- #1

TheCanadian

- 367

- 13

Hi,

I've made a "probability" histogram for my data and it's based on 14000 datapoints in total, BUT each bin is not the same (e.g. bin 1 might be composed on 200 total datapoints while bin 50 is only 3 data points). You can find it in image 1. Now, based on those relative frequencies, I constructed the average weight of each bin independently and plotted it up to a maximum of 1.7 on the x-axis (this results in 34 bins). This can be found in image 2.

There are a couple things to note, though. I've attached a third image, and this essentially just shows a bunch of random points plotted, before they were ever put into bins or averaged (my apologies for the different scales used in each of the images). All these points have an error in the independent variable that is not constant (not even in a particular bin), but none in the dependent variable. Thus, when computing the average x-error in each particular bin, I used:

## σ_x = \frac {\sqrt {\sum_{n=1}^{length(bin)} ({σ_x}_n)^2}}{length(bin)} ## where length(bin) is simply the number of elements in that particular bin. Now, when I have 34 independent bins that I each calculate an x-error for (in this prescribed method), I also want to find the y-error for combining these datapoints in each particular bin. I did this by the method shown here (i.e. from Cochran (1977)): http://stats.stackexchange.com/questions/25895/computing-standard-error-in-weighted-mean-estimation.

Now, by doing this, I have calculated 34 y-errors and 34 x-errors, one corresponding to each bin's datapoint for the plot of the weighted sum.

Now I am simply wondering, have I done anything wrong? I believe I've been essentially treating the x- and y-errors independently, but in this case, is this the correct approach? For example, since the y-error depends on the samples in each bin, the error in x indicates the likelihood of the data possibly being outside of the bin's limits. Shouldn't this be accounted for somewhat in the error in y or shouldn't there be an additional covariance parameter? If so, how would exactly would I go about doing that?

I am ultimately then trying to fit a linear function to this data by orthogonal distance regression. Any suggestions and help with regards to what I've done (possibly incorrectly) and maybe better formulas to use would be greatly appreciated!

I've made a "probability" histogram for my data and it's based on 14000 datapoints in total, BUT each bin is not the same (e.g. bin 1 might be composed on 200 total datapoints while bin 50 is only 3 data points). You can find it in image 1. Now, based on those relative frequencies, I constructed the average weight of each bin independently and plotted it up to a maximum of 1.7 on the x-axis (this results in 34 bins). This can be found in image 2.

There are a couple things to note, though. I've attached a third image, and this essentially just shows a bunch of random points plotted, before they were ever put into bins or averaged (my apologies for the different scales used in each of the images). All these points have an error in the independent variable that is not constant (not even in a particular bin), but none in the dependent variable. Thus, when computing the average x-error in each particular bin, I used:

## σ_x = \frac {\sqrt {\sum_{n=1}^{length(bin)} ({σ_x}_n)^2}}{length(bin)} ## where length(bin) is simply the number of elements in that particular bin. Now, when I have 34 independent bins that I each calculate an x-error for (in this prescribed method), I also want to find the y-error for combining these datapoints in each particular bin. I did this by the method shown here (i.e. from Cochran (1977)): http://stats.stackexchange.com/questions/25895/computing-standard-error-in-weighted-mean-estimation.

Now, by doing this, I have calculated 34 y-errors and 34 x-errors, one corresponding to each bin's datapoint for the plot of the weighted sum.

Now I am simply wondering, have I done anything wrong? I believe I've been essentially treating the x- and y-errors independently, but in this case, is this the correct approach? For example, since the y-error depends on the samples in each bin, the error in x indicates the likelihood of the data possibly being outside of the bin's limits. Shouldn't this be accounted for somewhat in the error in y or shouldn't there be an additional covariance parameter? If so, how would exactly would I go about doing that?

I am ultimately then trying to fit a linear function to this data by orthogonal distance regression. Any suggestions and help with regards to what I've done (possibly incorrectly) and maybe better formulas to use would be greatly appreciated!

#### Attachments

Last edited: