Hi,(adsbygoogle = window.adsbygoogle || []).push({});

I've made a "probability" histogram for my data and it's based on 14000 datapoints in total, BUT each bin is not the same (e.g. bin 1 might be composed on 200 total datapoints while bin 50 is only 3 data points). You can find it in image 1. Now, based on those relative frequencies, I constructed the average weight of each bin independently and plotted it up to a maximum of 1.7 on the x-axis (this results in 34 bins). This can be found in image 2.

There are a couple things to note, though. I've attached a third image, and this essentially just shows a bunch of random points plotted, before they were ever put into bins or averaged (my apologies for the different scales used in each of the images). All these points have an error in the independent variable that is not constant (not even in a particular bin), but none in the dependent variable. Thus, when computing the average x-error in each particular bin, I used:

## σ_x = \frac {\sqrt {\sum_{n=1}^{length(bin)} ({σ_x}_n)^2}}{length(bin)} ## where length(bin) is simply the number of elements in that particular bin. Now, when I have 34 independent bins that I each calculate an x-error for (in this prescribed method), I also want to find the y-error for combining these datapoints in each particular bin. I did this by the method shown here (i.e. from Cochran (1977)): http://stats.stackexchange.com/questions/25895/computing-standard-error-in-weighted-mean-estimation.

Now, by doing this, I have calculated 34 y-errors and 34 x-errors, one corresponding to each bin's datapoint for the plot of the weighted sum.

Now I am simply wondering, have I done anything wrong? I believe I've been essentially treating the x- and y-errors independently, but in this case, is this the correct approach? For example, since the y-error depends on the samples in each bin, the error in x indicates the likelihood of the data possibly being outside of the bin's limits. Shouldn't this be accounted for somewhat in the error in y or shouldn't there be an additional covariance parameter? If so, how would exactly would I go about doing that?

I am ultimately then trying to fit a linear function to this data by orthogonal distance regression. Any suggestions and help with regards to what I've done (possibly incorrectly) and maybe better formulas to use would be greatly appreciated!

**Physics Forums - The Fusion of Science and Community**

Join Physics Forums Today!

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

# Computing the standard deviation in y

Loading...

Similar Threads - Computing standard deviation | Date |
---|---|

A Reducing computation for large power sets | Jan 24, 2018 |

I Standard Deviation Versus Sample Size & T-Distribution | Dec 20, 2017 |

I Extension of Turing computable | Nov 14, 2017 |

I Computing uncertainties in histogram bin counts | Dec 13, 2016 |

I Combined measurement uncertainty for mass computation | Apr 1, 2016 |

**Physics Forums - The Fusion of Science and Community**