# Propagating uncertainties through Gaussian fit

1. Jun 8, 2015

### McCoy13

I'm doing an analysis where I have a set of random variables with some known uncertainties (the uncertainties are different for each random variable). The random variable is roughly Gaussian distributed. I'd like to get a meaningful characteristic value and uncertainty for the whole set. I can imagine two ways of doing this:

1) Take the mean of the data, and propagate the uncertainties as $\sigma=\frac{1}{N}\sqrt{\Sigma \sigma_{x_i^2}}$. However, the propagating the uncertainty this way doesn't take into account the spread in the data set, i.e. only the values of $\sigma_{x_i}$ matter, not the values of $x_i$, whether they are broadly or narrowly distributed. Therefore this $\sigma$ is not representative of the data.

2) Fit a Gaussian to the histogram of the data. This gives me a $\sigma$ that is characteristic of the spread. However, it does not propagate the uncertainties I already know about my random variable.

What I'd like to do is propagate my uncertainties through the Gaussian fit, but I don't know how to do this and haven't been able to find a method to do so.

2. Jun 8, 2015

### gleem

I may be missing something or you may not be telling me everything but the uncertainties should be a measure of the spread in the random variables.

3. Jun 8, 2015

### McCoy13

The uncertainties are a result of the precision of my measurement. If I were measuring lengths of rods, it'd be how precise my ruler is. However, I could also have a very broad or very narrow distribution of rod length. These two quantities (the uncertainty and the spread) are unrelated.

4. Jun 8, 2015

### gleem

So you want to combine the uncertainties for rods of different lengths?

5. Jun 8, 2015

### McCoy13

I'm not sure, because your wording is ambiguous. I'll try to be as clear as possible (don't be offended if some of this is elementary).

If you are making measurements, your apparatus has some limited precision, so your measurement has some uncertainty. For a quantity you're interested in, $f$, which is a function of your set of measurements $x_i$, you can find the uncertainty of $f$ as $$\sigma_f = \sqrt{\Sigma_i (\frac{\partial f}{\partial x_i})^2\sigma_{x_i}^2}$$.

Option #1 as in my original post is taking f to be the mean. However, since $\frac{\partial f}{\partial x_i}=\frac{1}{N}$, $\sigma_f$ doesn't depend on the values of $x_i$ at all, only on the values of $\sigma_{x_i}$. Therefore, one could easily imagine having a large spread in $x_i$ but small values of $\sigma_{x_i}$, in which case the resulting $\sigma_f$ is very small compared to the spread of measurements, and therefore is not a representative value for the data.

Option #2 doesn't use error propagation at all; it totally ignores the fact that my measurements have an uncertainty. The $\sigma$ value I get out of this method only cares about $x_i$ and doesn't take $\sigma_{x_i}$ into account.

In essence, I know two things are happening when I take a measurement: 1) my measurement tool is imprecise so I can't be sure of the $x_i$ I measure and 2) the physical process I measure has actual variability and therefore the values of $x_i$ should not all be the same. I want to have one number that captures both these aspects.

6. Jun 9, 2015

### gleem

If σx= uncertainty in your measurement apparatus and the spread in measurements is greater than this then there is another factor(s) affecting the measurement. A simple scheme is this, for example if you are weighing a small sample of objects which you want to determine the mean and the uncertainty in that value ( say grains of sand on a particular beach) then you must determine the sample standard deviation$\sigma_s=\sqrt{\sum{(\overline x-x_i)^2}}/(n-1)$where $\overline x$ is the sample mean $\overline x=\sum{x_i}/n$ . the total uncertainty will then
$\sigma=\sqrt{\sigma_x^2+\sigma_s^2}$.

7. Jun 11, 2015

### Stephen Tashi

It isn't clear whether you want to do "statistical estimation" or a theoretical calculation (or perhaps an"estimation" that also incorporates a theoretical calculation.) If you have numerical realizations of a random variable, you can "estimate" parameters of its distribution (e.g. it's standard deviation, which is what I assume you mean by "uncertainty"). On the other hand if you "know" the parameters of several random variables, you can do a theoretical calculation and determine the parameters of the distribution of a function of those random variables.

People speak of "knowing" for "finding" the mean and standard deviation of a random variable from data, but this is not precise terminology. The only thing you can "know" or "find" from data is a sample mean and sample standard deviation.

8. Jun 11, 2015

### McCoy13

Maybe being more specific will help.

I am analyzing image data. Basically I have images of bright rings on a dark background from several samples prepared with differing concentrations of a chemical reagent. I want to plot the characteristic intensity of the rings against the concentration, including error bars on the data. My question is about how big these error bars should be.

The way I measure the intensities is by using a circle finding algorithm to identify the rings in the image, and then taking a number of sample pixels along the circumference of the ring, reading off the intensities. For each ring in the image I have the order of a hundred intensity values. There is significant spread in the samples along each ring, i.e. this data is rather noisy. For each concentration I sample several hundred rings.

What I was doing was breaking down the analysis into two steps. I measure the intensities of pixels comprising an individual ring. Then I take the mean and standard deviation of those samples for that ring and note them down. Then what I was doing was to take a histogram of the mean intensity values of all the rings and performing a Gaussian fit. The output parameters of the Gaussian fit are the mean and the variance, but the variance only depends on the mean intensities of the individual rings, and the variance in the samples along individual rings is ignored.

An obvious solution that I didn't think of until just now is to fit a histogram of all the pixel values rather than taking a mean ring by ring.

9. Jun 11, 2015

### Stephen Tashi

The pixel intensity values in a single ring (for a given concentration) may not be independent random variables. For example, does a bright pixel tend to have other bright pixels adjacent to it?

10. Jun 11, 2015

### McCoy13

Yes. If you plot the pixel intensities against their angle, you can easily see that it's not just random scatter.

11. Jun 11, 2015

### Stephen Tashi

Assuming that your definition of the "characteristic" value of intensity is the mean value of pixel intensity (for a given concentration) then this theoretical mean value is a population parameter. You can estimate it by taking the mean of the individual mean values for each of your samples. The simplest way to estimate the standard deviation of the sample means is to do what you did - just compute the standard deviation of the sample means.

It's tempting to think that one can make a better estimate of the sample standard deviation by doing some calculation that involves the intensities of each individual pixel. If you want to go deeply into that, you should created a probability model for how the data is generated. (One that explains the dependence between adjacent pixels.) Applying statistics always involves assuming a probability model. Many people try to avoid discussing a probability model and talk only in terms of apply the blah-blah test or method. However, the suitability of tests and methods involves assumptions about probability models.

Incorporate what you know about the physics of the problem in the model. When applying statistics to a physical situation, you can't expect the answers to come from "math". Some of them have to come from physics.

Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook