Propagating uncertainties through Gaussian fit

Click For Summary
SUMMARY

This discussion focuses on propagating uncertainties through Gaussian fits in the context of analyzing random variables with known uncertainties. Two primary methods are considered: calculating the mean and propagating uncertainties using the formula \(\sigma_f = \sqrt{\Sigma_i (\frac{\partial f}{\partial x_i})^2\sigma_{x_i}^2}\), and fitting a Gaussian to the histogram of data, which does not account for measurement uncertainties. The participants conclude that a combined approach, incorporating both the spread of measurements and the uncertainties, is necessary for accurate analysis. A suggested method involves fitting a histogram of all pixel values instead of averaging individual ring measurements.

PREREQUISITES
  • Understanding of Gaussian distribution and its properties
  • Familiarity with uncertainty propagation techniques
  • Knowledge of statistical estimation methods
  • Experience with image analysis and pixel intensity measurement
NEXT STEPS
  • Research Gaussian fitting techniques in Python using libraries like SciPy
  • Learn about uncertainty propagation in complex functions
  • Explore statistical models for correlated data, particularly in image analysis
  • Investigate methods for fitting histograms to noisy data distributions
USEFUL FOR

Researchers and analysts in fields such as physics, image processing, and statistics who are looking to accurately propagate uncertainties in their measurements and improve data analysis methodologies.

McCoy13
Messages
71
Reaction score
0
I'm doing an analysis where I have a set of random variables with some known uncertainties (the uncertainties are different for each random variable). The random variable is roughly Gaussian distributed. I'd like to get a meaningful characteristic value and uncertainty for the whole set. I can imagine two ways of doing this:

1) Take the mean of the data, and propagate the uncertainties as \sigma=\frac{1}{N}\sqrt{\Sigma \sigma_{x_i^2}}. However, the propagating the uncertainty this way doesn't take into account the spread in the data set, i.e. only the values of \sigma_{x_i} matter, not the values of x_i, whether they are broadly or narrowly distributed. Therefore this \sigma is not representative of the data.

2) Fit a Gaussian to the histogram of the data. This gives me a \sigma that is characteristic of the spread. However, it does not propagate the uncertainties I already know about my random variable.

What I'd like to do is propagate my uncertainties through the Gaussian fit, but I don't know how to do this and haven't been able to find a method to do so.
 
Physics news on Phys.org
I may be missing something or you may not be telling me everything but the uncertainties should be a measure of the spread in the random variables.
 
The uncertainties are a result of the precision of my measurement. If I were measuring lengths of rods, it'd be how precise my ruler is. However, I could also have a very broad or very narrow distribution of rod length. These two quantities (the uncertainty and the spread) are unrelated.
 
So you want to combine the uncertainties for rods of different lengths?
 
I'm not sure, because your wording is ambiguous. I'll try to be as clear as possible (don't be offended if some of this is elementary).

If you are making measurements, your apparatus has some limited precision, so your measurement has some uncertainty. For a quantity you're interested in, f, which is a function of your set of measurements x_i, you can find the uncertainty of f as \sigma_f = \sqrt{\Sigma_i (\frac{\partial f}{\partial x_i})^2\sigma_{x_i}^2}.

Option #1 as in my original post is taking f to be the mean. However, since \frac{\partial f}{\partial x_i}=\frac{1}{N}, \sigma_f doesn't depend on the values of x_i at all, only on the values of \sigma_{x_i}. Therefore, one could easily imagine having a large spread in x_i but small values of \sigma_{x_i}, in which case the resulting \sigma_f is very small compared to the spread of measurements, and therefore is not a representative value for the data.

Option #2 doesn't use error propagation at all; it totally ignores the fact that my measurements have an uncertainty. The \sigma value I get out of this method only cares about x_i and doesn't take \sigma_{x_i} into account.

In essence, I know two things are happening when I take a measurement: 1) my measurement tool is imprecise so I can't be sure of the x_i I measure and 2) the physical process I measure has actual variability and therefore the values of x_i should not all be the same. I want to have one number that captures both these aspects.
 
If σx= uncertainty in your measurement apparatus and the spread in measurements is greater than this then there is another factor(s) affecting the measurement. A simple scheme is this, for example if you are weighing a small sample of objects which you want to determine the mean and the uncertainty in that value ( say grains of sand on a particular beach) then you must determine the sample standard deviation\sigma_s=\sqrt{\sum{(\overline x-x_i)^2}}/(n-1)where ##\overline x## is the sample mean ##\overline x=\sum{x_i}/n## . the total uncertainty will then
##\sigma=\sqrt{\sigma_x^2+\sigma_s^2}##.
 
McCoy13 said:
I'm doing an analysis where I have a set of random variables with some known uncertainties (the uncertainties are different for each random variable). The random variable is roughly Gaussian distributed. I'd like to get a meaningful characteristic value and uncertainty for the whole set.

It isn't clear whether you want to do "statistical estimation" or a theoretical calculation (or perhaps an"estimation" that also incorporates a theoretical calculation.) If you have numerical realizations of a random variable, you can "estimate" parameters of its distribution (e.g. it's standard deviation, which is what I assume you mean by "uncertainty"). On the other hand if you "know" the parameters of several random variables, you can do a theoretical calculation and determine the parameters of the distribution of a function of those random variables.

People speak of "knowing" for "finding" the mean and standard deviation of a random variable from data, but this is not precise terminology. The only thing you can "know" or "find" from data is a sample mean and sample standard deviation.
 
Maybe being more specific will help.

I am analyzing image data. Basically I have images of bright rings on a dark background from several samples prepared with differing concentrations of a chemical reagent. I want to plot the characteristic intensity of the rings against the concentration, including error bars on the data. My question is about how big these error bars should be.

The way I measure the intensities is by using a circle finding algorithm to identify the rings in the image, and then taking a number of sample pixels along the circumference of the ring, reading off the intensities. For each ring in the image I have the order of a hundred intensity values. There is significant spread in the samples along each ring, i.e. this data is rather noisy. For each concentration I sample several hundred rings.

What I was doing was breaking down the analysis into two steps. I measure the intensities of pixels comprising an individual ring. Then I take the mean and standard deviation of those samples for that ring and note them down. Then what I was doing was to take a histogram of the mean intensity values of all the rings and performing a Gaussian fit. The output parameters of the Gaussian fit are the mean and the variance, but the variance only depends on the mean intensities of the individual rings, and the variance in the samples along individual rings is ignored.

An obvious solution that I didn't think of until just now is to fit a histogram of all the pixel values rather than taking a mean ring by ring.
 
McCoy13 said:
An obvious solution that I didn't think of until just now is to fit a histogram of all the pixel values rather than taking a mean ring by ring.

The pixel intensity values in a single ring (for a given concentration) may not be independent random variables. For example, does a bright pixel tend to have other bright pixels adjacent to it?
 
  • #10
Yes. If you plot the pixel intensities against their angle, you can easily see that it's not just random scatter.
 
  • #11
Assuming that your definition of the "characteristic" value of intensity is the mean value of pixel intensity (for a given concentration) then this theoretical mean value is a population parameter. You can estimate it by taking the mean of the individual mean values for each of your samples. The simplest way to estimate the standard deviation of the sample means is to do what you did - just compute the standard deviation of the sample means.

It's tempting to think that one can make a better estimate of the sample standard deviation by doing some calculation that involves the intensities of each individual pixel. If you want to go deeply into that, you should created a probability model for how the data is generated. (One that explains the dependence between adjacent pixels.) Applying statistics always involves assuming a probability model. Many people try to avoid discussing a probability model and talk only in terms of apply the blah-blah test or method. However, the suitability of tests and methods involves assumptions about probability models.

Incorporate what you know about the physics of the problem in the model. When applying statistics to a physical situation, you can't expect the answers to come from "math". Some of them have to come from physics.
 

Similar threads

  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 42 ·
2
Replies
42
Views
5K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
Replies
15
Views
1K