Propagating uncertainties through Gaussian fit

In summary, the conversation is about finding a meaningful characteristic value and uncertainty for a set of random variables that are roughly Gaussian distributed. Two possible ways of doing this are discussed: taking the mean and propagating the uncertainties, or fitting a Gaussian to the histogram of the data. However, both methods have limitations and the speaker is looking for a way to combine the uncertainties and spread of the data to get a more representative value. The conversation then delves into specific examples and calculations related to image data analysis.
  • #1
McCoy13
74
0
I'm doing an analysis where I have a set of random variables with some known uncertainties (the uncertainties are different for each random variable). The random variable is roughly Gaussian distributed. I'd like to get a meaningful characteristic value and uncertainty for the whole set. I can imagine two ways of doing this:

1) Take the mean of the data, and propagate the uncertainties as [itex]\sigma=\frac{1}{N}\sqrt{\Sigma \sigma_{x_i^2}}[/itex]. However, the propagating the uncertainty this way doesn't take into account the spread in the data set, i.e. only the values of [itex]\sigma_{x_i}[/itex] matter, not the values of [itex]x_i[/itex], whether they are broadly or narrowly distributed. Therefore this [itex]\sigma[/itex] is not representative of the data.

2) Fit a Gaussian to the histogram of the data. This gives me a [itex]\sigma[/itex] that is characteristic of the spread. However, it does not propagate the uncertainties I already know about my random variable.

What I'd like to do is propagate my uncertainties through the Gaussian fit, but I don't know how to do this and haven't been able to find a method to do so.
 
Physics news on Phys.org
  • #2
I may be missing something or you may not be telling me everything but the uncertainties should be a measure of the spread in the random variables.
 
  • #3
The uncertainties are a result of the precision of my measurement. If I were measuring lengths of rods, it'd be how precise my ruler is. However, I could also have a very broad or very narrow distribution of rod length. These two quantities (the uncertainty and the spread) are unrelated.
 
  • #4
So you want to combine the uncertainties for rods of different lengths?
 
  • #5
I'm not sure, because your wording is ambiguous. I'll try to be as clear as possible (don't be offended if some of this is elementary).

If you are making measurements, your apparatus has some limited precision, so your measurement has some uncertainty. For a quantity you're interested in, [itex]f[/itex], which is a function of your set of measurements [itex]x_i[/itex], you can find the uncertainty of [itex]f[/itex] as [tex]\sigma_f = \sqrt{\Sigma_i (\frac{\partial f}{\partial x_i})^2\sigma_{x_i}^2}[/tex].

Option #1 as in my original post is taking f to be the mean. However, since [itex]\frac{\partial f}{\partial x_i}=\frac{1}{N}[/itex], [itex]\sigma_f[/itex] doesn't depend on the values of [itex]x_i[/itex] at all, only on the values of [itex]\sigma_{x_i}[/itex]. Therefore, one could easily imagine having a large spread in [itex]x_i[/itex] but small values of [itex]\sigma_{x_i}[/itex], in which case the resulting [itex]\sigma_f[/itex] is very small compared to the spread of measurements, and therefore is not a representative value for the data.

Option #2 doesn't use error propagation at all; it totally ignores the fact that my measurements have an uncertainty. The [itex]\sigma[/itex] value I get out of this method only cares about [itex]x_i[/itex] and doesn't take [itex]\sigma_{x_i}[/itex] into account.

In essence, I know two things are happening when I take a measurement: 1) my measurement tool is imprecise so I can't be sure of the [itex]x_i[/itex] I measure and 2) the physical process I measure has actual variability and therefore the values of [itex]x_i[/itex] should not all be the same. I want to have one number that captures both these aspects.
 
  • #6
If σx= uncertainty in your measurement apparatus and the spread in measurements is greater than this then there is another factor(s) affecting the measurement. A simple scheme is this, for example if you are weighing a small sample of objects which you want to determine the mean and the uncertainty in that value ( say grains of sand on a particular beach) then you must determine the sample standard deviation[itex]\sigma_s=\sqrt{\sum{(\overline x-x_i)^2}}/(n-1)[/itex]where ##\overline x## is the sample mean ##\overline x=\sum{x_i}/n## . the total uncertainty will then
##\sigma=\sqrt{\sigma_x^2+\sigma_s^2}##.
 
  • #7
McCoy13 said:
I'm doing an analysis where I have a set of random variables with some known uncertainties (the uncertainties are different for each random variable). The random variable is roughly Gaussian distributed. I'd like to get a meaningful characteristic value and uncertainty for the whole set.

It isn't clear whether you want to do "statistical estimation" or a theoretical calculation (or perhaps an"estimation" that also incorporates a theoretical calculation.) If you have numerical realizations of a random variable, you can "estimate" parameters of its distribution (e.g. it's standard deviation, which is what I assume you mean by "uncertainty"). On the other hand if you "know" the parameters of several random variables, you can do a theoretical calculation and determine the parameters of the distribution of a function of those random variables.

People speak of "knowing" for "finding" the mean and standard deviation of a random variable from data, but this is not precise terminology. The only thing you can "know" or "find" from data is a sample mean and sample standard deviation.
 
  • #8
Maybe being more specific will help.

I am analyzing image data. Basically I have images of bright rings on a dark background from several samples prepared with differing concentrations of a chemical reagent. I want to plot the characteristic intensity of the rings against the concentration, including error bars on the data. My question is about how big these error bars should be.

The way I measure the intensities is by using a circle finding algorithm to identify the rings in the image, and then taking a number of sample pixels along the circumference of the ring, reading off the intensities. For each ring in the image I have the order of a hundred intensity values. There is significant spread in the samples along each ring, i.e. this data is rather noisy. For each concentration I sample several hundred rings.

What I was doing was breaking down the analysis into two steps. I measure the intensities of pixels comprising an individual ring. Then I take the mean and standard deviation of those samples for that ring and note them down. Then what I was doing was to take a histogram of the mean intensity values of all the rings and performing a Gaussian fit. The output parameters of the Gaussian fit are the mean and the variance, but the variance only depends on the mean intensities of the individual rings, and the variance in the samples along individual rings is ignored.

An obvious solution that I didn't think of until just now is to fit a histogram of all the pixel values rather than taking a mean ring by ring.
 
  • #9
McCoy13 said:
An obvious solution that I didn't think of until just now is to fit a histogram of all the pixel values rather than taking a mean ring by ring.

The pixel intensity values in a single ring (for a given concentration) may not be independent random variables. For example, does a bright pixel tend to have other bright pixels adjacent to it?
 
  • #10
Yes. If you plot the pixel intensities against their angle, you can easily see that it's not just random scatter.
 
  • #11
Assuming that your definition of the "characteristic" value of intensity is the mean value of pixel intensity (for a given concentration) then this theoretical mean value is a population parameter. You can estimate it by taking the mean of the individual mean values for each of your samples. The simplest way to estimate the standard deviation of the sample means is to do what you did - just compute the standard deviation of the sample means.

It's tempting to think that one can make a better estimate of the sample standard deviation by doing some calculation that involves the intensities of each individual pixel. If you want to go deeply into that, you should created a probability model for how the data is generated. (One that explains the dependence between adjacent pixels.) Applying statistics always involves assuming a probability model. Many people try to avoid discussing a probability model and talk only in terms of apply the blah-blah test or method. However, the suitability of tests and methods involves assumptions about probability models.

Incorporate what you know about the physics of the problem in the model. When applying statistics to a physical situation, you can't expect the answers to come from "math". Some of them have to come from physics.
 

1. What is "propagating uncertainties"?

"Propagating uncertainties" refers to the process of determining the uncertainty in a calculated value based on the uncertainty in the input parameters used in the calculation. This is particularly important in scientific experiments and data analysis, as it allows for a more accurate representation of the final result.

2. How is uncertainty propagated through a Gaussian fit?

In a Gaussian fit, the uncertainty is propagated by taking into account the uncertainties in the parameters that define the Gaussian curve, such as the mean and standard deviation. These uncertainties are then used to calculate the uncertainty in the fitted parameters, which in turn affects the uncertainty in the final result.

3. Why is it important to consider uncertainties in a Gaussian fit?

Considering uncertainties in a Gaussian fit is important because it allows for a more realistic representation of the data. In many cases, the measured values will have some level of uncertainty, and not accounting for this can lead to inaccurate or misleading results.

4. What is the difference between systematic and random uncertainties in a Gaussian fit?

Systematic uncertainties refer to errors that consistently affect the measurement in the same direction, such as a faulty instrument. Random uncertainties, on the other hand, refer to errors that vary in magnitude and direction and are often due to factors that cannot be controlled. In a Gaussian fit, both types of uncertainties are considered and propagated in the final result.

5. How can I determine the uncertainty in my Gaussian fit?

The uncertainty in a Gaussian fit can be determined through various methods, such as using statistical analysis tools or calculating it manually using the uncertainties in the input parameters. It is important to carefully consider all sources of uncertainty and use appropriate methods to accurately determine the uncertainty in the final result.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
791
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
Back
Top