I Error on the Mean: How to Compute w/ Individual Error

  • Thread starter Thread starter kelly0303
  • Start date Start date
  • Tags Tags
    Error Mean
Click For Summary
To compute the mean with associated errors for measurements with individual errors, the standard formula for error on the mean (σ/√N) may not suffice, as it overlooks the individual measurement errors. Instead, the propagation of errors formula should be applied, which accounts for both the individual errors and the standard deviation of the measurements. The discussion highlights the importance of considering whether the measurements represent the same quantity, as differing variances can indicate they do not. Monte Carlo simulations are suggested as a method to visualize the distribution of the mean and its variance. Ultimately, accurately reflecting both precision and accuracy in the final error on the mean is crucial for reliable statistical reporting.
  • #31
Dale said:
Ok, so having reviewed the NIST document and your post I think that I understand the “official” procedure.

Ok, so this is a type A uncertainty with a standard uncertainty of ##u_s = 100##. If you are trying to measure the mean of the signal then this uncertainty contributes to the uncertainty of the measurement. But if you are trying to measure an individual value of this signal then this uncertainty is not relevant since it is part of the measurand.

This is then a type B uncertainty with a standard uncertainty ##u_1 = 50/2\sqrt{3}=14##

Which is a type B standard uncertainty of ## u_2 = 200/2\sqrt{3} = 58##
So if your goal is to measure the individual signal value you would use the propagation of errors. For that the combined uncertainty is ##u_c = \sqrt{u_1^2+u_2^2}/2 = 30##

But if your goal was to measure the mean of the signal then I am not certain, but I think that the combined uncertainty would be ##u_c = \sqrt{u_s^2 + u_1^2/4 + u_2^2/4} = 104##

I am not confident about that last one.
Thank you for this! I actually found this: https://ned.ipac.caltech.edu/level5/Leo/Stats4_5.html I think this is what I was looking for.
 
Physics news on Phys.org
  • #33
Dale said:
That seems good. It still uses the propagation of errors, but in a way that reduces the overall variance.
I still think something is missing (and I realize now my title was misleading, I am sorry for that). This gives you the error on the mean, i.e. how confident you are about the mean value. However if I am trying to approximate the real distribution with my measurements (which I assume it is why experimentalists are trying to do in general), I need to add on top of the error the standard deviation of the samples themselves?
 
  • #34
kelly0303 said:
I still think something is missing
So my recommendation when faced with situations that you are having trouble figuring out: Monte Carlo. At a minimum it let's you give any theoretical calculations a bit of a reality check.

kelly0303 said:
if I am trying to approximate the real distribution with my measurements
There are two approaches that I know. One is to assume some class of parametric distributions and then use the data to estimate the parameters. The other is to simply use the empirical distribution. The empirical distribution is non parametric but has known error bounds, so I like it.
 
  • #35
kelly0303 said:
Hmmm, ok I will try to be more specific (I am sorry, I don't know much about statistics, so I hope this will help). Say we have a source that produces a signal (in arbitrary units) of mean 1000 and standard deviation 100. And I have a measuring device with a resolution of 50 and another one with resolution of 200. I do one measurement with each of them and I get: ##900 \pm 50## and ##1100 \pm 200##. How should I properly combine these 2 measurements? Please let me know if I need to give more details.

Asking about the "proper" way to "combine" measurements is not a well defined mathematical question. If you don't want to tackle the sophisticated concepts involved in statistics, find some authority that has done similar work and copy what they did.

A slightly inferior version of that approach is to find people who will cross examine you until they can guess how to create a probability model for your problem and provide a solution based on that guess. If you want to pursue that route, let's try to formulate a specific question.

1. What is the population you are considering and what is being measured about it? Define this precisely. (e.g. The population of males between the ages of 20 and 30 in the state to Tennessee and their weights measured in pounds.)

2. Are you assuming the distribution of this population comes from a particular family of probability distributions? If so, what family of distributions? (e.g. lognormal)

3, Apparently you want to estimate some property of that population. What property is it? Is it one parameter of the distribution of the population? - or is it more than one parameter? - enough parameters to define the entire distribution function?

4. How is the population being sampled? Is it randomly sampled such that each member of the population has the same probability of being included in a sample? - or is it sampled in some systematic way? (e.g. pick 10 males at random versus pick 1 male at random from each of the age groups 21,22,23,...29.)

In your example, above, if I make up a population and make up a distribution for it, I still don't have information about how the two samples were selected. In particular, did the sampling process involve both picking a measuring instrument and a source at random? Or did the experimenter have two given measuring instruments and decide to use both of them? If so, were both used on the same source or were they used on two possibly different sources taken from the population of sources?5. To estimate a parameter of distribution, some algorithm is performed on a random sample of measurements. A result of such an algorithm is technically called a "statistic". When a "statistic" is used to estimate a parameter of a distribution, the statistic is called an "estimator". Statistics and estimators are random variables because they depend on the random values in samples. A statistic computed from a sample taken from a population usually does not have the same distribution of values as the population. (e.g. Suppose the population has a lognormal distribution. Suppose the statistic is defined by the algorithm "Take the mean value of measurements from 10 randomly selected individuals". The distribution of this statistic is not lognormal. )

Since statistics are random variables they have their own distributions, these distributions have their own parameters (e.g. mean, variance ) that can be different that the values of similar parameters in the distribution of the population. So it makes sense to talk about things like "the mean of the sample mean", "the variance of the sample mean"

However if I am trying to approximate the real distribution with my measurements
The distribution of what? The population has a distribution. The sample mean has a different distribution.
 
  • #36
  • #37
Figuring out what the question is is the most important part.
Dale said:
It is completely meaningless to say that they come from the same distribution and then give different variances for the two measurements. If they come from the same distribution then they must have the same variance.
That is not true. The uncertainty doesn't have to be the variance of the underlying distribution of the numbers. Toy example: You measure radioactive decays in 1 minute and estimate the true decay rate based on that. A measurement that has 5 decays will come with a different uncertainty than a measurement with 10 decays, even though they are identical repetitions of the experiment.
kelly0303 said:
Thank you for this! I actually found this: https://ned.ipac.caltech.edu/level5/Leo/Stats4_5.html I think this is what I was looking for.
That is the approach to get a best estimate for the "true" parameter and the uncertainty for your estimate.
kelly0303 said:
However if I am trying to approximate the real distribution with my measurements (which I assume it is why experimentalists are trying to do in general), I need to add on top of the error the standard deviation of the samples themselves?
Your measured variance will be the sum of the variance from the underlying distribution itself and your measurements. The above formula will give you the variance coming from your measurement uncertainties. Subtract that from the sample variance to estimate the variance of the underlying distribution.

Note: That will only give a best estimate, which even might be negative (e.g. you measure 1000+-100 and 1030+-60, the spread is smaller than you expect from the measurement uncertainties). If your measurement uncertainties are large compared to the width of the underlying distribution you will need many measurements to make this approach viable.
If you need a confidence interval for your estimate of the variance of the distribution run toy MC, doing that analytic won't work well.
 
  • #38
mfb said:
That is not true. The uncertainty doesn't have to be the variance of the underlying distribution of the numbers.
Oh, you are right. Good point.

mfb said:
If you need a confidence interval for your estimate of the variance of the distribution run toy MC, doing that analytic won't work well.
I have also suggested this a couple of times here. Monte Carlo is so flexible and useful that it should be a standard tool anyone doing statistics uses.
 

Similar threads

Replies
4
Views
3K
Replies
3
Views
1K
  • · Replies 13 ·
Replies
13
Views
2K
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
25
Views
3K
  • · Replies 42 ·
2
Replies
42
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
28
Views
4K