# I Error on the mean

#### Dale

Mentor
ok... but still, how do I combine them into one result.
How you combine them depends on the application, but taking the mean is a reasonable approach. If you have one measurement which is X~N(100,1) and another measurement that is Y~N(110,5) then the mean will be approximately N(105,2.5). This is exactly what is expected with the propagation of errors formula.

Perhaps you mean that your measurements would be X~U(99.5,100.5) and Y~U(107.5,102.5) and you are objecting to using the normal distribution for these? I am not certain if the propagation of errors formula depends on the specific distribution.

#### Dale

Mentor
@kelly0303 I think that I may be misunderstanding your situation. Perhaps it would help to look at the NIST guide for evaluating uncertainty. They do a really good job of distilling the complicated topic of measurement uncertainty into something approachable.

I think that especially section 4.6 may be relevant here. Also note section 5.1 advocates the use of the propagation of errors formula for combining uncertainties.

#### kelly0303

You have to be specific about the interpretation of the number you wish to publish. Your example of the "uncertainty" in measuring with a ruler doesn't involve probability. If the ruler can be read to $\pm$ a tenth of an inch, then a measurement with the ruler is guanteed to be within that distance of the actual value. There is no probability associated with that guarantee - except that it is 100% probable that it's true.

"Uncertainty" is an ambiguous term. How is it interpreted in your field of study? Is it supposed the be the standard deviation of a random variable? Or is it supposed to give an absolute guarantee about something?

To repeat, there is no objective answer to your question unless the question is converted to an unambiguous form. To make that conversion, you must appreciate the conceptual sophistication of statistics.
Hmmm, ok I will try to be more specific (I am sorry, I don't know much about statistics, so I hope this will help). Say we have a source that produces a signal (in arbitrary units) of mean 1000 and standard deviation 100. And I have a measuring device with a resolution of 50 and another one with resolution of 200. I do one measurement with each of them and I get: $900 \pm 50$ and $1100 \pm 200$. How should I properly combine these 2 measurements? Please let me know if I need to give more details.

#### Dale

Mentor
I do one measurement with each of them
OK, since you are doing one measurement with each of them then that is definitely a "type B" uncertainty. I think that the section 4.6 is what you want.

#### Dale

Mentor
Ok, so having reviewed the NIST document and your post I think that I understand the “official” procedure.

Say we have a source that produces a signal (in arbitrary units) of mean 1000 and standard deviation 100.
Ok, so this is a type A uncertainty with a standard uncertainty of $u_s = 100$. If you are trying to measure the mean of the signal then this uncertainty contributes to the uncertainty of the measurement. But if you are trying to measure an individual value of this signal then this uncertainty is not relevant since it is part of the measurand.

And I have a measuring device with a resolution of 50
This is then a type B uncertainty with a standard uncertainty $u_1 = 50/2\sqrt{3}=14$

and another one with resolution of 200
Which is a type B standard uncertainty of $u_2 = 200/2\sqrt{3} = 58$
I do one measurement with each of them and I get: 900±50 and 1100±200. How should I properly combine these 2 measurements?
So if your goal is to measure the individual signal value you would use the propagation of errors. For that the combined uncertainty is $u_c = \sqrt{u_1^2+u_2^2}/2 = 30$

But if your goal was to measure the mean of the signal then I am not certain, but I think that the combined uncertainty would be $u_c = \sqrt{u_s^2 + u_1^2/4 + u_2^2/4} = 104$

I am not confident about that last one.

#### kelly0303

Ok, so having reviewed the NIST document and your post I think that I understand the “official” procedure.

Ok, so this is a type A uncertainty with a standard uncertainty of $u_s = 100$. If you are trying to measure the mean of the signal then this uncertainty contributes to the uncertainty of the measurement. But if you are trying to measure an individual value of this signal then this uncertainty is not relevant since it is part of the measurand.

This is then a type B uncertainty with a standard uncertainty $u_1 = 50/2\sqrt{3}=14$

Which is a type B standard uncertainty of $u_2 = 200/2\sqrt{3} = 58$
So if your goal is to measure the individual signal value you would use the propagation of errors. For that the combined uncertainty is $u_c = \sqrt{u_1^2+u_2^2}/2 = 30$

But if your goal was to measure the mean of the signal then I am not certain, but I think that the combined uncertainty would be $u_c = \sqrt{u_s^2 + u_1^2/4 + u_2^2/4} = 104$

I am not confident about that last one.
Thank you for this! I actually found this: https://ned.ipac.caltech.edu/level5/Leo/Stats4_5.html I think this is what I was looking for.

Mentor

#### kelly0303

That seems good. It still uses the propagation of errors, but in a way that reduces the overall variance.
I still think something is missing (and I realize now my title was misleading, I am sorry for that). This gives you the error on the mean, i.e. how confident you are about the mean value. However if I am trying to approximate the real distribution with my measurements (which I assume it is why experimentalists are trying to do in general), I need to add on top of the error the standard deviation of the samples themselves?

#### Dale

Mentor
I still think something is missing
So my recommendation when faced with situations that you are having trouble figuring out: Monte Carlo. At a minimum it lets you give any theoretical calculations a bit of a reality check.

if I am trying to approximate the real distribution with my measurements
There are two approaches that I know. One is to assume some class of parametric distributions and then use the data to estimate the parameters. The other is to simply use the empirical distribution. The empirical distribution is non parametric but has known error bounds, so I like it.

#### Stephen Tashi

Hmmm, ok I will try to be more specific (I am sorry, I don't know much about statistics, so I hope this will help). Say we have a source that produces a signal (in arbitrary units) of mean 1000 and standard deviation 100. And I have a measuring device with a resolution of 50 and another one with resolution of 200. I do one measurement with each of them and I get: $900 \pm 50$ and $1100 \pm 200$. How should I properly combine these 2 measurements? Please let me know if I need to give more details.
Asking about the "proper" way to "combine" measurements is not a well defined mathematical question. If you don't want to tackle the sophisticated concepts involved in statistics, find some authority that has done similar work and copy what they did.

A slightly inferior version of that approach is to find people who will cross examine you until they can guess how to create a probability model for your problem and provide a solution based on that guess. If you want to pursue that route, let's try to formulate a specific question.

1. What is the population you are considering and what is being measured about it? Define this precisely. (e.g. The population of males between the ages of 20 and 30 in the state to Tennessee and their weights measured in pounds.)

2. Are you assuming the distribution of this population comes from a particular family of probability distributions? If so, what family of distributions? (e.g. lognormal)

3, Apparently you want to estimate some property of that population. What property is it? Is it one parameter of the distribution of the population? - or is it more than one parameter? - enough parameters to define the entire distribution function?

4. How is the population being sampled? Is it randomly sampled such that each member of the population has the same probability of being included in a sample? - or is it sampled in some systematic way? (e.g. pick 10 males at random versus pick 1 male at random from each of the age groups 21,22,23,...29.)

In your example, above, if I make up a population and make up a distribution for it, I still don't have information about how the two samples were selected. In particular, did the sampling process involve both picking a measuring instrument and a source at random? Or did the experimenter have two given measuring instruments and decide to use both of them? If so, were both used on the same source or were they used on two possibly different sources taken from the population of sources?

5. To estimate a parameter of distribution, some algorithm is performed on a random sample of measurements. A result of such an algorithm is technically called a "statistic". When a "statistic" is used to estimate a parameter of a distribution, the statistic is called an "estimator". Statistics and estimators are random variables because they depend on the random values in samples. A statistic computed from a sample taken from a population usually does not have the same distribution of values as the population. (e.g. Suppose the population has a lognormal distribution. Suppose the statistic is defined by the algorithm "Take the mean value of measurements from 10 randomly selected individuals". The distribution of this statistic is not lognormal. )

Since statistics are random variables they have their own distributions, these distributions have their own parameters (e.g. mean, variance ) that can be different that the values of similar parameters in the distribution of the population. So it makes sense to talk about things like "the mean of the sample mean", "the variance of the sample mean"

However if I am trying to approximate the real distribution with my measurements
The distribution of what? The population has a distribution. The sample mean has a different distribution.

#### DEvens

Gold Member
The buzz phrase is "least squares fit with error bars." Here are two examples of working it out. If you don't like these, please Google some more.

Basically, you have $\{\frac{x_i - \bar{x}}{\sigma_i}\}^2$ in the least-squares instead of the usual thing. Then there's a formula to estimate the net error in the slope and intercept you get.

#### mfb

Mentor
Figuring out what the question is is the most important part.
It is completely meaningless to say that they come from the same distribution and then give different variances for the two measurements. If they come from the same distribution then they must have the same variance.
That is not true. The uncertainty doesn't have to be the variance of the underlying distribution of the numbers. Toy example: You measure radioactive decays in 1 minute and estimate the true decay rate based on that. A measurement that has 5 decays will come with a different uncertainty than a measurement with 10 decays, even though they are identical repetitions of the experiment.
Thank you for this! I actually found this: https://ned.ipac.caltech.edu/level5/Leo/Stats4_5.html I think this is what I was looking for.
That is the approach to get a best estimate for the "true" parameter and the uncertainty for your estimate.
However if I am trying to approximate the real distribution with my measurements (which I assume it is why experimentalists are trying to do in general), I need to add on top of the error the standard deviation of the samples themselves?
Your measured variance will be the sum of the variance from the underlying distribution itself and your measurements. The above formula will give you the variance coming from your measurement uncertainties. Subtract that from the sample variance to estimate the variance of the underlying distribution.

Note: That will only give a best estimate, which even might be negative (e.g. you measure 1000+-100 and 1030+-60, the spread is smaller than you expect from the measurement uncertainties). If your measurement uncertainties are large compared to the width of the underlying distribution you will need many measurements to make this approach viable.
If you need a confidence interval for your estimate of the variance of the distribution run toy MC, doing that analytic won't work well.

#### Dale

Mentor
That is not true. The uncertainty doesn't have to be the variance of the underlying distribution of the numbers.
Oh, you are right. Good point.

If you need a confidence interval for your estimate of the variance of the distribution run toy MC, doing that analytic won't work well.
I have also suggested this a couple of times here. Monte Carlo is so flexible and useful that it should be a standard tool anyone doing statistics uses.

"Error on the mean"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving