I Error on the Mean: How to Compute w/ Individual Error

kelly0303 · Sep 9, 2019

Dale said:

Ok, so having reviewed the NIST document and your post I think that I understand the “official” procedure.

Ok, so this is a type A uncertainty with a standard uncertainty of ##u_s = 100##. If you are trying to measure the mean of the signal then this uncertainty contributes to the uncertainty of the measurement. But if you are trying to measure an individual value of this signal then this uncertainty is not relevant since it is part of the measurand.

This is then a type B uncertainty with a standard uncertainty ##u_1 = 50/2\sqrt{3}=14##

Which is a type B standard uncertainty of ## u_2 = 200/2\sqrt{3} = 58##
So if your goal is to measure the individual signal value you would use the propagation of errors. For that the combined uncertainty is ##u_c = \sqrt{u_1^2+u_2^2}/2 = 30##

But if your goal was to measure the mean of the signal then I am not certain, but I think that the combined uncertainty would be ##u_c = \sqrt{u_s^2 + u_1^2/4 + u_2^2/4} = 104##

I am not confident about that last one.

Thank you for this! I actually found this: https://ned.ipac.caltech.edu/level5/Leo/Stats4_5.html I think this is what I was looking for.

Dale · Sep 9, 2019

kelly0303 said:

Thank you for this! I actually found this: https://ned.ipac.caltech.edu/level5/Leo/Stats4_5.html I think this is what I was looking for.

That seems good. It still uses the propagation of errors, but in a way that reduces the overall variance.

kelly0303 · Sep 9, 2019

Dale said:

That seems good. It still uses the propagation of errors, but in a way that reduces the overall variance.

I still think something is missing (and I realize now my title was misleading, I am sorry for that). This gives you the error on the mean, i.e. how confident you are about the mean value. However if I am trying to approximate the real distribution with my measurements (which I assume it is why experimentalists are trying to do in general), I need to add on top of the error the standard deviation of the samples themselves?

Dale · Sep 9, 2019

kelly0303 said:

I still think something is missing

So my recommendation when faced with situations that you are having trouble figuring out: Monte Carlo. At a minimum it let's you give any theoretical calculations a bit of a reality check.

kelly0303 said:

if I am trying to approximate the real distribution with my measurements

There are two approaches that I know. One is to assume some class of parametric distributions and then use the data to estimate the parameters. The other is to simply use the empirical distribution. The empirical distribution is non parametric but has known error bounds, so I like it.

Stephen Tashi · Sep 10, 2019

kelly0303 said:

Hmmm, ok I will try to be more specific (I am sorry, I don't know much about statistics, so I hope this will help). Say we have a source that produces a signal (in arbitrary units) of mean 1000 and standard deviation 100. And I have a measuring device with a resolution of 50 and another one with resolution of 200. I do one measurement with each of them and I get: ##900 \pm 50## and ##1100 \pm 200##. How should I properly combine these 2 measurements? Please let me know if I need to give more details.

Asking about the "proper" way to "combine" measurements is not a well defined mathematical question. If you don't want to tackle the sophisticated concepts involved in statistics, find some authority that has done similar work and copy what they did.

A slightly inferior version of that approach is to find people who will cross examine you until they can guess how to create a probability model for your problem and provide a solution based on that guess. If you want to pursue that route, let's try to formulate a specific question.

1. What is the population you are considering and what is being measured about it? Define this precisely. (e.g. The population of males between the ages of 20 and 30 in the state to Tennessee and their weights measured in pounds.)

2. Are you assuming the distribution of this population comes from a particular family of probability distributions? If so, what family of distributions? (e.g. lognormal)

3, Apparently you want to estimate some property of that population. What property is it? Is it one parameter of the distribution of the population? - or is it more than one parameter? - enough parameters to define the entire distribution function?

4. How is the population being sampled? Is it randomly sampled such that each member of the population has the same probability of being included in a sample? - or is it sampled in some systematic way? (e.g. pick 10 males at random versus pick 1 male at random from each of the age groups 21,22,23,...29.)

In your example, above, if I make up a population and make up a distribution for it, I still don't have information about how the two samples were selected. In particular, did the sampling process involve both picking a measuring instrument and a source at random? Or did the experimenter have two given measuring instruments and decide to use both of them? If so, were both used on the same source or were they used on two possibly different sources taken from the population of sources?5. To estimate a parameter of distribution, some algorithm is performed on a random sample of measurements. A result of such an algorithm is technically called a "statistic". When a "statistic" is used to estimate a parameter of a distribution, the statistic is called an "estimator". Statistics and estimators are random variables because they depend on the random values in samples. A statistic computed from a sample taken from a population usually does not have the same distribution of values as the population. (e.g. Suppose the population has a lognormal distribution. Suppose the statistic is defined by the algorithm "Take the mean value of measurements from 10 randomly selected individuals". The distribution of this statistic is not lognormal. )

Since statistics are random variables they have their own distributions, these distributions have their own parameters (e.g. mean, variance ) that can be different that the values of similar parameters in the distribution of the population. So it makes sense to talk about things like "the mean of the sample mean", "the variance of the sample mean"

However if I am trying to approximate the real distribution with my measurements

The distribution of what? The population has a distribution. The sample mean has a different distribution.

DEvens · Sep 12, 2019

The buzz phrase is "least squares fit with error bars." Here are two examples of working it out. If you don't like these, please Google some more.

https://young.physics.ucsc.edu/242/lsfit.pdf https://www.phas.ubc.ca/~oser/p509/Lec_09.pdf
Basically, you have ##\{\frac{x_i - \bar{x}}{\sigma_i}\}^2## in the least-squares instead of the usual thing. Then there's a formula to estimate the net error in the slope and intercept you get.

mfb · Sep 12, 2019

Figuring out what the question is is the most important part.

Dale said:

It is completely meaningless to say that they come from the same distribution and then give different variances for the two measurements. If they come from the same distribution then they must have the same variance.

That is not true. The uncertainty doesn't have to be the variance of the underlying distribution of the numbers. Toy example: You measure radioactive decays in 1 minute and estimate the true decay rate based on that. A measurement that has 5 decays will come with a different uncertainty than a measurement with 10 decays, even though they are identical repetitions of the experiment.

kelly0303 said:

Thank you for this! I actually found this: https://ned.ipac.caltech.edu/level5/Leo/Stats4_5.html I think this is what I was looking for.

That is the approach to get a best estimate for the "true" parameter and the uncertainty for your estimate.

kelly0303 said:

However if I am trying to approximate the real distribution with my measurements (which I assume it is why experimentalists are trying to do in general), I need to add on top of the error the standard deviation of the samples themselves?

Your measured variance will be the sum of the variance from the underlying distribution itself and your measurements. The above formula will give you the variance coming from your measurement uncertainties. Subtract that from the sample variance to estimate the variance of the underlying distribution.

Note: That will only give a best estimate, which even might be negative (e.g. you measure 1000+-100 and 1030+-60, the spread is smaller than you expect from the measurement uncertainties). If your measurement uncertainties are large compared to the width of the underlying distribution you will need many measurements to make this approach viable.
If you need a confidence interval for your estimate of the variance of the distribution run toy MC, doing that analytic won't work well.

Dale · Sep 12, 2019

mfb said:

That is not true. The uncertainty doesn't have to be the variance of the underlying distribution of the numbers.

Oh, you are right. Good point.

mfb said:

If you need a confidence interval for your estimate of the variance of the distribution run toy MC, doing that analytic won't work well.

I have also suggested this a couple of times here. Monte Carlo is so flexible and useful that it should be a standard tool anyone doing statistics uses.

I Error on the Mean: How to Compute w/ Individual Error

Similar threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I A variant of the Monty Hall problem

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

I Please Explain (actually explain) The Monty Hall Problem

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers