Why do we report the standard deviation of the mean in error analysis?

dydxforsn · May 29, 2012

I've done a decent amount of reading on the subject, and I used this method of course when reporting errors during lab experience I got as an undergraduate, but it was never quite fully explained to me why exactly we report a value +/- the standard deviation of the mean as our result. Why not just the standard deviation? I understand it's because we only took a sample of the population, but how in the heck does this problem get fixed and we magically get the standard deviation of the mean (which I might still need defined to me because I don't see how it could possibly be what it says it is) if we just divide σ by the square root of 'n'?..

I remember something about multiplying every point along the frequency distribution by each other and maximizing that by a derivative (so you're assuming that the 10 or so points you randomly sampled would just happen to be the most likely 10 points or some logic like that..) and then solving for the variables 'a' and 'b' of a gaussian curve that you assume the experimental data is going to conform to (which I guess I would also like to know if it's more than an assumption that experimental data will conform to a gaussian curve, I've never found an explanation of this either, they always just say "assume a gaussian curve"...) But if this method is used by what logic do people use to call this 'b' value found in this way the "standard deviation OF THE MEAN". I don't see how those particular semantics follow from that method.

Sorry if my irritation is openly transparent. This topic always poorly and frustratingly explained. None of my physics go into any kind of depth on this subject at all, I've completed my undergraduate and still don't know about this and have apparently no way of finding out (some of my other physics friends are also confused about how this comes about..)

chiro · May 29, 2012

Hey dydxforsn.

I don't know about the +- but I'm guessing it's just a natural convention that's been used and still is used.

As for the Gaussian, I absolutely agree about this assumption just used whenever it is 'suitable' to be used. The reason for using this is because a lot of statistical analyses like Gaussian distributions, especially for the classical frequentist approaches.

I guess if they provided evidence that the distribution was indeed Gaussian through a variety of statistics, then it would be ok, but yeah just assuming something is Gaussian is definitely something that should be looked down on.

Another to mention is that the distribution may not be for say a sample or collected data, but for a model and this needs to be taken into account. For example you might want to use Gaussian distributions as noise models for a communication system and there are very good reasons for doing this. I imagine a lot of other similar situations apply.

In terms of actually getting the standard deviation of the mean, there are results like the Central Limit Theorem which says the distribution of the mean of any variable given enough data points is normally distributed. This theorem is actually the foundation for a lot of classical frequentist statistics, that many scientists use.

Using this, you can derive distributions for the mean with respect to the sample mean (mean calculated from your data) and from this you get a distribution of your population mean under certain assumptions.

The actual understanding behind this involves the CLT, Random Variables, Transformation of PDF's amongst other things and if you want to understand this, then open up a solid probability/statistics book and take a look.

I see your frustration and I empathize with it because I did some tutoring recently for an engineer who had some troubles with these kinds of things. Upon looking at the notes, it was really no surprise that he had some conceptual difficulties because there were no explanations and it was basically a non-conceptual, take it on faith, kind of exercise which is absolutely pathetic for anyone trying to use mathematical methods properly whether they be engineers, scientists, analysts or anyone else of any kind.

If you have specific questions, post them in this forum and I'm sure you will get a decent answer that is deeper and more conceptual than what you are getting in your coursework.

haruspex · May 29, 2012

dydxforsn said:

I've done a decent amount of reading on the subject, and I used this method of course when reporting errors during lab experience I got as an undergraduate, but it was never quite fully explained to me why exactly we report a value +/- the standard deviation of the mean as our result. Why not just the standard deviation? I understand it's because we only took a sample of the population, but how in the heck does this problem get fixed and we magically get the standard deviation of the mean (which I might still need defined to me because I don't see how it could possibly be what it says it is) if we just divide σ by the square root of 'n'?..

Are you saying there's a convention that the error range is always quoted as one (estimated) standard deviation? If so, I guess that's ok, provided the reader bears in mind that's all it means. For many purposes, depending on the risks involved in making the wrong decision, it may be wiser to use several standard deviations, or in other cases perhaps only half a standard deviation.
I don't understand the next part of your question? What problem to be fixed? In what procedure are you dividing a σ by the square root of 'n'? Is this σ the actual s.d of the population (which is known how?) or s.d. of a sample, or an estimate of the s.d. of the population based on a sample?
From a set of observations, you can form an estimate of σ for the whole population. This will be more than the standard deviation of the sample, size n say, by a factor √(n/(n-1)). I can explain that if you like. This does not depend on any knowledge of the underlying distribution, only on the assumption that the samples are independent.

Parlyne · May 30, 2012

It seems to me you've asked pretty much all of the subtle questions that are hidden in basic discussions of error analysis; so, let me see if I can untangle them a bit for you. I'll start with the distinction between the standard deviation and the standard deviation of the mean.

When we consider a data set, we generally expect that measurements of any physical quantity will tend to fluctuate randomly about some "real" value. When we say this, what we really mean is that there is some value we should get if we could separate out only the physics that we're interested in; but, in reality, there are always other effects that we can't control, but that will tend to change from trial to trial. When we calculate the standard deviation of our data, what we're doing is finding the typical size of the fluctuation in any single measurement.

What we're generally interested in, however, is getting the best possible estimate of the value that our measurements are fluctuating around and characterizing how closely that best-fit value approximates the "true" value. As long as the fluctuations in the individual measurements are random, averaging the measured values will tend to cancel the effects of the random fluctuations, which will mean that the average will tend to be closer to the "true" value than any individual measurement will be. Thus, we should expect that a quantity characterizing the typical deviation of the average of a data set from the "true" value will be smaller than that characterizing the typical deviation of an individual measurement from the "true" value. We can see this by considering error propagation. If we take the uncertainty in each measurement to be [itex]\sigma[/itex] (with the presumption that this represents the standard deviation of the measured values), the uncertainty in the average, [itex]\frac{1}{N}\sum_{i=1}^Nx_i[/itex], will be [itex]\sqrt{\sum_{i=1}^N(\frac{\sigma}{N})^2}[/itex], which simplifies to [itex]\frac{\sigma}{\sqrt{N}}[/itex]. You should recognize this as the standard deviation of the mean.

In point of fact, however, there are still some unstated assumptions packed into this discussion, most of which really just come down to unstated assumptions about the distribution of measured values. It is certainly true that we often assume without further comment that measured values are normally distributed. As you may already suspect, this is pretty much always only approximately true, at best. Measured values will strictly only follow a normal distribution when the domain of possible values that can be measured is the entire real line and when the deviation of each measurement is totally uncorrelated with the deviation of each other measurement. The second condition is often unproblematic (although rarely exactly true); however, the first is never satisfied. Our measurements always have finite resolution and restricted range. So, the gaussian approximation is best when the standard deviation of measured values is large compared with the instrumental resolution but small compared with the allowed range of measured values.

Whatever is actually the case about the statistical properties of our measurements, we need to justify the use of the average as the best estimator of the "true" value, as well as the use of error propagation rules. This is usually done using the Method of Maximum Likelihood, which is just a fancy way of saying that we guess at the form of the distribution from which our measured values are sampled, use that to find the form of the probability of data set, and maximize that probability with respect to the parameters of the distribution. That's a little more involved that I want to try to type up here; but, you can find discussions elsewhere which go through this in some detail.

dydxforsn · May 30, 2012

chiro said:

Hey dydxforsn.

I don't know about the +- but I'm guessing it's just a natural convention that's been used and still is used.

heh, I didn't realize there was a ± button in the quick symbols of this forum. I just meant to say that you report experimental values that are determined in a lab setting in your formal lab report as like (if it's a position measurement) 4.0_cm ± 0.2cm (where 0.2cm is the standard deviation of the mean. Or rather, 2 times the standard deviation of the mean usually to get like 95% confidence..)

chiro said:

Another to mention is that the distribution may not be for say a sample or collected data, but for a model and this needs to be taken into account. For example you might want to use Gaussian distributions as noise models for a communication system and there are very good reasons for doing this. I imagine a lot of other similar situations apply.

Yeah that is definitely true, like if you were measuring speeds of molecules in a gas or something. There are definitely those situations.

chiro said:

In terms of actually getting the standard deviation of the mean, there are results like the Central Limit Theorem which says the distribution of the mean of any variable given enough data points is normally distributed. This theorem is actually the foundation for a lot of classical frequentist statistics, that many scientists use.

Using this, you can derive distributions for the mean with respect to the sample mean (mean calculated from your data) and from this you get a distribution of your population mean under certain assumptions.

Okay, this is definitely helpful, I clearly need to delve into the Central Limit Theorem some more, I'll look into this. This is definitely along the lines of what I'm looking for. It's this "distribution for the mean" that I'm looking for the proof of ([itex]\frac{\sigma}{\sqrt{n}}[/itex]), seeing as how σ_m along with the average value (which we presumably already know to be the average value of the sample we've already obtained) completely define the theoretical distribution of average (true) values.)

chiro said:

The actual understanding behind this involves the CLT, Random Variables, Transformation of PDF's amongst other things and if you want to understand this, then open up a solid probability/statistics book and take a look.

If you have specific questions, post them in this forum and I'm sure you will get a decent answer that is deeper and more conceptual than what you are getting in your coursework.

I definitely appreciate your response Chiro, it's helped. I may have to go to the university library and read up on some statistics books on the subject, but I was always afraid that if I got a math book on statistics that it might stray too far from what I'm looking for if it gets included at all, but I may give it a try soon if I can't clear everything up with this topic. Currently I'm reading Experiments in Modern Physics by Melissinos, but the section on the subject of this forum topic is a mere appendix topic and I feel could use further explanation.

Parlyne said:

What we're generally interested in, however, is getting the best possible estimate of the value that our measurements are fluctuating around and characterizing how closely that best-fit value approximates the "true" value. As long as the fluctuations in the individual measurements are random, averaging the measured values will tend to cancel the effects of the random fluctuations, which will mean that the average will tend to be closer to the "true" value than any individual measurement will be. Thus, we should expect that a quantity characterizing the typical deviation of the average of a data set from the "true" value will be smaller than that characterizing the typical deviation of an individual measurement from the "true" value. We can see this by considering error propagation. If we take the uncertainty in each measurement to be [itex]\sigma[/itex] (with the presumption that this represents the standard deviation of the measured values), the uncertainty in the average, [itex]\frac{1}{N}\sum_{i=1}^Nx_i[/itex], will be [itex]\sqrt{\sum_{i=1}^N(\frac{\sigma}{N})^2}[/itex], which simplifies to [itex]\frac{\sigma}{\sqrt{N}}[/itex]. You should recognize this as the standard deviation of the mean.

That's it! Okay, I see what you're doing, you're simply assuming each measurement has its own error value associated with it, 'σ', and using propagation of errors to find that there is an error in the average value where the average value is simply a function of each of the independent measurements which allows you to do the familiar [itex]\delta_{w}^{2} = \sum_{i}{(\frac{\partial w}{\partial x_i}\delta_{x_{i}})}^2[/itex] (where w = x_avg = f(x₁, x₂, .., x_i). I think I've been looking at it from this "Maximum Likelihood Method" and thus didn't really see how this came about.

Parlyne said:

Whatever is actually the case about the statistical properties of our measurements, we need to justify the use of the average as the best estimator of the "true" value, as well as the use of error propagation rules. This is usually done using the Method of Maximum Likelihood, which is just a fancy way of saying that we guess at the form of the distribution from which our measured values are sampled, use that to find the form of the probability of data set, and maximize that probability with respect to the parameters of the distribution. That's a little more involved that I want to try to type up here; but, you can find discussions elsewhere which go through this in some detail.

I sort of talked about this in my original post. I've seen such a process, and I believe it explains why the sample average and sample standard deviation can be used to find the population average (which is simply just the sample average according to this "Maximum Likelihood Method") as well as the population standard deviation ([itex]\frac{\sigma}{\sqrt{n}}[/itex]. However, I don't see how this isn't just another assumption. The Maximum Likelihood Method seems to just say "Well, let's just multiply the probability distribution (using the additional assumption that it's a gaussian) at the specific points that we measured together and try to maximize that product. So basically we're assuming that the 10 points we obtained somehow combine to form the highest likelihood 10 point set that we could have sampled." Or something like that... I dunno, everything about error analysis just seems to boil down to assumptions (btw I very much appreciate your reply Parlyne, it was very useful.)

I'll make my ideas on this subject clear at this point to avoid confusion in this topic and be very specific. I came into this topic having 3 specific questions:

1.) How is it that we can assume an underlying gaussian distribution in the spread of our experimentally measured values?

2.) How can we say that from the sample we can find the population distribution with the average value we obtained being equal to the average value of the population as well as the standard deviation we obtained being equal to the standard deviation of the population? But wait, we're not finding the average value for the real population distribution, we're finding the average value and its standard deviation from the sample set (so a distribution involving average values, not a distribution for the population of all values possible to experimentally obtain.)

3.) How is it that the Method of Maximum Likelihood isn't just simply another assumption? How can we possibly be certain that are true value is in the range that we specified? This Method of Maximum Likelihood seems like a nastier assumption than the gaussian distribution itself!

Question 2 has been sort of answered for me by Parlyne. I see how the standard deviation of the mean would be related to the standard deviation of the sample by simply just a factor of 1 over the square root of the sample size. And the proof that the average you obtained via the sample is the average that is the center point for the standard distribution of the mean is still completely unknown to me. How do we know that this average is a sufficient center point for the gaussian specified with standard deviation equal to the standard deviation of the mean?

I'm sorry for such a confusing topic, but this whole subject can get pretty long-winded near as I can tell. My only hope is that I have kept my topic and wording not as jumbled as it seems to me.

Parlyne · May 31, 2012

It's certainly true that you always have to make assumptions. The trick is to make as few as possible and to have the ones you make be well justified. So, let me see if I can help with your remaining questions.

1) This really does get to the heart of most discussions of error analysis at the undergrad level, doesn't it? The simple answer is that you are never justified in this assumption. In fact, you can be 100% certain that no set of values you ever get are sampled from a gaussian. This is for exactly the reasons I mentioned above - your data sets will always consist of rational valued numbers sampled from a finite range. So, the use of the gaussian is never strictly an exact thing, but rather, as chiro was getting at, a good enough approximation in many cases, due to the simple fact that, for certain ranges of parameters, most single-peaked probability distributions are well approximated by a guassian.

Of course, this comes with caveats. First of all, if you can actually figure out the form of the real underlying distribution (from theoretical arguments), you can get better results by using it. And, when using a gaussian to approximate anything else, you should take estimates of the probabilities of large deviations with a comparably large grain of salt.

2) As you seem to be getting pretty close to saying here, no amount of statistical work on your data will be able to recover the parameters of the real population distribution. Instead, what you can do is find the parameters of the distribution (of an assumed form) that would be most likely to generate your data set, which makes these parameters the best possible estimate of the parameters of the underlying distribution. Then, the uncertainties in the derived parameters give you an idea of how large a range in the distribution's parameters would still be reasonably consistent with the data.

3) As I've described in my answer to question number two, the Method of Maximum Likelihood is an estimation method. It does not (and, cannot) give you the actual parameters. It can only give you estimates and uncertainties on the estimates. Of course, as with measurements in general, knowing the uncertainty only allows you to make probabilistic statements about the value of parameter. There's no inherent guarantee that peak of the real distribution is within one standard deviation of the mean of your estimate. There's only a statement that in ~67% of cases where a data set matching yours was generated from an underlying distribution the distribution parameters of that underlying distribution were within 1 standard deviation of the mean from your estimated values (and so on).

I should probably note here that the method of maximum likelihood is in no way specific to gaussian distributions. It can be used in any case where you know the form, but not the parameters, of the distribution from which a data set is sample. The only requirement is that each measurement in the data set be statistically independent from the others (this is why the multiplication of the probabilities of the individual measurements is justified).

chiro · May 31, 2012

Parlyne said:

I should probably note here that the method of maximum likelihood is in no way specific to gaussian distributions. It can be used in any case where you know the form, but not the parameters, of the distribution from which a data set is sample. The only requirement is that each measurement in the data set be statistically independent from the others (this is why the multiplication of the probabilities of the individual measurements is justified).

The only warning I would say to the OP is that the distribution itself has to have a proper maximum with respect to the estimator. You can't for example use MLE on a uniform distribution to get a good estimate due to the nature of the derivative being 0 everywhere.

dydxforsn · May 31, 2012

Well, I'll conclude this topic this way. It seems as if I'm not so much bothered by the assumptions that are necessary in the process (there is both the probability distribution assumption as well as the method of maximum likelihood assumption, both of which seem to be decent assumptions, especially when you make the parameters in the experiment (resolution of the measuring device, etc.) suitable as has been said by others in this topic), but I am more bothered by everyone's (not anybody in this topic) constant reference to their supposed certainty that the predicted value is going to FOR SURE be within the 68.3%/95.5%/99.7% range corresponding to 1σ_m/2σ_m/3σ_m etc. It's as if they don't know that there is underlying guesswork, it's probably more this than the actual physics of things that makes me feel like I don't understand something (Though most of the undergraduate labs included like only 3 measurements of a specific quantity, with much larger sets of data I could obviously see the assumed certainty in the percents usually given by people to label the precision of their experiment, but when teaching the idea of error analysis professors should be more clear about this.) That's all I am going to delve into on the subject of how these things come about from certain reasonable assumptions. I really appreciate the responses by both Chiro and Parlyne, this topic has increased my understanding ten fold.

Why do we report the standard deviation of the mean in error analysis?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect