# I don't understand the standard deviation.

Note that I qualified what I said, using "about." I know the difference between the mean and median, and also the mode, another measure of central tendency. I was just trying to distinguish between two types of statistical measures.
It's unfortunate that they all begin with the same letter, it is easy to get them confused (the names). Why not used 'average' for mean. The terms are not helpful if you are not familiar with the them.

I have had a look at that article and it is pretty 'dense' and drags me all over the place via links, so it's hard work.

Anyway it's going a bit too much into the maths of it, it does not seem to get at that the root of the problem.
I had hoped there would be a simple explanation, it is not looking like I will be getting an answers I am happy with here, I will probably have to work out an answer myself before I will be happy (as is sometimes the case).
As I am being pointed to wiki pages it does not seem like anyone will be able to post an answer here I am happy with.
If I'm understanding your initial question, you want to know why standard deviation is the "right" statistic. This is of course a very subjective question, and of course it's not defined mathematically. Some have pointed out various nice mathematical properties it enjoys, but if that's not part of your criteria for being "right," then I'm afraid we have no chance of convincing you :)

http://en.wikipedia.org/wiki/Standard_deviation#Geometric_interpretation

So if you have two data points, (m1, m2), their standard deviation is the minimal distance between the POINT (m1, m2) and the line {(t, t) | t in R}.

The geometrical explanation is very intuitive with data sets of one, two, or three data points, and to me, I can see why it generalizes to n. For continuous distributions, you are simply replacing a summation by integration, as is common when going from discrete *anything* to continuous.

It's a pity there is no diagram with that, it would be much easier to follow.

To gain some geometric insights, we will start with a population of three values, x1, x2, x3. This defines a point P = (x1, x2, x3) in R3. Consider the line L = {(r, r, r) : r in R}. This is the "main diagonal" going through the origin. If our three given values were all equal, then the standard deviation would be zero and P would lie on L. So it is not unreasonable to assume that the standard deviation is related to the distance of P to L. And that is indeed the case. To move orthogonally from L to the point P, one begins at the point:

whose coordinates are the mean of the values we started out with. A little algebra shows that the distance between P and M (which is the same as the orthogonal distance between P and the line L) is equal to the standard deviation of the vector x1, x2, x3, divided by the square root of the number of dimensions of the vector.

1. How can 3 different values define a point?
2. Does R3 mean 3D 3 dimensions?
3. I think I would need a diagram to understand it properly.

If I'm understanding your initial question, you want to know why standard deviation is the "right" statistic. This is of course a very subjective question, and of course it's not defined mathematically. Some have pointed out various nice mathematical properties it enjoys, but if that's not part of your criteria for being "right," then I'm afraid we have no chance of convincing you :)
Seems like that may well be the case.
It's has always sounded to me like something plucked out of thin air, or thereabouts.
Perhaps a bit like trying to work out the standard size of a tin of beans.

Anyhow if I had never heard of it I doubt I would have dreamt it up myself, perhaps because I have never stumbled across any need to.

lavinia
Gold Member
The standard deviation is the expected distance to the mean. It is a natural measure of the average error in estimating the mean from a sample.

From this point of view it has universal meaning as a statistic and is not tied only to the Gaussian distribution.

The conceptual difference between variance and standard deviation is same as the difference between the concepts of distance and squared distance.

the idea of distance to me is more natural. Further it has the same units as the underlying distribution's points and so can be compared to them.

Mark44
Mentor
It's a pity there is no diagram with that, it would be much easier to follow.

To gain some geometric insights, we will start with a population of three values, x1, x2, x3. This defines a point P = (x1, x2, x3) in R3. Consider the line L = {(r, r, r) : r in R}. This is the "main diagonal" going through the origin. If our three given values were all equal, then the standard deviation would be zero and P would lie on L. So it is not unreasonable to assume that the standard deviation is related to the distance of P to L. And that is indeed the case. To move orthogonally from L to the point P, one begins at the point:

whose coordinates are the mean of the values we started out with. A little algebra shows that the distance between P and M (which is the same as the orthogonal distance between P and the line L) is equal to the standard deviation of the vector x1, x2, x3, divided by the square root of the number of dimensions of the vector.

1. How can 3 different values define a point?
2. Does R3 mean 3D 3 dimensions?
3. I think I would need a diagram to understand it properly.
Three values define a point in three-dimensional space, which is often called R3.

Three values define a point in three-dimensional space, which is often called R3.
Most of the graphs I see are in 2D. (All of them infact).

Mark44
Mentor
Seems like that may well be the case.
It's has always sounded to me like something plucked out of thin air, or thereabouts.
We have been laboring away, trying to explain to you that it was not plucked out of thin air. Unfortunately, your response to most of the explanations seems to be that they involve mathematics that you don't understand, or that an article is too dense with links to too many other sites, or that a sentence that seems crystal clear to me is "circular reasoning."

Perhaps a bit like trying to work out the standard size of a tin of beans.

Anyhow if I had never heard of it I doubt I would have dreamt it up myself, perhaps because I have never stumbled across any need to.
Just because you have never seen the need to work with variance of standard deviation doesn't mean that these statistics are unneeded. To use your example of a can of beans, manufacturers and food processors are very interested in making sure that the the variability of what goes in a can or package is tightly controlled. If they put more beans in the can than the advertised weight on the can, they are losing money. If they put too few beans in the can, they can be liable to lawsuits for failing to deliver the advertised amount. You better bet that they are keeping track of the standard deviation here.

The standard deviation is the expected distance to the mean.
$$\text{Is that right? I thought the expected distance to the mean would be}$$
$$\frac{1}{n}\sum_{i=1}^n|x_i-\overline{x}|\text{ with mean }\overline{x}=\frac{1}{n}\sum_{i=1}^nx_i$$

$$\text{(for a discrete set of values }x_i\text{) which isn't in general the same as the root mean square value of the difference from the mean}$$
$$\sqrt{\frac{1}{n}\sum_{i=1}^n(x_i-\overline{x})^2}$$
$$\text{which I thought was the definition of the standard deviation of the set.}$$

The standard deviation is the expected distance to the mean. It is a natural measure of the average error in estimating the mean from a sample.

From this point of view it has universal meaning as a statistic and is not tied only to the Gaussian distribution.

The conceptual difference between variance and standard deviation is same as the difference between the concepts of distance and squared distance.

the idea of distance to me is more natural. Further it has the same units as the underlying distribution's points and so can be compared to them.
Isn't the expected distance to the mean the average distance to the mean?
The average distance does not require squaring.

Some vague terms there 'conceptual difference' based on what concept?

Some of that seems to involve a kind of circular argument such as "blue is a colour which is blue in appearance" although saying that in a more long winded way.
Or put another way, you are using your theorem as the basis of your proof, when you strip out the other maths.

Last edited:
Also, the distribution of errors in nature "would" follow a perfect normal distribution, if everything was "truly" random and behaving perfectly according to long-run patterns.
The pedant would disagree with this rash statement.

Homework Helper
The standard deviation is not the expected distance to the mean.

The primary reason the mean and standard deviation have been used together for so long is the primacy of the assumption of normality for data (rightly or wrongly, usually wrongly). IF your data are normally distributed, or you are willing to believe it is, these are the natural choices for measures of location and spread.

If you prefer to work backwards and say "The best measure of location is the one that gives me the smallest measure of variability from that number to my data", then

a) If you measure variability by using the sum of the squares of the residuals, then it turns out that the mean is the measure that gives the minimum dispersion - that is, you end up working with

$$\sum (x-\bar x)^2$$

b) If you decide to measure variability using the sum of the absolute values, then it turns out that the appropriate meausure of location (appropriate meaning gives lowest value of variability) is the MEDIAN, not the mean. These two go together, but are not as "efficient" for normally distributed data as the mean and the median

Intersting side point: R. A. Fisher and A. Eddington had a similar discussion early in the 20th century. The "dispute" centered on this: IF you assume data is normally distributed, what is the best way to estimate the population standard deviation?

Fisher argued that an appropriate multiple of

$$\sqrt{ \frac 1 n \sum{(x-\bar x)^2}$$

was the answer, while Eddington backed a multiple of

$$\frac 1 n \sum |x - \bar x|$$

was better. It has since been shown that in this limited case (strict assumption of normality) Fisher was correct (his estimate has certain optimum properties as long as normality is assumed).

Is the a link to somewhere which shows Fischer is correct?

Bit off topic but Fisher was interested in Eugenics.

Also he did not believe smoking caused lung cancer, perhaps he got his analysis of the statistics wrong ;)

http://en.wikipedia.org/wiki/Ronald_Fisher

Fisher was opposed to the conclusions of Richard Doll and A.B. Hill that smoking caused lung cancer. He compared the correlations in their papers to a correlation between the import of apples and the rise of divorce in order to show that correlation does not imply causation.

I have to say that is pretty poor form for someone who is supposed to be an expert statistician.
Perhaps it was because he was using the root mean square method. :tongue::rofl:

"He was legendary in being able to produce mathematical results without setting down the intermediate steps."

Well that does not surprise me!

Last edited:
Homework Helper
The reason Fisher was correct is this: for the problem stated, his estimate - that is, the one he backed - has the characteristic being Uniformly Minimum Variance Unbiased, or UMVU, for the standard deviation.

"Fisher was opposed to the conclusions of Richard Doll and A.B. Hill that smoking caused lung cancer. He compared the correlations in their papers to a correlation between the import of apples and the rise of divorce in order to show that correlation does not imply causation.
I have to say that is pretty poor form for someone who is supposed to be an expert statistician.
Perhaps it was because he was using the root mean square method. "

Remember that it wasn't until much later that the link between smoking and cancer was generally accepted. Fisher was not alone in this - and nobody has claimed he was omniscient.

Throughout this period Eddington lectured on relativity, and was particularly well known for his ability to explain the concepts in lay terms as well as scientific. He collected many of these into the Mathematical Theory of Relativity in 1923, which Albert Einstein suggested was "the finest presentation of the subject in any language." He was an early advocate of Einstein's General Relativity, and an interesting anecdote well illustrates his humor and personal intellectual investment: Ludwig Silberstein, a physicist who thought of himself as an expert on relativity, approached Eddington at the Royal Society's (6 November) 1919 meeting where he had defended Einstein's Relativity with his Brazil-Principe Solar Eclipse calculations with some degree of scepticism and ruefully charged Arthur as one who claimed to be one of three men who actually understood the theory (Silberstein, of course, was including himself and Einstein as the other two). When Eddington refrained from replying, he insisted Arthur not be "so shy", whereupon Eddington replied, "Oh, no! I was wondering who the third one might be!

The reason Fisher was correct is this: for the problem stated, his estimate - that is, the one he backed - has the characteristic being Uniformly Minimum Variance Unbiased, or UMVU, for the standard deviation.

"Fisher was opposed to the conclusions of Richard Doll and A.B. Hill that smoking caused lung cancer. He compared the correlations in their papers to a correlation between the import of apples and the rise of divorce in order to show that correlation does not imply causation.
I have to say that is pretty poor form for someone who is supposed to be an expert statistician.
Perhaps it was because he was using the root mean square method. "

Remember that it wasn't until much later that the link between smoking and cancer was generally accepted. Fisher was not alone in this - and nobody has claimed he was omniscient.
Well I am unfamiliar with the term UMVU so I can't comment on that now.

Perhaps the reason why the link was not accepted was because of the work of people such as Fischer, who incidentally was employed by the tobacco firms as a consultant, so he had a significant conflict of interest, which perhaps could be used as an excuse for his failure to see the correlation, the alternative perhaps, is being seen as a poor statistician!!
He also, I think, would be seen as racist these days.

Phizo is right to continue with this question - so far nobody has yet meaningfully explained here, why the standard deviation is somehow "best" or "most natural" as an approach to a measure of spread for data. Having useful mathematical properties, or neat interpretations in some other context, is not unique to the standard deviation - thus "best" or "natural" or "the appropriate choice", is not inherent in such observations. A mathematical expression having certain optimal properties, is not necessarily an explanation either, in the absence of any clear proof of uniqueness.

Also, so far nobody has mentioned the very important data-aspect of the degrees of freedom and associated denominator of (n-1) in the formulas for the sample-level variance and standard deviation -- as opposed to the denominator of (n) in the corresponding population-level formulas.

Phizo, you are asking a very good question here, and it is good of you to persist! The fact is that, in practice, standard deviation is not essentially or necessarily "best" or most natural as a measure of spread. If statistics as an applied science were to be reborn anew tomorrow - alongside all of our current widely and readily available computing technology - it is certainly possible that the standard deviation formula as we see & use it today, would not be the most popular or default choice for a measure of spread.

Last edited:
Homework Helper
Phizo is right to continue with this question - so far nobody has yet meaningfully explained here, why the standard deviation is somehow "best" or "most natural" as an approach to a measure of spread for data. Having useful mathematical properties, or neat interpretations in some other context, is not unique to the standard deviation - thus "best" or "natural" or "the appropriate choice", is not inherent in such observations. A mathematical expression having certain optimal properties, is not necessarily an explanation either, in the absence of any clear proof of uniqueness.
In this context "best" has a certain statistical meaning. IF you assume your data comes from the normal distribution, then the best estimates of mean, variance, and standard deviation are the ones being discussed. If you make different assumptions, you get different answers.

Also, so far nobody has mentioned the very important data-aspect of the degrees of freedom and associated denominator of (n-1) in the formulas for the sample-level variance and standard deviation -- as opposed to the denominator of (n) in the corresponding population-level formulas.
The denominator in the sample variance is selected to be $\frac 1 {n-1}$ in order to make the statistic unbiased - so that its expectation equals the sample variance.
Phizo, you are asking a very good question here, and it is good of you to persist! The fact is that, in practice, standard deviation is not essentially or necessarily "best" or most natural as a measure of spread. If statistics as an applied science were to be reborn anew tomorrow - alongside all of our current widely and readily available computing technology - it is certainly possible that the standard deviation formula as we see & use it today, would not be the most popular or default choice for a measure of spread.
Possibly - there are many other methods for measuring variability now. However,
a) It would still be the case that the same quantities would be found as "most natural" to use when people assume normality
b) It would probably be the (unfortunate) case that the normal distribution would rise to prominence as the most used (and so, mis-used) distributional assumption
c) It would be the case that non-parametric, and robust, measures, would be adopted more readily than they have been (even though their use is becoming more common) as a consequence of the widely available computing power

Phizo is right to continue with this question - so far nobody has yet meaningfully explained here, why the standard deviation is somehow "best" or "most natural" as an approach to a measure of spread for data. Having useful mathematical properties, or neat interpretations in some other context, is not unique to the standard deviation - thus "best" or "natural" or "the appropriate choice", is not inherent in such observations. A mathematical expression having certain optimal properties, is not necessarily an explanation either, in the absence of any clear proof of uniqueness.

Also, so far nobody has mentioned the very important data-aspect of the degrees of freedom and associated denominator of (n-1) in the formulas for the sample-level variance and standard deviation -- as opposed to the denominator of (n) in the corresponding population-level formulas.

Phizo, you are asking a very good question here, and it is good of you to persist! The fact is that, in practice, standard deviation is not essentially or necessarily "best" or most natural as a measure of spread. If statistics as an applied science were to be reborn anew tomorrow - alongside all of our current widely and readily available computing technology - it is certainly possible that the standard deviation formula as we see & use it today, would not be the most popular or default choice for a measure of spread.
Well as I said initially I just do not really see where it comes from, and most of the answers I get seem to based on some other dubious and unexplained concept.

I mean measuring a spread is a somewhat vague concept anyway, it seems to be a process of measuring the unmeasurable, for example I think they use it in opinion polls and and they are pretty much pot luck in that you hope to pick a representive sample.

CRGreathouse
Homework Helper
That also seems a bit of circular arguement.
You need to explain why this is so first I think.
I need to explain why the mean and standard deviation define a normal distribution?!?

Homework Helper
Mark44 is correct - I will be a little less diplomatic: try actually studying and learning about the material BEFORE you write all of it off. It will take some work (unlike referring to the world's largest repository of unreliable material, Wikipedia)

I need to explain why the mean and standard deviation define a normal distribution?!?
Yes please. You seem to indicate it is a simple answer, so why not just explain it?

We have been laboring away, trying to explain to you that it was not plucked out of thin air. Unfortunately, your response to most of the explanations seems to be that they involve mathematics that you don't understand, or that an article is too dense with links to too many other sites, or that a sentence that seems crystal clear to me is "circular reasoning."

Just because you have never seen the need to work with variance of standard deviation doesn't mean that these statistics are unneeded. To use your example of a can of beans, manufacturers and food processors are very interested in making sure that the the variability of what goes in a can or package is tightly controlled. If they put more beans in the can than the advertised weight on the can, they are losing money. If they put too few beans in the can, they can be liable to lawsuits for failing to deliver the advertised amount. You better bet that they are keeping track of the standard deviation here.

It's not the mathematics I don't understand but the language used to hide the mathematics.

Integral
Staff Emeritus
Gold Member
I have had a look at that article and it is pretty 'dense' and drags me all over the place via links, so it's hard work.

Anyway it's going a bit too much into the maths of it, it does not seem to get at that the root of the problem.
I had hoped there would be a simple explanation, it is not looking like I will be getting an answers I am happy with here, I will probably have to work out an answer myself before I will be happy (as is sometimes the case).
As I am being pointed to wiki pages it does not seem like anyone will be able to post an answer here I am happy with.
If you cannot understand the math, and are not willing to do the work necessary to understand it, then there is really no point in continuing this discussion. Seems to me that you are way eager to argue and reluctant to put any effort into learning.