I don't understand the standard deviation.

phizo · May 17, 2010

OK I was going to post this in homework, but it's not homework (left school a long time ago ;))

Anyway when I was at school I did not do stats I did mechanics instead (you had a choice).
However all the people doing stats were always on about the the SD, which was a bit annoying
cos I didn't know what it was.
Anyway I asked one guy who did stats what it was, and he did not seem to give a very convincing/good answer, I think he said it was so many percent or something like that.

OK so I looked on wiki and it gives a definition

SD = is the square root of its variance

and:-

variance of a random variable or distribution is the expectation, or mean, of the deviation squared of that variable from its expected value or mean.

OK...so I can understand how to calculate it however... I don't understand *why* you would want to calculate that?

Why not take the cube root of the variance, or 1/variance or log(variance) or sin(variance)
or indeed any other function you can think of?

I just do not see what is significant about the square root of the variance!

Hope this is not a 'stupid question' :)

Mark44 · May 17, 2010

phizo said:

OK I was going to post this in homework, but it's not homework (left school a long time ago ;))

Anyway when I was at school I did not do stats I did mechanics instead (you had a choice).
However all the people doing stats were always on about the the SD, which was a bit annoying
cos I didn't know what it was.
Anyway I asked one guy who did stats what it was, and he did not seem to give a very convincing/good answer, I think he said it was so many percent or something like that.

OK so I looked on wiki and it gives a definition

SD = is the square root of its variance

and:-

variance of a random variable or distribution is the expectation, or mean, of the deviation squared of that variable from its expected value or mean.

OK...so I can understand how to calculate it however... I don't understand *why* you would want to calculate that?

Why not take the cube root of the variance, or 1/variance or log(variance) or sin(variance)
or indeed any other function you can think of?

I just do not see what is significant about the square root of the variance!

Hope this is not a 'stupid question' :)

There are two measures that are typically used to describe some distribution - the mean, and the standard deviation. The mean (aka "average") is a measure of central tendency. In other words, it's a measure of where the middle of the distribution is.

The other commonly used measure is the standard deviation, which is a measure of dispersion. In other words, how spread out from the middle the data is.

For example, if the average income of a group of people happened to be $50,000, all you can say is that about half the people in the group have an income higher than $50,000, and about half have lower incomes. You have no information at all about the highest or lowest wages in the group.

However, if you know the standard deviation, this fact gives you an idea of how spread out the wages are in this group. A small standard deviation implies that the wages are clustered pretty closely to the mean; a large s.d. implies that they are more spread out.

The variance is closely related to the standard deviation, and is essentially the sum of the squares of the distances from the mean value. (This is somewhat oversimplified.) Since this variance comes from the squares of numbers, the most reasonable thing would be to define another statistic (the standard deviation) that is the square root of the variance.

CRGreathouse · May 17, 2010

It's actually pretty simple: the square root of variance gives you the original units back. If your data is in meters, the variance is in square meters and the standard deviation is in meters. It's a unit that can be interpreted in real-word terms fairly readily.

phizo · May 17, 2010

Would not the 'not squared method' (absolute deviation it seems to be called) not give a similar indication of spread?

The squaring method gives more weight to some values than others?

phizo · May 17, 2010

CRGreathouse said:

It's actually pretty simple: the square root of variance gives you the original units back. If your data is in meters, the variance is in square meters and the standard deviation is in meters. It's a unit that can be interpreted in real-word terms fairly readily.

Well my problem is I don't understand the need to square root in the first place!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.

CRGreathouse · May 17, 2010

phizo said:

Well my problem is I don't understand the need to square root in the first place!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.

Ah. It happens that there are a number of useful properties about the variance that make it the natural choice. The sum of independent normal distributions has the property that the mean is the sum of the means and the variance is the sum of the variances, for example.

phizo · May 17, 2010

Well you are saying you can add these values together, which may well be useful.
I don't think that tells me anything more, because if the the things you add together have little meaning then being able to add them together does not really increase their usefulnes IMO.
I think you have to establish the usefulness of the individual sums/elements in the first place.

For example, cubes could also be used to give some indication of spread, or some other power. I just do not see a reason to use anything other than the first power of the variables.

For example, when ever I have wanted to deal with a set of data I have never felt any compulsion to use anything but the but the first power of the data, ie the data itself or the deviation from the mean ( actually I have never gone even that far!).

I did once plot the log of some data, because the values went 'off the screen', so why not take the log of the values? And then the 'anti-log'(?) of the sums?

It just seems to me there are a lot of arbitrary data functions you could use and I see nothing particularly compelling about the square/root.

Mark44 · May 17, 2010

The sum of the deviations from the mean is not a useful measure, since it always comes out to 0. The sum of the absolute values of the deviations from the mean is a better choice, but to quote from one text,
"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."
- An Introduction to Mathematical Statistics and Its Applications, 2nd Ed. Larsen and Marx.

The main reason that data is plotted on log scales is the orders of magnitude difference between the smallest and largest data values. Another reason is that if there is a power function relationship between two variables (i.e., y = xⁿ), the graph of the log of y will appear as a straight line.

The Chaz · May 17, 2010

phizo said:

Well my problem is I don't understand the need to square root in the first place!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.

The reason that we square it might best be demonstrated by the following data points:
1, 5, 9, 13, 17
The mean is obviously 9.
If our method for calculating variance/deviation were defined as simply summing the differences...
(1-9) + (5-9) + (9-9) + (13-9) + (17-7)
IS ZERO! So that's out.

Why not take the absolute value? Ugly to differentiate, for one.

Squaring and later "square-rooting" is almost like the absolute value, but ends up being easier to work with.

phizo · May 17, 2010

The Chaz said:

The reason that we square it might best be demonstrated by the following data points:
1, 5, 9, 13, 17
The mean is obviously 9.
If our method for calculating variance/deviation were defined as simply summing the differences...
(1-9) + (5-9) + (9-9) + (13-9) + (17-7)
IS ZERO! So that's out.

Why not take the absolute value? Ugly to differentiate, for one.

Squaring and later "square-rooting" is almost like the absolute value, but ends up being easier to work with.

Yes I would have said take the absolute values, I see no reason not to.
I don't see your point about differentiate I afraid. A sine wave goes negative, no problems differentiating it really.

CRGreathouse · May 17, 2010

phizo said:

Well you are saying you can add these values together, which may well be useful.
I don't think that tells me anything more, because if the the things you add together have little meaning then being able to add them together does not really increase their usefulnes IMO.

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

phizo · May 17, 2010

Mark44 said:

The sum of the deviations from the mean is not a useful measure, since it always comes out to 0. The sum of the absolute values of the deviations from the mean is a better choice, but to quote from one text,
"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."
- An Introduction to Mathematical Statistics and Its Applications, 2nd Ed. Larsen and Marx.

The main reason that data is plotted on log scales is the orders of magnitude difference between the smallest and largest data values. Another reason is that if there is a power function relationship between two variables (i.e., y = xⁿ), the graph of the log of y will appear as a straight line.

I would give you the same answer as to Chaz.

Also:-

"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."

I would love to see the arithmetic formula for "attractive".
Also as I said I don't really see the problem with differentation either.
And if I were to get picky in a similar manner, I would say the square root has two values, a positive and a negative one, surely one solution is more attractive than two??

Furthermore I could add, as I am not too happy with the standard deviation in the first place, it is rather unlikely I would be any happier differentiating it!
I would have derived and even unhappier answer!

phizo · May 17, 2010

CRGreathouse said:

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

Really?
Well I will have to get back to you on that one, that's a whole new can of worms for me!
(You will probably wish you had not bothered!).

Mark44 · May 18, 2010

phizo said:

Yes I would have said take the absolute values, I see no reason not to.
I don't see your point about differentiate I afraid. A sine wave goes negative, no problems differentiating it really.

It's not a matter of going negative. The sine function is continuous, as is the absolute value function. However, the sine function is differentiable at every point, but the absolute value is not differentiable at a number right in the middle of its domain. This is one of the reasons that the variance is more "attractive" as a statistic, according to the reference I cited. They didn't need to give an arithmetic formula to justify their preference; they gave reasons.

Mark44 · May 18, 2010

CRGreathouse said:

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

phizo said:

Really?

Yes, really. If you want to argue your point and be taken seriously, you're going to first have to get up to speed on some basic statistics. In your original post you said that you didn't get the standard deviation. Three regulars on this forum (CRGreathouse, The Chaz, and myself) have attempted to enlighten you as to what the standard deviation is and what it is used for.

As already stated numerous times, the sum of the absolute values of the differences between the data values and the means also works as a measure of dispersion, but for mathematical reasons already given, this particular measure is not the preferred one.

Now, have we answered your question?

phizo said:

Well I will have to get back to you on that one, that's a whole new can of worms for me!
(You will probably wish you had not bothered!).

Integral · May 18, 2010

Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" . Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.

G-U-E-S-T · May 18, 2010

Phizo's question is perfectly reasonable - using the absolute differences makes sense, it just proves a little more awkward to use this approach in practice (too awkward it appears, prior to the development of our modern calculating conveniences).

And while the mean and S.D. are sufficient for defining a normal distribution, that does not mean they are necessary. Are there not other equally valid approaches that exist, that could also be used to define a normal distribution?

Noting that variances add when the associated random variables are independent, is not necessarily a valid justification either. For instance, variances also add when the variables are perfectly correlated as well. Further, establishing perfect independance of variables is not generally possible, in practice - it is usually just a "hail mary" assumption in a mathematical model, to ignore possible lurking variables.

Also, the distribution of errors in nature "would" follow a perfect normal distribution, if everything was "truly" random and behaving perfectly according to long-run patterns. These mathematical idealizations are useful in theory - but it is only a model (nature is far richer than any mathematical model).

And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median. [And when the median is the measure of center being used, then the interquartile range is clearly a more appropriate measure of spread than the standard deviation.]

In orther words: Don't be discouraged Phizo in your inquiry. The S.D. is not the "most natural" measure of spread that could ever have been developed in the history of statistics - it is just a useful convention (one of many that could have also been developed).

Mark44 · May 18, 2010

G-U-E-S-T said:

And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median.

Note that I qualified what I said, using "about." I know the difference between the mean and median, and also the mode, another measure of central tendency. I was just trying to distinguish between two types of statistical measures.

The Chaz · May 18, 2010

Mark44 said:

...I know the difference between the mean and median, and also the mode, another measure of central tendency...

I was getting nervous for a minute there!

lavinia · May 18, 2010

Integral said:

Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" . Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.

I would add that before the days of high speed computers - empirical distributions were often impossible to describe. However, it was known that a sampling distribution of averages is close to normal if the number of observations in the sample is large enough.

The process of averaging made empirical distributions tractable and reduced the problem of describing the distribution to estimating only 2 parameters, the mean and the variance.

Nowadays, computers can describe distributions easily and have allowed more accurate descriptions of empirical distributions. For instance, careful studies indicate that the distribution of securities returns is not normal but has "fat tails" i.e. too much probability of outlier returns. These studies require huge amounts of data.

Tac-Tics · May 18, 2010

I was always curious about this. Even my statistics-major ex-girlfriend couldn't give me a good explanation.

Finally, someone on Wikipedia created this page:

http://en.wikipedia.org/wiki/Standard_deviation#Geometric_interpretation

So if you have two data points, (m1, m2), their standard deviation is the minimal distance between the POINT (m1, m2) and the line {(t, t) | t in R}.

The geometrical explanation is very intuitive with data sets of one, two, or three data points, and to me, I can see why it generalizes to n. For continuous distributions, you are simply replacing a summation by integration, as is common when going from discrete *anything* to continuous.

phizo · May 18, 2010

Mark44 said:

It's not a matter of going negative. The sine function is continuous, as is the absolute value function. However, the sine function is differentiable at every point, but the absolute value is not differentiable at a number right in the middle of its domain. This is one of the reasons that the variance is more "attractive" as a statistic, according to the reference I cited. They didn't need to give an arithmetic formula to justify their preference; they gave reasons.

Sorry makes little sense to me.
The initial data will not be 'differentiatable' either as far as I can see as it is usually just a number of points, ie not continuous, so I don't think you argument is valid.
It seems a very flawed argument to me.

phizo · May 18, 2010

CRGreathouse said:

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

That also seems a bit of circular arguement.
You need to explain why this is so first I think.

phizo · May 18, 2010

Integral said:

Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" . Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.

I have had a look at that article and it is pretty 'dense' and drags me all over the place via links, so it's hard work.

Anyway it's going a bit too much into the maths of it, it does not seem to get at that the root of the problem.
I had hoped there would be a simple explanation, it is not looking like I will be getting an answers I am happy with here, I will probably have to work out an answer myself before I will be happy (as is sometimes the case).
As I am being pointed to wiki pages it does not seem like anyone will be able to post an answer here I am happy with.

phizo · May 18, 2010

G-U-E-S-T said:

Phizo's question is perfectly reasonable - using the absolute differences makes sense, it just proves a little more awkward to use this approach in practice (too awkward it appears, prior to the development of our modern calculating conveniences).

And while the mean and S.D. are sufficient for defining a normal distribution, that does not mean they are necessary. Are there not other equally valid approaches that exist, that could also be used to define a normal distribution?

Noting that variances add when the associated random variables are independent, is not necessarily a valid justification either. For instance, variances also add when the variables are perfectly correlated as well. Further, establishing perfect independance of variables is not generally possible, in practice - it is usually just a "hail mary" assumption in a mathematical model, to ignore possible lurking variables.

Also, the distribution of errors in nature "would" follow a perfect normal distribution, if everything was "truly" random and behaving perfectly according to long-run patterns. These mathematical idealizations are useful in theory - but it is only a model (nature is far richer than any mathematical model).

And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median. [And when the median is the measure of center being used, then the interquartile range is clearly a more appropriate measure of spread than the standard deviation.]

In orther words: Don't be discouraged Phizo in your inquiry. The S.D. is not the "most natural" measure of spread that could ever have been developed in the history of statistics - it is just a useful convention (one of many that could have also been developed).

Thanks for that, that was helpful, I won't be discouraged but I fear I might be 'stopped' lol.

phizo · May 18, 2010

Mark44 said:

Note that I qualified what I said, using "about." I know the difference between the mean and median, and also the mode, another measure of central tendency. I was just trying to distinguish between two types of statistical measures.

It's unfortunate that they all begin with the same letter, it is easy to get them confused (the names). Why not used 'average' for mean. The terms are not helpful if you are not familiar with the them.

zhentil · May 18, 2010

phizo said:

I have had a look at that article and it is pretty 'dense' and drags me all over the place via links, so it's hard work.

Anyway it's going a bit too much into the maths of it, it does not seem to get at that the root of the problem.
I had hoped there would be a simple explanation, it is not looking like I will be getting an answers I am happy with here, I will probably have to work out an answer myself before I will be happy (as is sometimes the case).
As I am being pointed to wiki pages it does not seem like anyone will be able to post an answer here I am happy with.

If I'm understanding your initial question, you want to know why standard deviation is the "right" statistic. This is of course a very subjective question, and of course it's not defined mathematically. Some have pointed out various nice mathematical properties it enjoys, but if that's not part of your criteria for being "right," then I'm afraid we have no chance of convincing you :)

phizo · May 18, 2010

Tac-Tics said:

I was always curious about this. Even my statistics-major ex-girlfriend couldn't give me a good explanation.

Finally, someone on Wikipedia created this page:

http://en.wikipedia.org/wiki/Standard_deviation#Geometric_interpretation

So if you have two data points, (m1, m2), their standard deviation is the minimal distance between the POINT (m1, m2) and the line {(t, t) | t in R}.

The geometrical explanation is very intuitive with data sets of one, two, or three data points, and to me, I can see why it generalizes to n. For continuous distributions, you are simply replacing a summation by integration, as is common when going from discrete *anything* to continuous.

It's a pity there is no diagram with that, it would be much easier to follow.

To gain some geometric insights, we will start with a population of three values, x1, x2, x3. This defines a point P = (x1, x2, x3) in R3. Consider the line L = {(r, r, r) : r in R}. This is the "main diagonal" going through the origin. If our three given values were all equal, then the standard deviation would be zero and P would lie on L. So it is not unreasonable to assume that the standard deviation is related to the distance of P to L. And that is indeed the case. To move orthogonally from L to the point P, one begins at the point:

whose coordinates are the mean of the values we started out with. A little algebra shows that the distance between P and M (which is the same as the orthogonal distance between P and the line L) is equal to the standard deviation of the vector x1, x2, x3, divided by the square root of the number of dimensions of the vector.

1. How can 3 different values define a point?
2. Does R3 mean 3D 3 dimensions?
3. I think I would need a diagram to understand it properly.

phizo · May 18, 2010

zhentil said:

If I'm understanding your initial question, you want to know why standard deviation is the "right" statistic. This is of course a very subjective question, and of course it's not defined mathematically. Some have pointed out various nice mathematical properties it enjoys, but if that's not part of your criteria for being "right," then I'm afraid we have no chance of convincing you :)

Seems like that may well be the case.
It's has always sounded to me like something plucked out of thin air, or thereabouts.
Perhaps a bit like trying to work out the standard size of a tin of beans.

Anyhow if I had never heard of it I doubt I would have dreamt it up myself, perhaps because I have never stumbled across any need to.

lavinia · May 18, 2010

The standard deviation is the expected distance to the mean. It is a natural measure of the average error in estimating the mean from a sample.

From this point of view it has universal meaning as a statistic and is not tied only to the Gaussian distribution.

The conceptual difference between variance and standard deviation is same as the difference between the concepts of distance and squared distance.

the idea of distance to me is more natural. Further it has the same units as the underlying distribution's points and so can be compared to them.

Mark44 · May 18, 2010

phizo said:

It's a pity there is no diagram with that, it would be much easier to follow.

To gain some geometric insights, we will start with a population of three values, x1, x2, x3. This defines a point P = (x1, x2, x3) in R3. Consider the line L = {(r, r, r) : r in R}. This is the "main diagonal" going through the origin. If our three given values were all equal, then the standard deviation would be zero and P would lie on L. So it is not unreasonable to assume that the standard deviation is related to the distance of P to L. And that is indeed the case. To move orthogonally from L to the point P, one begins at the point:

whose coordinates are the mean of the values we started out with. A little algebra shows that the distance between P and M (which is the same as the orthogonal distance between P and the line L) is equal to the standard deviation of the vector x1, x2, x3, divided by the square root of the number of dimensions of the vector.

1. How can 3 different values define a point?
2. Does R3 mean 3D 3 dimensions?
3. I think I would need a diagram to understand it properly.

Three values define a point in three-dimensional space, which is often called R³.

phizo · May 18, 2010

Mark44 said:

Three values define a point in three-dimensional space, which is often called R³.

Most of the graphs I see are in 2D. (All of them infact).

Mark44 · May 18, 2010

phizo said:

Seems like that may well be the case.
It's has always sounded to me like something plucked out of thin air, or thereabouts.

We have been laboring away, trying to explain to you that it was not plucked out of thin air. Unfortunately, your response to most of the explanations seems to be that they involve mathematics that you don't understand, or that an article is too dense with links to too many other sites, or that a sentence that seems crystal clear to me is "circular reasoning."

phizo said:

Perhaps a bit like trying to work out the standard size of a tin of beans.

Anyhow if I had never heard of it I doubt I would have dreamt it up myself, perhaps because I have never stumbled across any need to.

Just because you have never seen the need to work with variance of standard deviation doesn't mean that these statistics are unneeded. To use your example of a can of beans, manufacturers and food processors are very interested in making sure that the the variability of what goes in a can or package is tightly controlled. If they put more beans in the can than the advertised weight on the can, they are losing money. If they put too few beans in the can, they can be liable to lawsuits for failing to deliver the advertised amount. You better bet that they are keeping track of the standard deviation here.

Martin Rattigan · May 18, 2010

lavinia said:

The standard deviation is the expected distance to the mean.

\text{Is that right? I thought the expected distance to the mean would be}
\frac{1}{n}\sum_{i=1}^n|x_i-\overline{x}|\text{ with mean }\overline{x}=\frac{1}{n}\sum_{i=1}^nx_i

\text{(for a discrete set of values }x_i\text{) which isn't in general the same as the root mean square value of the difference from the mean}
\sqrt{\frac{1}{n}\sum_{i=1}^n(x_i-\overline{x})^2}
\text{which I thought was the definition of the standard deviation of the set.}

phizo · May 18, 2010

lavinia said:

The standard deviation is the expected distance to the mean. It is a natural measure of the average error in estimating the mean from a sample.

From this point of view it has universal meaning as a statistic and is not tied only to the Gaussian distribution.

The conceptual difference between variance and standard deviation is same as the difference between the concepts of distance and squared distance.

the idea of distance to me is more natural. Further it has the same units as the underlying distribution's points and so can be compared to them.

Isn't the expected distance to the mean the average distance to the mean?
The average distance does not require squaring.

Some vague terms there 'conceptual difference' based on what concept?

Some of that seems to involve a kind of circular argument such as "blue is a colour which is blue in appearance" although saying that in a more long winded way.
Or put another way, you are using your theorem as the basis of your proof, when you strip out the other maths.

Studiot · May 18, 2010

Also, the distribution of errors in nature "would" follow a perfect normal distribution, if everything was "truly" random and behaving perfectly according to long-run patterns.

The pedant would disagree with this rash statement.

statdad · May 18, 2010

The standard deviation is not the expected distance to the mean.

The primary reason the mean and standard deviation have been used together for so long is the primacy of the assumption of normality for data (rightly or wrongly, usually wrongly). IF your data are normally distributed, or you are willing to believe it is, these are the natural choices for measures of location and spread.

If you prefer to work backwards and say "The best measure of location is the one that gives me the smallest measure of variability from that number to my data", then

a) If you measure variability by using the sum of the squares of the residuals, then it turns out that the mean is the measure that gives the minimum dispersion - that is, you end up working with

 \sum (x-\bar x)^2 

b) If you decide to measure variability using the sum of the absolute values, then it turns out that the appropriate meausure of location (appropriate meaning gives lowest value of variability) is the MEDIAN, not the mean. These two go together, but are not as "efficient" for normally distributed data as the mean and the median

Intersting side point: R. A. Fisher and A. Eddington had a similar discussion early in the 20th century. The "dispute" centered on this: IF you assume data is normally distributed, what is the best way to estimate the population standard deviation?

Fisher argued that an appropriate multiple of

 \sqrt{ \frac 1 n \sum{(x-\bar x)^2} 

was the answer, while Eddington backed a multiple of

 \frac 1 n \sum |x - \bar x| 

was better. It has since been shown that in this limited case (strict assumption of normality) Fisher was correct (his estimate has certain optimum properties as long as normality is assumed).

phizo · May 18, 2010

Is the a link to somewhere which shows Fischer is correct?

phizo · May 18, 2010

Bit off topic but Fisher was interested in Eugenics.

Also he did not believe smoking caused lung cancer, perhaps he got his analysis of the statistics wrong ;)

http://en.wikipedia.org/wiki/Ronald_Fisher

Fisher was opposed to the conclusions of Richard Doll and A.B. Hill that smoking caused lung cancer. He compared the correlations in their papers to a correlation between the import of apples and the rise of divorce in order to show that correlation does not imply causation.I have to say that is pretty poor form for someone who is supposed to be an expert statistician.
Perhaps it was because he was using the root mean square method.

"He was legendary in being able to produce mathematical results without setting down the intermediate steps."

Well that does not surprise me!

statdad · May 18, 2010

The reason Fisher was correct is this: for the problem stated, his estimate - that is, the one he backed - has the characteristic being Uniformly Minimum Variance Unbiased, or UMVU, for the standard deviation.

"Fisher was opposed to the conclusions of Richard Doll and A.B. Hill that smoking caused lung cancer. He compared the correlations in their papers to a correlation between the import of apples and the rise of divorce in order to show that correlation does not imply causation.
I have to say that is pretty poor form for someone who is supposed to be an expert statistician.
Perhaps it was because he was using the root mean square method. "

Remember that it wasn't until much later that the link between smoking and cancer was generally accepted. Fisher was not alone in this - and nobody has claimed he was omniscient.

phizo · May 18, 2010

Anecdote about Eddington

Throughout this period Eddington lectured on relativity, and was particularly well known for his ability to explain the concepts in lay terms as well as scientific. He collected many of these into the Mathematical Theory of Relativity in 1923, which Albert Einstein suggested was "the finest presentation of the subject in any language." He was an early advocate of Einstein's General Relativity, and an interesting anecdote well illustrates his humor and personal intellectual investment: Ludwig Silberstein, a physicist who thought of himself as an expert on relativity, approached Eddington at the Royal Society's (6 November) 1919 meeting where he had defended Einstein's Relativity with his Brazil-Principe Solar Eclipse calculations with some degree of scepticism and ruefully charged Arthur as one who claimed to be one of three men who actually understood the theory (Silberstein, of course, was including himself and Einstein as the other two). When Eddington refrained from replying, he insisted Arthur not be "so shy", whereupon Eddington replied, "Oh, no! I was wondering who the third one might be!

Anyway interesting reading about these too as I had never heard of either before.

phizo · May 18, 2010

statdad said:

The reason Fisher was correct is this: for the problem stated, his estimate - that is, the one he backed - has the characteristic being Uniformly Minimum Variance Unbiased, or UMVU, for the standard deviation.

"Fisher was opposed to the conclusions of Richard Doll and A.B. Hill that smoking caused lung cancer. He compared the correlations in their papers to a correlation between the import of apples and the rise of divorce in order to show that correlation does not imply causation.
I have to say that is pretty poor form for someone who is supposed to be an expert statistician.
Perhaps it was because he was using the root mean square method. "

Remember that it wasn't until much later that the link between smoking and cancer was generally accepted. Fisher was not alone in this - and nobody has claimed he was omniscient.

Well I am unfamiliar with the term UMVU so I can't comment on that now.

Perhaps the reason why the link was not accepted was because of the work of people such as Fischer, who incidentally was employed by the tobacco firms as a consultant, so he had a significant conflict of interest, which perhaps could be used as an excuse for his failure to see the correlation, the alternative perhaps, is being seen as a poor statistician!
He also, I think, would be seen as racist these days.

G-U-E-S-T · May 19, 2010

Phizo is right to continue with this question - so far nobody has yet meaningfully explained here, why the standard deviation is somehow "best" or "most natural" as an approach to a measure of spread for data. Having useful mathematical properties, or neat interpretations in some other context, is not unique to the standard deviation - thus "best" or "natural" or "the appropriate choice", is not inherent in such observations. A mathematical expression having certain optimal properties, is not necessarily an explanation either, in the absence of any clear proof of uniqueness.

Also, so far nobody has mentioned the very important data-aspect of the degrees of freedom and associated denominator of (n-1) in the formulas for the sample-level variance and standard deviation -- as opposed to the denominator of (n) in the corresponding population-level formulas.

Phizo, you are asking a very good question here, and it is good of you to persist! The fact is that, in practice, standard deviation is not essentially or necessarily "best" or most natural as a measure of spread. If statistics as an applied science were to be reborn anew tomorrow - alongside all of our current widely and readily available computing technology - it is certainly possible that the standard deviation formula as we see & use it today, would not be the most popular or default choice for a measure of spread.

statdad · May 19, 2010

G-U-E-S-T said:

Phizo is right to continue with this question - so far nobody has yet meaningfully explained here, why the standard deviation is somehow "best" or "most natural" as an approach to a measure of spread for data. Having useful mathematical properties, or neat interpretations in some other context, is not unique to the standard deviation - thus "best" or "natural" or "the appropriate choice", is not inherent in such observations. A mathematical expression having certain optimal properties, is not necessarily an explanation either, in the absence of any clear proof of uniqueness.

In this context "best" has a certain statistical meaning. IF you assume your data comes from the normal distribution, then the best estimates of mean, variance, and standard deviation are the ones being discussed. If you make different assumptions, you get different answers.

Also, so far nobody has mentioned the very important data-aspect of the degrees of freedom and associated denominator of (n-1) in the formulas for the sample-level variance and standard deviation -- as opposed to the denominator of (n) in the corresponding population-level formulas.

The denominator in the sample variance is selected to be \frac 1 {n-1} in order to make the statistic unbiased - so that its expectation equals the sample variance.

Phizo, you are asking a very good question here, and it is good of you to persist! The fact is that, in practice, standard deviation is not essentially or necessarily "best" or most natural as a measure of spread. If statistics as an applied science were to be reborn anew tomorrow - alongside all of our current widely and readily available computing technology - it is certainly possible that the standard deviation formula as we see & use it today, would not be the most popular or default choice for a measure of spread.

Possibly - there are many other methods for measuring variability now. However,
a) It would still be the case that the same quantities would be found as "most natural" to use when people assume normality
b) It would probably be the (unfortunate) case that the normal distribution would rise to prominence as the most used (and so, mis-used) distributional assumption
c) It would be the case that non-parametric, and robust, measures, would be adopted more readily than they have been (even though their use is becoming more common) as a consequence of the widely available computing power

phizo · May 19, 2010

G-U-E-S-T said:

Phizo is right to continue with this question - so far nobody has yet meaningfully explained here, why the standard deviation is somehow "best" or "most natural" as an approach to a measure of spread for data. Having useful mathematical properties, or neat interpretations in some other context, is not unique to the standard deviation - thus "best" or "natural" or "the appropriate choice", is not inherent in such observations. A mathematical expression having certain optimal properties, is not necessarily an explanation either, in the absence of any clear proof of uniqueness.

Also, so far nobody has mentioned the very important data-aspect of the degrees of freedom and associated denominator of (n-1) in the formulas for the sample-level variance and standard deviation -- as opposed to the denominator of (n) in the corresponding population-level formulas.

Phizo, you are asking a very good question here, and it is good of you to persist! The fact is that, in practice, standard deviation is not essentially or necessarily "best" or most natural as a measure of spread. If statistics as an applied science were to be reborn anew tomorrow - alongside all of our current widely and readily available computing technology - it is certainly possible that the standard deviation formula as we see & use it today, would not be the most popular or default choice for a measure of spread.

Well as I said initially I just do not really see where it comes from, and most of the answers I get seem to based on some other dubious and unexplained concept.

I mean measuring a spread is a somewhat vague concept anyway, it seems to be a process of measuring the unmeasurable, for example I think they use it in opinion polls and and they are pretty much pot luck in that you hope to pick a representive sample.

CRGreathouse · May 19, 2010

phizo said:

That also seems a bit of circular arguement.
You need to explain why this is so first I think.

I need to explain why the mean and standard deviation define a normal distribution?!?

statdad · May 19, 2010

Mark44 is correct - I will be a little less diplomatic: try actually studying and learning about the material BEFORE you write all of it off. It will take some work (unlike referring to the world's largest repository of unreliable material, Wikipedia)

phizo · May 19, 2010

CRGreathouse said:

I need to explain why the mean and standard deviation define a normal distribution?!?

Yes please. You seem to indicate it is a simple answer, so why not just explain it?

phizo · May 19, 2010

Mark44 said:

We have been laboring away, trying to explain to you that it was not plucked out of thin air. Unfortunately, your response to most of the explanations seems to be that they involve mathematics that you don't understand, or that an article is too dense with links to too many other sites, or that a sentence that seems crystal clear to me is "circular reasoning."

Just because you have never seen the need to work with variance of standard deviation doesn't mean that these statistics are unneeded. To use your example of a can of beans, manufacturers and food processors are very interested in making sure that the the variability of what goes in a can or package is tightly controlled. If they put more beans in the can than the advertised weight on the can, they are losing money. If they put too few beans in the can, they can be liable to lawsuits for failing to deliver the advertised amount. You better bet that they are keeping track of the standard deviation here.

It's not the mathematics I don't understand but the language used to hide the mathematics.

Integral · May 19, 2010

phizo said:

I have had a look at that article and it is pretty 'dense' and drags me all over the place via links, so it's hard work.

Anyway it's going a bit too much into the maths of it, it does not seem to get at that the root of the problem.
I had hoped there would be a simple explanation, it is not looking like I will be getting an answers I am happy with here, I will probably have to work out an answer myself before I will be happy (as is sometimes the case).
As I am being pointed to wiki pages it does not seem like anyone will be able to post an answer here I am happy with.

If you cannot understand the math, and are not willing to do the work necessary to understand it, then there is really no point in continuing this discussion. Seems to me that you are way eager to argue and reluctant to put any effort into learning.

I am not interested in reading any more of your arguments. The answers you seek are in this thread. Please read it over a few times. Try opening your mind while making an effort to understand.

Thread locked

I don't understand the standard deviation.

Similar threads

Hot Threads

Insights Fermat's Last Theorem

B What could prove this wrong? I'm having a dispute with friends

B About a definition: What is the number of terms of a polynomial P(x)?

B How Many Straight Lines to Connect an N by M Array of Points in a Closed Loop?

B Geometry Puzzle with 20 points in a cross pattern

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem