# I don't understand the standard deviation.

## Main Question or Discussion Point

OK I was going to post this in homework, but it's not homework (left school a long time ago ;))

Anyway when I was at school I did not do stats I did mechanics instead (you had a choice).
However all the people doing stats were always on about the the SD, which was a bit annoying
cos I didn't know what it was.
Anyway I asked one guy who did stats what it was, and he did not seem to give a very convincing/good answer, I think he said it was so many percent or something like that.

OK so I looked on wiki and it gives a definition

SD = is the square root of its variance

and:-

variance of a random variable or distribution is the expectation, or mean, of the deviation squared of that variable from its expected value or mean.

OK.......so I can understand how to calculate it however.... I don't understand *why* you would want to calculate that?

Why not take the cube root of the variance, or 1/variance or log(variance) or sin(variance)
or indeed any other function you can think of???

I just do not see what is significant about the square root of the variance!!

Hope this is not a 'stupid question' :)

Mark44
Mentor
OK I was going to post this in homework, but it's not homework (left school a long time ago ;))

Anyway when I was at school I did not do stats I did mechanics instead (you had a choice).
However all the people doing stats were always on about the the SD, which was a bit annoying
cos I didn't know what it was.
Anyway I asked one guy who did stats what it was, and he did not seem to give a very convincing/good answer, I think he said it was so many percent or something like that.

OK so I looked on wiki and it gives a definition

SD = is the square root of its variance

and:-

variance of a random variable or distribution is the expectation, or mean, of the deviation squared of that variable from its expected value or mean.

OK.......so I can understand how to calculate it however.... I don't understand *why* you would want to calculate that?

Why not take the cube root of the variance, or 1/variance or log(variance) or sin(variance)
or indeed any other function you can think of???

I just do not see what is significant about the square root of the variance!!

Hope this is not a 'stupid question' :)
There are two measures that are typically used to describe some distribution - the mean, and the standard deviation. The mean (aka "average") is a measure of central tendency. In other words, it's a measure of where the middle of the distribution is.

The other commonly used measure is the standard deviation, which is a measure of dispersion. In other words, how spread out from the middle the data is.

For example, if the average income of a group of people happened to be $50,000, all you can say is that about half the people in the group have an income higher than$50,000, and about half have lower incomes. You have no information at all about the highest or lowest wages in the group.

However, if you know the standard deviation, this fact gives you an idea of how spread out the wages are in this group. A small standard deviation implies that the wages are clustered pretty closely to the mean; a large s.d. implies that they are more spread out.

The variance is closely related to the standard deviation, and is essentially the sum of the squares of the distances from the mean value. (This is somewhat oversimplified.) Since this variance comes from the squares of numbers, the most reasonable thing would be to define another statistic (the standard deviation) that is the square root of the variance.

CRGreathouse
Homework Helper
It's actually pretty simple: the square root of variance gives you the original units back. If your data is in meters, the variance is in square meters and the standard deviation is in meters. It's a unit that can be interpreted in real-word terms fairly readily.

Would not the 'not squared method' (absolute deviation it seems to be called) not give a similar indication of spread?

The squaring method gives more weight to some values than others?

It's actually pretty simple: the square root of variance gives you the original units back. If your data is in meters, the variance is in square meters and the standard deviation is in meters. It's a unit that can be interpreted in real-word terms fairly readily.
Well my problem is I don't understand the need to square root in the first place!!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.

CRGreathouse
Homework Helper
Well my problem is I don't understand the need to square root in the first place!!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.
Ah. It happens that there are a number of useful properties about the variance that make it the natural choice. The sum of independent normal distributions has the property that the mean is the sum of the means and the variance is the sum of the variances, for example.

Well you are saying you can add these values together, which may well be useful.
I don't think that tells me anything more, because if the the things you add together have little meaning then being able to add them together does not really increase their usefulnes IMO.
I think you have to establish the usefulness of the individual sums/elements in the first place.

For example, cubes could also be used to give some indication of spread, or some other power. I just do not see a reason to use anything other than the first power of the variables.

For example, when ever I have wanted to deal with a set of data I have never felt any compulsion to use anything but the but the first power of the data, ie the data itself or the deviation from the mean ( actually I have never gone even that far!).

I did once plot the log of some data, because the values went 'off the screen', so why not take the log of the values? And then the 'anti-log'(?) of the sums?

It just seems to me there are a lot of arbitrary data functions you could use and I see nothing particularly compelling about the square/root.

Mark44
Mentor
The sum of the deviations from the mean is not a useful measure, since it always comes out to 0. The sum of the absolute values of the deviations from the mean is a better choice, but to quote from one text,
"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."
- An Introduction to Mathematical Statistics and Its Applications, 2nd Ed. Larsen and Marx.

The main reason that data is plotted on log scales is the orders of magnitude difference between the smallest and largest data values. Another reason is that if there is a power function relationship between two variables (i.e., y = xn), the graph of the log of y will appear as a straight line.

Well my problem is I don't understand the need to square root in the first place!!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.
The reason that we square it might best be demonstrated by the following data points:
1, 5, 9, 13, 17
The mean is obviously 9.
If our method for calculating variance/deviation were defined as simply summing the differences...
(1-9) + (5-9) + (9-9) + (13-9) + (17-7)
IS ZERO!!!!!! So that's out.

Why not take the absolute value? Ugly to differentiate, for one.

Squaring and later "square-rooting" is almost like the absolute value, but ends up being easier to work with.

The reason that we square it might best be demonstrated by the following data points:
1, 5, 9, 13, 17
The mean is obviously 9.
If our method for calculating variance/deviation were defined as simply summing the differences...
(1-9) + (5-9) + (9-9) + (13-9) + (17-7)
IS ZERO!!!!!! So that's out.

Why not take the absolute value? Ugly to differentiate, for one.

Squaring and later "square-rooting" is almost like the absolute value, but ends up being easier to work with.
Yes I would have said take the absolute values, I see no reason not to.
I don't see your point about differentiate I afraid. A sine wave goes negative, no problems differentiating it really.

CRGreathouse
Homework Helper
Well you are saying you can add these values together, which may well be useful.
I don't think that tells me anything more, because if the the things you add together have little meaning then being able to add them together does not really increase their usefulnes IMO.
The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

The sum of the deviations from the mean is not a useful measure, since it always comes out to 0. The sum of the absolute values of the deviations from the mean is a better choice, but to quote from one text,
"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."
- An Introduction to Mathematical Statistics and Its Applications, 2nd Ed. Larsen and Marx.

The main reason that data is plotted on log scales is the orders of magnitude difference between the smallest and largest data values. Another reason is that if there is a power function relationship between two variables (i.e., y = xn), the graph of the log of y will appear as a straight line.

I would give you the same answer as to Chaz.

Also:-

"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."

I would love to see the arithmetic formula for "attractive".
Also as I said I don't really see the problem with differentation either.
And if I were to get picky in a similar manner, I would say the square root has two values, a positive and a negative one, surely one solution is more attractive than two??

Furthermore I could add, as I am not too happy with the standard deviation in the first place, it is rather unlikely I would be any happier differentiating it!!!
I would have derived and even unhappier answer!!

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!
Really?
Well I will have to get back to you on that one, that's a whole new can of worms for me!!
(You will probably wish you had not bothered!!).

Mark44
Mentor
Yes I would have said take the absolute values, I see no reason not to.
I don't see your point about differentiate I afraid. A sine wave goes negative, no problems differentiating it really.
It's not a matter of going negative. The sine function is continuous, as is the absolute value function. However, the sine function is differentiable at every point, but the absolute value is not differentiable at a number right in the middle of its domain. This is one of the reasons that the variance is more "attractive" as a statistic, according to the reference I cited. They didn't need to give an arithmetic formula to justify their preference; they gave reasons.

Mark44
Mentor
CRGreathouse said:
The mean and standard deviation/variance are jointly sufficient to define a normal distribution!
Really?
Yes, really. If you want to argue your point and be taken seriously, you're going to first have to get up to speed on some basic statistics. In your original post you said that you didn't get the standard deviation. Three regulars on this forum (CRGreathouse, The Chaz, and myself) have attempted to enlighten you as to what the standard deviation is and what it is used for.

As already stated numerous times, the sum of the absolute values of the differences between the data values and the means also works as a measure of dispersion, but for mathematical reasons already given, this particular measure is not the preferred one.

Well I will have to get back to you on that one, that's a whole new can of worms for me!!
(You will probably wish you had not bothered!!).

Integral
Staff Emeritus
Gold Member
Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" [Broken]. Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.

Last edited by a moderator:
Phizo's question is perfectly reasonable - using the absolute differences makes sense, it just proves a little more awkward to use this approach in practice (too awkward it appears, prior to the development of our modern calculating conveniences).

And while the mean and S.D. are sufficient for defining a normal distribution, that does not mean they are necessary. Are there not other equally valid approaches that exist, that could also be used to define a normal distribution?

Noting that variances add when the associated random variables are independent, is not necessarily a valid justification either. For instance, variances also add when the variables are perfectly correlated as well. Further, establishing perfect independance of variables is not generally possible, in practice - it is usually just a "hail mary" assumption in a mathematical model, to ignore possible lurking variables.

Also, the distribution of errors in nature "would" follow a perfect normal distribution, if everything was "truly" random and behaving perfectly according to long-run patterns. These mathematical idealizations are useful in theory - but it is only a model (nature is far richer than any mathematical model).

And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median. [And when the median is the measure of center being used, then the interquartile range is clearly a more appropriate measure of spread than the standard deviation.]

In orther words: Don't be discouraged Phizo in your inquiry. The S.D. is not the "most natural" measure of spread that could ever have been developed in the history of statistics - it is just a useful convention (one of many that could have also been developed).

Mark44
Mentor
And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median.
Note that I qualified what I said, using "about." I know the difference between the mean and median, and also the mode, another measure of central tendency. I was just trying to distinguish between two types of statistical measures.

...I know the difference between the mean and median, and also the mode, another measure of central tendency...
I was getting nervous for a minute there! :tongue2:

lavinia
Gold Member
Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" [Broken]. Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.
I would add that before the days of high speed computers - empirical distributions were often impossible to describe. However, it was known that a sampling distribution of averages is close to normal if the number of observations in the sample is large enough.

The process of averaging made empirical distributions tractable and reduced the problem of describing the distribution to estimating only 2 parameters, the mean and the variance.

Nowadays, computers can describe distributions easily and have allowed more accurate descriptions of empirical distributions. For instance, careful studies indicate that the distribution of securities returns is not normal but has "fat tails" i.e. too much probability of outlier returns. These studies require huge amounts of data.

Last edited by a moderator:

http://en.wikipedia.org/wiki/Standard_deviation#Geometric_interpretation

So if you have two data points, (m1, m2), their standard deviation is the minimal distance between the POINT (m1, m2) and the line {(t, t) | t in R}.

The geometrical explanation is very intuitive with data sets of one, two, or three data points, and to me, I can see why it generalizes to n. For continuous distributions, you are simply replacing a summation by integration, as is common when going from discrete *anything* to continuous.

It's not a matter of going negative. The sine function is continuous, as is the absolute value function. However, the sine function is differentiable at every point, but the absolute value is not differentiable at a number right in the middle of its domain. This is one of the reasons that the variance is more "attractive" as a statistic, according to the reference I cited. They didn't need to give an arithmetic formula to justify their preference; they gave reasons.
Sorry makes little sense to me.
The initial data will not be 'differentiatable' either as far as I can see as it is usually just a number of points, ie not continuous, so I don't think you argument is valid.
It seems a very flawed argument to me.

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

That also seems a bit of circular arguement.
You need to explain why this is so first I think.

Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" [Broken]. Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.

I have had a look at that article and it is pretty 'dense' and drags me all over the place via links, so it's hard work.

Anyway it's going a bit too much into the maths of it, it does not seem to get at that the root of the problem.
I had hoped there would be a simple explanation, it is not looking like I will be getting an answers I am happy with here, I will probably have to work out an answer myself before I will be happy (as is sometimes the case).
As I am being pointed to wiki pages it does not seem like anyone will be able to post an answer here I am happy with.

Last edited by a moderator:
Phizo's question is perfectly reasonable - using the absolute differences makes sense, it just proves a little more awkward to use this approach in practice (too awkward it appears, prior to the development of our modern calculating conveniences).

And while the mean and S.D. are sufficient for defining a normal distribution, that does not mean they are necessary. Are there not other equally valid approaches that exist, that could also be used to define a normal distribution?

Noting that variances add when the associated random variables are independent, is not necessarily a valid justification either. For instance, variances also add when the variables are perfectly correlated as well. Further, establishing perfect independance of variables is not generally possible, in practice - it is usually just a "hail mary" assumption in a mathematical model, to ignore possible lurking variables.

Also, the distribution of errors in nature "would" follow a perfect normal distribution, if everything was "truly" random and behaving perfectly according to long-run patterns. These mathematical idealizations are useful in theory - but it is only a model (nature is far richer than any mathematical model).

And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median. [And when the median is the measure of center being used, then the interquartile range is clearly a more appropriate measure of spread than the standard deviation.]

In orther words: Don't be discouraged Phizo in your inquiry. The S.D. is not the "most natural" measure of spread that could ever have been developed in the history of statistics - it is just a useful convention (one of many that could have also been developed).

Thanks for that, that was helpful, I won't be discouraged but I fear I might be 'stopped' lol.