I don't understand the standard deviation.

phizo · May 17, 2010

OK I was going to post this in homework, but it's not homework (left school a long time ago ;))

Anyway when I was at school I did not do stats I did mechanics instead (you had a choice).
However all the people doing stats were always on about the the SD, which was a bit annoying
cos I didn't know what it was.
Anyway I asked one guy who did stats what it was, and he did not seem to give a very convincing/good answer, I think he said it was so many percent or something like that.

OK so I looked on wiki and it gives a definition

SD = is the square root of its variance

and:-

variance of a random variable or distribution is the expectation, or mean, of the deviation squared of that variable from its expected value or mean.

OK...so I can understand how to calculate it however... I don't understand *why* you would want to calculate that?

Why not take the cube root of the variance, or 1/variance or log(variance) or sin(variance)
or indeed any other function you can think of?

I just do not see what is significant about the square root of the variance!

Hope this is not a 'stupid question' :)

Mark44 · May 17, 2010

phizo said:

OK I was going to post this in homework, but it's not homework (left school a long time ago ;))

Anyway when I was at school I did not do stats I did mechanics instead (you had a choice).
However all the people doing stats were always on about the the SD, which was a bit annoying
cos I didn't know what it was.
Anyway I asked one guy who did stats what it was, and he did not seem to give a very convincing/good answer, I think he said it was so many percent or something like that.

OK so I looked on wiki and it gives a definition

SD = is the square root of its variance

and:-

variance of a random variable or distribution is the expectation, or mean, of the deviation squared of that variable from its expected value or mean.

OK...so I can understand how to calculate it however... I don't understand *why* you would want to calculate that?

Why not take the cube root of the variance, or 1/variance or log(variance) or sin(variance)
or indeed any other function you can think of?

I just do not see what is significant about the square root of the variance!

Hope this is not a 'stupid question' :)

There are two measures that are typically used to describe some distribution - the mean, and the standard deviation. The mean (aka "average") is a measure of central tendency. In other words, it's a measure of where the middle of the distribution is.

The other commonly used measure is the standard deviation, which is a measure of dispersion. In other words, how spread out from the middle the data is.

For example, if the average income of a group of people happened to be $50,000, all you can say is that about half the people in the group have an income higher than $50,000, and about half have lower incomes. You have no information at all about the highest or lowest wages in the group.

However, if you know the standard deviation, this fact gives you an idea of how spread out the wages are in this group. A small standard deviation implies that the wages are clustered pretty closely to the mean; a large s.d. implies that they are more spread out.

The variance is closely related to the standard deviation, and is essentially the sum of the squares of the distances from the mean value. (This is somewhat oversimplified.) Since this variance comes from the squares of numbers, the most reasonable thing would be to define another statistic (the standard deviation) that is the square root of the variance.

CRGreathouse · May 17, 2010

It's actually pretty simple: the square root of variance gives you the original units back. If your data is in meters, the variance is in square meters and the standard deviation is in meters. It's a unit that can be interpreted in real-word terms fairly readily.

phizo · May 17, 2010

Would not the 'not squared method' (absolute deviation it seems to be called) not give a similar indication of spread?

The squaring method gives more weight to some values than others?

phizo · May 17, 2010

CRGreathouse said:

It's actually pretty simple: the square root of variance gives you the original units back. If your data is in meters, the variance is in square meters and the standard deviation is in meters. It's a unit that can be interpreted in real-word terms fairly readily.

Well my problem is I don't understand the need to square root in the first place!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.

CRGreathouse · May 17, 2010

phizo said:

Well my problem is I don't understand the need to square root in the first place!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.

Ah. It happens that there are a number of useful properties about the variance that make it the natural choice. The sum of independent normal distributions has the property that the mean is the sum of the means and the variance is the sum of the variances, for example.

phizo · May 17, 2010

Well you are saying you can add these values together, which may well be useful.
I don't think that tells me anything more, because if the the things you add together have little meaning then being able to add them together does not really increase their usefulnes IMO.
I think you have to establish the usefulness of the individual sums/elements in the first place.

For example, cubes could also be used to give some indication of spread, or some other power. I just do not see a reason to use anything other than the first power of the variables.

For example, when ever I have wanted to deal with a set of data I have never felt any compulsion to use anything but the but the first power of the data, ie the data itself or the deviation from the mean ( actually I have never gone even that far!).

I did once plot the log of some data, because the values went 'off the screen', so why not take the log of the values? And then the 'anti-log'(?) of the sums?

It just seems to me there are a lot of arbitrary data functions you could use and I see nothing particularly compelling about the square/root.

Mark44 · May 17, 2010

The sum of the deviations from the mean is not a useful measure, since it always comes out to 0. The sum of the absolute values of the deviations from the mean is a better choice, but to quote from one text,
"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."
- An Introduction to Mathematical Statistics and Its Applications, 2nd Ed. Larsen and Marx.

The main reason that data is plotted on log scales is the orders of magnitude difference between the smallest and largest data values. Another reason is that if there is a power function relationship between two variables (i.e., y = xⁿ), the graph of the log of y will appear as a straight line.

The Chaz · May 17, 2010

phizo said:

Well my problem is I don't understand the need to square root in the first place!
Sure taking the root would be sensible after squaring, but my point is, if you don't square the values in the first place, then you don't need to take the square root, also you have not given more weight to some values than others.

The reason that we square it might best be demonstrated by the following data points:
1, 5, 9, 13, 17
The mean is obviously 9.
If our method for calculating variance/deviation were defined as simply summing the differences...
(1-9) + (5-9) + (9-9) + (13-9) + (17-7)
IS ZERO! So that's out.

Why not take the absolute value? Ugly to differentiate, for one.

Squaring and later "square-rooting" is almost like the absolute value, but ends up being easier to work with.

phizo · May 17, 2010

The Chaz said:

The reason that we square it might best be demonstrated by the following data points:
1, 5, 9, 13, 17
The mean is obviously 9.
If our method for calculating variance/deviation were defined as simply summing the differences...
(1-9) + (5-9) + (9-9) + (13-9) + (17-7)
IS ZERO! So that's out.

Why not take the absolute value? Ugly to differentiate, for one.

Squaring and later "square-rooting" is almost like the absolute value, but ends up being easier to work with.

Yes I would have said take the absolute values, I see no reason not to.
I don't see your point about differentiate I afraid. A sine wave goes negative, no problems differentiating it really.

CRGreathouse · May 17, 2010

phizo said:

Well you are saying you can add these values together, which may well be useful.
I don't think that tells me anything more, because if the the things you add together have little meaning then being able to add them together does not really increase their usefulnes IMO.

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

phizo · May 17, 2010

Mark44 said:

The sum of the deviations from the mean is not a useful measure, since it always comes out to 0. The sum of the absolute values of the deviations from the mean is a better choice, but to quote from one text,
"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."
- An Introduction to Mathematical Statistics and Its Applications, 2nd Ed. Larsen and Marx.

The main reason that data is plotted on log scales is the orders of magnitude difference between the smallest and largest data values. Another reason is that if there is a power function relationship between two variables (i.e., y = xⁿ), the graph of the log of y will appear as a straight line.

I would give you the same answer as to Chaz.

Also:-

"the absolute value is somewhat troublesome mathematically: it does not have a simple arithmetic formula, nor is it a differentiable function. Squaring the deviations proves to be a much more attractive solution."

I would love to see the arithmetic formula for "attractive".
Also as I said I don't really see the problem with differentation either.
And if I were to get picky in a similar manner, I would say the square root has two values, a positive and a negative one, surely one solution is more attractive than two??

Furthermore I could add, as I am not too happy with the standard deviation in the first place, it is rather unlikely I would be any happier differentiating it!
I would have derived and even unhappier answer!

phizo · May 17, 2010

CRGreathouse said:

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

Really?
Well I will have to get back to you on that one, that's a whole new can of worms for me!
(You will probably wish you had not bothered!).

Mark44 · May 18, 2010

phizo said:

Yes I would have said take the absolute values, I see no reason not to.
I don't see your point about differentiate I afraid. A sine wave goes negative, no problems differentiating it really.

It's not a matter of going negative. The sine function is continuous, as is the absolute value function. However, the sine function is differentiable at every point, but the absolute value is not differentiable at a number right in the middle of its domain. This is one of the reasons that the variance is more "attractive" as a statistic, according to the reference I cited. They didn't need to give an arithmetic formula to justify their preference; they gave reasons.

Mark44 · May 18, 2010

CRGreathouse said:

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

phizo said:

Really?

Yes, really. If you want to argue your point and be taken seriously, you're going to first have to get up to speed on some basic statistics. In your original post you said that you didn't get the standard deviation. Three regulars on this forum (CRGreathouse, The Chaz, and myself) have attempted to enlighten you as to what the standard deviation is and what it is used for.

As already stated numerous times, the sum of the absolute values of the differences between the data values and the means also works as a measure of dispersion, but for mathematical reasons already given, this particular measure is not the preferred one.

Now, have we answered your question?

phizo said:

Well I will have to get back to you on that one, that's a whole new can of worms for me!
(You will probably wish you had not bothered!).

Integral · May 18, 2010

Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" . Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.

G-U-E-S-T · May 18, 2010

Phizo's question is perfectly reasonable - using the absolute differences makes sense, it just proves a little more awkward to use this approach in practice (too awkward it appears, prior to the development of our modern calculating conveniences).

And while the mean and S.D. are sufficient for defining a normal distribution, that does not mean they are necessary. Are there not other equally valid approaches that exist, that could also be used to define a normal distribution?

Noting that variances add when the associated random variables are independent, is not necessarily a valid justification either. For instance, variances also add when the variables are perfectly correlated as well. Further, establishing perfect independence of variables is not generally possible, in practice - it is usually just a "hail mary" assumption in a mathematical model, to ignore possible lurking variables.

Also, the distribution of errors in nature "would" follow a perfect normal distribution, if everything was "truly" random and behaving perfectly according to long-run patterns. These mathematical idealizations are useful in theory - but it is only a model (nature is far richer than any mathematical model).

And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median. [And when the median is the measure of center being used, then the interquartile range is clearly a more appropriate measure of spread than the standard deviation.]

In orther words: Don't be discouraged Phizo in your inquiry. The S.D. is not the "most natural" measure of spread that could ever have been developed in the history of statistics - it is just a useful convention (one of many that could have also been developed).

Mark44 · May 18, 2010

G-U-E-S-T said:

And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median.

Note that I qualified what I said, using "about." I know the difference between the mean and median, and also the mode, another measure of central tendency. I was just trying to distinguish between two types of statistical measures.

The Chaz · May 18, 2010

Mark44 said:

...I know the difference between the mean and median, and also the mode, another measure of central tendency...

I was getting nervous for a minute there!

lavinia · May 18, 2010

Integral said:

Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" . Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.

I would add that before the days of high speed computers - empirical distributions were often impossible to describe. However, it was known that a sampling distribution of averages is close to normal if the number of observations in the sample is large enough.

The process of averaging made empirical distributions tractable and reduced the problem of describing the distribution to estimating only 2 parameters, the mean and the variance.

Nowadays, computers can describe distributions easily and have allowed more accurate descriptions of empirical distributions. For instance, careful studies indicate that the distribution of securities returns is not normal but has "fat tails" i.e. too much probability of outlier returns. These studies require huge amounts of data.

Tac-Tics · May 18, 2010

I was always curious about this. Even my statistics-major ex-girlfriend couldn't give me a good explanation.

Finally, someone on Wikipedia created this page:

http://en.wikipedia.org/wiki/Standard_deviation#Geometric_interpretation

So if you have two data points, (m1, m2), their standard deviation is the minimal distance between the POINT (m1, m2) and the line {(t, t) | t in R}.

The geometrical explanation is very intuitive with data sets of one, two, or three data points, and to me, I can see why it generalizes to n. For continuous distributions, you are simply replacing a summation by integration, as is common when going from discrete *anything* to continuous.

phizo · May 18, 2010

Mark44 said:

It's not a matter of going negative. The sine function is continuous, as is the absolute value function. However, the sine function is differentiable at every point, but the absolute value is not differentiable at a number right in the middle of its domain. This is one of the reasons that the variance is more "attractive" as a statistic, according to the reference I cited. They didn't need to give an arithmetic formula to justify their preference; they gave reasons.

Sorry makes little sense to me.
The initial data will not be 'differentiatable' either as far as I can see as it is usually just a number of points, ie not continuous, so I don't think you argument is valid.
It seems a very flawed argument to me.

phizo · May 18, 2010

CRGreathouse said:

The mean and standard deviation/variance are jointly sufficient to define a normal distribution!

That also seems a bit of circular argument.
You need to explain why this is so first I think.

phizo · May 18, 2010

Integral said:

Phizo,
You seemed to have missed a significant fact. The definitions for mean and SD that have been discussed in this thread are the key parameters of a http://en.wikipedia.org/wiki/Normal_distribution" . Please read the wiki article.

So your real question is not, why take the square root, but, why are we so fixated on the normal distribution. We had no choice in the matter, we did not choose the normal distribution, nature did. This is the distribution which describes any measurement with random errors. Have you ever seen the demonstration where balls are dropped from a single point into a uniformly spaced grid of nails? The pile they make is the normal distribution.

Now the fact that this distribution occurs naturally combined with the fact that there is a very simple mathematical description of it makes it irresistible.

I have had a look at that article and it is pretty 'dense' and drags me all over the place via links, so it's hard work.

Anyway it's going a bit too much into the maths of it, it does not seem to get at that the root of the problem.
I had hoped there would be a simple explanation, it is not looking like I will be getting an answers I am happy with here, I will probably have to work out an answer myself before I will be happy (as is sometimes the case).
As I am being pointed to wiki pages it does not seem like anyone will be able to post an answer here I am happy with.

phizo · May 18, 2010

G-U-E-S-T said:

Phizo's question is perfectly reasonable - using the absolute differences makes sense, it just proves a little more awkward to use this approach in practice (too awkward it appears, prior to the development of our modern calculating conveniences).

And while the mean and S.D. are sufficient for defining a normal distribution, that does not mean they are necessary. Are there not other equally valid approaches that exist, that could also be used to define a normal distribution?

Noting that variances add when the associated random variables are independent, is not necessarily a valid justification either. For instance, variances also add when the variables are perfectly correlated as well. Further, establishing perfect independence of variables is not generally possible, in practice - it is usually just a "hail mary" assumption in a mathematical model, to ignore possible lurking variables.

Also, the distribution of errors in nature "would" follow a perfect normal distribution, if everything was "truly" random and behaving perfectly according to long-run patterns. These mathematical idealizations are useful in theory - but it is only a model (nature is far richer than any mathematical model).

And by the way: The mean does not tell us that "about half" of the data lie to one side of it, etc - you are thinking of the median. [And when the median is the measure of center being used, then the interquartile range is clearly a more appropriate measure of spread than the standard deviation.]

In orther words: Don't be discouraged Phizo in your inquiry. The S.D. is not the "most natural" measure of spread that could ever have been developed in the history of statistics - it is just a useful convention (one of many that could have also been developed).

Thanks for that, that was helpful, I won't be discouraged but I fear I might be 'stopped' lol.

phizo · May 18, 2010

Mark44 said:

Note that I qualified what I said, using "about." I know the difference between the mean and median, and also the mode, another measure of central tendency. I was just trying to distinguish between two types of statistical measures.

It's unfortunate that they all begin with the same letter, it is easy to get them confused (the names). Why not used 'average' for mean. The terms are not helpful if you are not familiar with the them.

zhentil · May 18, 2010

phizo said:

I have had a look at that article and it is pretty 'dense' and drags me all over the place via links, so it's hard work.

Anyway it's going a bit too much into the maths of it, it does not seem to get at that the root of the problem.
I had hoped there would be a simple explanation, it is not looking like I will be getting an answers I am happy with here, I will probably have to work out an answer myself before I will be happy (as is sometimes the case).
As I am being pointed to wiki pages it does not seem like anyone will be able to post an answer here I am happy with.

If I'm understanding your initial question, you want to know why standard deviation is the "right" statistic. This is of course a very subjective question, and of course it's not defined mathematically. Some have pointed out various nice mathematical properties it enjoys, but if that's not part of your criteria for being "right," then I'm afraid we have no chance of convincing you :)

phizo · May 18, 2010

Tac-Tics said:

I was always curious about this. Even my statistics-major ex-girlfriend couldn't give me a good explanation.

Finally, someone on Wikipedia created this page:

http://en.wikipedia.org/wiki/Standard_deviation#Geometric_interpretation

So if you have two data points, (m1, m2), their standard deviation is the minimal distance between the POINT (m1, m2) and the line {(t, t) | t in R}.

The geometrical explanation is very intuitive with data sets of one, two, or three data points, and to me, I can see why it generalizes to n. For continuous distributions, you are simply replacing a summation by integration, as is common when going from discrete *anything* to continuous.

It's a pity there is no diagram with that, it would be much easier to follow.

To gain some geometric insights, we will start with a population of three values, x1, x2, x3. This defines a point P = (x1, x2, x3) in R3. Consider the line L = {(r, r, r) : r in R}. This is the "main diagonal" going through the origin. If our three given values were all equal, then the standard deviation would be zero and P would lie on L. So it is not unreasonable to assume that the standard deviation is related to the distance of P to L. And that is indeed the case. To move orthogonally from L to the point P, one begins at the point:

whose coordinates are the mean of the values we started out with. A little algebra shows that the distance between P and M (which is the same as the orthogonal distance between P and the line L) is equal to the standard deviation of the vector x1, x2, x3, divided by the square root of the number of dimensions of the vector.

1. How can 3 different values define a point?
2. Does R3 mean 3D 3 dimensions?
3. I think I would need a diagram to understand it properly.

phizo · May 18, 2010

zhentil said:

If I'm understanding your initial question, you want to know why standard deviation is the "right" statistic. This is of course a very subjective question, and of course it's not defined mathematically. Some have pointed out various nice mathematical properties it enjoys, but if that's not part of your criteria for being "right," then I'm afraid we have no chance of convincing you :)

Seems like that may well be the case.
It's has always sounded to me like something plucked out of thin air, or thereabouts.
Perhaps a bit like trying to work out the standard size of a tin of beans.

Anyhow if I had never heard of it I doubt I would have dreamt it up myself, perhaps because I have never stumbled across any need to.

lavinia · May 18, 2010

The standard deviation is the expected distance to the mean. It is a natural measure of the average error in estimating the mean from a sample.

From this point of view it has universal meaning as a statistic and is not tied only to the Gaussian distribution.

The conceptual difference between variance and standard deviation is same as the difference between the concepts of distance and squared distance.

the idea of distance to me is more natural. Further it has the same units as the underlying distribution's points and so can be compared to them.

I don't understand the standard deviation.

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Undergrad How do E[X] and E[|X|] relate?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers