Why Is Standard Deviation Defined Using Squared Differences?

aaaa202
Messages
1,144
Reaction score
2
1) For the normal distribution it seems that the integral of the propability density function from \mu-\sigma to \mu+\sigma is independent of \sigma. I guess that gives kind of a nice interpretation of \sigma. But how do you prove this, when the antiderivative of an exponential with a square doesn't exist, and the limits are not from minus infinity to infinity?
2) Secondly it doesn't seem that for other distributions, their \sigma has this neat property. So what on Earth makes the standard deviation a useful number for these distributions?
 
Physics news on Phys.org
aaaa202 said:
1) For the normal distribution it seems that the integral of the propability density function from \mu-\sigma to \mu+\sigma is independent of \sigma. I guess that gives kind of a nice interpretation of \sigma. But how do you prove this, when the antiderivative of an exponential with a square doesn't exist, and the limits are not from minus infinity to infinity?

You can make a change of variables to see that the integral is independent of ##\sigma##, namely it is ##\mathrm{erf}(1/\sqrt{2})##. You don't need to be able to express it in terms of more elementary functions to see the independence.

2) Secondly it doesn't seem that for other distributions, their \sigma has this neat property. So what on Earth makes the standard deviation a useful number for these distributions?

The standard deviation (or the variance) is still a measurement of how far a random sample can be expected to deviate from the mean. Having certain cumulative distributions be independent of the standard deviation is not a fundamental property (it has nothing to do with how the standard deviation is defined), but rather is a very special behavior of the normal distribution.
 
In addition to the property you noted, the normal distribution has many nice features that are not shared by other distributions: a linear combination of normal random variables is normal, uncorrelated normal random variables are independent, the characteristic function is of the same form as the pdf, etc. It is a very unusual distribution in many respects.
 
well if the standard devitation in general does not say that a certain percentage lies within ±\sigma how does it measure the deviation from the mean?
I mean suppose you have a small standard deviation but only 15% lies within ±\sigma and you have a large \sigma but 95% lies within this, a small standard deviation is not necessarily an indicator of a small deviation from the mean.
 
When dealing with distributions, the standard deviation is defined as the square root of the variance. If the random variable is ##X##, then the variance is the expectation value of the square of the difference between the variable and the mean:

$$\begin{split} &\mathrm{Var}(X) = E[(X-\mu)^2],\\
&\mu = E[X].\end{split}$$

So, if we measure the variable ##X##, the variance (and hence the standard deviation) is a measure of how far the value will tend from the mean. The percentage of measurements within a certain distance (##\pm\sigma## or any other) of the mean is a specific property of the distribution. It is something that needs to be computed for whatever distribution we're using. In general the percentage will depend on the same parameters that the variance does, so it depends, at least implicitly on the variance.

Again, the normal distribution has very special properties. The statistical concepts themselves are generally defined in such a way that they can be applied to any distribution.
 
aaaa202 said:
well if the standard devitation in general does not say that a certain percentage lies within ±\sigma how does it measure the deviation from the mean?

It doesn't in that sense.

I mean suppose you have a small standard deviation but only 15% lies within ±\sigma and you have a large \sigma but 95% lies within this, a small standard deviation is not necessarily an indicator of a small deviation from the mean.

Extreme behavior is at least limited by Chebyshev's inequality. http://en.wikipedia.org/wiki/Chebyshev%27s_inequality]
 
Last edited by a moderator:
and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense.
 
Last edited:
aaaa202 said:
and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense.
Certainly there are many ways to measure the deviation of a random variable from its mean. The standard deviation is ##\sqrt{E[(x-\mu)^2]}##. Your suggestion is ##E[|x-\mu|]##. These are both special cases of a more general form: for any real number ##p \geq 1##, we may define what mathematicians call the ##p##-norm:
$$\|x-\mu\|_p = \left(E[|x - \mu|^p]\right)^{1/p}$$
The standard deviation is the ##p=2## case, and your suggestion is the ##p=1## case. All of these are measures of the deviation of ##x## from its mean. Higher values of ##p## assign greater weight (or penalty) when ##|x - \mu| > 1##. The reverse is true when ##|x - \mu| < 1##.

The ##p=2## case has some nice mathematical properties.

First, we can easily expand the variance into three terms:
$$E[|x-\mu|^2] = E[(x-\mu)^2] = E[x^2] - 2\mu E[x] + \mu^2$$
(assuming ##x## is real-valued). This makes it very easy to work analytically with the ##p=2## case.

Second, it plays nicely with the notion of covariance. If we define the covariance of two random variables ##x## and ##y## to be ##\text{cov}(x,y) = E[(x-\mu_x)(y - \mu_y)]##, we get a measure of how correlated the two variables are. Then, we immediately have ##\text{var}(x) = \text{cov}(x,x)##. We also have the Cauchy-Schwarz inequality: ##|\text{cov}(x,y)| \leq \sigma_x \sigma_y##, where ##\sigma_x## and ##\sigma_y## are the standard deviations of ##x## and ##y##.

Another nice feature is that if ##x## and ##y## are independent, then you can simply add the variances: ##\text{var}(x+y) = \text{var}(x) + \text{var}(y)##. This very convenient property is not true for ##p \neq 2##.
 
Last edited:
  • #10
"and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense."

Much had to to with ease of calculation: working with squared differences was easier (many years ago) than working with other powers - certainly it was valuable to have nice, brief little formulas to use for their calculation.
Some was due to the assumption of Gaussian distributions for the data: if you assume your data is Gaussian (normal) than both the mean and variance (so standard deviation as well) are important: they are easily interpreted and together, once you assume a normal distribution, uniquely identify the distributional properties: if you know them you are the supreme emperor of your problem.
There were, as has been pointed out, other approaches.
 
Back
Top