Why Is Standard Deviation Defined Using Squared Differences?

aaaa202 · Jul 26, 2013

1) For the normal distribution it seems that the integral of the propability density function from [itex]\mu[/itex]-[itex]\sigma[/itex] to [itex]\mu[/itex]+[itex]\sigma[/itex] is independent of [itex]\sigma[/itex]. I guess that gives kind of a nice interpretation of [itex]\sigma[/itex]. But how do you prove this, when the antiderivative of an exponential with a square doesn't exist, and the limits are not from minus infinity to infinity?
2) Secondly it doesn't seem that for other distributions, their [itex]\sigma[/itex] has this neat property. So what on Earth makes the standard deviation a useful number for these distributions?

fzero · Jul 26, 2013

aaaa202 said:

1) For the normal distribution it seems that the integral of the propability density function from [itex]\mu[/itex]-[itex]\sigma[/itex] to [itex]\mu[/itex]+[itex]\sigma[/itex] is independent of [itex]\sigma[/itex]. I guess that gives kind of a nice interpretation of [itex]\sigma[/itex]. But how do you prove this, when the antiderivative of an exponential with a square doesn't exist, and the limits are not from minus infinity to infinity?

You can make a change of variables to see that the integral is independent of ##\sigma##, namely it is ##\mathrm{erf}(1/\sqrt{2})##. You don't need to be able to express it in terms of more elementary functions to see the independence.

2) Secondly it doesn't seem that for other distributions, their [itex]\sigma[/itex] has this neat property. So what on Earth makes the standard deviation a useful number for these distributions?

The standard deviation (or the variance) is still a measurement of how far a random sample can be expected to deviate from the mean. Having certain cumulative distributions be independent of the standard deviation is not a fundamental property (it has nothing to do with how the standard deviation is defined), but rather is a very special behavior of the normal distribution.

jbunniii · Jul 26, 2013

In addition to the property you noted, the normal distribution has many nice features that are not shared by other distributions: a linear combination of normal random variables is normal, uncorrelated normal random variables are independent, the characteristic function is of the same form as the pdf, etc. It is a very unusual distribution in many respects.

aaaa202 · Jul 26, 2013

well if the standard devitation in general does not say that a certain percentage lies within ±[itex]\sigma[/itex] how does it measure the deviation from the mean?
I mean suppose you have a small standard deviation but only 15% lies within ±[itex]\sigma[/itex] and you have a large [itex]\sigma[/itex] but 95% lies within this, a small standard deviation is not necessarily an indicator of a small deviation from the mean.

fzero · Jul 26, 2013

When dealing with distributions, the standard deviation is defined as the square root of the variance. If the random variable is ##X##, then the variance is the expectation value of the square of the difference between the variable and the mean:

$$\begin{split} &\mathrm{Var}(X) = E[(X-\mu)^2],\\
&\mu = E[X].\end{split}$$

So, if we measure the variable ##X##, the variance (and hence the standard deviation) is a measure of how far the value will tend from the mean. The percentage of measurements within a certain distance (##\pm\sigma## or any other) of the mean is a specific property of the distribution. It is something that needs to be computed for whatever distribution we're using. In general the percentage will depend on the same parameters that the variance does, so it depends, at least implicitly on the variance.

Again, the normal distribution has very special properties. The statistical concepts themselves are generally defined in such a way that they can be applied to any distribution.

Stephen Tashi · Jul 26, 2013

aaaa202 said:

well if the standard devitation in general does not say that a certain percentage lies within ±[itex]\sigma[/itex] how does it measure the deviation from the mean?

It doesn't in that sense.

I mean suppose you have a small standard deviation but only 15% lies within ±[itex]\sigma[/itex] and you have a large [itex]\sigma[/itex] but 95% lies within this, a small standard deviation is not necessarily an indicator of a small deviation from the mean.

Extreme behavior is at least limited by Chebyshev's inequality. http://en.wikipedia.org/wiki/Chebyshev%27s_inequality]

aaaa202 · Jul 27, 2013

and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense.

Stephen Tashi · Jul 27, 2013

The sum of the deviations from the sample mean is always zero. There are arguments for and against using the mean of the absolute values of the deviations: http://www.leeds.ac.uk/educol/documents/00003759.htm

jbunniii · Jul 27, 2013

aaaa202 said:

and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense.

Certainly there are many ways to measure the deviation of a random variable from its mean. The standard deviation is ##\sqrt{E[(x-\mu)^2]}##. Your suggestion is ##E[|x-\mu|]##. These are both special cases of a more general form: for any real number ##p \geq 1##, we may define what mathematicians call the ##p##-norm:
$$\|x-\mu\|_p = \left(E[|x - \mu|^p]\right)^{1/p}$$
The standard deviation is the ##p=2## case, and your suggestion is the ##p=1## case. All of these are measures of the deviation of ##x## from its mean. Higher values of ##p## assign greater weight (or penalty) when ##|x - \mu| > 1##. The reverse is true when ##|x - \mu| < 1##.

The ##p=2## case has some nice mathematical properties.

First, we can easily expand the variance into three terms:
$$E[|x-\mu|^2] = E[(x-\mu)^2] = E[x^2] - 2\mu E[x] + \mu^2$$
(assuming ##x## is real-valued). This makes it very easy to work analytically with the ##p=2## case.

Second, it plays nicely with the notion of covariance. If we define the covariance of two random variables ##x## and ##y## to be ##\text{cov}(x,y) = E[(x-\mu_x)(y - \mu_y)]##, we get a measure of how correlated the two variables are. Then, we immediately have ##\text{var}(x) = \text{cov}(x,x)##. We also have the Cauchy-Schwarz inequality: ##|\text{cov}(x,y)| \leq \sigma_x \sigma_y##, where ##\sigma_x## and ##\sigma_y## are the standard deviations of ##x## and ##y##.

Another nice feature is that if ##x## and ##y## are independent, then you can simply add the variances: ##\text{var}(x+y) = \text{var}(x) + \text{var}(y)##. This very convenient property is not true for ##p \neq 2##.

statdad · Aug 2, 2013

"and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense."

Much had to to with ease of calculation: working with squared differences was easier (many years ago) than working with other powers - certainly it was valuable to have nice, brief little formulas to use for their calculation.
Some was due to the assumption of Gaussian distributions for the data: if you assume your data is Gaussian (normal) than both the mean and variance (so standard deviation as well) are important: they are easily interpreted and together, once you assume a normal distribution, uniquely identify the distributional properties: if you know them you are the supreme emperor of your problem.
There were, as has been pointed out, other approaches.

Why Is Standard Deviation Defined Using Squared Differences?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect