Why Is Standard Deviation Defined Using Squared Differences?

Click For Summary

Discussion Overview

The discussion centers around the definition of standard deviation using squared differences, particularly in the context of its application to various probability distributions, including the normal distribution. Participants explore the implications of this definition, its mathematical properties, and its usefulness across different distributions.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants note that for the normal distribution, the integral of the probability density function from \(\mu - \sigma\) to \(\mu + \sigma\) appears to be independent of \(\sigma\), raising questions about how to prove this given the complexity of the antiderivative.
  • Others argue that while the standard deviation is a measure of how far a random sample can deviate from the mean, this property does not hold universally across all distributions, suggesting that the usefulness of standard deviation may vary.
  • One participant highlights that the normal distribution possesses unique features, such as the independence of uncorrelated normal random variables and the linear combination of normal variables resulting in a normal variable.
  • Concerns are raised about the interpretation of standard deviation, questioning how it can measure deviation from the mean if the percentage of values within \(\pm \sigma\) can differ significantly across distributions.
  • Some participants discuss the mathematical definition of variance as the expectation of the squared differences from the mean, emphasizing that this definition is not inherently tied to the properties of the normal distribution.
  • There is a suggestion that alternative methods for measuring deviation, such as using absolute values, could also be valid, and the discussion touches on the broader concept of \(p\)-norms in measuring deviations.
  • Participants mention that the choice of using squared differences may have historical roots in ease of calculation and assumptions about data distributions.

Areas of Agreement / Disagreement

Participants express a range of views on the definition and utility of standard deviation, with no clear consensus on its applicability across different distributions or the best method for measuring deviation from the mean.

Contextual Notes

Some limitations are noted regarding the assumptions underlying the use of standard deviation, particularly in relation to the properties of specific distributions and the implications of using squared differences versus other methods of measurement.

aaaa202
Messages
1,144
Reaction score
2
1) For the normal distribution it seems that the integral of the propability density function from [itex]\mu[/itex]-[itex]\sigma[/itex] to [itex]\mu[/itex]+[itex]\sigma[/itex] is independent of [itex]\sigma[/itex]. I guess that gives kind of a nice interpretation of [itex]\sigma[/itex]. But how do you prove this, when the antiderivative of an exponential with a square doesn't exist, and the limits are not from minus infinity to infinity?
2) Secondly it doesn't seem that for other distributions, their [itex]\sigma[/itex] has this neat property. So what on Earth makes the standard deviation a useful number for these distributions?
 
Physics news on Phys.org
aaaa202 said:
1) For the normal distribution it seems that the integral of the propability density function from [itex]\mu[/itex]-[itex]\sigma[/itex] to [itex]\mu[/itex]+[itex]\sigma[/itex] is independent of [itex]\sigma[/itex]. I guess that gives kind of a nice interpretation of [itex]\sigma[/itex]. But how do you prove this, when the antiderivative of an exponential with a square doesn't exist, and the limits are not from minus infinity to infinity?

You can make a change of variables to see that the integral is independent of ##\sigma##, namely it is ##\mathrm{erf}(1/\sqrt{2})##. You don't need to be able to express it in terms of more elementary functions to see the independence.

2) Secondly it doesn't seem that for other distributions, their [itex]\sigma[/itex] has this neat property. So what on Earth makes the standard deviation a useful number for these distributions?

The standard deviation (or the variance) is still a measurement of how far a random sample can be expected to deviate from the mean. Having certain cumulative distributions be independent of the standard deviation is not a fundamental property (it has nothing to do with how the standard deviation is defined), but rather is a very special behavior of the normal distribution.
 
In addition to the property you noted, the normal distribution has many nice features that are not shared by other distributions: a linear combination of normal random variables is normal, uncorrelated normal random variables are independent, the characteristic function is of the same form as the pdf, etc. It is a very unusual distribution in many respects.
 
well if the standard devitation in general does not say that a certain percentage lies within ±[itex]\sigma[/itex] how does it measure the deviation from the mean?
I mean suppose you have a small standard deviation but only 15% lies within ±[itex]\sigma[/itex] and you have a large [itex]\sigma[/itex] but 95% lies within this, a small standard deviation is not necessarily an indicator of a small deviation from the mean.
 
When dealing with distributions, the standard deviation is defined as the square root of the variance. If the random variable is ##X##, then the variance is the expectation value of the square of the difference between the variable and the mean:

$$\begin{split} &\mathrm{Var}(X) = E[(X-\mu)^2],\\
&\mu = E[X].\end{split}$$

So, if we measure the variable ##X##, the variance (and hence the standard deviation) is a measure of how far the value will tend from the mean. The percentage of measurements within a certain distance (##\pm\sigma## or any other) of the mean is a specific property of the distribution. It is something that needs to be computed for whatever distribution we're using. In general the percentage will depend on the same parameters that the variance does, so it depends, at least implicitly on the variance.

Again, the normal distribution has very special properties. The statistical concepts themselves are generally defined in such a way that they can be applied to any distribution.
 
aaaa202 said:
well if the standard devitation in general does not say that a certain percentage lies within ±[itex]\sigma[/itex] how does it measure the deviation from the mean?

It doesn't in that sense.

I mean suppose you have a small standard deviation but only 15% lies within ±[itex]\sigma[/itex] and you have a large [itex]\sigma[/itex] but 95% lies within this, a small standard deviation is not necessarily an indicator of a small deviation from the mean.

Extreme behavior is at least limited by Chebyshev's inequality. http://en.wikipedia.org/wiki/Chebyshev%27s_inequality]
 
Last edited by a moderator:
and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense.
 
Last edited:
aaaa202 said:
and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense.
Certainly there are many ways to measure the deviation of a random variable from its mean. The standard deviation is ##\sqrt{E[(x-\mu)^2]}##. Your suggestion is ##E[|x-\mu|]##. These are both special cases of a more general form: for any real number ##p \geq 1##, we may define what mathematicians call the ##p##-norm:
$$\|x-\mu\|_p = \left(E[|x - \mu|^p]\right)^{1/p}$$
The standard deviation is the ##p=2## case, and your suggestion is the ##p=1## case. All of these are measures of the deviation of ##x## from its mean. Higher values of ##p## assign greater weight (or penalty) when ##|x - \mu| > 1##. The reverse is true when ##|x - \mu| < 1##.

The ##p=2## case has some nice mathematical properties.

First, we can easily expand the variance into three terms:
$$E[|x-\mu|^2] = E[(x-\mu)^2] = E[x^2] - 2\mu E[x] + \mu^2$$
(assuming ##x## is real-valued). This makes it very easy to work analytically with the ##p=2## case.

Second, it plays nicely with the notion of covariance. If we define the covariance of two random variables ##x## and ##y## to be ##\text{cov}(x,y) = E[(x-\mu_x)(y - \mu_y)]##, we get a measure of how correlated the two variables are. Then, we immediately have ##\text{var}(x) = \text{cov}(x,x)##. We also have the Cauchy-Schwarz inequality: ##|\text{cov}(x,y)| \leq \sigma_x \sigma_y##, where ##\sigma_x## and ##\sigma_y## are the standard deviations of ##x## and ##y##.

Another nice feature is that if ##x## and ##y## are independent, then you can simply add the variances: ##\text{var}(x+y) = \text{var}(x) + \text{var}(y)##. This very convenient property is not true for ##p \neq 2##.
 
Last edited:
  • #10
"and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense."

Much had to to with ease of calculation: working with squared differences was easier (many years ago) than working with other powers - certainly it was valuable to have nice, brief little formulas to use for their calculation.
Some was due to the assumption of Gaussian distributions for the data: if you assume your data is Gaussian (normal) than both the mean and variance (so standard deviation as well) are important: they are easily interpreted and together, once you assume a normal distribution, uniquely identify the distributional properties: if you know them you are the supreme emperor of your problem.
There were, as has been pointed out, other approaches.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
Replies
5
Views
6K
Replies
1
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 12 ·
Replies
12
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 14 ·
Replies
14
Views
7K
  • · Replies 5 ·
Replies
5
Views
2K