Variance & Standard Deviation

Click For Summary
SUMMARY

Variance and standard deviation are both measures of data dispersion, defined mathematically as Variance = Var(X) = σ² = (1/(n - 1)) Σ (xᵢ - x̄)² and Standard Deviation = σ = √Var(X). A higher value in either indicates greater variability in the data. The standard deviation is a biased estimator of the population standard deviation when calculated from a sample, but using n-1 instead of n provides an unbiased estimator for variance. The square root function's non-linearity means that differences in variance do not translate linearly to differences in standard deviation.

PREREQUISITES
  • Understanding of basic statistical concepts such as mean, variance, and standard deviation.
  • Familiarity with mathematical notation and operations, including summation and square roots.
  • Knowledge of sample vs. population statistics and the implications of bias in estimators.
  • Basic comprehension of probability distributions and their properties.
NEXT STEPS
  • Study the properties of Chebyshev's inequality and its implications for variance and standard deviation.
  • Learn about the differences between biased and unbiased estimators in statistics.
  • Explore the concept of Z-scores and their application in standardizing data.
  • Investigate the implications of non-linear transformations in statistical analysis.
USEFUL FOR

Statisticians, data analysts, students studying statistics, and anyone involved in data analysis or interpretation who seeks to understand the nuances of variance and standard deviation.

Agent Smith
Messages
345
Reaction score
36
TL;DR
What are the uses of variance and standard deviation and how do they differ?
Going through my notes ... and I see the following:

1. Variance = ##\displaystyle \text{Var(X)} = \sigma^2 = \frac{1}{n - 1} \sum_i = 1 ^n \left(x_i - \overline x \right)^2##
2. Standard Deviation = ##\sigma = \sqrt {Var(X)} = \sqrt {\sigma^2}##

Both variance and standard deviation are measures of dispersion (colloquially the spread in the data). Higher their values, more spread out the data is.

Statement B: The square root function is not linear and so standard deviation is biased when compared to variance.

Questions:
1. Do high variance and standard deviation mean greater variability in the data?
2. What does statement B mean?
 
Physics news on Phys.org
The answer to 1 is "Yes".
The answer to 2 is "nothing". It is a meaningless statement because the concept of 'bias' only applies to estimators. A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken". For more detail see https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation.
 
  • Like
Likes   Reactions: Agent Smith and FactChecker
andrewkirk said:
A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken".
That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.
 
  • Like
Likes   Reactions: Agent Smith
@andrewkirk , by downward bias, do you mean it underestimates the variability in the data?

@FactChecker I didn't know dividing by ##n - 1## instead of ##n## corrects the bias. Gracias.

Statistics is hard! And I'm merely scratching the surface.

Can someone tell me what "square root function is not linear" means? I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.
 
  • Haha
Likes   Reactions: hutchphd
Agent Smith said:
Can someone tell me what "square root function is not linear" means?
The graph is not a straight line.
1724840588570.png

Agent Smith said:
I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.
That is not a good way to judge if a function is linear. Consider the function, ##y=100 x##. It is linear, but a 1 unit change in ##x## becomes a 100 unit change in ##y##.
 
  • Like
Likes   Reactions: Agent Smith
Agent Smith said:
TL;DR Summary: What are the uses of variance and standard deviation and how do they differ?

The square root function is not linear and so standard deviation is biased when compared to variance.
The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.
 
Last edited:
  • Like
Likes   Reactions: Agent Smith
I'd say that they measure the same thing. The advantage of variance is that the variances of independent random variables add meaningfully while their standard deviations don't.
 
  • Skeptical
Likes   Reactions: Agent Smith
@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.

Dale said:
The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.
:thumbup:
 
FactChecker said:
That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.
This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.
 
  • Like
Likes   Reactions: hutchphd and FactChecker
  • #10
Agent Smith said:
Statistics is hard! And I'm merely scratching the surface.
groan. Actually very nicely done. groan.
 
  • Haha
Likes   Reactions: Agent Smith
  • #11
mjc123 said:
This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.
I stand corrected. Thanks. I learned something I have had wrong all my life. The relevant part of this backs up what you said.
 
  • Like
Likes   Reactions: Agent Smith
  • #12
Agent Smith said:
@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.


:thumbup:
This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.
 
  • Like
Likes   Reactions: Agent Smith
  • #13
Hornbein said:
This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
 
Last edited:
  • #14
@FactChecker
Capture.PNG

So if we take a linear function ##f(X) = 2X## then ##E[f(X)] = f(E[X])##?
 
  • #15
  • #16
No, expectation itself is linear. E[2X]=2E[X].
 
  • #17
Agent Smith said:
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
What do you mean by variance contradicting the standard deviation?
Re linearity, I suspect both biasedness as well as maximum likelihood estimators may only be preserved by linear transformations. Maybe @statdad can confirm or deny this?
 
  • #18
WWGD said:
What do you mean by variance contradicting the standard deviation?
Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.
 
  • #19
WWGD said:
No, expectation itself is linear. E[2X]=2E[X].
##f(X) = 2X##? The other function/operation must be linear?
 
  • #20
Agent Smith said:
##f(X) = 2X##? The other function/operation must be linear?
What I mean is that if X is a random variable, then the expectation of the RV 2X is twice the expectation of X.
 
  • Like
Likes   Reactions: Agent Smith
  • #21
Agent Smith said:
Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.
@WWGD ☝️
 
  • #22
Agent Smith said:
Well, you can argue too, that when Variance increases, so does SD, by their functional dependence, at least when variance >1. But what notion, messure other than those two, do you use as a measure of variability? But I was wrong above. There are nonlinear transformations that preserve unbiasedness, maximum likelihood property.
 
  • Like
Likes   Reactions: Agent Smith
  • #23
@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?
 
  • #24
Agent Smith said:
@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?
That's close, yes.
 
  • Haha
Likes   Reactions: Agent Smith
  • #25
Agent Smith said:
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.
 
  • Like
Likes   Reactions: Agent Smith and Vanadium 50
  • #26
The units are different, so it makes no sense to compare the size of variance and standard deviation. It does make sense to compare the size of the mean and the standard deviation, they are the same units.
 
  • Like
Likes   Reactions: Agent Smith, Hornbein, Vanadium 50 and 2 others
  • #27
There are two numbers. One is the square of the other. All the usual rules apply.
There is no statistical significance to any of this. Am I missing something??
 
  • Like
Likes   Reactions: Hornbein
  • #28
All this is why statistics are standardized/normalized to the unitless "Z-score."
 
  • Like
Likes   Reactions: Agent Smith and Dale
  • #29
hutchphd said:
There are two numbers. One is the square of the other. All the usual rules apply.
Including the most important - they have different units. What is bigger, a gallon or a calorie?
 
  • Like
Likes   Reactions: hutchphd and Dale
  • #30
@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra 👇 )
Hornbein said:
Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.
 
  • Like
Likes   Reactions: hutchphd

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
Replies
1
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 18 ·
Replies
18
Views
3K
Replies
5
Views
5K