B Variance & Standard Deviation

Click For Summary
Variance and standard deviation are both measures of data dispersion, with variance calculated as the average of squared deviations from the mean, while standard deviation is the square root of variance. Higher values of both indicate greater variability in the data. The discussion highlights that standard deviation, being derived from variance, is biased when comparing the two, particularly due to the non-linear nature of the square root function. It is emphasized that variance is additive for independent variables, while standard deviation shares units with the mean, making it more interpretable. Ultimately, both measures serve distinct purposes in statistical analysis, and their differences are crucial for understanding data variability.
Agent Smith
Messages
345
Reaction score
36
TL;DR
What are the uses of variance and standard deviation and how do they differ?
Going through my notes ... and I see the following:

1. Variance = ##\displaystyle \text{Var(X)} = \sigma^2 = \frac{1}{n - 1} \sum_i = 1 ^n \left(x_i - \overline x \right)^2##
2. Standard Deviation = ##\sigma = \sqrt {Var(X)} = \sqrt {\sigma^2}##

Both variance and standard deviation are measures of dispersion (colloquially the spread in the data). Higher their values, more spread out the data is.

Statement B: The square root function is not linear and so standard deviation is biased when compared to variance.

Questions:
1. Do high variance and standard deviation mean greater variability in the data?
2. What does statement B mean?
 
Physics news on Phys.org
The answer to 1 is "Yes".
The answer to 2 is "nothing". It is a meaningless statement because the concept of 'bias' only applies to estimators. A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken". For more detail see https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation.
 
  • Like
Likes Agent Smith and FactChecker
andrewkirk said:
A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken".
That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.
 
  • Like
Likes Agent Smith
@andrewkirk , by downward bias, do you mean it underestimates the variability in the data?

@FactChecker I didn't know dividing by ##n - 1## instead of ##n## corrects the bias. Gracias.

Statistics is hard! And I'm merely scratching the surface.

Can someone tell me what "square root function is not linear" means? I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.
 
Agent Smith said:
Can someone tell me what "square root function is not linear" means?
The graph is not a straight line.
1724840588570.png

Agent Smith said:
I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.
That is not a good way to judge if a function is linear. Consider the function, ##y=100 x##. It is linear, but a 1 unit change in ##x## becomes a 100 unit change in ##y##.
 
  • Like
Likes Agent Smith
Agent Smith said:
TL;DR Summary: What are the uses of variance and standard deviation and how do they differ?

The square root function is not linear and so standard deviation is biased when compared to variance.
The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.
 
Last edited:
  • Like
Likes Agent Smith
I'd say that they measure the same thing. The advantage of variance is that the variances of independent random variables add meaningfully while their standard deviations don't.
 
  • Skeptical
Likes Agent Smith
@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.

Dale said:
The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.
:thumbup:
 
FactChecker said:
That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.
This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.
 
  • Like
Likes hutchphd and FactChecker
  • #10
Agent Smith said:
Statistics is hard! And I'm merely scratching the surface.
groan. Actually very nicely done. groan.
 
  • Haha
Likes Agent Smith
  • #11
mjc123 said:
This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.
I stand corrected. Thanks. I learned something I have had wrong all my life. The relevant part of this backs up what you said.
 
  • Like
Likes Agent Smith
  • #12
Agent Smith said:
@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.


:thumbup:
This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.
 
  • Like
Likes Agent Smith
  • #13
Hornbein said:
This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
 
Last edited:
  • #14
@FactChecker
Capture.PNG

So if we take a linear function ##f(X) = 2X## then ##E[f(X)] = f(E[X])##?
 
  • #15
  • #16
No, expectation itself is linear. E[2X]=2E[X].
 
  • #17
Agent Smith said:
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
What do you mean by variance contradicting the standard deviation?
Re linearity, I suspect both biasedness as well as maximum likelihood estimators may only be preserved by linear transformations. Maybe @statdad can confirm or deny this?
 
  • #18
WWGD said:
What do you mean by variance contradicting the standard deviation?
Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.
 
  • #19
WWGD said:
No, expectation itself is linear. E[2X]=2E[X].
##f(X) = 2X##? The other function/operation must be linear?
 
  • #20
Agent Smith said:
##f(X) = 2X##? The other function/operation must be linear?
What I mean is that if X is a random variable, then the expectation of the RV 2X is twice the expectation of X.
 
  • Like
Likes Agent Smith
  • #21
Agent Smith said:
Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.
@WWGD ☝️
 
  • #22
Agent Smith said:
Well, you can argue too, that when Variance increases, so does SD, by their functional dependence, at least when variance >1. But what notion, messure other than those two, do you use as a measure of variability? But I was wrong above. There are nonlinear transformations that preserve unbiasedness, maximum likelihood property.
 
  • Like
Likes Agent Smith
  • #23
@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?
 
  • #24
Agent Smith said:
@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?
That's close, yes.
 
  • Haha
Likes Agent Smith
  • #25
Agent Smith said:
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.
 
  • Like
Likes Agent Smith and Vanadium 50
  • #26
The units are different, so it makes no sense to compare the size of variance and standard deviation. It does make sense to compare the size of the mean and the standard deviation, they are the same units.
 
  • Like
Likes Agent Smith, Hornbein, Vanadium 50 and 2 others
  • #27
There are two numbers. One is the square of the other. All the usual rules apply.
There is no statistical significance to any of this. Am I missing something??
 
  • #28
All this is why statistics are standardized/normalized to the unitless "Z-score."
 
  • Like
Likes Agent Smith and Dale
  • #29
hutchphd said:
There are two numbers. One is the square of the other. All the usual rules apply.
Including the most important - they have different units. What is bigger, a gallon or a calorie?
 
  • Like
Likes hutchphd and Dale
  • #30
@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra 👇 )
Hornbein said:
Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 28 ·
Replies
28
Views
3K
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
Replies
1
Views
3K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
Replies
18
Views
3K