B Variance & Standard Deviation

Agent Smith
Messages
345
Reaction score
36
TL;DR Summary
What are the uses of variance and standard deviation and how do they differ?
Going through my notes ... and I see the following:

1. Variance = ##\displaystyle \text{Var(X)} = \sigma^2 = \frac{1}{n - 1} \sum_i = 1 ^n \left(x_i - \overline x \right)^2##
2. Standard Deviation = ##\sigma = \sqrt {Var(X)} = \sqrt {\sigma^2}##

Both variance and standard deviation are measures of dispersion (colloquially the spread in the data). Higher their values, more spread out the data is.

Statement B: The square root function is not linear and so standard deviation is biased when compared to variance.

Questions:
1. Do high variance and standard deviation mean greater variability in the data?
2. What does statement B mean?
 
Physics news on Phys.org
The answer to 1 is "Yes".
The answer to 2 is "nothing". It is a meaningless statement because the concept of 'bias' only applies to estimators. A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken". For more detail see https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation.
 
  • Like
Likes Agent Smith and FactChecker
andrewkirk said:
A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken".
That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.
 
  • Like
Likes Agent Smith
@andrewkirk , by downward bias, do you mean it underestimates the variability in the data?

@FactChecker I didn't know dividing by ##n - 1## instead of ##n## corrects the bias. Gracias.

Statistics is hard! And I'm merely scratching the surface.

Can someone tell me what "square root function is not linear" means? I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.
 
Agent Smith said:
Can someone tell me what "square root function is not linear" means?
The graph is not a straight line.
1724840588570.png

Agent Smith said:
I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.
That is not a good way to judge if a function is linear. Consider the function, ##y=100 x##. It is linear, but a 1 unit change in ##x## becomes a 100 unit change in ##y##.
 
  • Like
Likes Agent Smith
Agent Smith said:
TL;DR Summary: What are the uses of variance and standard deviation and how do they differ?

The square root function is not linear and so standard deviation is biased when compared to variance.
The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.
 
Last edited:
  • Like
Likes Agent Smith
I'd say that they measure the same thing. The advantage of variance is that the variances of independent random variables add meaningfully while their standard deviations don't.
 
  • Skeptical
Likes Agent Smith
@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.

Dale said:
The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.
:thumbup:
 
FactChecker said:
That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.
This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.
 
  • Like
Likes hutchphd and FactChecker
  • #10
Agent Smith said:
Statistics is hard! And I'm merely scratching the surface.
groan. Actually very nicely done. groan.
 
  • Haha
Likes Agent Smith
  • #11
mjc123 said:
This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.
I stand corrected. Thanks. I learned something I have had wrong all my life. The relevant part of this backs up what you said.
 
  • Like
Likes Agent Smith
  • #12
Agent Smith said:
@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.


:thumbup:
This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.
 
  • Like
Likes Agent Smith
  • #13
Hornbein said:
This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
 
Last edited:
  • #14
@FactChecker
Capture.PNG

So if we take a linear function ##f(X) = 2X## then ##E[f(X)] = f(E[X])##?
 
  • #15
  • #16
No, expectation itself is linear. E[2X]=2E[X].
 
  • #17
Agent Smith said:
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
What do you mean by variance contradicting the standard deviation?
Re linearity, I suspect both biasedness as well as maximum likelihood estimators may only be preserved by linear transformations. Maybe @statdad can confirm or deny this?
 
  • #18
WWGD said:
What do you mean by variance contradicting the standard deviation?
Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.
 
  • #19
WWGD said:
No, expectation itself is linear. E[2X]=2E[X].
##f(X) = 2X##? The other function/operation must be linear?
 
  • #20
Agent Smith said:
##f(X) = 2X##? The other function/operation must be linear?
What I mean is that if X is a random variable, then the expectation of the RV 2X is twice the expectation of X.
 
  • Like
Likes Agent Smith
  • #21
Agent Smith said:
Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.
@WWGD ☝️
 
  • #22
Agent Smith said:
Well, you can argue too, that when Variance increases, so does SD, by their functional dependence, at least when variance >1. But what notion, messure other than those two, do you use as a measure of variability? But I was wrong above. There are nonlinear transformations that preserve unbiasedness, maximum likelihood property.
 
  • Like
Likes Agent Smith
  • #23
@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?
 
  • #24
Agent Smith said:
@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?
That's close, yes.
 
  • Haha
Likes Agent Smith
  • #25
Agent Smith said:
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.
 
  • Like
Likes Agent Smith and Vanadium 50
  • #26
The units are different, so it makes no sense to compare the size of variance and standard deviation. It does make sense to compare the size of the mean and the standard deviation, they are the same units.
 
  • Like
Likes Agent Smith, Hornbein, Vanadium 50 and 2 others
  • #27
There are two numbers. One is the square of the other. All the usual rules apply.
There is no statistical significance to any of this. Am I missing something??
 
  • #28
All this is why statistics are standardized/normalized to the unitless "Z-score."
 
  • Like
Likes Agent Smith and Dale
  • #29
hutchphd said:
There are two numbers. One is the square of the other. All the usual rules apply.
Including the most important - they have different units. What is bigger, a gallon or a calorie?
 
  • Like
Likes hutchphd and Dale
  • #30
@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra 👇 )
Hornbein said:
Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.
 
  • #31
Agent Smith said:
which is a bette
Which is better, momentum or energy? Temperature or pressure? Volume or area?
 
  • Haha
Likes Agent Smith
  • #32
Agent Smith said:
which is a better, the variance or the standard deviation, in giving us an accurate measure of variability
The distinctions between them are not in terms of accuracy. Variance is nice because it is additive and you can partition a total variance into separate portions. The standard deviation is nice because it has the same units as the random variable itself and can be meaningfully compared to the mean.

Agent Smith said:
By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability
Neither of these statements is true.
 
  • Like
  • Wow
Likes Agent Smith, Vanadium 50 and hutchphd
  • #33
Agent Smith said:
@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra 👇 )
@Dale & @Vanadium 50 ☝️ 🤔

For ##\sigma = 9## (say) meters, the ##9## meters by itself is meaningless, oui?
 
  • #34
One can deduce that the variance is a measure of the spread or dispersion of a probability distribution through Chebyshev's inequality. E. Parsen: "Modern Probability Theory and Its Applications"

Since I am not well versed in this subject I leave it to the cognoscenti to elaborate.
 
  • Like
Likes Agent Smith
  • #35
gleem said:
One can deduce that the variance is a measure of the spread or dispersion of a probability distribution through Chebyshev's inequality. E. Parsen: "Modern Probability Theory and Its Applications"

Since I am not well versed in this subject I leave it to the cognoscenti to elaborate.
Gracias.
 
  • #36
Agent Smith said:
@Dale & @Vanadium 50 ☝️ 🤔

For ##\sigma = 9## (say) meters, the ##9## meters by itself is meaningless, oui?
##9 \mathrm{\ m}## is meaningful since it is never "by itself" but is in the context of the SI system. What I was objecting to was your claim that "By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability". Neither of these is true.

The numerical point you are alluding to is numerically wrong. Yes, ##9^2 = 81 > 9##, but ##0.9^2 = 0.81 < 0.9##. So it is not always true that squaring a number "amplifies" the number. Nor is it always true that taking the square root always "downsizes" the number. Yesterday I was working with a data set with means in the ##-0.010## range and standard deviations in the ##0.005## range, so variances had even smaller numbers that were annoying to format. The variances were decidedly not "amplified".

Also, the units don't work out. ##(9\mathrm{\ m})^2 = 81 \mathrm{\ m^2}## cannot be compared to ##9\mathrm{\ m}## at all. So you cannot say that ##81 \mathrm{\ m^2}## is "amplified" from ##9 \mathrm{\ m}##.

It doesn't make sense to me to talk about the magnitude or the size of the variability as something different from the standard deviation and the variance.
 
  • Like
Likes Vanadium 50 and Agent Smith
  • #37
Dale said:
##9 \mathrm{\ m}## is meaningful since it is never "by itself" but is in the context of the SI system. What I was objecting to was your claim that "By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability". Neither of these is true.

The numerical point you are alluding to is numerically wrong. Yes, ##9^2 = 81 > 9##, but ##0.9^2 = 0.81 < 0.9##. So it is not always true that squaring a number "amplifies" the number. Nor is it always true that taking the square root always "downsizes" the number. Yesterday I was working with a data set with means in the ##-0.010## range and standard deviations in the ##0.005## range, so variances had even smaller numbers that were annoying to format. The variances were decidedly not "amplified".

Also, the units don't work out. ##(9\mathrm{\ m})^2 = 81 \mathrm{\ m^2}## cannot be compared to ##9\mathrm{\ m}## at all. So you cannot say that ##81 \mathrm{\ m^2}## is "amplified" from ##9 \mathrm{\ m}##.

It doesn't make sense to me to talk about the magnitude or the size of the variability as something different from the standard deviation and the variance.
Gracias for the clarification, but it's still non liquet.

Say we have a statistical report saying that for whale sharks, their lengths, ##\mu = 20 \text{ m}## and ##\sigma = 3 \text{ m}##, for me ##\sigma## only makes sense if linked to ##\mu## and one also has to have a fair idea of what a ##\text{meter}## is. I think someone mentioned that we need keep our units in mind before interpreting statistical information.

Correct?
 
  • #38
Agent Smith said:
for me ##\sigma## only makes sense if linked to ##\mu##
Why? I could have a random variable with ##\sigma = 3 \mathrm{\ m}## regardless of ##\mu##. The only thing that ##\sigma=3\mathrm{\ m}## tells us about ##\mu## is that ##\mu## has dimensions of length.
 
  • Like
Likes Agent Smith
  • #39
A follow up question.
Watched a video on Pearson's correlation coefficient: ##r = \frac{\sum (x - \overline x)(y - \overline y)}{\sqrt {\sum (x_i - \overline x)^2 \sum (y_i - \overline y)^2}}##. The denominator, says the author, is the process of standardization. I don't what that means. Is this ##\frac{x - \overline x}{\sigma}## (z score) also standardization?
 
  • #40
Agent Smith said:
A follow up question.
Watched a video on Pearson's correlation coefficient: ##r = \frac{\sum (x - \overline x)(y - \overline y)}{\sqrt {\sum (x_i - \overline x)^2 \sum (y_i - \overline y)^2}}##. The denominator, says the author, is the process of standardization. I don't what that means. Is this ##\frac{x - \overline x}{\sigma}## (z score) also standardization?
It normalized the result to be on within [-1, 1], where 1 is perfectly positively correlated, -1 is perfectly negatively correlated, and anything else is in between. When you want to know how strongly two things are related, you want to be able to ignore the scale of variation of each one. So you divide each one by its standard deviation.
Consider what happens without the denominators to normalize the calculations. Suppose I wanted to know how strongly the length of a person's left toe tended to imply other size characteristics.
In one case, I want to see how well it implied the length of the same person's right toe. Without the denominators, all the numbers and variations from the mean would be small and the calculation would be a small number even though you know the relationship is very strong.
In another case, I want to see how well it implied the heights of the same people. You know that the relationship is not as strong, but the calculation will give a larger number because the heights are larger and have more variation.
 
Last edited:
  • Like
Likes Dale and Agent Smith
  • #41
The words "standardized" and "normalized" get tossed around quite a bit in these discussions, rather like "derivative" and "integral" get tossed around in analysis: the words can mean different procedures while describing things that are quite similar in concept.

Think about the correlation coefficient you have posted. The expression $$\frac 1 n \sum{\left(x-\overline{x}\right)\left(y-\overline{y}\right)}$$ is the covariance of the variables, and its size depends on the units of both variables, so it can be arbitrarily large or small, depending on the scales of those variables.

Remember that the (biased) standard deviation for x is $$\sqrt{\frac 1 n \sum{\left(x-\overline{x}\right)^2}}$$, with a similar expression for y. The correlation coefficient is the covariance divided by the product of the two standard deviations: this has the effect of cancelling out the units of both variables so that correlation is a pure number. That process is usually referred to as normalization, but unfortunately standardization is also used.

How does it ensure correlation is between -1 and 1? It comes from the Cauchy-Schwarz inequality that says $$\left(\sum{ab}\right)^2 \le \left(\sum{a^2}\right)\left(\sum{b^2}\right)$$. If you divide the both sides by the right you get $$\frac{\left(\sum{ab}\right)^2}{\left(\sum{a^2}\right)\left(\sum{b^2}\right)} \le 1$$, and taking square roots gets the "-1 <= *** <= 1" part. Use ##x - \overline{x}## for a and##y - \overline y## for b.
 
  • Like
Likes Agent Smith and FactChecker
  • #42
What is this
Capture.PNG
 
  • #43
  • Like
Likes Agent Smith
Back
Top