Intuitively, shouldn't variance always be 0?

  • Thread starter samh
  • Start date
  • Tags
    Variance
In summary, the conversation discusses the concept of variance and how it is calculated. The formula for variance is E[(X-E[X])^2], which may seem counterintuitive as the expected value of a random variable should always be the mean. However, the concept of squaring the difference between observations and the mean is what makes variance a positive value. This is because it is an average of the squared differences and not just the differences themselves. The conversation concludes with a better understanding of why variance should not always be zero.
  • #1
samh
46
0
Ok I know that Var[X] = E[(X-E[X])^2]. But I just can't help but think that the variance should always be zero. I think it makes so much sense, but obviously the formula says otherwise... But look, my reasoning seems so perfect:

1) The variance is the expected difference from the mean, squared

2) The expected value of X is the mean

3) So shouldn't we always expect X to be the mean, and so (X - mean)^2 = 0^2 = 0?

But obviously it doesn't work out that way...it's so weird. Does anyone know an intuitive reason why Var[X] shouldn't be zero? What's wrong with my logic?
 
Physics news on Phys.org
  • #2
samh said:
1) The variance is the expected difference from the mean, squared.

What's wrong with my logic?
The first item is what is wrong with your logic. While the expected difference from the mean is tautologically zero, but that is not what the variance is. The variance is the expected value of the square of the difference between the random variable and the mean.

What does this mean? The difference between a random variable and its mean is identically zero only for a constant random variable. In a way it is a bit silly to even talk about a constant random variable because there is no randomness. The cumulative probability function (CDF) for a constant is the step function. Not a very interesting random variable.

Now consider drawing a set of sample values from a random process whose underlying CDF is a smooth function. Some of these sample values will be above the expected mean, some under the expected mean. While the difference between the ith sampled value and the mean might be positive or negative, the square of this difference is always positive. The mean of a bunch of positive values is positive. It is this mean that forms the variance.
 
  • #3
Variance is an average of the square of differences of observed values from the mean. It depends on the nature of statistical distribution. In other words, it is a measure of the deviation from the average.
The squaring part makes all the difference because when summing and taking average, we square the difference so that only positive quantities are added. Had we taken -ve sign for values below the mean, we would obviously have obtained the value equal to 0 as you said.
 
  • #4
Ok I see. I forgot that E[...] is kind of like the average of many repetitions. I see what you guys are saying, it makes sense that the average of a bunch of positive numbers is positive. Thanks :)
 

1. Why is variance 0 when all the data points are the same?

According to the formula for calculating variance, the measure is based on the deviation of each data point from the mean. When all the data points are the same, there is no deviation from the mean, resulting in a variance of 0.

2. Can variance ever be negative?

No, variance cannot be negative. It is always a non-negative value because it is calculated by squaring the differences between each data point and the mean.

3. Is a lower variance always better?

Not necessarily. The concept of a "better" or "worse" variance depends on the context and the purpose of the analysis. In some cases, a lower variance may be preferred because it indicates less variability in the data. However, in other cases, a higher variance may be desired to capture a wider range of values.

4. How does changing the units of measurement affect the variance?

Changing the units of measurement does not affect the variance. This is because the variance is calculated based on the squared differences between data points and the mean, which is independent of the units used.

5. Can the variance of a dataset be greater than the range of the data?

Yes, it is possible for the variance to be greater than the range of the data. The range only takes into account the difference between the maximum and minimum values, whereas the variance considers the deviation of each data point from the mean. Therefore, a few extreme values can greatly impact the variance, even if the range is relatively small.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
776
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
782
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
814
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
815
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Back
Top