- 137
- 1
If I have a sample consisting of n measurements why is the sample variance the result of dividing by n-1 instead of n?
jf
jf
Well, some texts/people use n, but the reason for using n-1 is to make the estimate unbiased. I.e., you want the expected value of your estimate to equal the true population variance, and this requires using n-1. I'll leave the details to you...If I have a sample consisting of n measurements why is the sample variance the result of dividing by n-1 instead of n?
jf
Why is this, or is division by n-1 just a better estimator than division by n in the finite case. If so, why?maverick280857 said:Well, if you have (n-1) then the expectation of the so defined sample variance exactly equals the population variance.
It is degrees of freedom. Specifically, it's because you're already using the same data to estimate the mean; if you were to know the population mean ahead of time, and were only interested in the variance on its own, then the unbiased estimator would indeed use a denominator of n. What's more common, however, is that you need to first estimate the mean, and then use that estimate in your estimate of the variance. It's this cascaded method of estimation that throws off the variance estimator, and requires the n-1 denominator. Intuitively speaking, introducing the mean estimate into the variance estimator eliminates one degree of freedom because the mean estimate (which is just the population average), together with any n-1 of the samples, uniquely determines the other sample.I have had the same problem understanding this issue. Frequently, textbooks and online websites gloss over the issue with a pithy and unsatisfactory statement about degrees of freedom, leaving me to wonder whether the real explanation has anything to do with degrees of freedom at all.
Let's crank through it:Why is this, or is division by n-1 just a better estimator than division by n in the finite case. If so, why?