Tedjn said:
I have had the same problem understanding this issue. Frequently, textbooks and online websites gloss over the issue with a pithy and unsatisfactory statement about degrees of freedom, leaving me to wonder whether the real explanation has anything to do with degrees of freedom at all.
It is degrees of freedom. Specifically, it's because you're already using the same data to estimate the mean; if you were to know the population mean ahead of time, and were only interested in the variance on its own, then the unbiased estimator would indeed use a denominator of n. What's more common, however, is that you need to first estimate the mean, and then use that estimate in your estimate of the variance. It's this cascaded method of estimation that throws off the variance estimator, and requires the n-1 denominator. Intuitively speaking, introducing the mean estimate into the variance estimator eliminates one degree of freedom because the mean estimate (which is just the population average), together with any n-1 of the samples, uniquely determines the other sample.
Tedjn said:
Why is this, or is division by n-1 just a better estimator than division by n in the finite case. If so, why?
Let's crank through it:
Assume we have n i.i.d. samples \left\{x_1,\ldots,x_n \right\} with mean \mu and variance \sigma^2. First, let's consider what would happen if we knew the true mean \mu and only wanted to estimate the variance:
E\left[ \sum_{i=1}^n (x_i - \mu)^2 \right] = \sum_{i=1}^n E\left[ (x_i - \mu)^2 \right] = n\sigma^2.
Which is to say that we'd use an estimator with denominator n to get an unbiased estimate. So far, so good, right? Now, let's examine what happens if we don't know \mu and instead need to estimate it, using the usual sample average estimator (which is unbiased):
E\left [ \sum_{i=1}^n\left( x_i - \frac{1}{n}\sum_{k=1}^n x_k \right)^2 \right ] = \sum_{i=1}^nE\left[ \left( x_i - \frac{1}{n}\sum_{k=1}^n x_k \right)^2 \right]
= \sum_{i=1}^n E \left[ x_i^2 - \frac{2}{n}x_i\sum_{k=1}^nx_k + \frac{1}{n^2}\left( \sum_{k=1}^n x_k \right)^2 \right]
= n\left( \sigma^2 + \mu^2 - \frac{2}{n}(\sigma^2 + n\mu^2) + \frac{1}{n^2}(n\sigma^2 + n^2\mu^2) \right) = (n-1)\sigma^2
So, we see that the terms arising from the mean estimator (which is a random variable) had the net effect of subtracting \sigma^2 from the sum, requiring a denominator of (n-1) for unbiasedness. I.e., it's like you're estimating the variance with a known mean, but only only (n-1) data points.
A more explicit way to demonstrate this is to write the mean estimate in terms of the true mean: \frac{1}{n}\sum_{i=1}^n x_i = \mu + \epsilon where E(\epsilon) = 0, Var(\epsilon) = \frac{\sigma^2}{n} and E(x_i \epsilon) = \frac{\sigma^2}{n}\,,\,\forall i. Then, the variance estimator looks like:
\sum_{i=1}^nE\left[ (x_i - \mu - \epsilon)^2 \right] = \sum_{i=1}^n E\left[ (x_i - \mu)^2 - 2(x_i-\mu)\epsilon + \epsilon^2 \right] = n\left( \sigma^2 - 2\frac{\sigma^2}{n} + \frac{\sigma^2}{n} \right) = (n-1)\sigma^2.
Comparing that derivation with the fist one (using the true mean, that is), it should be evident that the introduction of the "error term" (\epsilon) has had an effect equivalent to the elimination of a degree of freedom.