Confused about the intuitive explanation of degrees of freedom

kotreny
Messages
46
Reaction score
0
One common explanation of the concept of D.F. is this:

Suppose you have n numbers (a, b, c,...) that make up a sample of a population. You want to estimate the variance of the population with the sample variance. But the sample mean m is being calculated from these numbers, so when determining the variance ((a-m)2+(b-m)2+(c-m)2...)/n, only n-1 numbers are free to vary. The n-th number must be chosen so that the mean of all n numbers comes out to m. Thus, there are only n-1 "degrees of freedom."

But wait--shouldn't m be free to vary in this case? The value of the n-th number is a function of the other numbers and m. Fair enough, but that means m must become the n-th degree of freedom!
 
Physics news on Phys.org
I am not sure what your point is. However in estimating the variance, the sample variance divisor is n-1 in order for it to be an unbiased estimate of the true variance.
 
Sorry, I forgot to add that this is a common intuitive explanation for why the n-1 creates an unbiased sample variance. I take it it's a bad one? Regardless, n-1 is generally said to be the number of degrees of freedom in the case of n numbers whose residuals must sum to zero. Supposedly, only n-1 numbers are useful as information because they are free to vary. The nth number is completely determined by the previous n-1 numbers and the condition that all n residuals sum to zero. Sometimes the explanation describes the sample mean as the condition. My argument is that either of these additional conditions qualify as degrees of freedom themselves, making it n degrees of freedom no matter what.

Here is a small sample of links with the D.F. explanation I am questioning. Either all are wrong (not likely), I misinterpreted them, or my own reasoning is naive. Please, clear up the situation for me if you can.

http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)#Linear_regression

http://www.tufts.edu/~gdallal/dof.htm

http://arnoldkling.com/apstats/df.html

 
Last edited by a moderator:
As a mathematician, specializing in probability theory (not statistics), I have not worked with the concept degrees of freedom. However, the proof of the use of n-1 comes directly from estimating the mean of the sample variance. To make it equal to the true variance, you need n-1.
 
Last edited:
Thank you for replying anyway. I am familiar with the proof you speak of, but some people have said that the n-1 "makes sense because it is the number of degrees of freedom." I rather doubt this claim; In fact, as I said twice, I doubt the entire claim that n-1 is even the number of D.F. to begin with.
 
I, too, have been struggling with this concept. I don't think degrees of freedom really work in an intuitive manner, so I'm just settling with using n-1 for sample variance to make it an unbiased estimator.
 
Hi mezza8, thanks for the input and welcome to the forums. Even if we discard completely the D.F. connection to the sample variance, D.F. is still an important concept in statistics. It is applied in the chi-square test for example. A lot of people say that degrees of freedom is an intuitive concept, and make the questionable argument seen in my links and discussed above. (Check the YouTube one for a particularly clear demonstration of this dubious reasoning. If the link doesn't work for any reason, the uploader's name is jdeisenberg. You can search that with "degrees of freedom.") I hope I have made clear why I think this argument is false. When using an estimated parameter to justify removing a D.F., the parameter itself becomes the so-called removed D.F.
 
Back
Top