Register to reply 
Quadratic form and degrees of freedom fixed title 
Share this thread: 
#1
Jan1513, 01:20 PM

P: 25

My question is: Is there any gain in intuitive mathematical understanding of degrees of freedom from learning their expression using the 'quadratic form' and matrix algebra techniques?
This sort of explanation is at least understandable and selfconsistent, if not rigorous mathematically: "Degrees of freedom are a way of keeping score. A data set contains a number of observations, say, n. They constitute n individual pieces of information. These pieces of information can be used to estimate either parameters or variability. In general, each item being estimated costs one degree of freedom. The remaining degrees of freedom are used to estimate variability. All we have to do is count properly. " However, I don't like that I am doing relatively simple mathematical operations without an understanding of what the mathematical justification is. If I learned matrix algebra enough to understand the 'quadratic form' sense of d.f., will it make more sense to me why we use a denominator of e.g. n1 for estimating average variance, or will I just know a more complicated way of deriving degrees of freedom? (I have very limited exposure to matrix algebra, and it didn't seem that intuitive, i.e. hard to translate into nonmatrix terms, but maybe that would change if I studied it more seriously and at least got used to its rules). Thanks in advance! 


#2
Jan1613, 11:20 AM

Sci Advisor
P: 3,297

The general idea is that if you have a set S of things that can be defined by a description that uses n variables and each arbitrary choice of values for the n variables defines a distinct element of S then you can claim that S has n degrees of freedom. In the typical applied math situation, you a set have a set S defined by M variables and K constraints on the variables. (For example an element of S might be defined by M real numbers x1,x2,,xM subject to the contraints x1 + x2 + ..xM = 1 and (x1)^2 + (x2)^2 + ... (xM)^2 = 7 ). So you don't have M degrees of freedom because you can't assign the x's an arbitrary set of M values. The way things usually work out is that the number of degrees of freedom you have is = M minus the number of contraints. This is not a foolproof rule, because it won't work if some of the constraints are "dependent" on each other. (This brings up the further problem of how to define "dependent" since we aren't necessarily talking about contraints as vectors.) Most useful formulas involving degrees of freedom give us the convenient of not having to rewrite the set S explicity as a new set of MK variables. We just use the values of the original set of M variables and fix the answer by some adjustment involving K. In a typical statistics problem, each element in the set S is a single real number that is the value of some function. Often S is a statistic, which by definition is a function of the values in a sample. I think you can find a mathematical justification for the formulas you encounter that use "degrees of freedom". it certainly might involve matrix algebra and quadratic forms if it is a statistics formula. But I don't think think you'll find a single mathematical theory that justifies all the "degrees of freedom" formulas that popup. Explain which particular "degrees of freedom" formula that you want to justify. 


#3
Jan1613, 11:21 PM

P: 25

Very quickly, it would be great if I could see how with a threepoint sample of item a, b, and c, when you calculate sample variance, you have to divide the total variance into n1 portions instead of n. I am reading the frequently cited HW Walker's 1940 paper (Journal of Educational Psychology. 31(4) (1940) 253269) on d.f., wherein it is explained that degrees of freedom arise from dimensions of your sample, minus the number of constraints you have placed on your data:
"Consider now a point (x, y, z) in threedimensional space (N = 3). If no restrictions are placed on its coördinates, it can move with freedom in each of three directions, has three degrees of freedom. All three variables are independent. If we set up the restriction x + y + z = c , where c is any constant, only two of the numbers can be freely chosen, only two are independent observations. For example, let x − y − z = 10 . If now we choose, say, x = 7 and y = 9 , then z is forced to be − 12 . The equation x − y − z = c is the equation of a plane, a twodimensional space cutting across the original three dimensional space, and a point lying on this space has two degrees of freedom. N − r = 3− 1 = 2. If the coördinates of the (x, y, z) point are made to conform to the condition x^2 + y^2 + z^2 = k , the point will be forced to lie on the surface of a sphere whose center is at the origin and whose radius is ￼√￼k. The surface of a sphere is a two dimensional space. (N = 3, r = 1, N − r = 3 − 1 = 2 .)." It would help me, I think, if basically the above paragraph was written explicitly from the point of, say, you have your 3 data points, and then how the constraints emerge for those three data points, causing a decrease in d.f., when you calculate the sample variance after calculating the sample mean. For example, I see that x^2 + y^2 + z^2 = k means you have restricted the degrees of freedom to 31 for any k, but x, y, z are sample points right, not residuals, in that example? So that's not directly analogous to calculating variance. Thanks a lot for your response, it already gives a lot to think about. 


#4
Jan1713, 12:36 AM

Sci Advisor
P: 3,297

Quadratic form and degrees of freedom fixed title
The first thing to straighten out is the distinction between "the sample variance" and "an estimator of the population variance".
"The sample variance" is actually an ambiguous phrase. Some books define it with the formula [itex] \sum_{i=1}^n \frac{ (X_i  \bar{X})^2} {n} [/itex] (e.g. http://mathworld.wolfram.com/SampleVariance.html) and some books define it to be [itex] \sum_{i=1}^n \frac{ (X_i  \bar{X})^2} {n1} [/itex], where the n independent sample values are the [itex] X_i [/itex] and [itex] \bar{X} [/itex] is the sample mean. An "unbiased estimator" is a function of the sample values whose purpose is to estimate some parameter of the population. Since an estimators is a function of the random values in the sample, the estimator itself is a random variable. An "unbiased estimator" is an estimator whose expected value is exactly equal to the population parameter it is intended to estimate. One "unbiased estimator" for the population variance is given by [itex] \sum_{i=1}^n \frac{ (X_i  \bar{X})^2} {n1} [/itex]. To prove it is unbiased, you must prove the expected value of this estimator is the population variance. The estimator [itex] \sum_{i=1}^n \frac{ (X_i  \bar{X})^2} {n} [/itex] is not an unbiased estimator of the population variance. So you shouldn't look for any mathematical argument that "proves" the formula for the sample variance. Its formula is simply a matter of convention. You may look for a mathematical argument proving that a formula is an unbiased estimator of the population variance. Mathematically correct arguments about estimators can be complicated and nonintuitive. For example the estimator [itex] \sum_{i=1}^n \frac{ (X_i  \bar{X})^2} {n1} [/itex] is an unbased estimator of the population variance but the estimator [itex] \sqrt{ \sum_{i=1}^n \frac{ (X_i  \bar{X})^2} {n1}} [/itex] is not necessarily an unbiased estimator for the population standard deviation. 


#5
Jan1713, 12:43 AM

Sci Advisor
P: 3,297

At the moment Google thinks the PDF http://www.google.com/url?sa=t&rct=j...,d.aWM&cad=rja is the best proof that the estimator with denominator n1 is unbiased. I haven't read this proof carefully, but at least it gives you an idea of what a mathematical proof would involve.



#6
Jan1713, 12:59 AM

P: 25

So there is no way to take the language from Walker's paper and explicitly make it about how to calculating an unbiased estimate of the population variance causes you to divide by n1 instead of n? What I'm getting is that there are proofs that n1 works, which is fine, and there is this sense that degrees of freedom are about loss of dimensionality when estimating (?), but I don't see how the latter sense arises mathematically.



#7
Jan1713, 02:19 AM

Sci Advisor
P: 3,297




#8
Jan1713, 12:34 PM

P: 25




#9
Jan1713, 05:27 PM

Sci Advisor
P: 3,297

As far as I can see, her discussion of the figure doesn not lead to a proof of any of the formulas that involve the degrees of freedom. I think her exposition would have been clearer if she had reconciled her algebra with the geometry of the picture by setting N = 3 and mu = 0. A complicated situation is often broken up into simpler cases. She is considering a case where all the samples (of size 3) have the same mean. Within such a case, the possible samples are points that lie on the shaded triangle, which is a 2D figure. The geometry shows that for the sample S, the sum of the squares of the deviations from the population mean is the square of length OS. The sum of the squares of the deviations from the sample mean is the square of length AS. I don't see how this leads to a specific formula for an unbiased estimator of the variance. It simply illustrates that the two ways of calculating the squares of the deviations are different.
If there is a proof for the formula for the unbiased estimator of the variance based on geometry and reducing degrees of freedom, my guess is that it would involve conditional expectation. The expected value of a random variable can be computed by dividing the possible outcomes up into mutually exclusive sets of outcomes. In the case of a function of 3 sample values, the 3D solid of possible outcomes can be divided up into "layers", each of which is a 2D figure. We compute the mean of the variable on each layer and add up the result. To compute the mean value of the variable X on the layer L, compute the mean value X using the conditional probability density that restricts X to be in L. Then we multiply this mean value by the probability that X is in L. But this general pattern of proof doesn't produce any universal rule that says "always use n1 instead of n". 


#10
Jan1713, 11:41 PM

P: 25




#11
Jan1813, 05:23 AM

Sci Advisor
P: 3,297

Instead of an x,y,z axis system, she has a X1,X2,X3 axis system. A point (a,b,c) represents a sample of size ( three realizations of the same random variable.) If we ask for the surface where X1+X2+X3 = k then we get a plane [itex] P_k [/itex] that includes the points (0,0,k), (0,k,0), (k,0,0). Those points are where the shaded triangle hits the 3 axes. The shaded triangle is the part of the plane where each sample value isl nonnegative. (There is no requirement that each sample value must be nonnegative, it just makes it easier to visualize the plane if you only show the shaded triangle.) Everywhere on the plane [itex] P_k [/itex] , the sample mean is (X1+X2+X3)/3 = k/3. .
The point A represents the sample where all 3 samples values are equal to each other. This is the sample (k/3, k/3, k/3). She is assuming the origin [itex] O [/itex] = (0,0,0) represents a sample where each realization of the random variable was exactly equal to the mean of the population. So, the picture assumes the mean of the population is 0. The point S represents a sample where the sum of the sample values is k, but the sample values are not equal to each other. The sum of the square deviations from the sample values in S to the population mean (0,0,0) is the square of the length of line segment OS. This follows from using the distance formula for the distance between two points in 3 dimensions. The sum of the square deviations of the sample values in S from the sample mean is the square of the length of line segment AS, by using the distance formula. Using some convoluted language she says (I think) that the ratio OA/OS is the value of the tstatistic for the sample represented by S. As far as I can see, she hasn't proved any theorem. She didn't give any particular reason for only considering samples where the values sum to k. My reaction is "Why the heck is this an oftencited paper?". Perhaps some other forum member has more insight. To illustrate my remarks about conditional expectation, if you look at the planes [itex] P_k [/itex] and let k range over all possible values, these planes would include all possible 3value samples. If you know a formula for the mean of some estimator on the plane [itex] P_k [/itex] then you can find the mean value of the estimator (over all samples) by "averaging up" its mean value over all possible [itex] P_k [/itex]. 


Register to reply 
Related Discussions  
Degrees of Freedom in QM  Quantum Physics  1  
Degrees of freedom  Classical Physics  22  
Degrees of freedom  Introductory Physics Homework  1  
Degrees of Freedom  Set Theory, Logic, Probability, Statistics  1  
Degrees of freedom  General Physics  7 