# Confusion on application of definition of degrees of freedom

Gold Member
I am confused about the counting of degrees of freedom. Yes, I know that it is the number of vectors which are free to vary. But that definition gives way to different interpretations:
(1) the number of data points minus the number of independent variables. This seems to be the basis of the standard "n-1" or "n-2" in many applications.
(2) just the number of independent variables. This seems to be the basis in applications with 1 degree of freedom (example below), or when one says that the movement of a robot arm has 6 degrees of freedom, being +x,+y,+z,-x,-y,-z. [In this latter example, I am puzzled why, say (2,0,0) is considered the same as (1,0,0) for the purposes of counting, but they are considered distinct from (-1, 0, 0). Both (2,0,0) and (-1,0,0) are just λ(1,0,0).]

So, for example reading a psychology paper with statistics that appear to me dubious, I came across the following set of data in which the author is making a correlation between female first names and places of residence,

Milwaukee: Women named Mildred= 865, expected value = 806
Virginia Beach: Women named Mildred= 230, expected value = 289
Milwaukee: Women named Virginia= 544, expected value = 603
Virginia Beach: Women named Virginia = 275, expected value = 216

[I am not making this up. Ignobel Prizes, take note: "Why Susie Sells Seashells by the Seashore: Implicit Egotism and Major Life Decisions" by Pelham, B., Mirenberg, M., and Jones, J.; Journal of Personality and Social Psychology 2002, Vol. 82, No. 4, 469-487]

The authors then state (p. 471) that the "association between name and place of residence for women was highly significant, $\chi$2(1) = 38.25, p<.001." Apart from other questions of validity of this study, my question is whether the df= 1 here is justified. This would seem to be the number of independent variables interpretation, ignoring the number of data points.

So, three questions: is (1) or (2) above correct (and so why the other interpretation exists), why North and South are considered separately in a robot arm, and whether the psychology paper is fudging with the df count.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
Simon Bridge
Homework Helper
Look up "chi-square distribution" for the useage of the phrase "degrees of freedom" in this context.

Gold Member
Simon Bridge, thanks for the answer, but of course I had looked it up before posting my question; the fact that this did not give me a clear answer led to my confusion and my question. Here's what I came up with [(1) and (2) refer to the two interpretations in my original post]:

Stat trek says "... a random sample of size n from a normal population ... v = n - 1 is the number of degrees of freedom...."
and further
"The number of degrees of freedom generally refers to the number of independent observations in a sample minus the number of population parameters that must be estimated from sample data. "
And in an example
"Therefore, [in this example] the number of degrees of freedom is equal to the sample size minus one."
all of which implies that (1).

Wikipedia
the parameter corresponds to the degrees of freedom of an underlying random vector, as in the preceding ANOVA example. Another simple example is: if Xi; i =1,...n are independent normal (μ,σ2) random variables, the [chi-squared] statistic ...follows a chi-squared distribution with n−1 degrees of freedom.
Sounds like (1)

However:
Wolfram mathworld:
"The number of degrees of freedom in a problem, distribution, etc., is the number of parameters which may be independently varied."
sounds vaguely like (2)

Khan academy seems to imply that (2).

The psychology example I presented seems to imply that (2).

Therefore I am asking Forum contributors to judge whether the use in the psychology example was valid, which would help me decide.

Stephen Tashi
There is no universal definition of "degrees of freedom" that applies across all technical fields. I think the "spirit" of the notion is universal in that it is supposed to mean how many variables can be varied independently.

In robotics the degrees of freedom of a robot arm (according to the Wikipedia article) is the number of rotating joints. It is possible to have a joint with a non-reversible motor that spins in only one direction. That might explain what +x and-x count as different "degrees". It wouldn't make sense of analyze robot arms only in terms of the dimensions of the 3D manifold that the tip of the arm can travel since you have to worry about other parts of the arm bumping into things. Two very different postures of the arm might put the tip at the same location in 3D space. You can't actually vary the rotations of all the joints independently in some types of arms since the arm might bump into itself.

In statistics, "degrees of freedom" appears in behind the scenes theoretical calculations that done to prove a particular estimator or statistic has "the formula in the book". Such calculations involve doing multiple integrals. When you do a multiple integral, you can view it as integrating a function over some subset of N dimensional space. If there are no constraints on the variables, you integrate oveer all of an N dimensional space. If there there are K> 0 "independent" constraints then you integrate over some propert subset of N dimensional space ( for example, an N dimensional sphere). The degrees of freedom counts how many independent variables are involved in the integration. This vague description can be made more specific by considering specific statistics.

Applying statistics to practical problems is a subjective matter. Hypothesis testing for "significance" is simply a procedure. It isn't a proof of anything and it doesn't quantify the probability that a given hypothesis is correct. It does involve quantifying the probability of the data on the assumption of a given hypothesis. What is the hypothesis that is of interest? There is a distinction between the hypothesis "People who live in Virginia have the same liklihood of being named "Virginia" as people who live in any other state" and the compound hypothesis "People who live in "Milwaukee have the same probability of being named "Mildred" as people who live in any other state and people who live in Virginia have the same probability of being named "Virginia" as people who live in any other state".

Gold Member
Many thanks for the answer, Stephen Tashi. I definitely like the definition of the number of variables over which you would have to integrate; I then do not see when one would ever use data points (unless they were one per independent variable). Your answer on the robotics issue makes perfect sense. The psychology article I cited is trying to handle a proposition "people are disproportionately likely to live in places whose names resemble their own first or last names" (taken from the abstract). As I mentioned, there are plenty of reasons to put the validity of the article's treatment into question, but I am concentrating on the statistical part, and I find the p-value oddly tiny, i.e., the chi-squared statistic suspiciously high. So my first suspicion fell on the small df=1, but it appears to me that you are saying that this, at least, is correct. Am I reading you wrong?

ssd
Can you put forward the data?

Stephen Tashi
I then do not see when one would ever use data points (unless they were one per independent variable)
In statistics, a "statistic" is not a single number. A "statistic" is a function of the data values. Since the data values are random variables, a statistic is also a random variable. A typical use of a statistic is as an "estimator" of an unknown parameter, so you can think of a typical statistic as being a formula, whose variables are the values in a sample. Properties such as the mean value of a statistic are computed by an integration. For example if you have two independent sample values $x_1$ and $x_2$ from the same probability density function $f(x)$ , the usual estimator for the mean of the distribution is $\frac{(x_1 + x_2)}{2}$. To prove that the average value of this estimator is exactly equal to the actual mean of the distribution, you must prove $\int x f(x) dx = \int \int \frac {x_1 + x_2}{2} f(x_1) f(x_2) dx_1 dx_2$. This shows how the number of data values in a sample does affect the number of variables involved in integrations.

. So my first suspicion fell on the small df=1, but it appears to me that you are saying that this, at least, is correct. Am I reading you wrong?
If you are only interested in a hypothesis about whether the particular name "Virginia" is equally likely to be a name of a person in the state of Virginia than any non-"Virginia" name then df = 1 looks OK to me. The general question of whether some names might be more likely to be chosen in particular states is more complicated. For example, you can imagine an unethical researcher going through lists of names and "cherry-picking" some that occurred more often in one state than another and only publishing the chi-square df=1 results for those names. If thousands of names were randomly assigned indpendently of states then just by chance there might be a few that were more frequent in particular states.

Simon Bridge
Homework Helper
Aside to what Stephen says:
Off post #2: Wolfram and Khan academy appear to be talking in general terms while the other references are talking specifically about the Chi-squared distribution. That seems to be why you are seeing two possibilities: there are two situations generating two different meanings.

I am confused about the counting of degrees of freedom. Yes, I know that it is the number of vectors which are free to vary. But that definition gives way to different interpretations:
(1) the number of data points minus the number of independent variables. This seems to be the basis of the standard "n-1" or "n-2" in many applications.
You are confused as to what is a variable and what is not. This is common in math, usually this has to be inferred from context. It's natural to think of data points as constants, but in this respect they aren't. They are variables.

The "degrees of freedom" concept is hard to explain, and also means different things in different contexts. I learned the idea though linear algebra, where it the same thing as the rank of a matrix, and then by analogy you can guess what the writer means.

The reason for n-1 in the calculation of standard deviation is that we aren't using the n variables, instead we are using the differences between the variables and the mean. It is easiest to see when you have precisely one data point. Call it X with value x. The mean m is always going to be x, so x-m will always be zero and you have a constant. So you have no variables at all, and zero degrees of freedom. When you have more variables than that it isn't so obvious, and you have to figure out the rank of a matrix.

Gold Member
Thanks for all the replies. One by one:
Stephen Tashi: thank you for the explanation and the very enlightening example. This was much more concrete than the more common definitions to be found. In your example, then, the degrees of freedom appear to me to be 2, since you have the two variables x1 and x2 over which you are integrating. Right?
It appears that the authors did not do any data dredging, but as you remark, there are all sorts of other issues in this research.

ImaLooser: Thank you for your explanation which unifies the two apparently different ways ((1) & (2) from my original post) to calculate df. That is a great help.

Stephen Bridge: Thanks: true, there are two different situations, but I am looking for a definition which is at the same time general enough to cover the different situations, yet specific and concrete enough to be able to systematically apply the definition. These other responses seem to be doing just that: both ImaLooser and Stephen Tashi are pointing out that I have been putting the cart before the horse, in looking at data points as constants which then are calculated on, whereas first comes the form of the calculation upon which the calculation of the degrees of freedom are based, and then the constants are thrown in. Otherwise put, if I am understanding this correctly, one must decide the minimum number of dimensions for which the data are points in before worrying about the actual values. (By the way, you said that Wiki and Stat Trek were talking about the chi-squared distribution when they use n-1, whereas the article is talking about the chi-square distribution, and n-1 is not what it is using. That is, the chi-squared distribution can take both cases.)

ssd: Thanks for being willing to look through the data. The article only presents data in compilation form, as I presented in my original post. The full article is to be found at http://www.stat.columbia.edu/~gelman/stuff_for_blog/susie.pdf

I think I am starting to see through the fog, for which I am very grateful. Any further remarks will also be greatly appreciated.

Last edited:
Gold Member
Sorry for this continuation, but although I almost got the idea, I came across a problem: in all the expositions of the chi-squared distribution, they insist that the d.f.= sample size minus one. At first I figured that, in line with the definition using an n-dimensional vector space, this just meant if you had n different independent samples, but the examples kept insisting on using a sample space of one independent variable, with n different data points for that variable. This doesn't seem to fit.
For example, using the example of my original post
Milwaukee: 865 Mildred's & E[M]= 806; 544 Virginia's & E[V] = 603
Virginia Beach: 230 Mildred's & E[M]= 289; 275 Virginia's & E[V] = 216
It would seem that there is one independent variable, location, so k=1. But the definition of sample size minus one, the number would be much higher, such as the combined population of the two cities.
So I am still puzzled.
Many thanks for the continued explanation.

Stephen Tashi
in all the expositions of the chi-squared distribution, they insist that the d.f.= sample size minus one.
Can you give a link to an exposition that makes that claim? Expositions of "Pearsons Chi-square Test" don't say that.

Stephen Tashi
In the chi-square goodness of fit, it is the number of "cells" that enter into the degrees of freedom calculation, not then number of observations. http://en.wikipedia.org/wiki/Pearson's_chi-squared_test

I don't know what variant of a chi-square test the first link you gave is talking about and the second link has expired.

In the chi-square goodness of fit, it is the number of "cells" that enter into the degrees of freedom calculation, not then number of observations. http://en.wikipedia.org/wiki/Pearson's_chi-squared_test

I don't know what variant of a chi-square test the first link you gave is talking about and the second link has expired.
The cells are bins that have a mean. Since he is using the two sample means, then there is only one bin, I think. If he divided it into points greater than the mean and points less than the mean then there would be two bins. Etc.

So its confusing. In the first example the variables are the data points, in the second example the variables are the sample means. This is a common difficultly in mathematics: what is a variable depends on context, and it often isn't explicitly explained.

Gold Member
Thank you, Stephen Tashi: if I am following you correctly, the k in my original example could be calculated by having two cells, so k = 2-1?
Thanks, ImaLooser, for the moral support in agreeing that it is confusing.
Sorry about the link that timed out. Strange, it was OK for me.

Stephen Tashi
Thank you, Stephen Tashi: if I am following you correctly, the k in my original example could be calculated by having two cells, so k = 2-1?
I think your example is a 2x2 grid of 4 cells.

________Named "Virginia"___Not Named "Virginia"
From VA________ x_______________y______________ row total = x+y
Not from VA_____ z_______________w_____________ row total = z+w
____________col total x+z_______col total y + w

If you were given the row and col totals and you wanted to assign values of x,y,z,w that were consistent with those totals, you could make 1 "free" choice (for example you could set x to some number between 0 and the smaller of x+y, x+z). After that one free choice, the other numbers in the table would be determined. So there is 1 degree of freedom.

Gold Member
Many thanks, Stephen Tashi.
This is assuming I am given the row and column totals. However, if I only had the total sample size before doing the experiment, then I would need at least two pieces of data before being able to determine the other numbers in the table, giving me then 2 degrees of freedom. So, before determining the number of degrees of freedom, one needs to know the information given at the outset of the experiment, no?

Last edited:
Stephen Tashi
This is assuming I am given the row and column totals. However, if I only had the total sample size before doing the experiment, then I would need at least two pieces of data before being able to determine the other numbers in the table, giving me then 2 degrees of freedom.

As I said, there is no universal definition of "degrees of freedom". In the particular case of Pearsons Chi-square test for the independence of two classifications, degrees of freedom is counted as I indicated. The count of degrees of freedom has to do with mathematics done in "behind the scenes" computations. Attempts to justify the count by procedures that say "if you were given thus-and-so, you would have this many free choices" are merely ways to assist memorizing the method for counting the degrees of freedom. These procedures don't actually prove anything about degrees of freedom. If you wanted a proof, you would have to tackle the theoretical mathematics in detail.

So, before determining the number of degrees of freedom, one needs to know the information given at the outset of the experiment, no?
One needs to know the probablity model for the data specified by the null hypothesis and one needs to know the specific statistic being used. And, in practice, one needs to know what procedure is used to count the degrees of freedom for the distribution of that particular statistic. I don't know any universal procedure that would work for all possible statistics.

Gold Member
Thanks again, Stephen Tashi. I appreciate the fact that the more interesting concepts in mathematics and elsewhere do not have any "quick fix", and so your answers tell me that I should go back and dig deeper into the underlying mathematics. I shall do so, and thanks for bringing me this far.

ssd
Sorry for this continuation, but although I almost got the idea, I came across a problem: in all the expositions of the chi-squared distribution, they insist that the d.f.= sample size minus one.
My goodness!!!!!!!!

Following are the example of three well known tests done by frequency chi sq.

1/ Test for goodness of fit: df=k-r-1, k= # classes, r=# parameters estimated

2/ Test for homogeneity: df= (k-1)*(n-1), k=#classes, n=# populations

3/ Test for independence of two attributes :df=(k-1)*(n-1), k=#classes for one attribute, n=# classes for other attribute.

Gold Member
Thanks, ssd. I shall widen my search for better sources. (I am against book burning, but sometimes....It would not take much more effort for the authors who give oversimplifications to add a footnote saying that there are other cases.)

ssd
I am against book burning, but sometimes....It would not take much more effort for the authors who give oversimplifications to add a footnote saying that there are other cases.)
Agree.
My approach to study a 'notion' of a subject not rigorously known to me is to find someone who went through the subject (viz, some professor or graduate students of the topic) and ask for standard references. In this way, I generally could avoid popular but oversimplified books.