A The chi square goodness-of-fit test with no degrees of freedom left

AI Thread Summary
The discussion centers on the chi-square goodness-of-fit test and the implications of having no degrees of freedom when comparing an empirical frequency distribution to a theoretical one. It highlights that even with no degrees of freedom, significant deviations can occur between the observed and theoretical distributions. The calculations of chi-square values for different sets of scores illustrate how the inclusion of additional data points can affect the significance of the fit. The conversation also clarifies the relationship between data points, parameters, and degrees of freedom, emphasizing that the number of degrees of freedom is determined by the formula involving classes and estimated parameters. Ultimately, the comparison of different probability models reveals varying measures of fit, underscoring the complexity of statistical analysis in this context.
Ad VanderVen
Messages
169
Reaction score
13
TL;DR Summary
How to deal with a chi square goodness-of-fit test if the number of degrees of freedom is equal to zero?
I have an empirical frequency distribution as for example below:

##f_{2} = \, \, \, 21##
##f_{3} = 111##
##f_{4} = \, \, \, 24##

The theoretical distribution is determined by two parameters. So for a chi-square goodness-of-fit test there are actually no degrees of freedom left. Yet the theoretical distribution deviates from the observed distribution. The fact that there are no degrees of freedom left does not ensure that the theoretical and the observed distribution coincide. Can you still say something about the goodness-of-fit ?
 
Physics news on Phys.org
If you have three measurements and two parameters, you have one degree of freedom.

An example of zero degrees of freedom would be a linear fit to two points. In that case there is no goodness of fit information.
 
For the theoretical distribution ##P (X = k)## it holds in this case that ##k = 2, 3, 4 \dots ##. If I calculate chi square for the scores ##2, 3, 4## with expected values ##NP (X = 2)##, ##NP (X = 3)## and ##NP (X \geq 4)## then I get .381, but if I compute chi square for the scores ##2, 3, 4## and ##5##, where ##f_{5} = 0## with expected value ##NP(X = 2)##, ##NP(X = 3)##, ##NP (X = 4)## and ##NP(X \geq 5)## then get I as a result 3.719 and that would be significant with one degree of freedom.
 
I don't see how your second message has anything to do with your first. Your first measurement has 3 measurements and 2 parameters - i.e. one degree of freedom.
 
I thought the number of degrees of freedom (##df##) was equal to the number of classes minus the number of estimated parameters minus 1. So in this case for ##f_{2} = \, \, \, 21##, ##f_{3} = 111## and ##f_{4} = \, \, \, 24## one would expect ##df = 0## (##= 3-2-1##).
 
I don't know what you mean by "class". You have X data points and Y fit parameters, so you have X-Y degrees of freedom. So if, as your OP says, you have 3 data points and 2 parameters in your model, you have one degree of freedom.
 
Ad VanderVen said:
but if I compute chi square for the scores ##2, 3, 4## and ##5##, where ##f_{5} = 0## with expected value ##NP(X = 2)##, ##NP(X = 3)##, ##NP (X = 4)## and ##NP(X \geq 5)## then get I as a result 3.719 and that would be significant with one degree of freedom.

You are comparing two different probability models to the same data, so isn't surprising that you get different measures of fit. The first probability model makes no prediction for X=4. The second one does.
 
Back
Top