G-Test: Is it Dependent on Total Amount of Observations?

  • Thread starter Thread starter mnb96
  • Start date Start date
AI Thread Summary
The G-Test is proposed as an alternative to the Chi-squared test, but it is directly proportional to the total number of observations, which raises concerns about its interpretation. For example, with a coin toss, increasing the sample size from 10 to 100 significantly increases the G value, impacting the threshold for rejecting the null hypothesis. The discussion highlights that the Chi-squared distribution requires specifying degrees of freedom to determine the appropriate threshold for significance. When using a significance level of α=0.005, the G values from different sample sizes lead to different conclusions about the fairness of the coin. This indicates that the G-Test's sensitivity to sample size can complicate its practical application.
mnb96
Messages
711
Reaction score
5
Hello,

it is claimed that the so called G-Test can be used as a replacement for the well-known Chi-squared test. The G-test is defined as: G = 2\sum_i O_i \cdot \log \left( \frac{O_i}{E_i}\right)where Oi and Ei are the observed and expected counts in the cell i of a contingency table.

I see a big problem with this.
Namely, the value G is directly proportional to the total amount N of observations!

This is easily seen even with the most trivial example of a coin toss. Suppose we want to test wheter a coin is fair or not. We collect N=10 samples and we obtain {1 head, 9 tails}. Thus, according to the above formula G≈7.36.
Now suppose we collect N=100 samples and we obtain {10 heads, 90 tails}. Well, according to the above formula we now get G≈73.6, exactly ten times more.

So, what is the threshold value for G above which we reject the null-hypothesis that the coin is fair?
 
Physics news on Phys.org
mnb96 said:
exactly ten times more.

What's bad about that? Intuitively, more trials provide more evidence of a trend toward tails.


So, what is the threshold value for G above which we reject the null-hypothesis that the coin is fair?

What \alpha do you want to use?

"The" chi-square distribution is actually a family of distributions. You have to specify the "degrees of freedom" to specify a particular distribution.
 
  • Like
Likes 1 person
Stephen Tashi said:
What \alpha do you want to use?

"The" chi-square distribution is actually a family of distributions. You have to specify the "degrees of freedom" to specify a particular distribution.

Well, let's say I want to set α=0.005.
If we stick with the example of the coin-toss in my previous post, we have only 1 degree of freedom, to which it correspond a P-value of 7.879. Thus in the first case, where we had only 10 tosses, we won't yet reject the hypothesis that the coin is fair (G was ~7.36).
In the second case when we have 100 tosses (more evidence), we obtained G≈73.6 which is more than enough to reject the hypothesis that the coin is fair.

Stephen Tashi said:
What's bad about that? Intuitively, more trials provide more evidence of a trend toward tails.

Yes, now it makes sense. I was just missing the correct interpretation.
I believe that what confused me is that in both scenarios we had 90% tails and 10% heads, and I wrongly expected to get the same G-value.
 
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top