G-Test: Is it Dependent on Total Amount of Observations?

mnb96 · Apr 15, 2014

Hello,

it is claimed that the so called G-Test can be used as a replacement for the well-known Chi-squared test. The G-test is defined as: G = 2\sum_i O_i \cdot \log \left( \frac{O_i}{E_i}\right)where O_i and E_i are the observed and expected counts in the cell i of a contingency table.

I see a big problem with this.
Namely, the value G is directly proportional to the total amount N of observations!

This is easily seen even with the most trivial example of a coin toss. Suppose we want to test wheter a coin is fair or not. We collect N=10 samples and we obtain {1 head, 9 tails}. Thus, according to the above formula G≈7.36.
Now suppose we collect N=100 samples and we obtain {10 heads, 90 tails}. Well, according to the above formula we now get G≈73.6, exactly ten times more.

So, what is the threshold value for G above which we reject the null-hypothesis that the coin is fair?

Stephen Tashi · Apr 16, 2014

mnb96 said:

exactly ten times more.

What's bad about that? Intuitively, more trials provide more evidence of a trend toward tails.

So, what is the threshold value for G above which we reject the null-hypothesis that the coin is fair?

What \alpha do you want to use?

"The" chi-square distribution is actually a family of distributions. You have to specify the "degrees of freedom" to specify a particular distribution.

mnb96 · Apr 16, 2014

Stephen Tashi said:

What \alpha do you want to use?

"The" chi-square distribution is actually a family of distributions. You have to specify the "degrees of freedom" to specify a particular distribution.

Well, let's say I want to set α=0.005.
If we stick with the example of the coin-toss in my previous post, we have only 1 degree of freedom, to which it correspond a P-value of 7.879. Thus in the first case, where we had only 10 tosses, we won't yet reject the hypothesis that the coin is fair (G was ~7.36).
In the second case when we have 100 tosses (more evidence), we obtained G≈73.6 which is more than enough to reject the hypothesis that the coin is fair.

Stephen Tashi said:

What's bad about that? Intuitively, more trials provide more evidence of a trend toward tails.

Yes, now it makes sense. I was just missing the correct interpretation.
I believe that what confused me is that in both scenarios we had 90% tails and 10% heads, and I wrongly expected to get the same G-value.

G-Test: Is it Dependent on Total Amount of Observations?

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Roulette wheel physics and probability'

Thread 'Here's a Statistics problem for game of Polo (or Hockey if you like)'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective