Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Question on G-Test

  1. Apr 15, 2014 #1

    it is claimed that the so called G-Test can be used as a replacement for the well-known Chi-squared test. The G-test is defined as: [tex]G = 2\sum_i O_i \cdot \log \left( \frac{O_i}{E_i}\right)[/tex]where Oi and Ei are the observed and expected counts in the cell i of a contingency table.

    I see a big problem with this.
    Namely, the value G is directly proportional to the total amount N of observations!

    This is easily seen even with the most trivial example of a coin toss. Suppose we want to test wheter a coin is fair or not. We collect N=10 samples and we obtain {1 head, 9 tails}. Thus, according to the above formula G≈7.36.
    Now suppose we collect N=100 samples and we obtain {10 heads, 90 tails}. Well, according to the above formula we now get G≈73.6, exactly ten times more.

    So, what is the threshold value for G above which we reject the null-hypothesis that the coin is fair?
  2. jcsd
  3. Apr 16, 2014 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    What's bad about that? Intuitively, more trials provide more evidence of a trend toward tails.

    What [itex] \alpha [/itex] do you want to use?

    "The" chi-square distribution is actually a family of distributions. You have to specify the "degrees of freedom" to specify a particular distribution.
  4. Apr 16, 2014 #3
    Well, let's say I want to set α=0.005.
    If we stick with the example of the coin-toss in my previous post, we have only 1 degree of freedom, to which it correspond a P-value of 7.879. Thus in the first case, where we had only 10 tosses, we won't yet reject the hypothesis that the coin is fair (G was ~7.36).
    In the second case when we have 100 tosses (more evidence), we obtained G≈73.6 which is more than enough to reject the hypothesis that the coin is fair.

    Yes, now it makes sense. I was just missing the correct interpretation.
    I believe that what confused me is that in both scenarios we had 90% tails and 10% heads, and I wrongly expected to get the same G-value.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook