G-Test: Is it Dependent on Total Amount of Observations?

  • Thread starter mnb96
  • Start date
In summary, the chi-square test can be used to compare the frequencies of different events, but it can also be used to test the null-hypothesis that two events have the same frequency. However, the G-test is only accurate if the degrees of freedom are correctly specified.
  • #1
mnb96
715
5
Hello,

it is claimed that the so called G-Test can be used as a replacement for the well-known Chi-squared test. The G-test is defined as: [tex]G = 2\sum_i O_i \cdot \log \left( \frac{O_i}{E_i}\right)[/tex]where Oi and Ei are the observed and expected counts in the cell i of a contingency table.

I see a big problem with this.
Namely, the value G is directly proportional to the total amount N of observations!

This is easily seen even with the most trivial example of a coin toss. Suppose we want to test wheter a coin is fair or not. We collect N=10 samples and we obtain {1 head, 9 tails}. Thus, according to the above formula G≈7.36.
Now suppose we collect N=100 samples and we obtain {10 heads, 90 tails}. Well, according to the above formula we now get G≈73.6, exactly ten times more.

So, what is the threshold value for G above which we reject the null-hypothesis that the coin is fair?
 
Physics news on Phys.org
  • #2
mnb96 said:
exactly ten times more.

What's bad about that? Intuitively, more trials provide more evidence of a trend toward tails.


So, what is the threshold value for G above which we reject the null-hypothesis that the coin is fair?

What [itex] \alpha [/itex] do you want to use?

"The" chi-square distribution is actually a family of distributions. You have to specify the "degrees of freedom" to specify a particular distribution.
 
  • Like
Likes 1 person
  • #3
Stephen Tashi said:
What [itex] \alpha [/itex] do you want to use?

"The" chi-square distribution is actually a family of distributions. You have to specify the "degrees of freedom" to specify a particular distribution.

Well, let's say I want to set α=0.005.
If we stick with the example of the coin-toss in my previous post, we have only 1 degree of freedom, to which it correspond a P-value of 7.879. Thus in the first case, where we had only 10 tosses, we won't yet reject the hypothesis that the coin is fair (G was ~7.36).
In the second case when we have 100 tosses (more evidence), we obtained G≈73.6 which is more than enough to reject the hypothesis that the coin is fair.

Stephen Tashi said:
What's bad about that? Intuitively, more trials provide more evidence of a trend toward tails.

Yes, now it makes sense. I was just missing the correct interpretation.
I believe that what confused me is that in both scenarios we had 90% tails and 10% heads, and I wrongly expected to get the same G-value.
 

1. What is the G-Test and how is it used?

The G-Test, also known as the G-Test of Independence, is a statistical test used to determine if there is a significant relationship between two categorical variables. It is often used in scientific research to analyze data and make inferences about the population based on a sample.

2. How is the G-Test different from other statistical tests?

The G-Test differs from other statistical tests, such as the chi-square test, in that it takes into account the expected frequencies of the categories being compared. This makes it more suitable for analyzing small sample sizes and can provide more accurate results.

3. Is the G-Test dependent on the total amount of observations?

Yes, the G-Test is dependent on the total amount of observations. The accuracy and reliability of the results depend on having a sufficient number of observations in each category being compared. It is recommended to have at least 5 observations in each category for the G-Test to be valid.

4. How do I interpret the results of a G-Test?

The results of a G-Test provide a p-value, which indicates the probability of obtaining the observed data if there is truly no relationship between the variables being compared. A low p-value (usually less than 0.05) suggests that there is a significant relationship between the variables, while a high p-value suggests that there is no significant relationship.

5. Can the G-Test be used for more than two categories?

Yes, the G-Test can be used for more than two categories. In fact, it is often used for analyzing contingency tables with multiple rows and columns. However, as the number of categories increases, the accuracy of the results may decrease, so it is important to have a sufficient sample size for each category being compared.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
4K
  • Precalculus Mathematics Homework Help
Replies
12
Views
2K
  • Special and General Relativity
Replies
11
Views
135
Back
Top