Chi-square test: why does it follow a Chi-square distribution

  • Thread starter Thread starter mnb96
  • Start date Start date
  • Tags Tags
    Distribution Test
Click For Summary
SUMMARY

The Chi-square test is fundamentally linked to the Kullback-Leibler divergence, specifically through the second order Taylor approximation, expressed as 2\,\mathcal{D}_{KL}(O \| E) \approx \sum_i \frac{(O_i-E_i)^2}{E_i} = \chi^2. Each error term \frac{(O_i-E_i)^2}{E_i} does not follow a normal distribution due to its nonnegative nature. The assumption that the data in the contingency table follows a multinomial distribution is crucial for the validity of the Chi-square test; without this assumption, the calculated \chi^2 value does not conform to a Chi-square distribution.

PREREQUISITES
  • Understanding of Chi-square tests and their applications
  • Familiarity with Kullback-Leibler divergence
  • Knowledge of multinomial distributions
  • Basic statistics, including binomial random variables
NEXT STEPS
  • Study the derivation of the Chi-square test from Kullback-Leibler divergence
  • Explore the properties of multinomial distributions in statistical testing
  • Learn about the implications of assumptions in statistical tests
  • Investigate alternative tests when assumptions of the Chi-square test are violated
USEFUL FOR

Statisticians, data analysts, researchers conducting hypothesis testing, and anyone involved in statistical modeling and inference.

mnb96
Messages
711
Reaction score
5
Hello,

it is well-known that the Chi-square test between an observed distribution O and an expected distribution E can be interpreted as a test based on (twice) the second order Taylor approximation of the Kullback-Leibler divergence, i.e.: 2\,\mathcal{D}_{KL}(O \| E) \approx \sum_i \frac{(O_i-E_i)^2}{E_i} = \chi^2
where i is the bin of the histogram (or contigency table). A proof is given here (page 5).

The question is: how do we know that each of the error terms \frac{(O_i-E_i)^2}{E_i} on the right side of the above equation follows a normal distribution N(0,1)? There is probably some some assumption to be made...?
 
Physics news on Phys.org
mnb96 said:
The question is: how do we know that each of the error terms \frac{(O_i-E_i)^2}{E_i} on the right side of the above equation follows a normal distribution N(0,1)?

\frac{ (O_i - E_i)^2}{E_i} is nonnegative, so it doesn't follow a normal distribution.

If X is a binomial random variable representing the number of "successes" n independent trials with probability of success p on each trial then the distribution of Y = \frac {X-np}{\sqrt{np(1-p)}<br /> } can be approximated by a N(0,1) distribution.
 
Last edited:
  • Like
Likes 1 person
Stephen Tashi said:


If X is a binomial random variable ...

I see. There it is our assumption!
It seems to me that such an assumption automatically implies that the data in the cells of the contingency table are assumed to follow a multinomial distribution.

So in the end, although the formula for calculating the \chi^2 value is just an approximation of the Kullback-Leibler divergence, if we are willing to perform a decision test we still need the assumption that we are dealing with a multinomial distribution, otherwise the \chi^2 value that we calculated according to the formula above, does not necessarily follow a chi2-distribution.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 20 ·
Replies
20
Views
4K