Chi-square test: why does it follow a Chi-square distribution

  • Context: Graduate 
  • Thread starter Thread starter mnb96
  • Start date Start date
  • Tags Tags
    Distribution Test
Click For Summary
SUMMARY

The Chi-square test is fundamentally linked to the Kullback-Leibler divergence, specifically through the second order Taylor approximation, expressed as 2\,\mathcal{D}_{KL}(O \| E) \approx \sum_i \frac{(O_i-E_i)^2}{E_i} = \chi^2. Each error term \frac{(O_i-E_i)^2}{E_i} does not follow a normal distribution due to its nonnegative nature. The assumption that the data in the contingency table follows a multinomial distribution is crucial for the validity of the Chi-square test; without this assumption, the calculated \chi^2 value does not conform to a Chi-square distribution.

PREREQUISITES
  • Understanding of Chi-square tests and their applications
  • Familiarity with Kullback-Leibler divergence
  • Knowledge of multinomial distributions
  • Basic statistics, including binomial random variables
NEXT STEPS
  • Study the derivation of the Chi-square test from Kullback-Leibler divergence
  • Explore the properties of multinomial distributions in statistical testing
  • Learn about the implications of assumptions in statistical tests
  • Investigate alternative tests when assumptions of the Chi-square test are violated
USEFUL FOR

Statisticians, data analysts, researchers conducting hypothesis testing, and anyone involved in statistical modeling and inference.

mnb96
Messages
711
Reaction score
5
Hello,

it is well-known that the Chi-square test between an observed distribution O and an expected distribution E can be interpreted as a test based on (twice) the second order Taylor approximation of the Kullback-Leibler divergence, i.e.: 2\,\mathcal{D}_{KL}(O \| E) \approx \sum_i \frac{(O_i-E_i)^2}{E_i} = \chi^2
where i is the bin of the histogram (or contigency table). A proof is given here (page 5).

The question is: how do we know that each of the error terms \frac{(O_i-E_i)^2}{E_i} on the right side of the above equation follows a normal distribution N(0,1)? There is probably some some assumption to be made...?
 
Physics news on Phys.org
mnb96 said:
The question is: how do we know that each of the error terms \frac{(O_i-E_i)^2}{E_i} on the right side of the above equation follows a normal distribution N(0,1)?

\frac{ (O_i - E_i)^2}{E_i} is nonnegative, so it doesn't follow a normal distribution.

If X is a binomial random variable representing the number of "successes" n independent trials with probability of success p on each trial then the distribution of Y = \frac {X-np}{\sqrt{np(1-p)}<br /> } can be approximated by a N(0,1) distribution.
 
Last edited:
  • Like
Likes   Reactions: 1 person
Stephen Tashi said:


If X is a binomial random variable ...

I see. There it is our assumption!
It seems to me that such an assumption automatically implies that the data in the cells of the contingency table are assumed to follow a multinomial distribution.

So in the end, although the formula for calculating the \chi^2 value is just an approximation of the Kullback-Leibler divergence, if we are willing to perform a decision test we still need the assumption that we are dealing with a multinomial distribution, otherwise the \chi^2 value that we calculated according to the formula above, does not necessarily follow a chi2-distribution.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 11 ·
Replies
11
Views
2K