Chi-square test: why does it follow a Chi-square distribution

mnb96 · Apr 14, 2014

Hello,

it is well-known that the Chi-square test between an observed distribution O and an expected distribution E can be interpreted as a test based on (twice) the second order Taylor approximation of the Kullback-Leibler divergence, i.e.: [tex]2\,\mathcal{D}_{KL}(O \| E) \approx \sum_i \frac{(O_i-E_i)^2}{E_i} = \chi^2[/tex]
where i is the bin of the histogram (or contigency table). A proof is given here (page 5).

The question is: how do we know that each of the error terms [itex]\frac{(O_i-E_i)^2}{E_i}[/itex] on the right side of the above equation follows a normal distribution N(0,1)? There is probably some some assumption to be made...?

Stephen Tashi · Apr 15, 2014

mnb96 said:

The question is: how do we know that each of the error terms [itex]\frac{(O_i-E_i)^2}{E_i}[/itex] on the right side of the above equation follows a normal distribution N(0,1)?

[itex]\frac{ (O_i - E_i)^2}{E_i}[/itex] is nonnegative, so it doesn't follow a normal distribution.

If [itex]X[/itex] is a binomial random variable representing the number of "successes" n independent trials with probability of success [itex]p[/itex] on each trial then the distribution of [itex]Y = \frac {X-np}{\sqrt{np(1-p)}<br /> }[/itex] can be approximated by a [itex]N(0,1)[/itex] distribution.

mnb96 · Apr 15, 2014

Stephen Tashi said:

[itex][/itex]

If [itex]X[/itex] is a binomial random variable ...

I see. There it is our assumption!
It seems to me that such an assumption automatically implies that the data in the cells of the contingency table are assumed to follow a multinomial distribution.

So in the end, although the formula for calculating the [itex]\chi^2[/itex] value is just an approximation of the Kullback-Leibler divergence, if we are willing to perform a decision test we still need the assumption that we are dealing with a multinomial distribution, otherwise the [itex]\chi^2[/itex] value that we calculated according to the formula above, does not necessarily follow a chi²-distribution.

Chi-square test: why does it follow a Chi-square distribution

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Graduate Expected numbers of cards of a last color remaining

Undergrad How do E[X] and E[|X|] relate?

Undergrad The problem of points

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect