Pearson's chi-square test versus chi-squared distribution

Click For Summary

Discussion Overview

The discussion revolves around the relationship between Pearson's chi-square test and the chi-squared distribution, focusing on the definitions of the chi-square statistic, the nature of statistics as random variables, and the convergence of the statistic to the distribution as the number of observations increases. The scope includes theoretical aspects and mathematical reasoning.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Exploratory

Main Points Raised

  • One participant outlines the definitions of the chi-square statistic, the chi-squared distribution, and the chi-squared test, expressing confusion about how a single statistic can approach a distribution.
  • Another participant clarifies that a statistic is a function of a random variable and that its realizations follow a distribution function, explaining that as the number of observations increases, the distribution of the statistic converges to the chi-square distribution.
  • A later reply references a mathematical approach involving saddle point approximation for large dimensions, suggesting a method to explore the connection between the statistic and the distribution.

Areas of Agreement / Disagreement

Participants generally agree on the definitions and the nature of the chi-square statistic as a function of random variables. However, the discussion remains unresolved regarding the proof of the convergence of the statistic to the chi-squared distribution and the specifics of the mathematical connection.

Contextual Notes

The discussion includes assumptions about the nature of statistics and convergence, but these assumptions are not fully explored or proven within the thread.

nomadreid
Gold Member
Messages
1,773
Reaction score
256
I know that there are many web-sites that explain Pearson's chi-square test, but they all leave the same questions unanswered. First, to make sure I have the definitions right:
1) for a fixed population with standard deviation σ,a fixed number of degrees of freedom df=k, and a fixed sample with variance s2
the chi-square statistic = k*the ratio of the sample variance to the population variance = k*(s22), also expressed as the sum of the squares of (the difference between an observation to the expected value, as expressed in terms in units of population standard deviation).
2) For this population and this df, the chi-squared distribution is then the graph for all samples with the chi-squared statistic on the x-axis and the probability density on the y.
3) The chi-squared test uses the statistic
Ʃ (Oi-Ei)2/Ei for i values, with Oi being the observed frequency of the i'th value, Ei being its expected frequency.

OK, so far so good. But now what I do not get is the next comment: that as i goes to infinity, the chi-squared statistic approaches the chi-square distribution. First and foremost, how does a statistic, which is a single number, approach a distribution? Does it mean the cumulative distribution? Second (but not as important), is there a relatively short proof of this fact? Or at least a way to see the connection between the formulas? Thanks in advance.
 
Physics news on Phys.org
nomadreid said:
First and foremost, how does a statistic, which is a single number, approach a distribution?

A statistic T isn't a single number, it is a function of a (usually vectorial) random variable T=f(Y). That means that T itself is a random variable. Its realizations ##t_j## in repetitions of an experiment follow a distribution function.
In case of the chi-square statistic, the realization y of Y is the vector ##(O_i)^T##.
If you repeat the experiment, you will get different realizations ##y_j## and different realizations of the statistic ##t_j##.
When the dimension of the vector, i.e. the maximal i, goes to infinity, the distribution of ##T## converges in distribution to the chisquare distribution.
See
http://en.wikipedia.org/wiki/Convergence_of_random_variables
 
Dr. Du: Thank you, that adequately answers my first question.
Now, if I am lucky, someone will answer my second question.
 
Calculate ##p(t)=\int\ldots\int dy_1\ldots dy_n p(y_1)\ldots p(y_n) \delta(t-f(\vec{y}))## and use a saddle point approximation for large n.
 
Dr. Du: thanks very much. Makes sense. Enlightening.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 23 ·
Replies
23
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 20 ·
Replies
20
Views
4K