Pearson's chi-square test versus chi-squared distribution

In summary, if you keep repeating an experiment, the chi-square statistic will eventually approach the chi-square distribution as the dimension of the vector, i.e. the maximal i, goes to infinity.
  • #1
nomadreid
Gold Member
1,668
203
I know that there are many web-sites that explain Pearson's chi-square test, but they all leave the same questions unanswered. First, to make sure I have the definitions right:
1) for a fixed population with standard deviation σ,a fixed number of degrees of freedom df=k, and a fixed sample with variance s2
the chi-square statistic = k*the ratio of the sample variance to the population variance = k*(s22), also expressed as the sum of the squares of (the difference between an observation to the expected value, as expressed in terms in units of population standard deviation).
2) For this population and this df, the chi-squared distribution is then the graph for all samples with the chi-squared statistic on the x-axis and the probability density on the y.
3) The chi-squared test uses the statistic
Ʃ (Oi-Ei)2/Ei for i values, with Oi being the observed frequency of the i'th value, Ei being its expected frequency.

OK, so far so good. But now what I do not get is the next comment: that as i goes to infinity, the chi-squared statistic approaches the chi-square distribution. First and foremost, how does a statistic, which is a single number, approach a distribution? Does it mean the cumulative distribution? Second (but not as important), is there a relatively short proof of this fact? Or at least a way to see the connection between the formulas? Thanks in advance.
 
Physics news on Phys.org
  • #2
nomadreid said:
First and foremost, how does a statistic, which is a single number, approach a distribution?

A statistic T isn't a single number, it is a function of a (usually vectorial) random variable T=f(Y). That means that T itself is a random variable. Its realizations ##t_j## in repetitions of an experiment follow a distribution function.
In case of the chi-square statistic, the realization y of Y is the vector ##(O_i)^T##.
If you repeat the experiment, you will get different realizations ##y_j## and different realizations of the statistic ##t_j##.
When the dimension of the vector, i.e. the maximal i, goes to infinity, the distribution of ##T## converges in distribution to the chisquare distribution.
See
http://en.wikipedia.org/wiki/Convergence_of_random_variables
 
  • #3
Dr. Du: Thank you, that adequately answers my first question.
Now, if I am lucky, someone will answer my second question.
 
  • #4
Calculate ##p(t)=\int\ldots\int dy_1\ldots dy_n p(y_1)\ldots p(y_n) \delta(t-f(\vec{y}))## and use a saddle point approximation for large n.
 
  • #5
Dr. Du: thanks very much. Makes sense. Enlightening.
 

What is Pearson's chi-square test?

Pearson's chi-square test is a statistical method used to determine if there is a significant relationship between two categorical variables. It compares the observed frequencies of the variables to the expected frequencies if there was no relationship between them.

What is the chi-squared distribution?

The chi-squared distribution is a probability distribution that is used in conjunction with Pearson's chi-square test. It is a theoretical distribution that describes the expected frequencies of a categorical variable if there is no relationship with another categorical variable.

What is the difference between Pearson's chi-square test and the chi-squared distribution?

Pearson's chi-square test uses the chi-squared distribution as a reference to determine if there is a significant relationship between two categorical variables. The test compares the observed and expected frequencies, while the chi-squared distribution is a theoretical distribution that represents the expected frequencies if there was no relationship between the variables.

When should I use Pearson's chi-square test?

Pearson's chi-square test is used when you want to determine if there is a significant relationship between two categorical variables. It is commonly used in research studies to analyze data from surveys or experiments with two or more categorical variables.

What are the assumptions of Pearson's chi-square test?

There are three main assumptions of Pearson's chi-square test: 1) the data is categorical, 2) the observations are independent, and 3) the expected frequency for each cell is at least 5. If these assumptions are not met, the results of the test may not be reliable.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
804
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
994
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
438
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
898
Back
Top