Question on Pearson's Chi-squared test

  • Context: Graduate 
  • Thread starter Thread starter mnb96
  • Start date Start date
  • Tags Tags
    Chi-squared Test
Click For Summary

Discussion Overview

The discussion centers around the interpretation and assumptions of Pearson's Chi-squared test, particularly regarding the relationship between observed and expected values in the context of random variables. Participants explore the implications of standardizing random variables and the conditions under which the Chi-squared test is applicable, touching on concepts from probability distributions such as multinomial, binomial, Poisson, and Gamma distributions.

Discussion Character

  • Exploratory, Technical explanation, Debate/contested

Main Points Raised

  • One participant interprets the Chi-squared formula as evaluating the probability of a sum of squares of standardized random variables and questions the assumption that variance equals expected value for all distributions.
  • Another participant clarifies that the O_i values represent counts of observations within defined cells, suggesting a multinomial distribution for the random variables X_i.
  • A different participant argues that the Chi-squared test can be applied to repeated independent samples of a single random variable, regardless of its distribution, as long as the cells partition the range of the variable.
  • One participant elaborates on the counting nature of X_i, proposing that it follows a binomial distribution when considering a single cell and extends this reasoning to multiple cells, suggesting a multinomial distribution.

Areas of Agreement / Disagreement

Participants express differing views on the assumptions underlying the Chi-squared test, particularly regarding the distribution of the random variables and the necessity of certain conditions. The discussion remains unresolved with multiple competing interpretations presented.

Contextual Notes

Participants highlight limitations in understanding the assumptions about the distributions of random variables and the implications for the applicability of the Chi-squared test, but do not resolve these issues.

mnb96
Messages
711
Reaction score
5
Hello,

I was trying to interpret the formula of Pearson's Chi-squared test:
\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}

I thought that if we assume that each O_i is an observation of the random variable X_i, then the above formula essentially considers the sum-of-squares of n standardized random variables Y_i=\frac{X_i-\mu_i}{\sigma_i}. In fact, if such random variables are Y_i \sim N(0,1), then the random variable S = \sum_{i=1}^n Y_i^2 follows a \chi^2-distribution. Thus, the formula of the Chi-squared test would essentially evaluate the probability \mathrm{P}\left( S = \chi^2 \right), and of course compare it to some chosen P-value.

My question is about the standardization of the random variables X_i.
If my interpretation above is correct, then Pearson's Chi-squared test somehow assumes that each random variable X_i has variance equal to its expected value, that is: \sigma_i^2 = \mu_i

Why so?
Can anybody explain why we would need to assume that variance and expected values are numerically equal? That condition is satisfied only for some distributions like Poisson and Gamma (with \theta=1). Why such a restriction?
 
Last edited:
Physics news on Phys.org
mnb96 said:
if we assume that each O_i is an observation of the random variable X_i

The O_i are supposed to be a count of how many observations of a random variable fall within a "cell". How are you are defining the ith cell?
 
Stephen Tashi said:
The O_i are supposed to be a count of how many observations of a random variable fall within a "cell".

I see! That is an important observation. It probably means that the random variables X_i are supposed to follow a multinomial distribution.

For instance, if we have only one cell, then X_1 could be the amount of successes out of m independent trials of some experiment. Thus, X_1 would follow a binomial distribution, which in fact approaches a Poisson distribution for m large, and which has \sigma^2=\mu=\lambda.

If the above reasoning is correct, then Pearson's Chi-squared test should work only when the number of trials is sufficiently large.
 
mnb96 said:
It probably means that the random variables X_i are supposed to follow a multinomial distribution.

I'm not sure what you mean by that statement.

The test can be applied to repeated independent samples of a single random variable. The single random variable can have any distribution. It is only necessary to define the cells so that they partition the range of the random variable.
 
  • Like
Likes   Reactions: 1 person
Hi Stephen, and thanks for your help!

What I meant, is that X_i is a random variable that "counts" the number of observations that happened to fall into the i-th cell. For instance, if we consider a continuous random variable Z having some unknown probability density function, and we partition the real line into two cells corresponding to the events: Z\geq 10 (=success) and Z< 10 (=failure), then the two events will have probabilities p and (1-p).

We can sample the random variable Z many times, say n times.
Now, X_1 is the random variable that keeps the total counts of successes, thus X_1 follows a binomial distribution, i.e. X_1\sim B(n,p).

I thought that if we extend this reasoning to k cells, then the vector of random variables (X_1,\ldots,X_k) should follow a multinomial distribution, i.e. (X_1,\ldots,X_k) \sim M(n;p_1,\ldots,p_k).

Or am I misunderstanding something?
 

Similar threads

  • · Replies 23 ·
Replies
23
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 4 ·
Replies
4
Views
3K