Question about Chi-Square Test Regarding Normal Distribution

Click For Summary

Discussion Overview

The discussion revolves around the application of the Chi-Square test in relation to normal distribution, specifically addressing how different data groupings can affect the outcome of the hypothesis test. Participants explore the implications of bin size and sample size on the results of statistical tests.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant notes that different groupings of data can lead to different conclusions regarding the null hypothesis, questioning whether this indicates a mistake in their calculations.
  • Another participant suggests that the Chi-Square test is sensitive to bin size, recommending the Shapiro-Wilk test as an alternative for assessing normality.
  • A participant points out that a sample size of 50 may not be sufficient to confirm a distribution, indicating potential issues with statistical power.
  • It is mentioned that, given the finite variance of height, the Central Limit Theorem (CLT) could be invoked to assume normality with a sample size of 50.
  • One participant agrees that different results are possible due to the level of detail in the first grouping and notes the impact of degrees of freedom on the Chi-Squared distribution.

Areas of Agreement / Disagreement

Participants generally agree that different data groupings can yield different results in hypothesis testing, but there is no consensus on the implications of this variability or the adequacy of the sample size.

Contextual Notes

Limitations include the potential impact of bin size on the Chi-Square test results, the small sample size of 50 events, and the assumptions underlying the application of the Central Limit Theorem.

Who May Find This Useful

This discussion may be useful for statisticians, researchers conducting hypothesis testing, and students learning about statistical methods and their applications in analyzing data distributions.

songoku
Messages
2,509
Reaction score
393
TL;DR
Let say I have 50 raw data of height of students. I want to do goodness of fit test to check whether normal distribution is appropriate model for the data at a certain significance level
The first step is to group the data and make a table so I can get the observed frequency for each data interval. I did two different groupings (something like 150 - 160 , 160 - 170 , etc and the other is 150 - 170, 170 - 190, etc) and found out that the conclusion of the hypothesis is different, one resulting in accepting null hypothesis and the other rejecting the null hypothesis.

Is it possible different grouping resulting in different conclusion? Or there should be mistake in my working?

Thanks
 
Physics news on Phys.org
You suffer from low statistics -- 50 events isn't much to confirm a distribution.
 
  • Like
Likes   Reactions: songoku
In reality, as everyone knows the height of individuals has finite variance, you can just rely on the CLT with n=50 to assume normality
 
  • Like
Likes   Reactions: songoku
It certainly is possible to get different results. Your first grouping would show more detail than your second grouping. It would also have twice the degrees of freedom, so the Chi-Squared distribution is different.
 
  • Like
Likes   Reactions: songoku
Thank you very much for the help and explanation BWV, BvU, FactChecker
 
  • Like
Likes   Reactions: BvU

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
Replies
20
Views
2K
  • · Replies 23 ·
Replies
23
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 9 ·
Replies
9
Views
2K
Replies
1
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K