Kolmogorv-Smirnov goodness of fit test

  • Thread starter Thread starter big man
  • Start date Start date
  • Tags Tags
    Fit Test
Click For Summary
SUMMARY

The discussion centers on the application of the Kolmogorov-Smirnov goodness-of-fit test using the "kstest" function in MATLAB to determine if a dataset follows a normal distribution. The user analyzed a dataset with 13 million data points, which resulted in rejecting the null hypothesis (H=1), indicating the distribution does not follow a normal distribution. In contrast, a smaller sample of 1,000 data points accepted the null hypothesis (H=0). The user concluded that the distribution exhibits leptokurtic characteristics, suggesting it is not comparable to a normal distribution.

PREREQUISITES
  • Understanding of the Kolmogorov-Smirnov goodness-of-fit test
  • Familiarity with MATLAB and the "kstest" function
  • Knowledge of statistical concepts such as kurtosis
  • Basic understanding of normal and leptokurtic distributions
NEXT STEPS
  • Research the implications of sample size on statistical tests, particularly the Kolmogorov-Smirnov test
  • Learn about alternative goodness-of-fit tests, such as the Anderson-Darling test
  • Explore the characteristics and applications of leptokurtic distributions
  • Investigate the relationship between Poisson and normal distributions in statistical analysis
USEFUL FOR

Statisticians, data analysts, and researchers working with large datasets who need to assess distribution characteristics and validate statistical assumptions.

big man
Messages
242
Reaction score
0

Homework Statement


I am needing to identify whether or not my distribution follows a normal distribution. Now by eye it kind of looks like it does, but I need to perform the kolmogorov-smirnov goodness-of-fit test to verify this. Below is a picture of my dataset with a normal curve fitted to it (red line is the normal curve).

http://img79.imageshack.us/img79/6730/statsis9.jpg


The Attempt at a Solution


So anyway to test this I was using the "kstest" function in Matlab. I essentially have 13 million data points and when I test this I get H=1, which means that the null hypothesis (that the distribution DOES follow a normal distribution) has been rejected. However, when I only use 1000 data points it returns the value of H=0, which means that the null hypothesis has been accepted.

I was just wondering if anyone knew why this would be so and if you maybe had any recommendations on what I should do?

Appreciate any advice.

Thanks.
 
Last edited by a moderator:
Physics news on Phys.org
Thanks for that Astronuc. That was quite interesting. What I'm doing is I'm using the Hipparcos photometry data to estimate the possible number of occultations that were observed throughout the mission. From the photon statistics theory that I've read so far I am meant to have a normal distribution (apparently the possoin distribution is approximately normal in this case), or the data should at least be comparable to a normal distribution.

I will have to do a test to verify this, but from the looks of it my distribution has a positive kurtosis that makes it similar to the logistic distribution. So I mean this data couldn't really be classed as comparable to a normal distribution can it?
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 9 ·
Replies
9
Views
11K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
4
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 9 ·
Replies
9
Views
4K