Kolmogorv-Smirnov goodness of fit test

  • Thread starter big man
  • Start date
  • Tags
    Fit Test
In summary, the conversation is about testing whether a distribution follows a normal distribution. The person has used the kolmogorov-smirnov goodness-of-fit test and found that the null hypothesis has been rejected when using 13 million data points, but accepted when using only 1000 data points. They also mention the distribution being leptokurtic and discuss the possibility of it being comparable to a normal distribution. They ask for recommendations and mention using Hipparcos photometry data to estimate the number of occultations observed.
  • #1
big man
254
1

Homework Statement


I am needing to identify whether or not my distribution follows a normal distribution. Now by eye it kind of looks like it does, but I need to perform the kolmogorov-smirnov goodness-of-fit test to verify this. Below is a picture of my dataset with a normal curve fitted to it (red line is the normal curve).

http://img79.imageshack.us/img79/6730/statsis9.jpg [Broken]


The Attempt at a Solution


So anyway to test this I was using the "kstest" function in Matlab. I essentially have 13 million data points and when I test this I get H=1, which means that the null hypothesis (that the distribution DOES follow a normal distribution) has been rejected. However, when I only use 1000 data points it returns the value of H=0, which means that the null hypothesis has been accepted.

I was just wondering if anyone knew why this would be so and if you maybe had any recommendations on what I should do?

Appreciate any advice.

Thanks.
 
Last edited by a moderator:
Physics news on Phys.org
  • #3
Thanks for that Astronuc. That was quite interesting. What I'm doing is I'm using the Hipparcos photometry data to estimate the possible number of occultations that were observed throughout the mission. From the photon statistics theory that I've read so far I am meant to have a normal distribution (apparently the possoin distribution is approximately normal in this case), or the data should at least be comparable to a normal distribution.

I will have to do a test to verify this, but from the looks of it my distribution has a positive kurtosis that makes it similar to the logistic distribution. So I mean this data couldn't really be classed as comparable to a normal distribution can it?
 

1. What is the Kolmogorov-Smirnov goodness of fit test?

The Kolmogorov-Smirnov goodness of fit test is a statistical test used to determine whether a sample of data follows a specific probability distribution. It compares the empirical cumulative distribution function (CDF) of the sample to the theoretical CDF of the distribution being tested.

2. When should the Kolmogorov-Smirnov test be used?

The Kolmogorov-Smirnov test is commonly used when the underlying distribution of a dataset is unknown or when comparing a sample to a theoretical distribution. It can also be used to compare two samples to see if they are drawn from the same distribution.

3. How does the Kolmogorov-Smirnov test work?

The test calculates the maximum difference (D) between the empirical CDF and the theoretical CDF. The larger the D value, the less likely the sample is to have come from the theoretical distribution. The test then compares the D value to a critical value from a table based on the sample size and significance level.

4. What are the assumptions of the Kolmogorov-Smirnov test?

The main assumptions of the Kolmogorov-Smirnov test are that the sample is independent, the sample size is large enough, and that the data is continuous. It also assumes that the sample is drawn from a single population or that the two samples being compared are from the same population.

5. What are the advantages of using the Kolmogorov-Smirnov test?

The Kolmogorov-Smirnov test is a non-parametric test, meaning it does not make assumptions about the underlying distribution of the data. It is also relatively easy to calculate and can be used for both continuous and discrete data. Additionally, it is a powerful test and can detect differences in the shape, location, and scale of two distributions.

Similar threads

  • Calculus and Beyond Homework Help
Replies
1
Views
2K
  • Calculus and Beyond Homework Help
Replies
9
Views
9K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
338
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
962
  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
4
Views
2K
Replies
1
Views
845
Back
Top