Kolmogorov-Smirnov and P-value

Nyasha · Jul 5, 2012

So l am testing a sensor l recently developed. I am changing parts on the sensor and testing to see if my data is reproducible with different parts. I compare the data using the KS test. However, the data always have the same distribution or trend line. To test if the two data sets are the same i.e the variations aren't due to random error. What must be my p-value ? I thought that 0.05 was too small since they had the same waveforms, so would using a p-value of let us say .5 help me determine if difference in the data sets are due to something wrong ?

Stephen Tashi · Jul 5, 2012

Nyasha said:

so would using a p-value of let us say .5 help me determine if difference in the data sets are due to something wrong ?

Statistical testing with p-values is a subjective procedure, not an objective procedure. The p-value doesn't quantify the probability that the hypothesis being tested is true and it doesn't quantify the probability that the hypothesis being tested is false. I you want to know a good p-value for a given situation, you must either have empirical experience is applying the test to similar situations or you must have some model for how the data is generated when the null hypothesis is false and use that model to evaluate (analytically or by Monte-Carlo simulation) how well the p-value detects the situation.

If you use a p-value of .5 and the distributions are identical, you will increase the probability that the test will conclude that the distributions are different. Will this be helpful to you? (Statistical tests using p-values are not proofs. In some academic circles, certain tests and p-values are accepted as a standard of evidence for publishing a result. This is a cultural matter, not a consequence of mathematical deductions.)

Nyasha · Jul 5, 2012

Stephen Tashi said:

Statistical testing with p-values is a subjective procedure, not an objective procedure. The p-value doesn't quantify the probability that the hypothesis being tested is true and it doesn't quantify the probability that the hypothesis being tested is false. I you want to know a good p-value for a given situation, you must either have empirical experience is applying the test to similar situations or you must have some model for how the data is generated when the null hypothesis is false and use that model to evaluate (analytically or by Monte-Carlo simulation) how well the p-value detects the situation.

If you use a p-value of .5 and the distributions are identical, you will increase the probability that the test will conclude that the distributions are different. Will this be helpful to you? (Statistical tests using p-values are not proofs. In some academic circles, certain tests and p-values are accepted as a standard of evidence for publishing a result. This is a cultural matter, not a consequence of mathematical deductions.)

I just want to know if the data sets are equal, l know they have the same distribution, but l want to know if the difference in the two data sets are statistically significant or they are a cause of concern. Thanks for your insight.

Number Nine · Jul 5, 2012

Nyasha said:

I just want to know if the data sets are equal, l know they have the same distribution, but l want to know if the difference in the two data sets are statistically significant or they are a cause of concern. Thanks for your insight.

"Statistically significant" means "the p-value has passed an arbitrary threshold", so decide on a threshold that seems reasonable to you. Mind you, the p-value doesn't actually give you the probability that the two data sets are equal, it assumes that the two data sets are equal and gives you the probability that such a difference would occur by random chance, in the completely hypothetical scenario that the two data sets did actually happen to be equal. One of the unfortunate drawbacks of dealing with null-hypothesis testing is that the p-value doesn't actually tell you what you want to know, and rarely does it provide useful information on its own.

Nyasha · Jul 5, 2012

Number Nine said:

"Statistically significant" means "the p-value has passed an arbitrary threshold", so decide on a threshold that seems reasonable to you. Mind you, the p-value doesn't actually give you the probability that the two data sets are equal, it assumes that the two data sets are equal and gives you the probability that such a difference would occur by random chance, in the completely hypothetical scenario that the two data sets did actually happen to be equal. One of the unfortunate drawbacks of dealing with null-hypothesis testing is that the p-value doesn't actually tell you what you want to know, and rarely does it provide useful information on its own.

So what is the best way to answer my question as to are the two data sets equal.

Number Nine · Jul 5, 2012

Nyasha said:

So what is the best way to answer my question as to are the two data sets equal.

That question has no answer. What are you measuring? What is the prior probability of the data sets being equal? Statistics is hard; there are no tests that you can run that will give you your answer.

Nyasha · Jul 5, 2012

Number Nine said:

That question has no answer. What are you measuring? What is the prior probability of the data sets being equal? Statistics is hard; there are no tests that you can run that will give you your answer.

I am measuring the # of counts in a sensor that is being hit by a light source which follows the Poisson distribution in the variations

Stephen Tashi · Jul 5, 2012

Nyasha said:

I am measuring the # of counts in a sensor that is being hit by a light source which follows the Poisson distribution in the variations

You aren't using language precisely. If you know that two sets of data are both fom Poission distributions, you should not say that they are from "the same distribution" unless you mean that they are both form the same poission distribution - i.e. the parameter of the distribution is the same for the processes that generated the two sets of data. Perhaps what you are tyring to ask is how to test whether two possion distributions have the same parameter.

Likewise to say "i want to know" is an imprecise statement when are dealing with situations involving probability. State a realistic goal. If you think that statistical tests allow you know things (with certainty), you are mistaken.

Nyasha · Jul 5, 2012

Stephen Tashi said:

You aren't using language precisely. If you know that two sets of data are both fom Poission distributions, you should not say that they are from "the same distribution" unless you mean that they are both form the same poission distribution - i.e. the parameter of the distribution is the same for the processes that generated the two sets of data. Perhaps what you are tyring to ask is how to test whether two possion distributions have the same parameter.

Likewise to say "i want to know" is an imprecise statement when are dealing with situations involving probability. State a realistic goal. If you think that statistical tests allow you know things (with certainty), you are mistaken.

Yes, it is the same. The data is generated from a radioactive source which is constant. The parameter of the distribution is one.

Stephen Tashi · Jul 6, 2012

Nyasha said:

Yes, it is the same. The data is generated from a radioactive source which is constant. The parameter of the distribution is one.

Then what do you mean by saying that you want to "know" if "the two data sets are equal"? Two sets are equal if and only if they contain the same elements. In this case the elements are numbers.

chiro · Jul 6, 2012

So do you want to do a hypothesis test that says that the two parameters are the same (fail to reject that they are different under some statistical significance) given the assumptions they come from the same distribution (I'm guessing Poisson or exponential by the context of the current discussion)?

Nyasha · Jul 6, 2012

Stephen Tashi said:

Then what do you mean by saying that you want to "know" if "the two data sets are equal"? Two sets are equal if and only if they contain the same elements. In this case the elements are numbers.

I am changing around parts on the sensor and l want to know if the performance stays the same as l change the parts. In other words, l take measurements on a sensor and then change let us say a transistor, and then take measurements again. I want to compare the data from before l removed the transistor to after l removed it. Using K-S test l want to be able to know if there is any big difference between the two data sets.

Kolmogorov-Smirnov and P-value

1. What is the Kolmogorov-Smirnov test and how does it work?

2. What is the significance of the P-value in the Kolmogorov-Smirnov test?

3. How is the Kolmogorov-Smirnov test different from other statistical tests?

4. Can the Kolmogorov-Smirnov test be used for small sample sizes?

5. How do you interpret the results of a Kolmogorov-Smirnov test?

Similar threads

Hot Threads

Recent Insights