Kolmogorov-Smirnov and P-value

  • Thread starter Nyasha
  • Start date
  • Tags
    P-value
In summary, statistically significant differences between data sets can be detected if a p-value is used that is greater than 0.05.
  • #1
Nyasha
127
0
So l am testing a sensor l recently developed. I am changing parts on the sensor and testing to see if my data is reproducible with different parts. I compare the data using the KS test. However, the data always have the same distribution or trend line. To test if the two data sets are the same i.e the variations aren't due to random error. What must be my p-value ? I thought that 0.05 was too small since they had the same waveforms, so would using a p-value of let us say .5 help me determine if difference in the data sets are due to something wrong ?
 
Physics news on Phys.org
  • #2
Nyasha said:
so would using a p-value of let us say .5 help me determine if difference in the data sets are due to something wrong ?

Statistical testing with p-values is a subjective procedure, not an objective procedure. The p-value doesn't quantify the probability that the hypothesis being tested is true and it doesn't quantify the probability that the hypothesis being tested is false. I you want to know a good p-value for a given situation, you must either have empirical experience is applying the test to similar situations or you must have some model for how the data is generated when the null hypothesis is false and use that model to evaluate (analytically or by Monte-Carlo simulation) how well the p-value detects the situation.

If you use a p-value of .5 and the distributions are identical, you will increase the probability that the test will conclude that the distributions are different. Will this be helpful to you? (Statistical tests using p-values are not proofs. In some academic circles, certain tests and p-values are accepted as a standard of evidence for publishing a result. This is a cultural matter, not a consequence of mathematical deductions.)
 
  • #3
Stephen Tashi said:
Statistical testing with p-values is a subjective procedure, not an objective procedure. The p-value doesn't quantify the probability that the hypothesis being tested is true and it doesn't quantify the probability that the hypothesis being tested is false. I you want to know a good p-value for a given situation, you must either have empirical experience is applying the test to similar situations or you must have some model for how the data is generated when the null hypothesis is false and use that model to evaluate (analytically or by Monte-Carlo simulation) how well the p-value detects the situation.

If you use a p-value of .5 and the distributions are identical, you will increase the probability that the test will conclude that the distributions are different. Will this be helpful to you? (Statistical tests using p-values are not proofs. In some academic circles, certain tests and p-values are accepted as a standard of evidence for publishing a result. This is a cultural matter, not a consequence of mathematical deductions.)

I just want to know if the data sets are equal, l know they have the same distribution, but l want to know if the difference in the two data sets are statistically significant or they are a cause of concern. Thanks for your insight.
 
  • #4
Nyasha said:
I just want to know if the data sets are equal, l know they have the same distribution, but l want to know if the difference in the two data sets are statistically significant or they are a cause of concern. Thanks for your insight.

"Statistically significant" means "the p-value has passed an arbitrary threshold", so decide on a threshold that seems reasonable to you. Mind you, the p-value doesn't actually give you the probability that the two data sets are equal, it assumes that the two data sets are equal and gives you the probability that such a difference would occur by random chance, in the completely hypothetical scenario that the two data sets did actually happen to be equal. One of the unfortunate drawbacks of dealing with null-hypothesis testing is that the p-value doesn't actually tell you what you want to know, and rarely does it provide useful information on its own.
 
  • #5
Number Nine said:
"Statistically significant" means "the p-value has passed an arbitrary threshold", so decide on a threshold that seems reasonable to you. Mind you, the p-value doesn't actually give you the probability that the two data sets are equal, it assumes that the two data sets are equal and gives you the probability that such a difference would occur by random chance, in the completely hypothetical scenario that the two data sets did actually happen to be equal. One of the unfortunate drawbacks of dealing with null-hypothesis testing is that the p-value doesn't actually tell you what you want to know, and rarely does it provide useful information on its own.

So what is the best way to answer my question as to are the two data sets equal.
 
  • #6
Nyasha said:
So what is the best way to answer my question as to are the two data sets equal.

That question has no answer. What are you measuring? What is the prior probability of the data sets being equal? Statistics is hard; there are no tests that you can run that will give you your answer.
 
  • #7
Number Nine said:
That question has no answer. What are you measuring? What is the prior probability of the data sets being equal? Statistics is hard; there are no tests that you can run that will give you your answer.


I am measuring the # of counts in a sensor that is being hit by a light source which follows the Poisson distribution in the variations
 
  • #8
Nyasha said:
I am measuring the # of counts in a sensor that is being hit by a light source which follows the Poisson distribution in the variations

You aren't using language precisely. If you know that two sets of data are both fom Poission distributions, you should not say that they are from "the same distribution" unless you mean that they are both form the same poission distribution - i.e. the parameter of the distribution is the same for the processes that generated the two sets of data. Perhaps what you are tyring to ask is how to test whether two possion distributions have the same parameter.

Likewise to say "i want to know" is an imprecise statement when are dealing with situations involving probability. State a realistic goal. If you think that statistical tests allow you know things (with certainty), you are mistaken.
 
  • #9
Stephen Tashi said:
You aren't using language precisely. If you know that two sets of data are both fom Poission distributions, you should not say that they are from "the same distribution" unless you mean that they are both form the same poission distribution - i.e. the parameter of the distribution is the same for the processes that generated the two sets of data. Perhaps what you are tyring to ask is how to test whether two possion distributions have the same parameter.

Likewise to say "i want to know" is an imprecise statement when are dealing with situations involving probability. State a realistic goal. If you think that statistical tests allow you know things (with certainty), you are mistaken.

Yes, it is the same. The data is generated from a radioactive source which is constant. The parameter of the distribution is one.
 
  • #10
Nyasha said:
Yes, it is the same. The data is generated from a radioactive source which is constant. The parameter of the distribution is one.

Then what do you mean by saying that you want to "know" if "the two data sets are equal"? Two sets are equal if and only if they contain the same elements. In this case the elements are numbers.
 
  • #11
So do you want to do a hypothesis test that says that the two parameters are the same (fail to reject that they are different under some statistical significance) given the assumptions they come from the same distribution (I'm guessing Poisson or exponential by the context of the current discussion)?
 
  • #12
Stephen Tashi said:
Then what do you mean by saying that you want to "know" if "the two data sets are equal"? Two sets are equal if and only if they contain the same elements. In this case the elements are numbers.


I am changing around parts on the sensor and l want to know if the performance stays the same as l change the parts. In other words, l take measurements on a sensor and then change let us say a transistor, and then take measurements again. I want to compare the data from before l removed the transistor to after l removed it. Using K-S test l want to be able to know if there is any big difference between the two data sets.
 

1. What is the Kolmogorov-Smirnov test and how does it work?

The Kolmogorov-Smirnov test is a non-parametric statistical test used to determine whether two datasets have significantly different distributions. It works by calculating the maximum difference between the cumulative distribution functions of the two datasets.

2. What is the significance of the P-value in the Kolmogorov-Smirnov test?

The P-value in the Kolmogorov-Smirnov test represents the probability of obtaining a test statistic at least as extreme as the observed one, assuming that the null hypothesis is true (i.e. the two datasets have the same distribution). A lower P-value indicates stronger evidence against the null hypothesis.

3. How is the Kolmogorov-Smirnov test different from other statistical tests?

The Kolmogorov-Smirnov test is a non-parametric test, meaning it does not make any assumptions about the underlying distribution of the data. This makes it more versatile than parametric tests, which are limited to specific types of distributions. Additionally, the Kolmogorov-Smirnov test can be used to compare datasets of any size, whereas other tests may have restrictions on sample size.

4. Can the Kolmogorov-Smirnov test be used for small sample sizes?

Yes, the Kolmogorov-Smirnov test can be used for small sample sizes. However, it may not be as powerful as other tests specifically designed for small samples, such as the t-test. It is important to consider the sample size and choose an appropriate statistical test for the data.

5. How do you interpret the results of a Kolmogorov-Smirnov test?

If the P-value is above the chosen significance level (usually 0.05), then there is no significant evidence to reject the null hypothesis and it can be concluded that the two datasets have similar distributions. If the P-value is below the significance level, then there is significant evidence to reject the null hypothesis and it can be concluded that the two datasets have different distributions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Back
Top