Kolmogorov smirnov in r - cannot compute correct p-values with ties

  • Thread starter joanne34567
  • Start date
  • Tags
    Kolmogorov
In summary, the conversation is about using the Kolmogorov-Smirnov test in R to compare two distributions. The speaker is getting an error due to ties in the dataset and is wondering if this affects the validity of the p-value. They suggest testing the sensitivity of the p-value by adding random perturbations to the data.
  • #1
joanne34567
12
0
Hi,
I'm trying to use the kolmogorov smirnov test in in R to compare one distribution with another. I'm getting the following error: cannot compute correct p-values with ties
I think this is because the dataset that I am using has in the first instance around 2000 values, and the second, 50000 values. A number of these are inherently going to be replicates of another number in the series (i.e. "ties"). I'm just wondering if this has implcations for the p value I have? Is the p value valid?
Cheers all
 
Physics news on Phys.org
  • #2
Did you read the R documentation for the ks.test function?

You could test the sensitivity of the p-value empirically. Add small random perturbations to your data to generate new data sets that have more digits. See how much the p value changes.
 

1. What is Kolmogorov-Smirnov in R?

Kolmogorov-Smirnov is a statistical test used to determine if a sample follows a specific distribution. In R, the function ks.test() is used to perform this test.

2. How does the Kolmogorov-Smirnov test work?

The Kolmogorov-Smirnov test compares the empirical distribution function (EDF) of the sample to the theoretical cumulative distribution function (CDF) of the specified distribution. The test statistic is the maximum absolute difference between the two functions.

3. What does it mean when the p-value cannot be computed in Kolmogorov-Smirnov?

If the p-value cannot be computed, it means that the test was unable to find a significant difference between the sample and the specified distribution. This could be due to a small sample size or a poor fit of the specified distribution to the data.

4. Why do p-values in Kolmogorov-Smirnov sometimes give incorrect results with ties?

Kolmogorov-Smirnov assumes that the data is continuous and does not take into account ties (when two or more data points have the same value). This can lead to incorrect p-values if the number of ties is large.

5. How can the issue of ties be addressed in Kolmogorov-Smirnov tests in R?

To address the issue of ties, the function ks.boot() can be used in R to perform a bootstrap version of the Kolmogorov-Smirnov test. This method takes into account ties in the data and provides more accurate p-values.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Mechanical Engineering
Replies
2
Views
894
Replies
1
Views
934
  • Programming and Computer Science
Replies
11
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
6K
  • Quantum Physics
Replies
2
Views
1K
  • Special and General Relativity
Replies
5
Views
1K
  • Math Proof Training and Practice
4
Replies
105
Views
12K
Back
Top