Can a Chi2 test be used on uniform p test results?

Click For Summary

Discussion Overview

The discussion revolves around the appropriateness of using a Chi-squared test versus a Kolmogorov-Smirnov (KS) test for analyzing a series of p-values that are expected to be uniformly distributed. Participants explore the implications of uniform distribution and the characteristics of the tests in question.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • One participant questions the necessity of a KS test for uniformly distributed p-values, suggesting that a Chi-squared test could be more straightforward.
  • Another participant points out a contradiction in the assumption that uniformly distributed p-values should cluster around 0.5, clarifying that uniform distribution implies randomness across the entire range of 0 to 1.
  • A participant argues that the Chi-squared test requires binning of data, which may lead to loss of information, especially with a small sample size of 10 p-values, making the KS test a more powerful alternative.
  • One participant acknowledges confusion between discrete and continuous distributions, reflecting on the flawed assumption that uniform p-values would lead to expected values clustering around 0.5.
  • Another participant reiterates that while the expected value of a uniform distribution is 0.5, the actual values should be randomly distributed between 0 and 1, and clustering around 0.5 is not a necessary outcome.

Areas of Agreement / Disagreement

Participants express differing views on the suitability of Chi-squared versus KS tests for the analysis of p-values. There is no consensus on which test is definitively better, and the discussion remains unresolved regarding the best approach.

Contextual Notes

Participants highlight limitations related to assumptions about distribution and the implications of sample size on the choice of statistical tests. The discussion reflects an ongoing exploration of these concepts without definitive resolutions.

Paul Uszak
Messages
84
Reaction score
7
Not sure that I've phrased the question correctly. If you have a series of p values from a series of tests, and they're all meant to be uniformly distributed, why do you have to do a KS test on that, and not another Chi-squared test?

The following is an extract from a test program's output:-
Test no. 1 p-value .886973
Test no. 2 p-value .473563
Test no. 3 p-value .358962
Test no. 4 p-value .894858
Test no. 5 p-value .767457
Test no. 6 p-value .583446
Test no. 7 p-value .227626
Test no. 8 p-value .765091
Test no. 9 p-value .298747
Test no. 10 p-value .108371
Results of the OSUM test for pu256.bin
KSTEST on the above 10 p-values: .059581

The p-values are meant to be uniformly distributed across 0.0 to 1.0. This implies that they should all be 0.5ish. Why doesn't the program (Diehard randomness tester) perform a Chi-squared test on the ps? This happens several times in the complete report, so I take it to be deliberate. It's always a KS test on uniformly distributed ps. Isn't the Chi-squared test numerically simpler too?
 
Physics news on Phys.org
Paul Uszak said:
The p-values are meant to be uniformly distributed across 0.0 to 1.0. This implies that they should all be 0.5ish.
This is a contradiction. If the values of p are uniformly distributed, they should not be clustered around 0.5.
Why doesn't the program (Diehard randomness tester) perform a Chi-squared test on the ps? This happens several times in the complete report, so I take it to be deliberate. It's always a KS test on uniformly distributed ps. Isn't the Chi-squared test numerically simpler too?
Here is my 2 cents. For Chi-squared, you have to divide the sample data into bins. That loses a lot of information and you do not have a lot of data (10 values of p). Since KS looks at the maximum difference between the cumulative distribution of the sample versus the theoretical cumulative distribution, it is more powerful for small data sets.
 
  • Like
Likes   Reactions: Paul Uszak
Hmm, sound like I'm confusing discrete and continuous distributions... Again. I'd made the assumption that if the p-values were uniformly distributed 0 - 1, then the expected p-values would be 0.5, hence a Chi-squared test could be attempted. As I write my assumption, I can see that it's logically flawed.
 
Chi-Test tests that variables fit a chi-distribution which is the sum of the squares of independent standard normal variables. The K-s test makes no assumptions of an underlying distribution so for uniform variables it is a better choice.
 
Paul Uszak said:
Hmm, sound like I'm confusing discrete and continuous distributions... Again. I'd made the assumption that if the p-values were uniformly distributed 0 - 1, then the expected p-values would be 0.5, hence a Chi-squared test could be attempted. As I write my assumption, I can see that it's logically flawed.
You are right that the expected value of p, uniform in [0,1], would be 0.5. But that does not mean that the values will cluster around 0.5. In fact they should be randomly spread between 0 and 1. As an extreme example, suppose you can have a random variable X which is 0 half the time and 1 the other half. Then its expected value is 0.5, but it is always at 0 or 1.

PS: There can be more clustering in uniform random than you would expect, but there is no reason for the clustering to prefer 0.5.
 

Similar threads

  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 2 ·
Replies
2
Views
4K
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
10K