Can a Chi2 test be used on uniform p test results?

Paul Uszak
Messages
84
Reaction score
7
Not sure that I've phrased the question correctly. If you have a series of p values from a series of tests, and they're all meant to be uniformly distributed, why do you have to do a KS test on that, and not another Chi-squared test?

The following is an extract from a test program's output:-
Test no. 1 p-value .886973
Test no. 2 p-value .473563
Test no. 3 p-value .358962
Test no. 4 p-value .894858
Test no. 5 p-value .767457
Test no. 6 p-value .583446
Test no. 7 p-value .227626
Test no. 8 p-value .765091
Test no. 9 p-value .298747
Test no. 10 p-value .108371
Results of the OSUM test for pu256.bin
KSTEST on the above 10 p-values: .059581

The p-values are meant to be uniformly distributed across 0.0 to 1.0. This implies that they should all be 0.5ish. Why doesn't the program (Diehard randomness tester) perform a Chi-squared test on the ps? This happens several times in the complete report, so I take it to be deliberate. It's always a KS test on uniformly distributed ps. Isn't the Chi-squared test numerically simpler too?
 
Physics news on Phys.org
Paul Uszak said:
The p-values are meant to be uniformly distributed across 0.0 to 1.0. This implies that they should all be 0.5ish.
This is a contradiction. If the values of p are uniformly distributed, they should not be clustered around 0.5.
Why doesn't the program (Diehard randomness tester) perform a Chi-squared test on the ps? This happens several times in the complete report, so I take it to be deliberate. It's always a KS test on uniformly distributed ps. Isn't the Chi-squared test numerically simpler too?
Here is my 2 cents. For Chi-squared, you have to divide the sample data into bins. That loses a lot of information and you do not have a lot of data (10 values of p). Since KS looks at the maximum difference between the cumulative distribution of the sample versus the theoretical cumulative distribution, it is more powerful for small data sets.
 
  • Like
Likes Paul Uszak
Hmm, sound like I'm confusing discrete and continuous distributions... Again. I'd made the assumption that if the p-values were uniformly distributed 0 - 1, then the expected p-values would be 0.5, hence a Chi-squared test could be attempted. As I write my assumption, I can see that it's logically flawed.
 
Chi-Test tests that variables fit a chi-distribution which is the sum of the squares of independent standard normal variables. The K-s test makes no assumptions of an underlying distribution so for uniform variables it is a better choice.
 
Paul Uszak said:
Hmm, sound like I'm confusing discrete and continuous distributions... Again. I'd made the assumption that if the p-values were uniformly distributed 0 - 1, then the expected p-values would be 0.5, hence a Chi-squared test could be attempted. As I write my assumption, I can see that it's logically flawed.
You are right that the expected value of p, uniform in [0,1], would be 0.5. But that does not mean that the values will cluster around 0.5. In fact they should be randomly spread between 0 and 1. As an extreme example, suppose you can have a random variable X which is 0 half the time and 1 the other half. Then its expected value is 0.5, but it is always at 0 or 1.

PS: There can be more clustering in uniform random than you would expect, but there is no reason for the clustering to prefer 0.5.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top