Challenge Micromass' big statistics challenge

  • #101
mfb said:
The random generation doesn't keep track of that by definition
I never said "by definition".
I said "one of the following things may happen...", implying that a human may or may not keep track of the current count of H's: if he does not, he might ideally get caught by a properly designed statistical test (not mine), if he actually does, then he may still not pass Test #2.

mfb said:
It will always be higher, unless we happen to have exactly as many T as H
This is probably a good point. Perhaps the test I proposed to check whether the string is produced by a biased coin flip experiment is not good (besides I have probably made a mistake in deriving the final formula). I have just noticed that some users were previously working on interesting ideas to achieve the same task. Maybe they will come up with a better solution than mine.

mfb said:
A randomly generated string will have correlations
I think this statement, in its current form, is incorrect, but I probably understand what you meant in the context: if we have a string z1...zN we can already say something about the pair (zN, zN+1). On the other hand, I think we cannot say anything about (zN+1, zN+2). I guess that this just implies that when we populate the 2x2 contingency table we should not consider "all the consecutive pairs" of the string, as I previously said, but rather all the disjoint consecutive pairs.
Unfortunately, if I do that, then the p-values are well above the threshold for both strings, so the test does not work.
 
Last edited:
Physics news on Phys.org
  • #102
mnb96 said:
I never said "by definition".
I said it.
mnb96 said:
implying that a human may or may not keep track of the current count of H's: if he does not, he might ideally get caught by a properly designed statistical test
That is wrong.
The random sequence does not keep track. Why should a human have to keep track?
mnb96 said:
This is probably a good point. Perhaps the test I proposed to check whether the string is produced by a biased coin flip experiment is not good (besides I have probably made a mistake in deriving the final formula). I have just noticed that some users were previously working on interesting ideas to achieve the same task. Maybe they will come up with a better solution than mine.
We had simple hypothesis testing already: how likely is it to get a larger deviation from 50% than observed?
The probability to observe 91 or fewer T or H in 199 trials (like sequence 1) is 0.25, the probability to observe 94 or fewer T or H in 199 trials (like sequence 2) is 0.48. Not really a strong preference for one sequence here.

mnb96 said:
I think this statement, in its current form, is incorrect, but I probably understand what you meant in the context: if we have a string z1...zN we can already say something about the pair (zN, zN+1). On the other hand, I think we cannot say anything about (zN+1, zN+2)
Well, we know 8 out of 16 things the pair cannot be.
mnb96 said:
we should not consider "all the consecutive pairs" of the string, as I previously said, but rather all the disjoint consecutive pairs.
We can do that, but that discards a lot of information.

Check the previous pages, there was already a lot of analysis in that direction.
 
Back
Top