Challenge Micromass' big statistics challenge

Click For Summary
The discussion centers on a statistics challenge involving various probability problems, including train passenger estimates, coin toss sequences, fish population estimation, and distinguishing between real and generated text. Participants are tasked with providing not only answers but also detailed strategies and reasoning for their approaches, emphasizing the importance of statistical models. The thread encourages the use of outside sources for reference while prohibiting direct searches for specific answers. The goal is to foster a collaborative environment for exploring statistical concepts and methodologies. Engaging with these problems enhances understanding of probability theory and its applications in real-world scenarios.
  • #101
mfb said:
The random generation doesn't keep track of that by definition
I never said "by definition".
I said "one of the following things may happen...", implying that a human may or may not keep track of the current count of H's: if he does not, he might ideally get caught by a properly designed statistical test (not mine), if he actually does, then he may still not pass Test #2.

mfb said:
It will always be higher, unless we happen to have exactly as many T as H
This is probably a good point. Perhaps the test I proposed to check whether the string is produced by a biased coin flip experiment is not good (besides I have probably made a mistake in deriving the final formula). I have just noticed that some users were previously working on interesting ideas to achieve the same task. Maybe they will come up with a better solution than mine.

mfb said:
A randomly generated string will have correlations
I think this statement, in its current form, is incorrect, but I probably understand what you meant in the context: if we have a string z1...zN we can already say something about the pair (zN, zN+1). On the other hand, I think we cannot say anything about (zN+1, zN+2). I guess that this just implies that when we populate the 2x2 contingency table we should not consider "all the consecutive pairs" of the string, as I previously said, but rather all the disjoint consecutive pairs.
Unfortunately, if I do that, then the p-values are well above the threshold for both strings, so the test does not work.
 
Last edited:
Physics news on Phys.org
  • #102
mnb96 said:
I never said "by definition".
I said it.
mnb96 said:
implying that a human may or may not keep track of the current count of H's: if he does not, he might ideally get caught by a properly designed statistical test
That is wrong.
The random sequence does not keep track. Why should a human have to keep track?
mnb96 said:
This is probably a good point. Perhaps the test I proposed to check whether the string is produced by a biased coin flip experiment is not good (besides I have probably made a mistake in deriving the final formula). I have just noticed that some users were previously working on interesting ideas to achieve the same task. Maybe they will come up with a better solution than mine.
We had simple hypothesis testing already: how likely is it to get a larger deviation from 50% than observed?
The probability to observe 91 or fewer T or H in 199 trials (like sequence 1) is 0.25, the probability to observe 94 or fewer T or H in 199 trials (like sequence 2) is 0.48. Not really a strong preference for one sequence here.

mnb96 said:
I think this statement, in its current form, is incorrect, but I probably understand what you meant in the context: if we have a string z1...zN we can already say something about the pair (zN, zN+1). On the other hand, I think we cannot say anything about (zN+1, zN+2)
Well, we know 8 out of 16 things the pair cannot be.
mnb96 said:
we should not consider "all the consecutive pairs" of the string, as I previously said, but rather all the disjoint consecutive pairs.
We can do that, but that discards a lot of information.

Check the previous pages, there was already a lot of analysis in that direction.