Micromass' big statistics challenge

micromass · May 20, 2016

If we're having a thread about probability theory, then we must have one on statistics too! The following questions are all very open-ended and thus multiple answers may seem possible. Your goal is to find a strategy to find the answer to the questions. Furthermore, you must provide some kind of reasoning as to why your strategy is a decent one.

For an answer to count, not only the answer must be given but also a detailed strategy. An explanation of why the strategy is plausible must be given. Don't forget the detail the model you're working with and why this model is plausible.
Any use of outside sources is allowed, but do not look up the question directly. For example, it is ok to go check probability books, but it is not allowed to google the exact question.
If you previously encountered this statement and remember the solution, then you cannot participate in this particular statement.
All mathematical methods are allowed.
Please reference every source you use.
What I feel are the best answers will be awarded on this original post. If you think you came up with a better answer, you must prove your answer is better.

Here you go:

SOLVED BY mfbEvery day there is one train between Mordor and Rohan. These are the number of people who want to take the train on one day:

Day 1: 233
Day 2: 231
Day 3: 254
Day 4: 212
Day 5: 202

Find the optimal number of seats in the train.

Take the following two sequences of coin tosses:

Code:

THHHHTTTTHHHHTHHHHHHHHTTTHHTTHHHHHTTTTTTHHTHHTHHHTTTHTTHHHHTHTTHTTTHHTTTTHHHHHHTTTHHTTHHHTHHHHHTTTTTHTTTHHTTHTTHHTTTHHTTTHHTHHTHHTTTTTHHTHHHHHHTHTHTTHTHTTHHHTTHHTHTHHHHHHHHTTHTTHHHTHHTTHTTTTTTHHHTHHH

Code:

THTHTTTHTTTTHTHTTTHTTHHHTHHTHTHTHTTTTHHTTHHTTHHHTHHHTTHHHTTTHHHTHHHHTTTHTHTHHHHTHTTTHHHTHHTHTTTHTHHHTHHHHTTHTHHTHHHTTTHTHHHTHHTTTHHHTTTTHHHTHTHHHHTHTTHHTTTTHTHTHTTHTHHTTHTTTHTTTTHHHHTHTHHHTTHHHHHTHHH

One of these sequences is from an actual coin toss experiment. The other is invented by a human. Find out which of these is which.

SOLVED BY Math_QED I want to estimate the number of fish in a lake. I catch 400 fish and given them all a red dot. I throw them back in the lake. Then I catch 400 fish again. I note that 100 of them have a red dot. How many fish are there in the lake?
SOLVED BY Ygggdrasil, mfb Unstable particles are emitted from a source and decay at a distance ##x##, a real number that has an exponential distribution with characteristic length ##\lambda##. Decay events can be observed only if they occur in a window extending from ##x=1## to ##x=20##. We observe ##6## decays at locations ##\{2,5,12,13,13,16\}##. What is ##\lambda##?
SOLVED BY mrspeedybob, PeroK I have a big box filled with balls. All balls have a number. I draw ##5## balls at random and record their number. They are: ##10##, ##50##, ##104##, ##130##, ##213##. How many balls do you expect to be in the box?
I claim I can tell the difference between coca cola and pepsi cola better than just guessing. Somebody pours 5 cups of pepsi and 5 cups of coca cola and hands them to me. I tell them which cup is which. It turns out I judged correctly 8 of the 10 cups. Do you believe my original claim?
SOLVED BY MarneMath A professor got a ticket twelve times for illegal overnight parking. All twelve tickets were given either Tuesdays or Thursdays. Is it justified for him to rent a garage on these days?
SOLVED BY QuantumQuest In a certain family, four girls take turns at washing dishes. There were four breakages. Three of them were caused by the youngest girl. Is it justified to call her clumsy?

SOLVED BY fresh_42 Given the following encoded text, find out whether this is a real text or randomly generated using some scheme. Attempting to decode the text doesn't count.

Code:

sedwhqjkbqjyedi oek xqlu tusetut jxyi junj jxqj mqi dej fqhj ev jxu fherbuc jxekwx ie oek wuj de feydji jxu vebbemydw yi qd unsuhfj vhec myayfutyq fherqrybyjo yi jxu cuqikhu ev jxu byaubyxeet jxqj qd uludj mybb esskh fherqrybyjo yi gkqdjyvyut qi q dkcruh rujmuud puhe qdt edu mxuhu puhe ydtysqjui ycfeiiyrybyjo qdt edu ydtysqjui suhjqydjo jxu xywxuh jxu fherqrybyjo ev qd uludj jxu cehu suhjqyd mu qhu jxqj jxu uludj mybb esskh q iycfbu unqcfbu yi jxu jeiiydw ev q vqyh kdryqiut seyd iydsu jxu seyd yi kdryqiut jxu jme ekjsecui xuqt qdt jqyb qhu ugkqbbo fherqrbu jxu fherqrybyjo ev xuqt ugkqbi jxu fherqrybyjo ev jqyb iydsu de ejxuh ekjsecu yi feiiyrbu jxu fherqrybyjo yi edu xqbv eh vyvjo fuhsudj ev uyjxuh xuqt eh jqyb yd ejxuh mehti jxu fherqrybyjo ev xuqt yi edu ekj ev jme ekjsecui qdt jxu fherqrybyjo ev jqyb yi qbie edu ekj ev jme ekjsecui jxuiu sedsufji xqlu ruud wylud qd qnyecqjys cqjxucqjysqb vehcqbypqjyed yd fherqrybyjo jxueho iuu fherqrybyjo qnyeci mxysx yi kiut mytubo yd iksx qhuqi ev ijkto qi cqjxucqjysi ijqjyijysi vydqdsu wqcrbydw isyudsu yd fqhjyskbqh fxoiysi qhjyvysyqb ydjubbywudsu cqsxydu buqhdydw secfkjuh isyudsu wqcu jxueho qdt fxybeiefxo je veh unqcfbu thqm ydvuhudsui qrekj jxu unfusjut vhugkudso ev uludji fherqrybyjo jxueho yi qbie kiut je tuishyru jxu kdtuhboydw cusxqdysi qdt huwkbqhyjyui ev secfbun ioijuci

SOLVED BY jbriggs444 A person is playing a game operated by a psychic, an entity presented as somehow being exceptionally skilled at predicting people's actions. It is known that the Psychic predicts people's actions correctly in approximately 99.9% of the cases. The player of the game is presented with two boxes, one transparent (labeled A) and the other opaque (labeled B). The player is permitted to take the contents of both boxes, or just the opaque box B. Box A contains a visible $1,000. The contents of box B, however, are determined as follows: At some point before the start of the game, the Psychic makes a prediction as to whether the player of the game will take just box B, or both boxes. If the Psychic predicts that both boxes will be taken, then box B will contain nothing. If the Psychic predicts that only box B will be taken, then box B will contain $1,000,000.

If the psychic predicts that the player will choose randomly, then box B will contain nothing.

By the time the game begins, and the player is called upon to choose which boxes to take, the prediction has already been made, and the contents of box B have already been determined. That is, box B contains either $0 or $1,000,000 before the game begins, and once the game begins even the Psychic is powerless to change the contents of the boxes. Before the game begins, the player is aware of all the rules of the game, including the two possible contents of box B, the fact that its contents are based on the Psychic's prediction, and knowledge of the Psychic's infallibility. The only information withheld from the player is what prediction the Psychic made, and thus what the contents of box B are.[/COLOR]

Thank you all for participating! I hope many of you have fun with this! Don't hesitate to post any feedback in the thread!

More information:

Invented myself
gato-docs.its.txstate.edu/mathworks/DistributionOfLongestRun.pdf
Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
MacKay "Information Theory, Inference and Learning algorithms" http://www.inference.phy.cam.ac.uk/itila/p0.html
http://www.math.uah.edu/stat/urn/OrderStatistics.html
https://en.wikipedia.org/wiki/Fisher's_exact_test
Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
https://en.wikipedia.org/wiki/Newcomb's_paradox

fresh_42 · May 20, 2016

2. The ##χ^2##-Test on the outcome gave me no informations. Therefore I considered the changes as random variable. For a fair coin the chances are ##50:50## to change or not. That costed only one observation, but ##198## are still fine.
The first sequence resulted in ##χ^2 = 6.5## which corresponds to an almost ##99\%## chance for not being random whereas the second got me ##0.3##, i.e. no statement about randomness by the ##χ^2##-Test can be made. So if I had no cut and paste errors, my answer is: The first sequence is hand made.

member 587159 · May 20, 2016

4) If n is the total amount of fish

400/n must be equal to 100/400, therefor n = 1600. This is because you take 400 fish from the population randomly (uniformly distributed) and then you find 100 fish out of 400 that have been marked (again uniformly distributed). If you give more information about the liability I can give an interval.

fresh_42 · May 20, 2016

9. Just to make sure: You didn't give us a part of the Voynich manuscript, didn't you?

Edit:
I have taken the source text which is of 1342 bytes length and compressed it to 718 bytes.
Then I randomly choose texts of equal length (1342 bytes) from:

New York Times → compressed 816 bytes
Washington Post → compressed 817 bytes
San Francisco Chronicle → compressed 820 bytes
a Danish tourist site → compressed 810 bytes
Helsingin Sanomat (Finnish newspaper) → compressed 804 bytes
El País (Madrid) → compressed 882 bytes

The Spanish have had the longest sentences, too. Although I changed languages and content, the compressed files were all around 820 bytes long, except one which was longer. But none came even near the 718 bytes, so I didn't make any statistical test. My result: The text is likely artificial.

micromass · May 20, 2016

fresh_42 said:

2. The ##χ^2##-Test on the outcome gave me no informations. Therefore I considered the changes as random variable. For a fair coin the chances are ##50:50## to change or not. That costed only one observation, but ##198## are still fine.
The first sequence resulted in ##χ^2 = 6.5## which corresponds to an almost ##99\%## chance for not being random whereas the second got me ##0.3##, i.e. no statement about randomness by the ##χ^2##-Test can be made. So if I had no cut and paste errors, my answer is: The first sequence is hand made.

That's interesting. I'll wait for some further analyses possibly by you or by others.

micromass · May 20, 2016

Math_QED said:

4) If n is the total amount of fish

400/n must be equal to 100/400, therefor n = 1600. This is because you take 400 fish from the population randomly (uniformly distributed) and then you find 100 fish out of 400 that have been marked (again uniformly distributed). If you give more information about the liability I can give an interval.

Very good. I will mark this as the correct answer.

micromass · May 20, 2016

fresh_42 said:

9. Just to make sure: You didn't give us a part of the Voynich manuscript, didn't you?

Edit:
I have taken the source text which is of 1342 bytes length and compressed it to 718 bytes.
Then I randomly choose texts of equal length (1342 bytes) from:

New York Times → compressed 816 bytes

Washington Post → compressed 817 bytes

San Francisco Chronicle → compressed 820 bytes

a Danish tourist site → compressed 810 bytes

Helsingin Sanomat (Finnish newspaper) → compressed 804 bytes

El País (Madrid) → compressed 882 bytes

The Spanish have had the longest sentences, too. Although I changed languages and content, the compressed files were all around 820 bytes long, except one which was longer. But none came even near the 718 bytes, so I didn't make any statistical test. My result: The text is likely artificial.

That's an interesting analysis. I'll be waiting for more input before I spill the beans on this one.

micromass · May 20, 2016

fresh_42 said:

9. Just to make sure: You didn't give us a part of the Voynich manuscript, didn't you?

Don't worry, there is a definite answer.

fresh_42 · May 20, 2016

micromass said:

That's interesting. I'll wait for some further analyses possibly by you or by others.

I've reviewed it. At a significance level of 2% then ##χ^2 = 6.5 > χ_{(98\%;1)}^2## and the null hypotheses (randomness with a fair coin) should be rejected. I have never trusted this test anyway ... or your coin.

member 587159 · May 21, 2016

6) Define the stochast X "Correct guesses".
X ~ B(10,1/2)

We want to know how large the probability is that someone gives 8 correct guesses out of 10, when he has 1/2 chance of guessing correctly. We can do this using P-values:

Our zero-hypothesis H0: p = 1/2
The alternative hypothesis Ha (this is the claim): Ha: p>1/2

P-value = 1 - Binomcdf(10,1/2,7) = 0,0547.

The chance of getting 8 out of 10 correct guesses with 1/2 chance is 0,0547 or 5,47%. This is a rather small percentage. Therefor, we conclude that supposedly H0 is wrong and we say that Ha is better. So I would say that we believe his claim, since 5,47% chance is not that large.

PeroK · May 21, 2016

For number 2:

I would say that the first sequence is human generated. Two reasons: First, the probability of getting another H after an H and a T after a T seems too high (for heads it's 67/107 and tails 51/92 by my counting). Second, there appear to be too many long sequences of Heads and too few Head singletons.

My guess is that this is someone trying to avoid too much HTHT and over-compensating. The giveaway, supposedly, for human generated sequences is too few long sequences of H and T. This looks like someone who knew this and over-compensated.

The second sequence appears to have a more normal correlation of H following H and T following T. The anomalies looks more believable.

I know too little statistics to analyse it any deeper.

micromass · May 21, 2016

Math_QED said:

6) Define the stochast X "Correct guesses".
X ~ B(10,1/2)

We want to know how large the probability is that someone gives 8 correct guesses out of 10, when he has 1/2 chance of guessing correctly. We can do this using P-values:

Our zero-hypothesis H0: p = 1/2
The alternative hypothesis Ha (this is the claim): Ha: p>1/2

P-value = 1 - Binomcdf(10,1/2,7) = 0,0547.

The chance of getting 8 out of 10 correct guesses with 1/2 chance is 0,0547 or 5,47%. This is a rather small percentage. Therefor, we conclude that supposedly H0 is wrong and we say that Ha is better. So I would say that we believe his claim, since 5,47% chance is not that large.

Two remarks:
1) Does it change anything if I knew beforehand that 5 cups would be pepsi and 5 would be coca cola? I believe this changes something, for example guessing correctly 9 out of 10 is then impossible.
2) You took the probability of guessing correctly 8 out of 10. But for the p-value, shouldn't you take the probability of guessing correctly 8 out of 10, and 9out of 10 and 10 out of 10?

micromass · May 21, 2016

PeroK said:

For number 2:

I would say that the first sequence is human generated. Two reasons: First, the probability of getting another H after an H and a T after a T seems too high (for heads it's 67/107 and tails 51/92 by my counting). Second, there appear to be too many long sequences of Heads and too few Head singletons.

My guess is that this is someone trying to avoid too much HTHT and over-compensating. The giveaway, supposedly, for human generated sequences is too few long sequences of H and T. This looks like someone who knew this and over-compensated.

The second sequence appears to have a more normal correlation of H following H and T following T. The anomalies looks more believable.

I know too little statistics to analyse it any deeper.

You gave such a neat solution to the "runs" question on the probability thread. Can you figure out the number of runs of order ##n## for each ##n## in a totally random sequence and compare it with what we have?

member 587159 · May 21, 2016

micromass said:

Two remarks:
1) Does it change anything if I knew beforehand that 5 cups would be pepsi and 5 would be coca cola? I believe this changes something, for example guessing correctly 9 out of 10 is then impossible.
2) You took the probability of guessing correctly 8 out of 10. But for the p-value, shouldn't you take the probability of guessing correctly 8 out of 10, and 9out of 10 and 10 out of 10?

I did calculate for 8 out of 10, 9 out of 10, and 10 out of 10. It's BinomCDF, I do not know whether you know this but it is a command on the graphing calculator. But in fact I calculated this. I need to think about the first remark though, since I am out of time now. I will look at it soon. Intuitively, I would say that I would have to calculate the chances for 8 out of 10 and 10 out of 10 and then add them together. This might give the correct probability. A quick calculation gives then 0,0439 and this means there is even a better chance that his claim is true. But I might be missing something.

micromass · May 21, 2016

Math_QED said:

I did calculate for 8 out of 10, 9 out of 10, and 10 out of 10. It's BinomCDF, I do not know whether you know this but it is a command on the graphing calculator. But in fact I calculated this.

Oh ok. Good to know this!

member 587159 · May 21, 2016

I edited my last post.

thephystudent · May 21, 2016

For number two (coinflips): Pick a random family of correlation functions e.g. ##\langle x_n, x_n+2 ,x_n+10\rangle##, with 'family' I mean n goes through all coinflips, where periodic boundary conditions are assumed.

The 'family' with the highest shannon-entropy will be the real one.

mrspeedybob · May 21, 2016

#5
I would expect the ball numbers (on average) to be evenly spaced through the range of possible ball numbers. The highest ball in this particular question is 213, so the average spacing between the numbers is 213/5. The total number of possible balls should include 1 average space above the highest ball, so that would make the most likely number of balls (6*213)/5, or 255.6.
Since I can't have .6 balls, I believe it's appropriate to round to the nearest integer (no pun intended), so my answer is 256.

PeroK · May 21, 2016

micromass said:

You gave such a neat solution to the "runs" question on the probability thread. Can you figure out the number of runs of order ##n## for each ##n## in a totally random sequence and compare it with what we have?

What I get for the first sequence for runs of H is:

##0 \ - \ 51; \ \ 1 \ - \ 12^*; \ \ 2 \ - \ 15; \ \ 3 \ - \ 6; \ \ 4 \ - \ 3; \ \ 5 \ - \ 1; \ \ 6 \ - \ 2; \ \ 8 \ - \ 2^*##

And for the second sequence;

##0 \ - \ 43; \ \ 1 \ - \ 25; \ \ 2 \ - \ 8^*; \ \ 3 \ - \ 13^*; \ \ 4 \ - \ 5; \ \ 5 \ - \ 1##

And, you would expect (on average) using 96 games as a nice number:

##0 \ - \ 48; \ \ 1 \ - \ 24; \ \ 2 \ - \ 12; \ \ 3 \ - \ 6; \ \ 4 \ - \ 3; \ \ 5 \ - \ 1.5; \ \ 6 \ - \ 0.75; \ \ 8 \ - \ 0.19##

The ones marked * are the main anomalies. The second pair of anomalies looks more plausible.

The probability of getting 12 or fewer H singletons is about ##0.002## and getting 2 or more runs of 8 or more about ##0.02##.

Whereas, the probability of getting precisely 8 and 13 for runs of 2 and 3 is ##0.06## and ##0.003## respectively. This is more likely to be the random sequence.

jbriggs444 · May 21, 2016

10) Does not ask a question. The obvious question is what one should choose and why. Or, better, what strategy one should adopt and why.

The key difficulty is the problem of self-reference. The situation we are faced with depends on the decision process we (will) choose to deal with it. A decision process which is guaranteed to yield an optimal result for a fixed situation is not guaranteed to yield an optimal result when the situation depends on the choice of decision process.

The problem statement attempts to disguise the self-reference by combining the use of a psychic (whose behavior depends on future choices) with the notion of a fixed and immutable past (which cannot depend on the future choices).

In order to remove the problem of self-reference, re-cast the problem as a two player non-zero sum game where we play against our twin. Each player has two choices, "cooperate" or "defect". The payoff matrix is:

Code:

row  Us     Twin     Result for us
  1 coop    coop     1,000,000   
  2 coop    defect           0
  3 defect  coop     1,001,000
  4 defect  defect        1000

The problem has been constructed to make mixed (aka random) strategies untenable. We can discard any such approaches out of hand.

If the psychic predictor were 100% accurate, our clone will make (or will have made) the same choice that we will. The relevant question is "what strategy, if adopted by both payers, will result in the highest average payout for us?" The relevant rows are 1 and 4 and a strategy of "cooperate" is optimal.

Since the psychic predictor is only 99.9% accurate, the effect is that 99.9% of the time we will be playing our twin, but 0.1% of the time we will instead be playing our evil step brother who always does the opposite. This time a choice of cooperate is now 99.9% likely to hit row 1 and 0.1% likely to hit row 2 for an average payout of $999,000. A strategy of defect is 0.1% likely to hit row 3 and 99.9% likely to hit row 4 for an average payout of $2009. A strategy of "cooperate" is still optimal.

PeroK · May 21, 2016

Number 5:

I'll take a wild guess at ##282.5##. If that's close I can post later how I arrived at it!

micromass · May 21, 2016

mrspeedybob said:

#5
I would expect the ball numbers (on average) to be evenly spaced through the range of possible ball numbers. The highest ball in this particular question is 213, so the average spacing between the numbers is 213/5. The total number of possible balls should include 1 average space above the highest ball, so that would make the most likely number of balls (6*213)/5, or 255.6.
Since I can't have .6 balls, I believe it's appropriate to round to the nearest integer (no pun intended), so my answer is 256.

PeroK said:

Number 5:

I'll take a wild guess at ##282.5##. If that's close I can post later how I arrived at it!

It's close, but mrspeedybob seems to be a lot closer, I believe he missed something essential though. I'm still very interested in seeing your method, so I would be very grateful if you would post it.

PeroK · May 21, 2016

Number 5:

First, my interpretation of the problem. You pick a number ##n## at random with equal likelihood from ##1## to some large number ##N##. You put that many numbered balls in the bag and draw out ##5##. The event ##X##, as described, with a highest of ##213## is observed. What is the expected value of ##n##, given event ##X##?

Note that it doesn't matter whether the event ##X## represents the order the balls came out or rearranged in order of magnitude: the calculations all differ only by a factor of ##5## throughout, which cancels out. Assume the former. Also assume that event ##X## is ball ##213## plus any four others smaller. Assuming the precise numbers in the problem will also cancel out and give the same answer.

##n## must be at least ##213##. Let ##p(X_{n})## be the probability of event ##X## given ##n##.

##p(X_{213}) = \frac{1}{213}##

##p(X_{214}) = \frac{1}{214} \times \frac{212}{213} \times \frac{211}{212} \times \frac{210}{211} \times \frac{209}{210}= \frac{1}{214} \times \frac{209}{213}##

In general:

##p(X_{n}) = \frac{1}{n} \times F_{n} = \frac{1}{n} \times \frac{n-5}{n-1} \times F_{n-1}## where ##F_{213} = 1, F_{214} = \frac{209}{213} \dots##

And:

##p(X) = \frac{1}{N - 212} \sum_{n = 213}^{N} p(X_{n})##

Now, we have:

##p(n \ / \ X) = \frac{p(X_n) p(n)}{p(X)} = \frac{p(X_n)}{(N-212) p(X)}##

And

##E(n) = \sum_{n = 213}^{N} n \frac{p(X_n)}{(N-212)p(X)} = \frac{1}{(N-212) p(X)} \sum_{n = 213}^{N} n {p(X_n)} = \frac{1}{(N-212)p(X)} (\sum_{n = 213}^{N} F_{n})##

##E(n) = \frac{\sum_{n = 213}^{N} F_{n}}{\sum_{n = 213}^{N} \frac{F_{n}}{n}}##

At this point I just put ##N = 3000## into a calculation and got:

##(N-212)p(X) = 0.25## and ##E(n) = 282.6##

micromass · May 21, 2016

Very impressive approach PeroK. You essentially used a Bayesian approach to this question. I like your solution a lot. I will wait a bit for other inputs, but you'll definitely get some of the credit for this problem.

PeroK · May 21, 2016

micromass said:

Very impressive approach PeroK. You essentially used a Bayesian approach to this question. I like your solution a lot. I will wait a bit for other inputs, but you'll definitely get some of the credit for this problem.

It appears that:

##\sum_{n = k}^{\infty} \frac{F_{n}}{n} = \frac{1}{r-1}## where ##k## is the highest number picked and ##r## is the number of balls picked.

So:

##E(n) = (r-1)\sum_{n = k}^{\infty} F_{n}##

Where ##F_k = 1, \ F_n = \frac{n-r}{n-1}F_{n-1}##

Not that far from a formula now!

PeroK · May 21, 2016

It also appears that:

##\sum_{n = k}^{\infty} F_{n} = \frac{k}{r-2}##

So:

##E(n) = \frac{k(r-1)}{r-2}##

E.g. ##k = 213, r = 5, E(n) = 284##

micromass · May 21, 2016

How did you find these formulas?

PeroK · May 21, 2016

micromass said:

How did you find these formulas?

Just computationally. Looking at them analytically now.

thephystudent · May 21, 2016

thephystudent said:

For number two (coinflips): Pick a random family of correlation functions e.g. ##\langle x_n, x_n+2 ,x_n+10\rangle##, with 'family' I mean n goes through all coinflips, where periodic boundary conditions are assumed.

The 'family' with the highest shannon-entropy will be the real one.

I tried this for a few families: I made for example a list {n,n+30} from n from 0 to L(the length) and calculated the entropy of the list of couples. But the difference in entropy is always in the third significant number and which chain has the highest entropy is not the same for all chosen 'families' of tupels. It may be that an averaging over all possible families will give a definite result. I don't have the time to check that for now, if someone else wants to try this, feel free to do so. But if that works, it probably wouldn't be the most elegant way to solve this problem.

EDIT: with a bit more careful look I also think the first one is hand-made, it seems to have slightly lower entropies between nearby levels especially, which is consistent with a human having a 'short time memory'.

member 587159 · May 21, 2016

6)

Define the stochast X "correct guesses". X can take any value, except 9 (if you have given 9 correct values, then the 10th is automatically correct too since you know there are 5 pepsi's and 5 regular cola's). Thus, in fact, 9 guesses are enough to know what we want to know.

We now see: X~B(9,1/2)

Zero-hypothesis: H0: p = 1/2
Alternative hypothesis (claim): Ha: p>1/2
x observed: 8

Using P values we get:

P-value = P(X>=8) = C(8,9)(1/2)^8*(1/2) + C(9,9)(1/2)^9 = 0,0195..

This means that the probability of someone guessing 8 out of 10 correctly (using H0), is 0,0195. This is a rather small probability. Therefor, I say that his claim is true.

micromass · May 21, 2016

Math_QED said:

We now see: X~B(9,1/2)

How do you know this?

member 587159 · May 21, 2016

micromass said:

How do you know this?

We only need 9 guesses to know the result. As you said earlier, 9 out of 10 is impossible. Is it correct?

micromass · May 21, 2016

Math_QED said:

We only need 9 guesses to know the result. As you said earlier, 9 out of 10 is impossible. Is it correct?

I don't see at all how you take into account that I know 5 are pepsi and 5 are coca cola. In your formulation, it is entirely possibly that I say 8 are pepsi.

member 587159 · May 21, 2016

micromass said:

I don't see at all how you take into account that I know 5 are pepsi and 5 are coca cola. In your formulation, it is entirely possibly that I say 8 are pepsi.

I supposed that the person who has to judge knows there are 5 pepsi's and 5 cola's. From where in my formulation would you say that is possible that there are 8 pepsis?

PeroK · May 21, 2016

micromass said:

How did you find these formulas?

They all reduce to variations on ##\sum \frac{1}{n(n+1)}##. For example:

##r = 3##

##\sum_{n = k}^{\infty} \frac{F_{n}}{n} = \frac{1}{k} + \frac{1}{k+1} \frac{k-2}{k} + \frac{1}{k+2} \frac{k-2}{k} \frac{k-1}{k+1} + \frac{1}{k+3} \frac{k-2}{k} \frac{k-1}{k+1} \frac{k}{k+2} \dots##

## = \frac{1}{k} + \frac{1}{k+1} \frac{k-2}{k} + (k-1)(k-2)[\frac{1}{k(k+1)(k+2)} + \frac{1}{(k+1)(k+2)(k+3)} \dots]##

## = \frac{1}{k} + \frac{1}{k+1} \frac{k-2}{k} + (k-1)(k-2)[\frac{1}{2k(k+1)}]## (Summed using partial fractions)

## = \frac{1}{2}##

Micromass' big statistics challenge

Hot Threads

Recent Insights