Micromass' big statistics challenge

In summary, we discussed various open-ended questions related to probability theory and statistics. Our goal was to find strategies and models to answer these questions and provide reasoning for their plausibility. We also allowed the use of outside sources, as long as they were properly referenced. Some of the questions we covered include estimating the number of fish in a lake, identifying a real coin toss experiment from a human-invented one, and determining the optimal number of seats on a train based on daily ridership. We also examined a scenario involving a person playing a game with a psychic, and discussed strategies for distinguishing between randomly generated text and real text.
  • #1
micromass
Staff Emeritus
Science Advisor
Homework Helper
Insights Author
22,183
3,324
If we're having a thread about probability theory, then we must have one on statistics too! The following questions are all very open-ended and thus multiple answers may seem possible. Your goal is to find a strategy to find the answer to the questions. Furthermore, you must provide some kind of reasoning as to why your strategy is a decent one.

  • For an answer to count, not only the answer must be given but also a detailed strategy. An explanation of why the strategy is plausible must be given. Don't forget the detail the model you're working with and why this model is plausible.
  • Any use of outside sources is allowed, but do not look up the question directly. For example, it is ok to go check probability books, but it is not allowed to google the exact question.
  • If you previously encountered this statement and remember the solution, then you cannot participate in this particular statement.
  • All mathematical methods are allowed.
  • Please reference every source you use.
  • What I feel are the best answers will be awarded on this original post. If you think you came up with a better answer, you must prove your answer is better.
Here you go:

  1. SOLVED BY mfbEvery day there is one train between Mordor and Rohan. These are the number of people who want to take the train on one day:

    Day 1: 233
    Day 2: 231
    Day 3: 254
    Day 4: 212
    Day 5: 202

    Find the optimal number of seats in the train.
  2. Take the following two sequences of coin tosses:

    Code:
    THHHHTTTTHHHHTHHHHHHHHTTTHHTTHHHHHTTTTTTHHTHHTHHHTTTHTTHHHHTHTTHTTTHHTTTTHHHHHHTTTHHTTHHHTHHHHHTTTTTHTTTHHTTHTTHHTTTHHTTTHHTHHTHHTTTTTHHTHHHHHHTHTHTTHTHTTHHHTTHHTHTHHHHHHHHTTHTTHHHTHHTTHTTTTTTHHHTHHH

    Code:
    THTHTTTHTTTTHTHTTTHTTHHHTHHTHTHTHTTTTHHTTHHTTHHHTHHHTTHHHTTTHHHTHHHHTTTHTHTHHHHTHTTTHHHTHHTHTTTHTHHHTHHHHTTHTHHTHHHTTTHTHHHTHHTTTHHHTTTTHHHTHTHHHHTHTTHHTTTTHTHTHTTHTHHTTHTTTHTTTTHHHHTHTHHHTTHHHHHTHHH

    One of these sequences is from an actual coin toss experiment. The other is invented by a human. Find out which of these is which.
  3. SOLVED BY Math_QED I want to estimate the number of fish in a lake. I catch 400 fish and given them all a red dot. I throw them back in the lake. Then I catch 400 fish again. I note that 100 of them have a red dot. How many fish are there in the lake?
  4. SOLVED BY Ygggdrasil, mfb Unstable particles are emitted from a source and decay at a distance ##x##, a real number that has an exponential distribution with characteristic length ##\lambda##. Decay events can be observed only if they occur in a window extending from ##x=1## to ##x=20##. We observe ##6## decays at locations ##\{2,5,12,13,13,16\}##. What is ##\lambda##?
  5. SOLVED BY mrspeedybob, PeroK I have a big box filled with balls. All balls have a number. I draw ##5## balls at random and record their number. They are: ##10##, ##50##, ##104##, ##130##, ##213##. How many balls do you expect to be in the box?
  6. I claim I can tell the difference between coca cola and pepsi cola better than just guessing. Somebody pours 5 cups of pepsi and 5 cups of coca cola and hands them to me. I tell them which cup is which. It turns out I judged correctly 8 of the 10 cups. Do you believe my original claim?
  7. SOLVED BY MarneMath A professor got a ticket twelve times for illegal overnight parking. All twelve tickets were given either Tuesdays or Thursdays. Is it justified for him to rent a garage on these days?
  8. SOLVED BY QuantumQuest In a certain family, four girls take turns at washing dishes. There were four breakages. Three of them were caused by the youngest girl. Is it justified to call her clumsy?
  9. SOLVED BY fresh_42 Given the following encoded text, find out whether this is a real text or randomly generated using some scheme. Attempting to decode the text doesn't count.

    Code:
    sedwhqjkbqjyedi oek xqlu tusetut jxyi junj jxqj mqi dej fqhj ev jxu fherbuc jxekwx ie oek wuj de feydji jxu vebbemydw yi qd unsuhfj vhec myayfutyq fherqrybyjo yi jxu cuqikhu ev jxu byaubyxeet jxqj qd uludj mybb esskh fherqrybyjo yi gkqdjyvyut qi q dkcruh rujmuud puhe qdt edu mxuhu puhe ydtysqjui ycfeiiyrybyjo qdt edu ydtysqjui suhjqydjo jxu xywxuh jxu fherqrybyjo ev qd uludj jxu cehu suhjqyd mu qhu jxqj jxu uludj mybb esskh q iycfbu unqcfbu yi jxu jeiiydw ev q vqyh kdryqiut seyd iydsu jxu seyd yi kdryqiut jxu jme ekjsecui xuqt qdt jqyb qhu ugkqbbo fherqrbu jxu fherqrybyjo ev xuqt ugkqbi jxu fherqrybyjo ev jqyb iydsu de ejxuh ekjsecu yi feiiyrbu jxu fherqrybyjo yi edu xqbv eh vyvjo fuhsudj ev uyjxuh xuqt eh jqyb yd ejxuh mehti jxu fherqrybyjo ev xuqt yi edu ekj ev jme ekjsecui qdt jxu fherqrybyjo ev jqyb yi qbie edu ekj ev jme ekjsecui jxuiu sedsufji xqlu ruud wylud qd qnyecqjys cqjxucqjysqb vehcqbypqjyed yd fherqrybyjo jxueho iuu fherqrybyjo qnyeci mxysx yi kiut mytubo yd iksx qhuqi ev ijkto qi cqjxucqjysi ijqjyijysi vydqdsu wqcrbydw isyudsu yd fqhjyskbqh fxoiysi qhjyvysyqb ydjubbywudsu cqsxydu buqhdydw secfkjuh isyudsu wqcu jxueho qdt fxybeiefxo je veh unqcfbu thqm ydvuhudsui qrekj jxu unfusjut vhugkudso ev uludji fherqrybyjo jxueho yi qbie kiut je tuishyru jxu kdtuhboydw cusxqdysi qdt huwkbqhyjyui ev secfbun ioijuci
  10. SOLVED BY jbriggs444 A person is playing a game operated by a psychic, an entity presented as somehow being exceptionally skilled at predicting people's actions. It is known that the Psychic predicts people's actions correctly in approximately 99.9% of the cases. The player of the game is presented with two boxes, one transparent (labeled A) and the other opaque (labeled B). The player is permitted to take the contents of both boxes, or just the opaque box B. Box A contains a visible $1,000. The contents of box B, however, are determined as follows: At some point before the start of the game, the Psychic makes a prediction as to whether the player of the game will take just box B, or both boxes. If the Psychic predicts that both boxes will be taken, then box B will contain nothing. If the Psychic predicts that only box B will be taken, then box B will contain $1,000,000.

    If the psychic predicts that the player will choose randomly, then box B will contain nothing.

    By the time the game begins, and the player is called upon to choose which boxes to take, the prediction has already been made, and the contents of box B have already been determined. That is, box B contains either $0 or $1,000,000 before the game begins, and once the game begins even the Psychic is powerless to change the contents of the boxes. Before the game begins, the player is aware of all the rules of the game, including the two possible contents of box B, the fact that its contents are based on the Psychic's prediction, and knowledge of the Psychic's infallibility. The only information withheld from the player is what prediction the Psychic made, and thus what the contents of box B are.[/COLOR]


Thank you all for participating! I hope many of you have fun with this! Don't hesitate to post any feedback in the thread!

More information:

  1. Invented myself
  2. gato-docs.its.txstate.edu/mathworks/DistributionOfLongestRun.pdf
  3. Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
  4. MacKay "Information Theory, Inference and Learning algorithms" http://www.inference.phy.cam.ac.uk/itila/p0.html
  5. http://www.math.uah.edu/stat/urn/OrderStatistics.html
  6. https://en.wikipedia.org/wiki/Fisher's_exact_test
  7. Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
  8. Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
  9. https://en.wikipedia.org/wiki/Newcomb's_paradox
 
Last edited:
Physics news on Phys.org
  • #2
2. The ##χ^2##-Test on the outcome gave me no informations. Therefore I considered the changes as random variable. For a fair coin the chances are ##50:50## to change or not. That costed only one observation, but ##198## are still fine.
The first sequence resulted in ##χ^2 = 6.5## which corresponds to an almost ##99\%## chance for not being random whereas the second got me ##0.3##, i.e. no statement about randomness by the ##χ^2##-Test can be made. So if I had no cut and paste errors, my answer is: The first sequence is hand made.
 
  • #3
4) If n is the total amount of fish

400/n must be equal to 100/400, therefor n = 1600. This is because you take 400 fish from the population randomly (uniformly distributed) and then you find 100 fish out of 400 that have been marked (again uniformly distributed). If you give more information about the liability I can give an interval.
 
  • #4
9. Just to make sure: You didn't give us a part of the Voynich manuscript, didn't you?

Edit:
I have taken the source text which is of 1342 bytes length and compressed it to 718 bytes.
Then I randomly choose texts of equal length (1342 bytes) from:
  1. New York Times → compressed 816 bytes
  2. Washington Post → compressed 817 bytes
  3. San Francisco Chronicle → compressed 820 bytes
  4. a Danish tourist site → compressed 810 bytes
  5. Helsingin Sanomat (Finnish newspaper) → compressed 804 bytes
  6. El País (Madrid) → compressed 882 bytes
The Spanish have had the longest sentences, too. Although I changed languages and content, the compressed files were all around 820 bytes long, except one which was longer. But none came even near the 718 bytes, so I didn't make any statistical test. My result: The text is likely artificial.
 
Last edited:
  • #5
fresh_42 said:
2. The ##χ^2##-Test on the outcome gave me no informations. Therefore I considered the changes as random variable. For a fair coin the chances are ##50:50## to change or not. That costed only one observation, but ##198## are still fine.
The first sequence resulted in ##χ^2 = 6.5## which corresponds to an almost ##99\%## chance for not being random whereas the second got me ##0.3##, i.e. no statement about randomness by the ##χ^2##-Test can be made. So if I had no cut and paste errors, my answer is: The first sequence is hand made.

That's interesting. I'll wait for some further analyses possibly by you or by others.
 
  • #6
Math_QED said:
4) If n is the total amount of fish

400/n must be equal to 100/400, therefor n = 1600. This is because you take 400 fish from the population randomly (uniformly distributed) and then you find 100 fish out of 400 that have been marked (again uniformly distributed). If you give more information about the liability I can give an interval.

Very good. I will mark this as the correct answer.
 
  • #7
fresh_42 said:
9. Just to make sure: You didn't give us a part of the Voynich manuscript, didn't you?

Edit:
I have taken the source text which is of 1342 bytes length and compressed it to 718 bytes.
Then I randomly choose texts of equal length (1342 bytes) from:
  1. New York Times → compressed 816 bytes
  2. Washington Post → compressed 817 bytes
  3. San Francisco Chronicle → compressed 820 bytes
  4. a Danish tourist site → compressed 810 bytes
  5. Helsingin Sanomat (Finnish newspaper) → compressed 804 bytes
  6. El País (Madrid) → compressed 882 bytes
The Spanish have had the longest sentences, too. Although I changed languages and content, the compressed files were all around 820 bytes long, except one which was longer. But none came even near the 718 bytes, so I didn't make any statistical test. My result: The text is likely artificial.

That's an interesting analysis. I'll be waiting for more input before I spill the beans on this one.
 
  • #8
fresh_42 said:
9. Just to make sure: You didn't give us a part of the Voynich manuscript, didn't you?

Don't worry, there is a definite answer.
 
  • #9
micromass said:
That's interesting. I'll wait for some further analyses possibly by you or by others.
I've reviewed it. At a significance level of 2% then ##χ^2 = 6.5 > χ_{(98\%;1)}^2## and the null hypotheses (randomness with a fair coin) should be rejected. I have never trusted this test anyway ... or your coin.
 
  • #10
6) Define the stochast X "Correct guesses".
X ~ B(10,1/2)

We want to know how large the probability is that someone gives 8 correct guesses out of 10, when he has 1/2 chance of guessing correctly. We can do this using P-values:

Our zero-hypothesis H0: p = 1/2
The alternative hypothesis Ha (this is the claim): Ha: p>1/2

P-value = 1 - Binomcdf(10,1/2,7) = 0,0547.

The chance of getting 8 out of 10 correct guesses with 1/2 chance is 0,0547 or 5,47%. This is a rather small percentage. Therefor, we conclude that supposedly H0 is wrong and we say that Ha is better. So I would say that we believe his claim, since 5,47% chance is not that large.
 
  • #11
For number 2:

I would say that the first sequence is human generated. Two reasons: First, the probability of getting another H after an H and a T after a T seems too high (for heads it's 67/107 and tails 51/92 by my counting). Second, there appear to be too many long sequences of Heads and too few Head singletons.

My guess is that this is someone trying to avoid too much HTHT and over-compensating. The giveaway, supposedly, for human generated sequences is too few long sequences of H and T. This looks like someone who knew this and over-compensated.

The second sequence appears to have a more normal correlation of H following H and T following T. The anomalies looks more believable.

I know too little statistics to analyse it any deeper.
 
  • #12
Math_QED said:
6) Define the stochast X "Correct guesses".
X ~ B(10,1/2)

We want to know how large the probability is that someone gives 8 correct guesses out of 10, when he has 1/2 chance of guessing correctly. We can do this using P-values:

Our zero-hypothesis H0: p = 1/2
The alternative hypothesis Ha (this is the claim): Ha: p>1/2

P-value = 1 - Binomcdf(10,1/2,7) = 0,0547.

The chance of getting 8 out of 10 correct guesses with 1/2 chance is 0,0547 or 5,47%. This is a rather small percentage. Therefor, we conclude that supposedly H0 is wrong and we say that Ha is better. So I would say that we believe his claim, since 5,47% chance is not that large.

Two remarks:
1) Does it change anything if I knew beforehand that 5 cups would be pepsi and 5 would be coca cola? I believe this changes something, for example guessing correctly 9 out of 10 is then impossible.
2) You took the probability of guessing correctly 8 out of 10. But for the p-value, shouldn't you take the probability of guessing correctly 8 out of 10, and 9out of 10 and 10 out of 10?
 
  • Like
Likes member 587159
  • #13
PeroK said:
For number 2:

I would say that the first sequence is human generated. Two reasons: First, the probability of getting another H after an H and a T after a T seems too high (for heads it's 67/107 and tails 51/92 by my counting). Second, there appear to be too many long sequences of Heads and too few Head singletons.

My guess is that this is someone trying to avoid too much HTHT and over-compensating. The giveaway, supposedly, for human generated sequences is too few long sequences of H and T. This looks like someone who knew this and over-compensated.

The second sequence appears to have a more normal correlation of H following H and T following T. The anomalies looks more believable.

I know too little statistics to analyse it any deeper.

You gave such a neat solution to the "runs" question on the probability thread. Can you figure out the number of runs of order ##n## for each ##n## in a totally random sequence and compare it with what we have?
 
  • #14
micromass said:
Two remarks:
1) Does it change anything if I knew beforehand that 5 cups would be pepsi and 5 would be coca cola? I believe this changes something, for example guessing correctly 9 out of 10 is then impossible.
2) You took the probability of guessing correctly 8 out of 10. But for the p-value, shouldn't you take the probability of guessing correctly 8 out of 10, and 9out of 10 and 10 out of 10?

I did calculate for 8 out of 10, 9 out of 10, and 10 out of 10. It's BinomCDF, I do not know whether you know this but it is a command on the graphing calculator. But in fact I calculated this. I need to think about the first remark though, since I am out of time now. I will look at it soon. Intuitively, I would say that I would have to calculate the chances for 8 out of 10 and 10 out of 10 and then add them together. This might give the correct probability. A quick calculation gives then 0,0439 and this means there is even a better chance that his claim is true. But I might be missing something.
 
  • #15
Math_QED said:
I did calculate for 8 out of 10, 9 out of 10, and 10 out of 10. It's BinomCDF, I do not know whether you know this but it is a command on the graphing calculator. But in fact I calculated this.

Oh ok. Good to know this!
 
  • #16
I edited my last post.
 
  • #17
For number two (coinflips): Pick a random family of correlation functions e.g. ##\langle x_n, x_n+2 ,x_n+10\rangle##, with 'family' I mean n goes through all coinflips, where periodic boundary conditions are assumed.

The 'family' with the highest shannon-entropy will be the real one.
 
Last edited:
  • #18
#5
I would expect the ball numbers (on average) to be evenly spaced through the range of possible ball numbers. The highest ball in this particular question is 213, so the average spacing between the numbers is 213/5. The total number of possible balls should include 1 average space above the highest ball, so that would make the most likely number of balls (6*213)/5, or 255.6.
Since I can't have .6 balls, I believe it's appropriate to round to the nearest integer (no pun intended), so my answer is 256.
 
  • #19
micromass said:
You gave such a neat solution to the "runs" question on the probability thread. Can you figure out the number of runs of order ##n## for each ##n## in a totally random sequence and compare it with what we have?

What I get for the first sequence for runs of H is:

##0 \ - \ 51; \ \ 1 \ - \ 12^*; \ \ 2 \ - \ 15; \ \ 3 \ - \ 6; \ \ 4 \ - \ 3; \ \ 5 \ - \ 1; \ \ 6 \ - \ 2; \ \ 8 \ - \ 2^*##

And for the second sequence;

##0 \ - \ 43; \ \ 1 \ - \ 25; \ \ 2 \ - \ 8^*; \ \ 3 \ - \ 13^*; \ \ 4 \ - \ 5; \ \ 5 \ - \ 1##

And, you would expect (on average) using 96 games as a nice number:

##0 \ - \ 48; \ \ 1 \ - \ 24; \ \ 2 \ - \ 12; \ \ 3 \ - \ 6; \ \ 4 \ - \ 3; \ \ 5 \ - \ 1.5; \ \ 6 \ - \ 0.75; \ \ 8 \ - \ 0.19##

The ones marked * are the main anomalies. The second pair of anomalies looks more plausible.

The probability of getting 12 or fewer H singletons is about ##0.002## and getting 2 or more runs of 8 or more about ##0.02##.

Whereas, the probability of getting precisely 8 and 13 for runs of 2 and 3 is ##0.06## and ##0.003## respectively. This is more likely to be the random sequence.
 
Last edited:
  • #20
10) Does not ask a question. The obvious question is what one should choose and why. Or, better, what strategy one should adopt and why.

The key difficulty is the problem of self-reference. The situation we are faced with depends on the decision process we (will) choose to deal with it. A decision process which is guaranteed to yield an optimal result for a fixed situation is not guaranteed to yield an optimal result when the situation depends on the choice of decision process.

The problem statement attempts to disguise the self-reference by combining the use of a psychic (whose behavior depends on future choices) with the notion of a fixed and immutable past (which cannot depend on the future choices).

In order to remove the problem of self-reference, re-cast the problem as a two player non-zero sum game where we play against our twin. Each player has two choices, "cooperate" or "defect". The payoff matrix is:

Code:
row  Us     Twin     Result for us
  1 coop    coop     1,000,000   
  2 coop    defect           0
  3 defect  coop     1,001,000
  4 defect  defect        1000

The problem has been constructed to make mixed (aka random) strategies untenable. We can discard any such approaches out of hand.

If the psychic predictor were 100% accurate, our clone will make (or will have made) the same choice that we will. The relevant question is "what strategy, if adopted by both payers, will result in the highest average payout for us?" The relevant rows are 1 and 4 and a strategy of "cooperate" is optimal.

Since the psychic predictor is only 99.9% accurate, the effect is that 99.9% of the time we will be playing our twin, but 0.1% of the time we will instead be playing our evil step brother who always does the opposite. This time a choice of cooperate is now 99.9% likely to hit row 1 and 0.1% likely to hit row 2 for an average payout of $999,000. A strategy of defect is 0.1% likely to hit row 3 and 99.9% likely to hit row 4 for an average payout of $2009. A strategy of "cooperate" is still optimal.
 
  • #21
Number 5:

I'll take a wild guess at ##282.5##. If that's close I can post later how I arrived at it!
 
  • #22
mrspeedybob said:
#5
I would expect the ball numbers (on average) to be evenly spaced through the range of possible ball numbers. The highest ball in this particular question is 213, so the average spacing between the numbers is 213/5. The total number of possible balls should include 1 average space above the highest ball, so that would make the most likely number of balls (6*213)/5, or 255.6.
Since I can't have .6 balls, I believe it's appropriate to round to the nearest integer (no pun intended), so my answer is 256.

PeroK said:
Number 5:

I'll take a wild guess at ##282.5##. If that's close I can post later how I arrived at it!

It's close, but mrspeedybob seems to be a lot closer, I believe he missed something essential though. I'm still very interested in seeing your method, so I would be very grateful if you would post it.
 
  • #23
Number 5:

First, my interpretation of the problem. You pick a number ##n## at random with equal likelihood from ##1## to some large number ##N##. You put that many numbered balls in the bag and draw out ##5##. The event ##X##, as described, with a highest of ##213## is observed. What is the expected value of ##n##, given event ##X##?

Note that it doesn't matter whether the event ##X## represents the order the balls came out or rearranged in order of magnitude: the calculations all differ only by a factor of ##5## throughout, which cancels out. Assume the former. Also assume that event ##X## is ball ##213## plus any four others smaller. Assuming the precise numbers in the problem will also cancel out and give the same answer.

##n## must be at least ##213##. Let ##p(X_{n})## be the probability of event ##X## given ##n##.

##p(X_{213}) = \frac{1}{213}##

##p(X_{214}) = \frac{1}{214} \times \frac{212}{213} \times \frac{211}{212} \times \frac{210}{211} \times \frac{209}{210}= \frac{1}{214} \times \frac{209}{213}##

In general:

##p(X_{n}) = \frac{1}{n} \times F_{n} = \frac{1}{n} \times \frac{n-5}{n-1} \times F_{n-1}## where ##F_{213} = 1, F_{214} = \frac{209}{213} \dots##

And:

##p(X) = \frac{1}{N - 212} \sum_{n = 213}^{N} p(X_{n})##

Now, we have:

##p(n \ / \ X) = \frac{p(X_n) p(n)}{p(X)} = \frac{p(X_n)}{(N-212) p(X)}##

And

##E(n) = \sum_{n = 213}^{N} n \frac{p(X_n)}{(N-212)p(X)} = \frac{1}{(N-212) p(X)} \sum_{n = 213}^{N} n {p(X_n)} = \frac{1}{(N-212)p(X)} (\sum_{n = 213}^{N} F_{n})##

##E(n) = \frac{\sum_{n = 213}^{N} F_{n}}{\sum_{n = 213}^{N} \frac{F_{n}}{n}}##

At this point I just put ##N = 3000## into a calculation and got:

##(N-212)p(X) = 0.25## and ##E(n) = 282.6##
 
Last edited:
  • Like
Likes Samy_A
  • #24
Very impressive approach PeroK. You essentially used a Bayesian approach to this question. I like your solution a lot. I will wait a bit for other inputs, but you'll definitely get some of the credit for this problem.
 
  • Like
Likes PeroK
  • #25
micromass said:
Very impressive approach PeroK. You essentially used a Bayesian approach to this question. I like your solution a lot. I will wait a bit for other inputs, but you'll definitely get some of the credit for this problem.

It appears that:

##\sum_{n = k}^{\infty} \frac{F_{n}}{n} = \frac{1}{r-1}## where ##k## is the highest number picked and ##r## is the number of balls picked.

So:

##E(n) = (r-1)\sum_{n = k}^{\infty} F_{n}##

Where ##F_k = 1, \ F_n = \frac{n-r}{n-1}F_{n-1}##

Not that far from a formula now!
 
  • #26
It also appears that:

##\sum_{n = k}^{\infty} F_{n} = \frac{k}{r-2}##

So:

##E(n) = \frac{k(r-1)}{r-2}##

E.g. ##k = 213, r = 5, E(n) = 284##
 
  • #27
How did you find these formulas?
 
  • #28
micromass said:
How did you find these formulas?

Just computationally. Looking at them analytically now.
 
  • #29
thephystudent said:
For number two (coinflips): Pick a random family of correlation functions e.g. ##\langle x_n, x_n+2 ,x_n+10\rangle##, with 'family' I mean n goes through all coinflips, where periodic boundary conditions are assumed.

The 'family' with the highest shannon-entropy will be the real one.

I tried this for a few families: I made for example a list {n,n+30} from n from 0 to L(the length) and calculated the entropy of the list of couples. But the difference in entropy is always in the third significant number and which chain has the highest entropy is not the same for all chosen 'families' of tupels. It may be that an averaging over all possible families will give a definite result. I don't have the time to check that for now, if someone else wants to try this, feel free to do so. But if that works, it probably wouldn't be the most elegant way to solve this problem.

EDIT: with a bit more careful look I also think the first one is hand-made, it seems to have slightly lower entropies between nearby levels especially, which is consistent with a human having a 'short time memory'.
 
Last edited:
  • #30
6)

Define the stochast X "correct guesses". X can take any value, except 9 (if you have given 9 correct values, then the 10th is automatically correct too since you know there are 5 pepsi's and 5 regular cola's). Thus, in fact, 9 guesses are enough to know what we want to know.

We now see: X~B(9,1/2)

Zero-hypothesis: H0: p = 1/2
Alternative hypothesis (claim): Ha: p>1/2
x observed: 8

Using P values we get:

P-value = P(X>=8) = C(8,9)(1/2)^8*(1/2) + C(9,9)(1/2)^9 = 0,0195..

This means that the probability of someone guessing 8 out of 10 correctly (using H0), is 0,0195. This is a rather small probability. Therefor, I say that his claim is true.
 
  • #31
Math_QED said:
We now see: X~B(9,1/2)

How do you know this?
 
  • #32
micromass said:
How do you know this?

We only need 9 guesses to know the result. As you said earlier, 9 out of 10 is impossible. Is it correct?
 
  • #33
Math_QED said:
We only need 9 guesses to know the result. As you said earlier, 9 out of 10 is impossible. Is it correct?

I don't see at all how you take into account that I know 5 are pepsi and 5 are coca cola. In your formulation, it is entirely possibly that I say 8 are pepsi.
 
  • #34
micromass said:
I don't see at all how you take into account that I know 5 are pepsi and 5 are coca cola. In your formulation, it is entirely possibly that I say 8 are pepsi.

I supposed that the person who has to judge knows there are 5 pepsi's and 5 cola's. From where in my formulation would you say that is possible that there are 8 pepsis?
 
  • #35
micromass said:
How did you find these formulas?

They all reduce to variations on ##\sum \frac{1}{n(n+1)}##. For example:

##r = 3##

##\sum_{n = k}^{\infty} \frac{F_{n}}{n} = \frac{1}{k} + \frac{1}{k+1} \frac{k-2}{k} + \frac{1}{k+2} \frac{k-2}{k} \frac{k-1}{k+1} + \frac{1}{k+3} \frac{k-2}{k} \frac{k-1}{k+1} \frac{k}{k+2} \dots##

## = \frac{1}{k} + \frac{1}{k+1} \frac{k-2}{k} + (k-1)(k-2)[\frac{1}{k(k+1)(k+2)} + \frac{1}{(k+1)(k+2)(k+3)} \dots]##

## = \frac{1}{k} + \frac{1}{k+1} \frac{k-2}{k} + (k-1)(k-2)[\frac{1}{2k(k+1)}]## (Summed using partial fractions)

## = \frac{1}{2}##
 
Back
Top