# Odds for flipping 100 coins

1. Jan 13, 2007

### eehiram

Suppose we consider flipping 100 quarters. The odds of all heads is 1 out of 2^100, which is about 10^30.

How can we explain it, then, if this happens in real life?

Secondly, if we flip 99 coins and get all heads, what will be the odds of getting one more head with one more coin toss? I know the odds are supposed to be 50%-50%, but it's a very unlikely streak of luck to occur, so perhaps...somehow...there's an explanation for why it would be unlikely to get one more head on the next toss.

2. Jan 13, 2007

### verty

If you are flipping the coin and do it from the same height, imparting the same velocity and angle, etc, then it will always have the same result. It depends on how random your coin-flipping is. If you get 99 heads, chances are your technique is biased in favour of heads.

3. Jan 13, 2007

### d_leet

There is a non-zero probability of it happening, so it is definitely a possibility that this happens, it just isn't very likely.

I think the explanation for this is that each individual coin toss is independant of all the others, so it doesn't matter what the previous number of heads you have flipped there is still only a 50% chance of getting heads when you flip the coin this time.

4. Jan 13, 2007

### matt grime

The chances of getting *any* sequence of heads and tails is 1/2^100. Why aren't you sceptical when any of those happen? Because you're thinkinkg about what you want to be intuitively true, perhaps.

Feynmann supposedly tried to make his students think of probability properly when he taught it by walking in to the lecture hall and saying: I say license plate XYZ123 on the way into work today. What are the chances of me seeing that particuilar combination of letters this morning!

You should also try to distinguish between theoretical models and real life. If you do get a real life situation with 1000 heads in a row, then you might want to consider rejecting the theoretical model of a fair coin toss for the situation. There are plenty of tests you can do on a hypothesis v. data to see if the hypothesis fits the known data.

5. Jan 13, 2007

### eehiram

However, if we ignore the order of the "sequence" by tossing them all at once and not earmarking any of them, then there are other outcomes that are more likely because there's more than one chance of them occurring. I am referring to the 50% chance of getting one head and one tail when flipping two coins, to compare to something.

On the other hand, you are right that the odds of rolling a one on a die is 1/6, but so are the odds for every other number. They are all equal in their likelihood. So the same goes for 100 coins?

I simply wanted to bring up some sort of issue with an unlikely event. If I'm not mistaken, Brian Greene wrote something about 100 coin tosses in The Fabric of the Cosmos (I read it at Barnes & Noble so I don't remember and can't check it) when discussing probability and entropy. I am sure he wrote about reordering a very long novel in a random order to illustrate an entropy principle: that entropy increases because disorganized states are more likely than organized states, as more disorganized states potentially exist as possible outcomes than organized states.

6. Jan 13, 2007

### EnumaElish

"100 heads" may be seen as an extreme event, because its observance maximizes the likelihood of a biased "coin" in a Bayesian sense. If two competing priors are "random coin" and "deterministic coin" (e.g. one with two heads), and if all outcomes are heads, then the probability of the deterministic prior conditional on the observed outcomes is (I believe) maximized (trivially in the case of double heads). ("Coin" is a placeholder term for "physical binomial random device.")

The max. entropy principle is apparent from the permutations in which 100 tosses of a random coin may appear: P(100 heads) = 1/2^100 which is very near zero, but P(50 heads) is a much larger number [= C(100,50)/2^100 = 0.08 approx.] because there are many combinations of 50 heads. One may think of this as a "degrees of freedom" problem. Nature has one degree of freedom when producing 100 heads (a highly organized state); but it has many more [C(100,50) = 10^29] degrees of freedom when producing 50 heads (a most disorganized state).

All this assumes that independence and randomness are "natural." 100 heads is not unlikely at all if the prior is "nature is nonrandom" (or the trials are dependent -- e.g. correlated). However, a nonrandom nature seems at odds (pardon the pun) with what we observe experimentally. OTOH, to be fair to the alternative hypothesis (of nonrandomness), a thorny problem for the 2nd Law of Thermodynamics is the innocent question "if disorganized states are so much more likely to be observed, why did the universe not begin in such a state?" More or less all of the answers (that I have seen) reduce to the anthropomorphic principle. (Unless this is interpreted as a "trick question" and so not answered.)

Last edited: Jan 13, 2007
7. Jan 13, 2007

### eehiram

Thank you for an interesting reply. I was indeed referring to coins with a head and a tail, not two-headed coins. Hence the odds of the 100 heads outcome outcome is much smaller than the 8% likelihood of 50 heads and 50 tails -- so much so that one wonders how such an outcome could actually occur in ordinary experience.

On the other hand, every coin of the 100 has a side with a head, so anything's possible, right? I suppose one might suspect cheating with such an outcome, but I don't know if we want to digress into that subsection of the topic of coin tossing.

8. Jan 23, 2007

### ssd

If it happens in real life then we must say that the P(head in a single toss) is not 0.5 for the coin(s)... ie, the coin(s) is biased (of courese we assume other things are fair, no cheating..etc). The proof can be derived as a corollary of Bernoulli's theorem.... This case is often referred to as "moral impossibility clause of probability". Events with very very small prob.s will not happen in a single trial ( here, a trial is 1 flip of 100 coins or 100 flip of 1 coin). It is like the probability of putting an airmail letter in the usual househod letter box(which has an openning just larger than the letter) when thrown from 20ft away.

Generally it is considered that one toss is independent of other tosses. Therefore getting a head at the m-th toss has the same prob. as getting a head at the 1st toss.....irrespective of what happened earlier.

9. Jan 23, 2007

### matt grime

Getting 100 heads in a row does not mean we *must* reject the hypothesis of an unbiased coin. And you cannot *prove* that you *must*.

If you wish to do a hypothesis test, then do so. But don't make nonsensical alternative hypotheses like 'H_0: the coin is fair. H_1 I've got more chance of throwing this letter through a small hole.'

Last edited: Jan 23, 2007
10. Jan 23, 2007

### ssd

Firstly, I gave no such alternative hypothesis as you mentioned. I gave the example so that one may visualize the practical impossibility of occurance of events with very very small prob.s.

Well,the word 'must' may or may not be used depending on the sense of using the word. One may test the null hyp H:p=0.5 ag: K:p<>0.5 (p= prob of getting head in a single toss) with sizes of the critical region 0.1 or 0.05 or 0.5 or 1-1/2^100 or 1/2^100 as he pleases.... but last three of them do not certainly make sense to a statistican. The choice of the size of the critical region is subjective. Different sizes may give different conclusions. While performing a test of hypothesis one 'must' commit two errors. So, can we say that the method of hypothesis testing is an erroneous one? Even, in general we cannot minimize the two errors simultaneously. But whatever the choice of critical region, once it is decided, we infer on the basis of it using the sample at disposal. What we infer: something like H is true (accepted) ag. K or H is false (not accepted) ag.K at the given level. I find little difference between the two statements "H is true" and "H must be true" while making statistical inference.
In the said example of coin flipping, the hypothesis H:p=0.5 ag: K:p<>0.5 will be rejected for any sensible choice of the critical region using the given sample of 99 heads out of 100 tosses........that is what I ment by saying the the coin 'must' be biased.

P.S.
1/ If you raise the question of 'must' in the way you did, Bernoulli's theorem 'must' remain unacceptable ("unproved"?) in the same way as long run relative frequencies will not converge to corresponding probabilities.

2/ Can you show that the size of the critical region used to accept the hypothesis p=0.5 against p<>0.5 when the sample is 100 heads out of 100 tosses have ever been used in any practical application of hypothesis testing? Or, for that matter that size of the critical region is theoritically used where the purpose of hypothesis testing prevails?

Last edited: Jan 23, 2007
11. Jan 23, 2007

### matt grime

'Must' without qualification implies no room for any other possibility.

12. Jan 24, 2007

### EnumaElish

Surely, in classical hypothesis testing. But suppose the statistician is a Bayesian with a rather strong prior toward "unbiasedness." Then the conclusion becomes an issue of what weights are attached to the prior "belief" vs. the data.

13. Jan 25, 2007

### ssd

Please look at my first post where I said "If... we must say ....", not that I said "coins must be biased" (although in my later post I decided to say so). That obviously was in the sense of "have to" or "ought to".

Can you support your claim with statistical theories of inference?

I am interested to learn the method where one gets 100 heads out of 100 tosses, and infer by a statistical inference procedure that the coin is unbiased when the factors of subjective choice in the procedure are impartially or unbiasedly chosen?

14. Jan 25, 2007

### ssd

Well, I dont disagree completely. But Bayesian approach is one of the methods to accommodate 'experience' in decision making through statistical procedures. To have a fair decision through Bayesian approach the decision maker should be impertial about the prior belief....ie, he should not manipulate facts to get desired results. What I mean is, whatever he assumes, he should have a reason (experience about the concerned facts) to believe that. In the particular problem of this thread, I feel that there is no reason or requirement (I may be wrong) to use such belief when we can easily apply classical testing procedure.

Last edited: Jan 25, 2007
15. Jan 25, 2007

### D H

Staff Emeritus
A Bayesianist will come to the same conclusion as will the frequentist (that the coin is biased) unless the Bayesianist has a collapsed covariance matrix for a prior uncertainty.

By the same logic, A Bayesianist could insist that a coin will always land on edge despite evidence to the contrary if his prior "belief" (as embodied in the prior estimate and prior covariance) is that the coin always lands on edge. Bayesianists avoid this problem by starting with a non-singular covariance matrix and then ensuring the covariance matrix doesn't collapse upon update (e.g., adding process noise).

Last edited: Jan 25, 2007
16. Jan 25, 2007

### EnumaElish

What if the Bayesian believes that the coin almost always lands on its side? That the coin will land on the side with probability = 1 - epsilon?

What if the data show 99 heads out of 100, not 100/100? (Which is what I was assuming, following ssd.)

Last edited: Jan 25, 2007
17. Jan 25, 2007

### matt grime

post 8. read it. you wrote it.

yes. it is trivial. exercise for the reader. bear in mind that you have just asserted that we *must* reject model Y given that some outcome that has non-zero probability in Y has occurred.

Last edited: Jan 25, 2007
18. Jan 25, 2007

### EnumaElish

Would it be OK to say "even when one must empirically reject model Y under conventional empirical standards, one cannot exclude model Y canonically"?

19. Jan 26, 2007

### whatta

No, it's because he's thinking in terms of probability of "N heads and 100-N tails out of 100 trials", ignoring the order, which is no longer the same for every N.

But, what I would like to point out here is the misconception of probability everybody seem to have these days. It is not absolute, it only works "on average". 100% probability does not mean this event will necessary happen, and 0% does not mean it will not ever happen.

Furthermore, we can easily construct the situation where event with zero probability happens with absolute necessity. For example, imagine ray coming out of the center of the sphere in random direction: the probability to intersect the sphere in any given point is 0, but intersection obviously will happen somewhere.

20. Jan 26, 2007

### matt grime

For a finite state space, it does. Which is what we have here. I'm not sure why you bring this up now.