Probability that 1000 coin flips results in >600 tails

  • B
  • Thread starter Jonathan212
  • Start date
  • Tags
    Probability
In summary, the conversation discusses the formula for calculating the probability of getting a certain number of tails when flipping a coin N times. The formula is N! / (M! * (N - M)!) * 0.5^M * (1 - 0.5)^(N-M), where N is the number of flips and M is the desired number of tails. The conversation also mentions using the binomial distribution and the normal approximation to calculate this probability. However, there is no single formula that can be used for this calculation and the sum of the binomial distribution must be used.
  • #1
Jonathan212
198
4
What is the formula for this? Throwing a flipping coin N times, what is the probability that the number of tails results is higher than M?
 
Physics news on Phys.org
  • #2
Well, doesn't friend google answer your question ? What did you find ?
 
  • #3
This is not homework if that's what you were thinking. Sounds specialized enough to me that a google answer is not readily available. Unless you dig a lot in which case might as well derive it from scratch.
 
  • #4
Jonathan212 said:
This is not homework if that's what you were thinking. Sounds specialized enough to me that a google answer is not readily available. Unless you dig a lot in which case might as well derive it from scratch.

Try "Binomial distribution" and the "Normal approximation to the binomial distribution".
 
  • Like
Likes Jonathan212
  • #5
"flip coin probability"does the trick. That's not 'digging in deep', really o0)
 
  • Like
Likes phinds
  • #7
Jonathan212 said:
Thanks. So the answer is:

Pr(M; N, 0.5) = N! / (M! * (N - M)!) * 0.5^M * (1 - 0.5)^(N-M)

https://en.wikipedia.org/wiki/Binomial_distribution

Is this High School level in the US nowadays?

Note that ##(1 - 0.5) = 0.5##, so you can simplify that expression. That is for precisely ##M## tails in ##N## flips. To get the answer to your original question you have to take the sum from ##M = 601## to ##M = 1000##.

Note also that this is where the normal approximation is useful. Then, you can get the answer by looking up a normal distribution table.
 
  • #8
Oopsa. Excel can't deal with N > 170. Any trick to avoid the overflow of "171!" ?
 
  • #10
Got stuck now. Something is wrong. Do we want a normal distribution with a standard deviation of N * 0.5 * ( 1 - 0.5 ) and a mean of N * 0.5? Excel has the NORMDIST() function but it returns a result 4 times bigger than the binomial distribution at N = 170, M = 102.
 
  • #12
So we want sigma = sqrt( N * 0.5 * ( 1 - 0.5 ) ). And the answer for N = 1000, M = 601 is 1 in 21 million. Hurray! That's statistically significant as hell.

Looking at the error of normal versus the binomial up to N = 170 (M = floor(0.6*N+0.5)):

Image3.jpg

2015 sti 0 60

It actually gets worse after N = 140. Is it meant to?
 

Attachments

  • Image3.jpg
    Image3.jpg
    18.5 KB · Views: 731
  • #13
Jonathan212 said:
So we want sigma = sqrt( N * 0.5 * ( 1 - 0.5 ) ). And the answer for N = 1000, M = 601 is 1 in 21 million. Hurray! That's statistically significant as hell.

Looking at the error of normal versus the binomial up to N = 170:

View attachment 238253
2015 sti 0 60

It actually gets worse after N = 140. Is it meant to?

I'm taking a guess that the normal distribution, being essentially ##e^{-x^2}##, will fall off to zero faster than the binomial. So, after a certain point the relative error will increase, but relative to some very small probabilities.
 
  • #14
At N = 100, M = 51, I get 0.078 from both the binomial and the normal. Sure this is right?
 
  • #15
Jonathan212 said:
At N = 100, M = 51, I get 0.078 from both the binomial and the normal. Sure this is right?

For N =100, I would expect almost all cases to lie in the interval 40-60 tails. That's about 5% each on average. So, 8% chance of exactly 51 tails sounds about right. Don't you think?
 
  • #16
But I want M or more tails, not M.
 
  • #18
Jonathan212 said:
But I want M or more tails, not M.

Then you have to add all cases from M = 51 to 100. See post #7.

Your previous answer in post #12 may be for exactly 601 tails.
 
  • #19
Looks like there is no formula that saves you having to do the sum of the binomial. But luckily, there is for the normal in excel.
 
  • #20
Jonathan212 said:
Looks like there is no formula that saves you having to do the sum of the binomial. But luckily, there is for the normal in excel.

There should be a "cumulative" field for both.
 
  • #21
You want to be careful with this 'cumulative' output: BINOM.DIST(50,100,0.5,TRUE) gives you 0.54 , not 0.5 !
 
  • #22
Back to basics cause I don't really know what I'm doing.

N = 3 throws, possible outcomes: 000, 001, 010, 011, 100, 101, 110, 111

0 heads -> p = 1/8
1 heads -> p = 3/8
2 heads -> p = 3/8
3 heads -> p = 1/8

Binomial distribution in Excel is as expected (NO cumulative flag):

0 heads -> p = 0.125
1 heads -> p = 0.375
2 heads -> p = 0.375
3 heads -> p = 0.125

Binomial distribution in Excel with cumulative flag:

up to 0 heads -> p = 0.125
up to 1 heads -> p = 0.5
up to 2 heads -> p = 0.875
up to 3 heads -> p = 1

We want at least M heads:

M = 0 -> p = 1
M = 1 -> p = 7/8
M = 2 -> p = 4/8
M = 3 -> p = 1/8

Now how do we write the last list in terms of the binomial function of Excel? As follows it is WRONG:

= 1 - BINOMDIST( M , N , 0.5, 1 )
 
  • #23
= 1 - BINOMDIST( M-1 , N , 0.5, 1 )

[edit] unfortunately, the function isn't defined for M-1 = -1 :rolleyes:
 
  • #24
=IF(M=0,1,1 - BINOMDIST( M-1 , N , 0.5, 1 ))

Ugly but it works.
 
  • #25
PeroK said:
I'm taking a guess that the normal distribution, being essentially ##e^{-x^2}##, will fall off to zero faster than the binomial. So, after a certain point the relative error will increase, but relative to some very small probabilities.

qualitatively this can't be the whole story -- the binomial is bounded while the Gaussian is unbounded...
 
  • Like
Likes PeroK
  • #26
@Jonathan212

what are you ultimately looking for here? Something to type into excel? If you're doing it computationally, you maybe be able to just sum over 400 terms or so. If you know basic coding, there's a slick way to do this with trees. If you want something 'nice' as an approximation, that seems ok. Normal/ Gaussian approximation works though there aren't easy closed form integrals (which is what you'd want).

You can get easy upper and lower bounds on Gaussian integrals though. It may also be worthwhile to know that asymptotically for a standard normal (i.e. mean zero, std deviation of 1: so you'd need to scaled and shift your problem to 'become standard' to use this) you have, for some ##c \gt 0##
##P(X\gt c) \sim \frac{1}{c\sqrt{2 \pi}} e^{-\frac{c^2}{2}}##
(which is a very nice approximation for ##c \gt 1## -- technically it is an upper bound on the integral which has a simple proof using the 'slip-in' trick)

the bound is below
for ##c \gt 0## and standard normal r.v. ##X##, we have
##P(X\gt c) = \frac{1}{\sqrt{2 \pi}}\int_c^\infty e^\frac{-x^2}{2} dx##
##= \frac{1}{\sqrt{2 \pi}}\int_c^\infty \frac{x}{x} e^\frac{-x^2}{2} dx##
##\leq \frac{1}{\sqrt{2 \pi}}\int_c^\infty \frac{x}{c} e^\frac{-x^2}{2} dx##
##= \frac{1}{c\sqrt{2 \pi}}\int_c^\infty x e^\frac{-x^2}{2} dx##
##= \frac{1}{c\sqrt{2 \pi}} e^{-\frac{c^2}{2}}##

learning how to convert a binomial into a normal and then how to convert a normal into a standard normal, seems like a worthwhile goal for this
- - - -
Also, there are relatively straightforward proofs of the Chernoff Bound for coin tossing...
 
Last edited:
  • Like
Likes Charles Link
  • #27
what are you ultimately looking for here?

The way to assess the statistical significance of a true random number generator's observed bias. Got the answer in Excel, it is as BvU says. Only thing is, the result is a bit suspicious for high N: probability of 600 OR MORE heads in 1000 throws is 1 in 7.3 billion.
 
Last edited:
  • #28
Jonathan212 said:
The way to assess the statistical significance of a true random number generator's observed bias. Got the answer in Excel, it is as BvU says. Only thing is, the result is a bit suspicious for high N: probability of 600 OR MORE heads in 1000 throws is 1 in 19 billion.

Getting 600 or more heads in 1000 flips sounds next to impossible to me. The probablity of getting 60 or more in 100 flips is about 3%. And, more or less, you'd have to do that 10 times in a row.
 
  • #29
Look at some more numbers. All at 60% heads or more:

Probability of 15 or more heads in 25 throws = 1 in 4.7
Probability of 60 or more heads in 100 throws = 1 in 35
Probability of 150 or more heads in 250 throws = 1 in 1,062
Probability of 240 or more heads in 400 throws = 1 in 27,000
Probability of 300 or more heads in 500 throws = 1 in 220,000
Probability of 360 or more heads in 600 throws = 1 in 1,800,000
Probability of 480 or more heads in 800 throws = 1 in 1,200,000,000
 
  • #30
Jonathan212 said:
Look at some more numbers. All at 60% heads or more:

Probability of 15 or more heads in 25 throws = 1 in 4.7
Probability of 60 or more heads in 100 throws = 1 in 35
Probability of 150 or more heads in 250 throws = 1 in 1,062
Probability of 240 or more heads in 400 throws = 1 in 27,000
Probability of 300 or more heads in 500 throws = 1 in 220,000
Probability of 360 or more heads in 600 throws = 1 in 1,800,000
Probability of 480 or more heads in 800 throws = 1 in 1,200,000,000

That's the law of averages for you!
 
  • #31
Alright but such a combinatorial explosion was not expected.
 
  • #32
Jonathan212 said:
The way to assess the statistical significance of a true random number generator's observed bias. Got the answer in Excel

'true random number generator' is a bit of a loaded term. Excel isn't known to be good for pseudo random number generators.

Jonathan212 said:
Got the answer in Excel, it is as BvU says. Only thing is, the result is a bit suspicious for high N: probability of 600 OR MORE heads in 1000 throws is 1 in 19 billion.

you're could run into numeric stability issues if you're calculating and summing extremely small probabilities so extra care is needed. It sounds about right assuming a fair coin. I know the correct probability is cannot be bigger than ##\frac{2.07}{10^{9}}##. Here's why:

you know every probability ##p## satisfies ##0\leq p \leq 1##. Now using chernoff bounds, I can say, with

##S_n = X_1 + X_2 + ... + X_n##
where each ##X_k## is a indendent bernouli (coin toss) with value 1 for tails and 0 for heads.
i.e. ##S_n## is a random variable that counts the number of tails

So the expected number of tails ##= \mu = E[S_n] = n \cdot E[X_k] = 1000 \cdot 0.5 = 500##

what the Chernoff bound tells us is

## 0 \lt Pr\Big\{\text{600 or More Tails}\Big\} = Pr\Big\{\big(S_n - \mu\big) \geq 100\Big\} \leq \exp\big(\frac{-2\cdot 100^2}{1000}\big) \lt 2.07 \cdot 10^{-9}##

(note thread title says ##\gt 600 ## tails where as a more recent posting says ##\geq 600##, I went with the latter)
- - - - -

edit:
for a basic look at Chernoff Bounds for coin tossing, consider looking at pages 20-30, here:
https://ocw.mit.edu/courses/electri...ce-fall-2010/readings/MIT6_042JF10_chap19.pdf

part of the math for CS course, here:
https://ocw.mit.edu/courses/electri...tics-for-computer-science-fall-2010/readings/
 
Last edited:
  • Like
Likes Filip Larsen
  • #33
Indeed Excel's cumulative binomial runs out of steam at N = 1029, it fails above that value. But so does Excel's normal distribution at N = 1555 (the cumulative option). Btw the source of my true random data is not Excel but /dev/random or "haveged".
 
Last edited:
  • #34
The plot looks alright though.

X-axis = N
Y-axis = 1 / probability that N throws result in >=N*0.6 heads.

Image4.jpg
 

Attachments

  • Image4.jpg
    Image4.jpg
    18.8 KB · Views: 569
Last edited:
  • Like
Likes mfb
  • #35
It's 100% if you do 1000 coin flips again and again until you get greater than 600 tails.

Well, unless you're cursed. Then it's 0%, even with a double tailed coin. You get killed before you flip the coin the 1000th time.
 

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
57
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
29
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
41
Views
4K
  • Set Theory, Logic, Probability, Statistics
2
Replies
45
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
1K
Back
Top