Testing a coin for bias

  • B
  • Thread starter Agent Smith
  • Start date
  • #1
Agent Smith
345
36
TL;DR Summary
A simple hypothesis testing method that seems to give the wrong result
I have a coin. I flip it a 100 times and see that 70 of the outcomes are heads.
##H_0##: Assume coin is fair i.e. P(heads) = P(tails) = 0.5
##H_a##: The coin is biased (towards heads)
##\alpha = 0.05##

Under ##H_0##, ##\text{p value } = P(70 \text{ heads}) = ^{100}C_{70} \times 0.5^{70} \times 0.5^{30} \approx 0.0000232##
0.0000232 < 0.05
We reject ##H_0## and accept ##H_a##, the coin is biased towards heads.

Correct/incorrect/both/neither?
 
Physics news on Phys.org
  • #2
Agent Smith said:
TL;DR Summary: A simple hypothesis testing method that seems to give the wrong result

I have a coin. I flip it a 100 times and see that 70 of the outcomes are heads.
##H_0##: Assume coin is fair i.e. P(heads) = P(tails) = 0.5
##H_a##: The coin is biased (towards heads)
##\alpha = 0.05##

Under ##H_0##, ##\text{p value } = P(70 \text{ heads}) = ^{100}C_{70} \times 0.5^{70} \times 0.5^{30} \approx 0.0000232##
0.0000232 < 0.05
We reject ##H_0## and accept ##H_a##, the coin is biased towards heads.

Correct/incorrect/both/neither?
Some basic mistakes:
1) You are using the results of your experiment to design a hypothesis test specifically to match those results. That biases the test.
2) You are ignoring that a perfectly fair coin might give other results just as "off-center", like 71 heads, 72 heads, 73 heads, ..., 70 tails, 71 tails, 72 tails, .... Those probabilities add up.
In other words, you should be applying a two-tailed test. That would include all the possibilities of a fair coin giving 70 heads or worse, and all the possibilities of at least 70 tails.
 
  • #3
Agent Smith said:
We reject ##H_0## and accept ##H_a##, the coin is biased towards heads.
OK, so I assume you know my usual criticism about this. "We reject ##H_0##" is justified by a null significance hypothesis test on this data, but not "and accept ##H_a##". There are only two justified outcomes for a null significance hypothesis test:

1) Reject the null hypothesis
2) Fail to reject the null hypothesis

No claims about accepting the null hypothesis can be made because the test is only designed to identify if the data provides strong evidence for rejecting the null. And no claims of any sort can be made for any hypothesis other than the null since they were not involved in the calculations at all.

Agent Smith said:
Under ##H_0##, ##\text{p value } = P(70 \text{ heads}) = ^{100}C_{70} \times 0.5^{70} \times 0.5^{30} \approx 0.0000232##
Here, you do not want to calculate just ##P(x=70)##, (where ##x## is the number of heads flipped out of 100 flips). Instead, you want to calculate ##P(x\ge 70)=0.00039## for a 1 tailed test, or ##P(x\le 30 \cup x\ge 70)=0.000079## for a 2 tailed test.
 
Last edited:
  • Like
Likes hutchphd and Agent Smith
  • #4
FactChecker said:
1) You are using the results of your experiment to design a hypothesis test specifically to match those results. That biases the test.
I don’t think he is doing that. His ##H_0## is the standard one you would make without seeing the data.
 
  • #5
FactChecker said:
1) You are using the results of your experiment to design a hypothesis test specifically to match those results. That biases the test.
I formed my hypotheses before the results of my experiment. I conducted the experiment to test my hypotheses.

FactChecker said:
2) You are ignoring that a perfectly fair coin might give other results just as "off-center", like 71 heads, 72 heads, 73 heads, ..., 70 tails, 71 tails, 72 tails, .... Those probabilities add up.
I suppose you're correct. I was worried about that. The probability of HHHHT (for example) = probability of HTTTT. Not sure about that since the former is ##^{100}C_{70} \times 0.5^{70} \times 0.5^{30}## and the latter is ##^{100}C_{50} \times 0.5^{50} \times 0.5^{50}##

FactChecker said:
In other words, you should be applying a two-tailed test. That would include all the possibilities of a fair coin giving 70 heads or worse, and all the possibilities of at least 70 tails.
But I want to test my hypothesis that the coin is biased towards heads, given that I get 70 heads in 100 flips.

I think I can do a 2-tailed test like this: ##z = \frac{0.7 - 0.5}{\sqrt{\frac{0.5(1 - 0.5)}{100}}}##. Find the p-value and compare it to alpha.

Dale said:
Here, you do not want to calculate just P(x=70), (where x is the number of heads flipped out of 100 flips). Instead, you want to calculate P(x≥70)=0.00039.
Thank you for the correction. That's right. I'm also trying to compute the probability of getting 70 heads in 100 flips of the coin (assuming ##H_0##)

Dale said:
OK, so I assume you know my usual criticism about this. "We reject H0" is justified by a null significance hypothesis test on this data, but not "and accept Ha". There are only two justified outcomes for a null significance hypothesis test:

1) Reject the null hypothesis
2) Fail to reject the null hypothesis
Yes, you've said this n number of times now. I'm doing high school stats and so some simplifications must've been made for the benefit of the beginner.
 
  • #6
@Dale a question on Bayes' theorem. If my prior probability that the coin is biased = 0.7, can the probability of heads = 0.5? I need the probability of heads to compute P(k heads | coin is biased) = P(evidence|hypothesis). Also I feel that P(coin is biased) = P(heads for a head-biased coin) = 0.7. Wrong?
 
  • #7
Dale said:
I don’t think he is doing that. His ##H_0## is the standard one you would make without seeing the data.
His choice of the alternative hypothesis is that it is biased towards heads. I assume that was motivated by the test sample giving 70 heads. So he is ignoring the equally likely possibility that a fair coin can give more tails. He did not include that possibility (and others) in his calculations.
 
  • Like
Likes Gavran and Dale
  • #8
Agent Smith said:
I'm doing high school stats and so some simplifications must've been made for the benefit of the beginner.
Fair enough. You will have to keep in mind what your teacher expects for your tests and homework. But every time you ask this here I will have to correct it so that people searching these threads get the right information.

This is a very common mistake. So I hope that when you get out of your class you will do it the right way.

Agent Smith said:
If my prior probability that the coin is biased = 0.7, can the probability of heads = 0.5?
Remember that the prior probability specifies your uncertainty. That is the probability density for all frequencies ##F##, not just one.

So, for example, if you had some good prior reason to suspect that the coin is biased towards heads you might specify that your prior is ##F\sim \beta(8,4)##. That would mean that you think it is about 50% chance that ##0.60<F<0.79##, but you also allow a 50% chance that it is outside this range.

1735253587176.png
 
  • #9
FactChecker said:
His choice of the alternative hypothesis is that it is biased towards heads. I assume that was motivated by the test sample giving 70 heads. So he is ignoring the equally likely possibility that a fair coin can give more tails. He did not include that possibility (and others) in his calculations.
Sure, but for a null significance hypothesis test the alternate hypothesis never enters into the calculations and you cannot make any claims about it anyway, so it cannot produce any bias. If you report the results correctly, (which it seems that their class does not encourage)
 
  • #10
Dale said:
Fair enough. You will have to keep in mind what your teacher expects for your tests and homework. But every time you ask this here I will have to correct it so that people searching these threads get the right information.

This is a very common mistake. So I hope that when you get out of your class you will do it the right way.


Remember that the prior probability specifies your uncertainty. That is the probability density for all frequencies ##F##, not just one.

So, for example, if you had some good prior reason to suspect that the coin is biased towards heads you might specify that your prior is ##F\sim \beta(8,4)##. That would mean that you think it is about 50% chance that ##0.60<F<0.79##, but you also allow a 50% chance that it is outside this range.

View attachment 354910
My question is whether a biased coin can have P(heads) = P(tails) = 0.5. Thank you for the answer though. It went over my head, but I have saved your comments in my notes, for later study.
 
  • #11
Dale said:
Sure, but for a null significance hypothesis test the alternate hypothesis never enters into the calculations and you cannot make any claims about it anyway, so it cannot produce any bias. If you report the results correctly, (which it seems that their class does not encourage)
A vague or incorrect alternative hypothesis can encourage an incorrect test (one-tailed, two-tailed, incomplete) like this example shows. The calculation done was very incomplete and omitted a lot of the relevant possibilities which should have been included as acceptable for the null hypothesis and a reasonable confidence level.
 
Last edited:
  • Like
Likes Dale
  • #12
Agent Smith said:
My question is whether a biased coin can have P(heads) = P(tails) = 0.5.
No. By definition that would be a fair coin.

However, a biased coin could have a sample with exactly 50% heads in that sample.
 
  • Like
Likes Agent Smith and SammyS
  • #13
FactChecker said:
A vague or incorrect alternative hypothesis can encourage an incorrect test (one-tailed, two-tailed, incomplete) like this example shows. The calculation done was very incomplete and omitted a lot of the relevant possibilities which should have been included as acceptable for the null hypothesis and a reasonable confidence level.
In discussions with other people, some of them were kind enough to acknowledge that my method and results, as they appear in the OP, can be considered a crude, informal, imprecise way to check for a biased coin. @Dale was right to point out that I should've computed ##P(X \geq 70)## for the p-value.

Dale said:
No. By definition that would be a fair coin.

However, a biased coin could have a sample with exactly 50% heads in that sample.
But this would be highly unlikely. So if ##P(H) = ## the coin is biased, the ##P(E|H) = ## very low, where ##E## = a sample with 50% heads. That would ultimately lower the posterior probability, yes?
 
  • #14
Agent Smith said:
But this would be highly unlikely. So if ##P(H) = ## the coin is biased, the ##P(E|H) = ## very low, where ##E## = a sample with 50% heads. That would ultimately lower the posterior probability, yes?
Yes. If you start with a non-fair prior and you get a 50% sample, that will make your posterior more fair.

Did you see the excel spreadsheet I sent on this from the last thread? It seems like you don’t like the conjugate prior beta function stuff. Every time I mention it you say “over my head”. So instead you can use that spreadsheet and see the actual calculation directly.

Here is an updated version for this specific scenario. I assumed a prior with 0.50 prior probability between ##0.60 \le F \le 0.80## and 0.50 prior probability outside of that range. And then assumed an observation of exactly 50 heads and 50 tails.
 

Attachments

  • BayesianCoinFlip.xlsx
    24.1 KB · Views: 3
Last edited:
  • Haha
Likes Agent Smith
  • #15
Agent Smith said:
I should've computed ##P(X \geq 70)## for the p-value.
No. Why are you only considering the extreme Heads cases? Do not use the results of your sample to design an alternate hypothesis that the same sample is likely to fit. I can not stress this enough
You should have computed ##P((\text{NumHeads} \geq 70)\ OR\ (\text{NumTails} \geq 70) )## for the p-value. The only reason you have singled out Heads is because your sample made you suspicious that it was biased towards Heads. That is not valid. You need to calculate the total probability that a fair coin would have given a result as unlikely or less likely than 70, either Heads or Tails. For a fair coin, it is just as likely that Tails would have gotten the same number.
 
  • Like
Likes Nugatory and Agent Smith
  • #16
When I taught statistics I emphasized knowing how to choose between one tailed and two tailed tests. It's basic.
 
  • Like
Likes FactChecker
  • #17
Dale said:
Yes. If you start with a non-fair prior and you get a 50% sample, that will make your posterior more fair.

Did you see the excel spreadsheet I sent on this from the last thread? It seems like you don’t like the conjugate prior beta function stuff. Every time I mention it you say “over my head”. So instead you can use that spreadsheet and see the actual calculation directly.

Here is an updated version for this specific scenario. I assumed a prior with 0.50 prior probability between ##0.60 \le F \le 0.80## and 0.50 prior probability outside of that range. And then assumed an observation of exactly 50 heads and 50 tails.
Thank you. I do understand a bit of the ROPE (region of practical equivalence) trick :smile: you mentioned. For this particular case (biased coin hypothesis), the ROPE is a range e.g. [0.45, 0.55] for the proportion of heads you would consider to be an outcome for a fair coin. It seems there's a probability associated with this range (is this the ##\beta## distribution?). Now we conduct an experiment and say the proportion of heads = ##p##. Either ##p## will be inside the range [0.45, 0.55] or outside it. If the former, then we conclude the coin is fair (the proper expression would be the coin is "practically equivalent" to a fair coin?) and if the latter we conclude the coin is biased.

For B: the coin is biased, E: the proportion of heads ##p## (see above)
##P(B|E) = \frac{P(B) \times P(E|B)}{P(E)}##

For ##P(E|B)##, we look at the ##\beta## distribution, yes? Is ##p## inside/outside the range [0.45, 0.55]?

For ##P(E)##, we also need to know ##P(\neg B) \times P(E|\neg B)##. I don't know how to do that?

FactChecker said:
No. Why are you only considering the extreme Heads cases? Do not use the results of your sample to design an alternate hypothesis that the same sample is likely to fit. I can not stress this enough
You should have computed ##P((\text{NumHeads} \geq 70)\ OR\ (\text{NumTails} \geq 70) )## for the p-value. The only reason you have singled out Heads is because your sample made you suspicious that it was biased towards Heads. That is not valid. You need to calculate the total probability that a fair coin would have given a result as unlikely or less likely than 70, either Heads or Tails. For a fair coin, it is just as likely that Tails would have gotten the same number.
So you want me to calculate, for ##k## (the proportion of heads from my experiment), ##P(\text{coin is biased}|k)##. where the coin could be biased towards either heads or tails? Wouldn't that be ##2## hypotheses? Hypothesis 1: Coin is head-biased (do the computation). Hypothesis 2: The coin is tail-biased (do the computation). Use the evidence to update the prior probability.
 
  • #18
Agent Smith said:
Thank you. I do understand a bit of the ROPE (region of practical equivalence) trick :smile: you mentioned. For this particular case (biased coin hypothesis), the ROPE is a range e.g. [0.45, 0.55] for the proportion of heads you would consider to be an outcome for a fair coin.
Yes.

Agent Smith said:
It seems there's a probability associated with this range (is this the β distribution?).
Yes. This would be the prior probability ##P(H)##. If your prior probability is a ##\beta## distribution then yes you would use that distribution for the calculation. Whatever distribution you use you would integrate or sum the distribution over the ROPE (e.g. [0.45, 0.55]) to get the prior probability that the coin is fair.

In the excel spreadsheet you can just literally do SUM over the rows from 0.45 to 0.55 in the P(H) column (the prior).

Agent Smith said:
Now we conduct an experiment and say the proportion of heads = p. Either p will be inside the range [0.45, 0.55] or outside it. If the former, then we conclude the coin is fair (the proper expression would be the coin is "practically equivalent" to a fair coin?) and if the latter we conclude the coin is biased.
Not really. Let's say that your experiment was to flip the coin 3 times. Then regardless of the outcome of that experiment you are guaranteed to get a proportion of heads that is outside the range [0.45, 0.55].

So instead, what happens is that the information from the experiment is used to update your prior. Meaning, from your evidence ##E## you now calculate ##P(H|E)##. Once you have ##P(H|E)## then you re-calculate the probability of the ROPE, just like you did before. With the ##\beta## distribution you integrate over the ROPE, or with the excel spreadsheet you run SUM over the ROPE.

Agent Smith said:
For P(E), we also need to know P(¬B)×P(E|¬B). I don't know how to do that?
I show how to calculate ##P(E)## in the spreadsheet. It is just a normalization constant to make sure that the result is a probability that integrates or sums to 1. You do not need ##P(¬B)×P(E|¬B)##
 
  • #19
@FactChecker & @Dale 👇
I want to test if a coin is biased/not.
H = The coin is biased
D = The data/evidence (70 heads in 100 flips)

P(H) = 0.5 (prior probability)

##P(H|D) = \frac{P(H) \times P(D|H)}{P(D)}##

For a head-biased coin, P(heads) = 0.7
mean = 0.7
standard deviation = ##\sigma_H = \sqrt {\frac{0.7 \times 0.3}{100}} = 0.0458##
##z = \frac{0.7 - 0.7}{0.0458} = 0##
P-value = 0.5 (for outcomes as/more extreme that 0.7 proportion of heads)
##P(D|\text{coin is head-biased}) = 0.5##

For tail-biased coin, P(heads) = 0.3 or P(tails) = 0.7
mean = 0.3
standard deviation ##\sigma_T = 0.0458## (the same formula, the same inputs, the same result)
##z = \frac{0.7 - 0.3}{0.0458} = 8.73##
P-value = 0 (for outcomes as/more extreme than 0.7 proportion of heads)

For fair coin, P(heads) = 0.5
mean = 0.5
standard deviation = 0.05
##z = \frac{0.7 - 0.5}{0.05} = 4##
P-value = 0.0000633721 (for outcomes as/more extreme than 0.7 proportion of heads)
##P(D|\text{coin is unbiased/fair}) = 0.0000633721##
------------------------------
##P(D|\text{coin is biased}) = P(D|\text{coin is head-biased}) + P(D|\text{coin is tail-biased})##
##P(D|\text{coin is biased}) = 0.5 + 0 = 0.5##
------------------------------
##P(D) = P(\text{coin is biased}) \times P(D|\text{coin is biased}) + P(\text{coin is unbiased/fair}) \times P(D|\text{coin is unbiased/fair})##
##P(D) = 0.5 \times 0.5 + 0.5 \times 0.0000633721 = 0.25003168605##
----
##P(H|D) = \frac{P(H) \times P(D|H)}{P(D)} = \frac{0.5 \times 0.5}{0.25003168605} = 0.99987##

Correct?
 
  • #20
Agent Smith said:
@FactChecker & @Dale 👇
I want to test if a coin is biased/not.
H = The coin is biased
D = The data/evidence (70 heads in 100 flips)

P(H) = 0.5 (prior probability)

##P(H|D) = \frac{P(H) \times P(D|H)}{P(D)}##

For a head-biased coin, P(heads) = 0.7
Are you saying that every head-biased coin has P(heads) = 0.7? What about a two-headed coin?
I'm afraid this thread is getting confusing. It's a mixture of Bayesian approach and traditional. I plan to step out of this discussion. I recommend that you concentrate on understanding traditional hypothesis testing and confidence intervals until you are comfortable with those subjects.
 
  • Haha
Likes Agent Smith
  • #21
Agent Smith said:
I want to test if a coin is biased/not.
H = The coin is biased
I think there might be a bit of confusion on what probabilities are what. It is a little confusing because there are two kinds of probabilities in this problem.

One probability is the probability that the coin shows heads. This is the usual frequentist probability, so we could call it the coin frequency. In principle, this is a property of the coin itself, based on its physical characteristics and the physical way that it is flipped.

The other probability is what we use to represent our uncertainty about the frequency. This is a Bayesian probability. In is not a property of the coin, but a property of us, based on our uncertainty and our limited knowledge about the coin.

To avoid confusion, from here on I will use "frequency" to refer to the probability of the coin landing heads and "uncertainty" to refer to the probability that represents our limited knowledge. Please be aware that both are completely valid probabilities, but they are different things.

In the excel spreadsheet I sent, ##H## is the frequency, and ##P(H)## is the uncertainty. ##P(H)## is a probability distribution of our uncertainty over all possible frequencies. So it does not just have a single value, it as a value for every possible value of ##H##.

The statement "the coin is biased" means that ##H## is not in the ROPE. So "the probability that the coin is biased" is ##P(\neg (H \ \in \ ROPE))=1-P(H \ \in \ ROPE)##.

Agent Smith said:
D = The data/evidence (70 heads in 100 flips)

P(H) = 0.5 (prior probability)
So here, ##P(H)## is a probability distribution of our uncertainty at every possible frequency. It is not a single value, but rather it is a function of ##H##. This means, to describe something where we have a 50% probability that the coin is biased means that we want to use a function ##P(H)## such that the sum of all ##P(H \ \in \ ROPE)=0.5##.

I have updated the spreadsheet and placed such a function as an example in column B. Of course, you may not like that the function has a dramatic step going from in the ROPE to out of it, but that is why I like the beta distribution for this purpose.

Agent Smith said:
##P(H|D) = \frac{P(H) \times P(D|H)}{P(D)}##

For a head-biased coin, P(heads) = 0.7
mean = 0.7
standard deviation = ##\sigma_H = \sqrt {\frac{0.7 \times 0.3}{100}} = 0.0458##
##z = \frac{0.7 - 0.7}{0.0458} = 0##
P-value = 0.5 (for outcomes as/more extreme that 0.7 proportion of heads)
##P(D|\text{coin is head-biased}) = 0.5##

For tail-biased coin, P(heads) = 0.3 or P(tails) = 0.7
mean = 0.3
standard deviation ##\sigma_T = 0.0458## (the same formula, the same inputs, the same result)
##z = \frac{0.7 - 0.3}{0.0458} = 8.73##
P-value = 0 (for outcomes as/more extreme than 0.7 proportion of heads)

For fair coin, P(heads) = 0.5
mean = 0.5
standard deviation = 0.05
##z = \frac{0.7 - 0.5}{0.05} = 4##
P-value = 0.0000633721 (for outcomes as/more extreme than 0.7 proportion of heads)
##P(D|\text{coin is unbiased/fair}) = 0.0000633721##
------------------------------
##P(D|\text{coin is biased}) = P(D|\text{coin is head-biased}) + P(D|\text{coin is tail-biased})##
##P(D|\text{coin is biased}) = 0.5 + 0 = 0.5##
------------------------------
##P(D) = P(\text{coin is biased}) \times P(D|\text{coin is biased}) + P(\text{coin is unbiased/fair}) \times P(D|\text{coin is unbiased/fair})##
##P(D) = 0.5 \times 0.5 + 0.5 \times 0.0000633721 = 0.25003168605##
----
##P(H|D) = \frac{P(H) \times P(D|H)}{P(D)} = \frac{0.5 \times 0.5}{0.25003168605} = 0.99987##

Correct?
You essentially used Baye's theorem once, based on treating ##P(H)## as a single value. What you want to do is evaluate Baye's theorem for every possible frequency. So you use it once for each value of ##H##. That is what is calculated in column F in the spreadsheet.

Notice that with 70 heads and 30 tails, even though we started with a prior uncertainty that the coin was 50% likely to be fair, in the end the data has convinced us that there is only a 1.3% chance that the coin is fair. The data has dramatically changed our uncertainty about the fairness of the coin.
 

Attachments

  • BayesianCoinFlip.xlsx
    25.8 KB · Views: 3
Last edited:
  • Like
Likes Agent Smith
  • #22

Similar threads

Replies
5
Views
480
Replies
9
Views
3K
Replies
50
Views
15K
Replies
23
Views
3K
Replies
15
Views
4K
Replies
10
Views
2K
Replies
5
Views
2K
Back
Top