Bayes, Hume, lotteries and trustworty newspapers

  • Context: Undergrad 
  • Thread starter Thread starter haushofer
  • Start date Start date
Click For Summary
SUMMARY

This discussion centers on the application of Bayes' theorem to evaluate the probability of winning a lottery based on a newspaper report. The probability of winning is set at 1 in 10 million, while the newspaper has a 2% error rate in reporting outcomes. Using Bayes' theorem, participants explore the calculation of P(won | article contains no error) and the implications of the newspaper's reliability on the perceived probability of winning. The conversation highlights the differences between lottery outcomes and medical testing scenarios, emphasizing the importance of understanding conditional probabilities in both contexts.

PREREQUISITES
  • Understanding of Bayes' theorem and its application in probability theory
  • Familiarity with conditional probability concepts
  • Knowledge of basic statistics, particularly in interpreting probabilities
  • Experience with real-world examples of probability, such as lottery systems and medical testing
NEXT STEPS
  • Study the detailed workings of Bayes' theorem in various contexts, including medical testing and gambling scenarios
  • Explore the implications of false positives and false negatives in probability assessments
  • Investigate the relationship between independent and dependent events in probability theory
  • Learn about advanced statistical methods for evaluating the reliability of information sources
USEFUL FOR

Mathematicians, statisticians, data scientists, and anyone interested in applying probability theory to real-world scenarios, particularly in evaluating the reliability of information sources such as news reports and medical tests.

  • #31
haushofer said:
I regard the a priori probability P(H) that unicorns exist as 1 in ten million, and I regard the trustworthyness of the person as P(D|H)=0.98; if unicorns actually exist, the probability is 98% that his person saw it right. Can I know similary conclude that it is highly likely that unicorns actually exist, e.g. P(H|D)≈1? I'd say no, but I can't pinpoint why your reasoning would fail for the unicorn-case.
His reasoning doesn’t fail, but he made an unstated assumption about ##P(D|\neg H)##. That assumption is the most reasonable one for the lottery, but perhaps not for the unicorn. You should evaluate ##P(D|\neg H)## for both cases and see how they differ.
 
Last edited:
  • Like
Likes   Reactions: PeroK
Physics news on Phys.org
  • #32
haushofer said:
Imagine someone is claiming that he saw a unicorn. I know, unicorns are not seen every day, while people win lotteries every day, but imagine that I put the same numbers on this case: I regard the a priori probability P(H) that unicorns exist as 1 in ten million, and I regard the trustworthyness of the person as P(D|H)=0.98; if unicorns actually exist, the probability is 98% that his person saw it right. Can I know similary conclude that it is highly likely that unicorns actually exist, e.g. ##P(H|D) \approx 1##? I'd say no, but I can't pinpoint why your reasoning would fail for the unicorn-case.

Well, you should work through the unicorn case. The Bayesian rule is:

##P(H|D) = \frac{P(D|H) P(H)}{P(D)}##

If we assume that the data is very accurate, then ##P(D|H) \approx 1##, and so it reduces approximately to:

##P(H|D) \approx \frac{P(H)}{P(D)}##

We can also write: ##P(D) = P(D|H) P(H) + P(D|\neg H) P(\neg H)##

If we assume that ##P(D|\neg H) \gg P(H)##, and ##P(\neg H) \approx 1## then we can make another approximation:

##P(D) \approx P(D|\neg H)##

So we have: ##P(H|D) \approx \frac{P(H)}{P(D|\neg H)}##

The difference between the unicorn case and the lottery case is the relative size of ##P(D|\neg H)##
 
  • #33
There is a slight variant of the lottery example that makes a huge difference in the computed probability. Instead of the newspaper reporting the lottery winner with 98% accuracy, assume that there is a website where you can enter your ticket number, and it tells you, with 98% accuracy, whether you are the winner or not. Even though that sounds similar to the original problem, in this case, it's much more likely that the website is in error than that you are the actual winner.
 
  • Like
Likes   Reactions: Dale, BWV, Ygggdrasil and 1 other person
  • #34
Dale said:
I agree. Fairness and random errors are the best assumptions for this problem.

It is just that your post sounded to me like you were trying to claim that the posterior odds must be 49:1 regardless of any other assumptions, simply because that is the rate of accurate calls by the paper.

My point is that your outcome requires more than just the newspaper accuracy, it also requires an assumption about ##P(D|\neg H)##. That assumption is the most natural one in my opinion and probably what was intended, but it is an additional assumption nonetheless.

Yes, the book I quoted is vague about this statement and I was a bit sloppy :P
 
  • #35
stevendaryl said:
Well, you should work through the unicorn case. The Bayesian rule is:

##P(H|D) = \frac{P(D|H) P(H)}{P(D)}##

If we assume that the data is very accurate, then ##P(D|H) \approx 1##, and so it reduces approximately to:

##P(H|D) \approx \frac{P(H)}{P(D)}##

We can also write: ##P(D) = P(D|H) P(H) + P(D|\neg H) P(\neg H)##

If we assume that ##P(D|\neg H) \gg P(H)##, and ##P(\neg H) \approx 1## then we can make another approximation:

##P(D) \approx P(D|\neg H)##

So we have: ##P(H|D) \approx \frac{P(H)}{P(D|\neg H)}##

The difference between the unicorn case and the lottery case is the relative size of ##P(D|\neg H)##
Yes, I will consider that term more closely.

I must say, working through these examples and the help here is really enlightening!
 
  • #36
stevendaryl said:
There is a slight variant of the lottery example that makes a huge difference in the computed probability. Instead of the newspaper reporting the lottery winner with 98% accuracy, assume that there is a website where you can enter your ticket number, and it tells you, with 98% accuracy, whether you are the winner or not. Even though that sounds similar to the original problem, in this case, it's much more likely that the website is in error than that you are the actual winner.
I have to think about that one.
 
  • #37
I think I worked out the lottery now, making some extra assumptions about the report. Maybe I'll put it here for completeness.

stevendaryl said:
There is a slight variant of the lottery example that makes a huge difference in the computed probability. Instead of the newspaper reporting the lottery winner with 98% accuracy, assume that there is a website where you can enter your ticket number, and it tells you, with 98% accuracy, whether you are the winner or not. Even though that sounds similar to the original problem, in this case, it's much more likely that the website is in error than that you are the actual winner.
Could you explain this in more detail?
 
  • #38
haushofer said:
I think I worked out the lottery now, making some extra assumptions about the report. Maybe I'll put it here for completeness.Could you explain this in more detail?

Imagine the following process:
  1. You call the lottery office.
  2. They spin a dial to get a real number between 0 and 1.
  3. If the number is less than 0.98, they tell you the truth about whether you won the lottery or not.
  4. If the number is between 0.98 and 1, they lie to you about it.
Now, you call up the office and ask whether you won the lottery, or not. Before you even talk to anyone, there are 4 possibilities:
  1. You won the lottery and they tell you the truth. The probability of this is ##10^{-10} \cdot 0.98##
  2. You won the lottery and they lie to you: The probability of this is ##0^{-10} \cdot 0.02##
  3. You did not win the lottery, and they tell you the truth. The probability of this is: ##(1-10^{-10}) \cdot 0.98##
  4. You did not win the lottery, and they tell you a lie. The probability of this is: ##(1-10^{-10}) \cdot 0.02##
After you talk to someone and they tell you that you won, you can eliminate possibilities 2 and 3. So that leaves 1 and 4. 4 is much more likely than 1.
 
Last edited:
  • Like
Likes   Reactions: haushofer
  • #39
The newspaper equivalent would be "if you don't win and they print a wrong name, they always print your name".
 
  • Like
Likes   Reactions: stevendaryl
  • #40
stevendaryl said:
Imagine the following process:
  1. You call the lottery office.
  2. They spin a dial to get a real number between 0 and 1.
  3. If the number is less than 0.98, they tell you the truth about whether you won the lottery or not.
  4. If the number is between 0.98 and 1, they lie to you about it.
Now, you call up the office and ask whether you won the lottery, or not. Before you even talk to anyone, there are 4 possibilities:
  1. You won the lottery and they tell you the truth. The probability of this is ##10^{-10} \cdot 0.98##
  2. You won the lottery and they lie to you: The probability of this is ##0^{-10} \cdot 0.02##
  3. You did not win the lottery, and they tell you the truth. The probability of this is: ##(1-10^{-10}) \cdot 0.98##
  4. You did not win the lottery, and they tell you a lie. The probability of this is: ##(1-10^{-10}) \cdot 0.02##
After you talk to someone and they tell you that you won, you can eliminate possibilities 2 and 3. So that leaves 1 and 4. 4 is much more likely than 1.
Thanks, that makes sense. (you missed a 1 in option 2, btw; ##0^{-10} \rightarrow 10^{-10}## ;) )

I must say I like this lotterystuff; it shows the subtleties involved in these calculations.
 
  • #41
So let me write down how I worked out this example. I assumed three things:

* (1) the lottery is fair, so a priori every number has the same probability of winning
* (2) every winning lottery number has the same probability of being printed
* (3) every losing lottery number has the same probability of being printed

So there are no biases concerning rightly or wrongly printing. Now imagine the lottery numbers range from 1 to 10.000.000. We (as PF-forum, happy to share) drew number 42. And indeed, the newspaper reports that number 42 won!

The a priori probability of lottery number ##x## to win is denoted as ##P(x)##, and the data that the newspaper reports that number ##y## won is denoted as ##P(paper:y)##. So we have, according to (1), ##P(42)=\frac{1}{10000000}=10^{-7}##. Also, a reliability of 98% means that ##P(paper:42 | 42)=0.98##. And finally, from (3), we can deduce that ##P(paper:42 | \neg 42) = \frac{0.02}{9999999}##; all of the remaining 9.999.999 (all of the numbers besides 42) have the same probability of being printed wrongly. If I put these numbers into Bayes, I get

<br /> P(42|\text{paper:}42) = \frac{P(\text{paper:}42|42) P(42)}{P(\text{paper:}42|42) P(42) + P(\text{paper:}42|\neg 42)P(\neg 42)}<br />

which gives me as answer

<br /> P(42|\text{paper:}42) = 0,9998 = 99.98 \%<br />

The only thing I'm not sure of whether it makes sense that this number exceeds the initial reliability of 98%. But this should be right, right?

-edit: resolved, made an arithmetic error in converting percentages.
 
Last edited:
  • #42
mfb said:
The newspaper equivalent would be "if you don't win and they print a wrong name, they always print your name".

Yes. So I guess that's why

##P(paper:42 | \neg 42) = \frac{0.02}{9999999}##

instead of

##P(paper:42 | \neg 42) = 0.02 ##.
 
  • #43
Assume you only have one ticket and there is only one winner/number published...

If your number is in the paper your odds of having won = 49/50.

But its also possible you won and the paper made a mistake = 1/10m x 1/50

Just add these together.
 
  • #44
haushofer said:
So let me write down how I worked out this example. I assumed three things:

* (1) the lottery is fair, so a priori every number has the same probability of winning
* (2) every winning lottery number has the same probability of being printed
* (3) every losing lottery number has the same probability of being printed

So there are no biases concerning rightly or wrongly printing. Now imagine the lottery numbers range from 1 to 10.000.000. We (as PF-forum, happy to share) drew number 42. And indeed, the newspaper reports that number 42 won!

The a priori probability of lottery number ##x## to win is denoted as ##P(x)##, and the data that the newspaper reports that number ##y## won is denoted as ##P(paper:y)##. So we have, according to (1), ##P(42)=\frac{1}{10000000}=10^{-7}##. Also, a reliability of 98% means that ##P(paper:42 | 42)=0.98##. And finally, from (3), we can deduce that ##P(paper:42 | \neg 42) = \frac{0.02}{9999999}##; all of the remaining 9.999.999 (all of the numbers besides 42) have the same probability of being printed wrongly. If I put these numbers into Bayes, I get

<br /> P(42|\text{paper:}42) = \frac{P(\text{paper:}42|42) P(42)}{P(\text{paper:}42|42) P(42) + P(\text{paper:}42|\neg 42)P(\neg 42)}<br />

which gives me as answer

<br /> P(42|\text{paper:}42) = 0,9998 = 99.98 \%<br />

The only thing I'm not sure of whether it makes sense that this number exceeds the initial reliability of 98%. But this should be right, right?

It can't be as high as that if the newspaper is only 98% accurate.
 
  • #45
PeroK said:
It can't be as high as that if the newspaper is only 98% accurate.

Mmmm, so where's the mistake then you think?
 
  • #46
haushofer said:
Mmmm, so where's the mistake then you think?
You lost me, I'm afraid. If we let A be you win the lottery and B the newspaper prints your number, then ##P(A) = P(B) = 1/N##, where N is the number of lottery tickets. This, as others have pointed out, depends on the basic assumptions of no bias.

Then, Bayes theorem reduces to:

##P(A|B) = P(B|A) = 0.98##

Another approach, which I would recommend, is to use a probability tree:
 
  • #47
... for example, if we complicate matters by the assumption that sometimes the paper prints an invalid lottery number. Say, 0.5% of the time. Then 1.5% of the time it prints a valid but incorrect winning number. And 98% of the time it prints the correct winning number.

A probability tree is ideal for handling that and avoids the difficulties of the Bayes formula as the number of options increases.
 
  • #48
PeroK said:
You lost me, I'm afraid.
haushofer said:
Mmmm, so where's the mistake then you think?
Never mind, made an arithmetic error. The probability is just 0.98 if you fill in the numbers:

<br /> P(42|\text{paper:}42) = 0.98 = 98 \%<br />

Intuitively, I understand it as the probability

##P(paper:42 | \neg 42) = \frac{0.02}{9999999}##

being too small to affect the initial reliability of ##P(paper:x | x)=0.98## of the newspaper.
 
Last edited:

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 6 ·
Replies
6
Views
891
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 87 ·
3
Replies
87
Views
8K