Bayes, Hume, lotteries and trustworty newspapers

Dale · Dec 23, 2018

haushofer said:

I regard the a priori probability P(H) that unicorns exist as 1 in ten million, and I regard the trustworthyness of the person as P(D|H)=0.98; if unicorns actually exist, the probability is 98% that his person saw it right. Can I know similary conclude that it is highly likely that unicorns actually exist, e.g. P(H|D)≈1? I'd say no, but I can't pinpoint why your reasoning would fail for the unicorn-case.

His reasoning doesn’t fail, but he made an unstated assumption about ##P(D|\neg H)##. That assumption is the most reasonable one for the lottery, but perhaps not for the unicorn. You should evaluate ##P(D|\neg H)## for both cases and see how they differ.

stevendaryl · Dec 23, 2018

haushofer said:

Imagine someone is claiming that he saw a unicorn. I know, unicorns are not seen every day, while people win lotteries every day, but imagine that I put the same numbers on this case: I regard the a priori probability P(H) that unicorns exist as 1 in ten million, and I regard the trustworthyness of the person as P(D|H)=0.98; if unicorns actually exist, the probability is 98% that his person saw it right. Can I know similary conclude that it is highly likely that unicorns actually exist, e.g. ##P(H|D) \approx 1##? I'd say no, but I can't pinpoint why your reasoning would fail for the unicorn-case.

Well, you should work through the unicorn case. The Bayesian rule is:

##P(H|D) = \frac{P(D|H) P(H)}{P(D)}##

If we assume that the data is very accurate, then ##P(D|H) \approx 1##, and so it reduces approximately to:

##P(H|D) \approx \frac{P(H)}{P(D)}##

We can also write: ##P(D) = P(D|H) P(H) + P(D|\neg H) P(\neg H)##

If we assume that ##P(D|\neg H) \gg P(H)##, and ##P(\neg H) \approx 1## then we can make another approximation:

##P(D) \approx P(D|\neg H)##

So we have: ##P(H|D) \approx \frac{P(H)}{P(D|\neg H)}##

The difference between the unicorn case and the lottery case is the relative size of ##P(D|\neg H)##

stevendaryl · Dec 23, 2018

There is a slight variant of the lottery example that makes a huge difference in the computed probability. Instead of the newspaper reporting the lottery winner with 98% accuracy, assume that there is a website where you can enter your ticket number, and it tells you, with 98% accuracy, whether you are the winner or not. Even though that sounds similar to the original problem, in this case, it's much more likely that the website is in error than that you are the actual winner.

haushofer · Dec 23, 2018

Dale said:

I agree. Fairness and random errors are the best assumptions for this problem.

It is just that your post sounded to me like you were trying to claim that the posterior odds must be 49:1 regardless of any other assumptions, simply because that is the rate of accurate calls by the paper.

My point is that your outcome requires more than just the newspaper accuracy, it also requires an assumption about ##P(D|\neg H)##. That assumption is the most natural one in my opinion and probably what was intended, but it is an additional assumption nonetheless.

Yes, the book I quoted is vague about this statement and I was a bit sloppy :P

haushofer · Dec 23, 2018

stevendaryl said:

Well, you should work through the unicorn case. The Bayesian rule is:

##P(H|D) = \frac{P(D|H) P(H)}{P(D)}##

If we assume that the data is very accurate, then ##P(D|H) \approx 1##, and so it reduces approximately to:

##P(H|D) \approx \frac{P(H)}{P(D)}##

We can also write: ##P(D) = P(D|H) P(H) + P(D|\neg H) P(\neg H)##

If we assume that ##P(D|\neg H) \gg P(H)##, and ##P(\neg H) \approx 1## then we can make another approximation:

##P(D) \approx P(D|\neg H)##

So we have: ##P(H|D) \approx \frac{P(H)}{P(D|\neg H)}##

The difference between the unicorn case and the lottery case is the relative size of ##P(D|\neg H)##

Yes, I will consider that term more closely.

I must say, working through these examples and the help here is really enlightening!

haushofer · Dec 23, 2018

stevendaryl said:

There is a slight variant of the lottery example that makes a huge difference in the computed probability. Instead of the newspaper reporting the lottery winner with 98% accuracy, assume that there is a website where you can enter your ticket number, and it tells you, with 98% accuracy, whether you are the winner or not. Even though that sounds similar to the original problem, in this case, it's much more likely that the website is in error than that you are the actual winner.

I have to think about that one.

haushofer · Dec 29, 2018

I think I worked out the lottery now, making some extra assumptions about the report. Maybe I'll put it here for completeness.

stevendaryl said:

There is a slight variant of the lottery example that makes a huge difference in the computed probability. Instead of the newspaper reporting the lottery winner with 98% accuracy, assume that there is a website where you can enter your ticket number, and it tells you, with 98% accuracy, whether you are the winner or not. Even though that sounds similar to the original problem, in this case, it's much more likely that the website is in error than that you are the actual winner.

Could you explain this in more detail?

stevendaryl · Dec 29, 2018

haushofer said:

I think I worked out the lottery now, making some extra assumptions about the report. Maybe I'll put it here for completeness.Could you explain this in more detail?

Imagine the following process:

You call the lottery office.
They spin a dial to get a real number between 0 and 1.
If the number is less than 0.98, they tell you the truth about whether you won the lottery or not.
If the number is between 0.98 and 1, they lie to you about it.

Now, you call up the office and ask whether you won the lottery, or not. Before you even talk to anyone, there are 4 possibilities:

You won the lottery and they tell you the truth. The probability of this is ##10^{-10} \cdot 0.98##
You won the lottery and they lie to you: The probability of this is ##0^{-10} \cdot 0.02##
You did not win the lottery, and they tell you the truth. The probability of this is: ##(1-10^{-10}) \cdot 0.98##
You did not win the lottery, and they tell you a lie. The probability of this is: ##(1-10^{-10}) \cdot 0.02##

After you talk to someone and they tell you that you won, you can eliminate possibilities 2 and 3. So that leaves 1 and 4. 4 is much more likely than 1.

mfb · Dec 29, 2018

The newspaper equivalent would be "if you don't win and they print a wrong name, they always print your name".

haushofer · Dec 30, 2018

stevendaryl said:

Imagine the following process:

You call the lottery office.

They spin a dial to get a real number between 0 and 1.

If the number is less than 0.98, they tell you the truth about whether you won the lottery or not.

If the number is between 0.98 and 1, they lie to you about it.

Now, you call up the office and ask whether you won the lottery, or not. Before you even talk to anyone, there are 4 possibilities:

You won the lottery and they tell you the truth. The probability of this is ##10^{-10} \cdot 0.98##

You won the lottery and they lie to you: The probability of this is ##0^{-10} \cdot 0.02##

You did not win the lottery, and they tell you the truth. The probability of this is: ##(1-10^{-10}) \cdot 0.98##

You did not win the lottery, and they tell you a lie. The probability of this is: ##(1-10^{-10}) \cdot 0.02##

After you talk to someone and they tell you that you won, you can eliminate possibilities 2 and 3. So that leaves 1 and 4. 4 is much more likely than 1.

Thanks, that makes sense. (you missed a 1 in option 2, btw; ##0^{-10} \rightarrow 10^{-10}## ;) )

I must say I like this lotterystuff; it shows the subtleties involved in these calculations.

haushofer · Dec 30, 2018

So let me write down how I worked out this example. I assumed three things:

* (1) the lottery is fair, so a priori every number has the same probability of winning
* (2) every winning lottery number has the same probability of being printed
* (3) every losing lottery number has the same probability of being printed

So there are no biases concerning rightly or wrongly printing. Now imagine the lottery numbers range from 1 to 10.000.000. We (as PF-forum, happy to share) drew number 42. And indeed, the newspaper reports that number 42 won!

The a priori probability of lottery number ##x## to win is denoted as ##P(x)##, and the data that the newspaper reports that number ##y## won is denoted as ##P(paper:y)##. So we have, according to (1), ##P(42)=\frac{1}{10000000}=10^{-7}##. Also, a reliability of 98% means that ##P(paper:42 | 42)=0.98##. And finally, from (3), we can deduce that ##P(paper:42 | \neg 42) = \frac{0.02}{9999999}##; all of the remaining 9.999.999 (all of the numbers besides 42) have the same probability of being printed wrongly. If I put these numbers into Bayes, I get

 P(42|\text{paper:}42) = \frac{P(\text{paper:}42|42) P(42)}{P(\text{paper:}42|42) P(42) + P(\text{paper:}42|\neg 42)P(\neg 42)} 

which gives me as answer

 P(42|\text{paper:}42) = 0,9998 = 99.98 \% 

The only thing I'm not sure of whether it makes sense that this number exceeds the initial reliability of 98%. But this should be right, right?

-edit: resolved, made an arithmetic error in converting percentages.

haushofer · Dec 30, 2018

mfb said:

The newspaper equivalent would be "if you don't win and they print a wrong name, they always print your name".

Yes. So I guess that's why

##P(paper:42 | \neg 42) = \frac{0.02}{9999999}##

instead of

##P(paper:42 | \neg 42) = 0.02 ##.

CWatters · Dec 30, 2018

Assume you only have one ticket and there is only one winner/number published...

If your number is in the paper your odds of having won = 49/50.

But its also possible you won and the paper made a mistake = 1/10m x 1/50

Just add these together.

PeroK · Dec 30, 2018

haushofer said:

So let me write down how I worked out this example. I assumed three things:

* (1) the lottery is fair, so a priori every number has the same probability of winning
* (2) every winning lottery number has the same probability of being printed
* (3) every losing lottery number has the same probability of being printed

So there are no biases concerning rightly or wrongly printing. Now imagine the lottery numbers range from 1 to 10.000.000. We (as PF-forum, happy to share) drew number 42. And indeed, the newspaper reports that number 42 won!

The a priori probability of lottery number ##x## to win is denoted as ##P(x)##, and the data that the newspaper reports that number ##y## won is denoted as ##P(paper:y)##. So we have, according to (1), ##P(42)=\frac{1}{10000000}=10^{-7}##. Also, a reliability of 98% means that ##P(paper:42 | 42)=0.98##. And finally, from (3), we can deduce that ##P(paper:42 | \neg 42) = \frac{0.02}{9999999}##; all of the remaining 9.999.999 (all of the numbers besides 42) have the same probability of being printed wrongly. If I put these numbers into Bayes, I get

 P(42|\text{paper:}42) = \frac{P(\text{paper:}42|42) P(42)}{P(\text{paper:}42|42) P(42) + P(\text{paper:}42|\neg 42)P(\neg 42)} 

which gives me as answer

 P(42|\text{paper:}42) = 0,9998 = 99.98 \% 

The only thing I'm not sure of whether it makes sense that this number exceeds the initial reliability of 98%. But this should be right, right?

It can't be as high as that if the newspaper is only 98% accurate.

haushofer · Dec 30, 2018

PeroK said:

It can't be as high as that if the newspaper is only 98% accurate.

Mmmm, so where's the mistake then you think?

PeroK · Dec 30, 2018

haushofer said:

Mmmm, so where's the mistake then you think?

You lost me, I'm afraid. If we let A be you win the lottery and B the newspaper prints your number, then ##P(A) = P(B) = 1/N##, where N is the number of lottery tickets. This, as others have pointed out, depends on the basic assumptions of no bias.

Then, Bayes theorem reduces to:

##P(A|B) = P(B|A) = 0.98##

Another approach, which I would recommend, is to use a probability tree:

PeroK · Dec 30, 2018

... for example, if we complicate matters by the assumption that sometimes the paper prints an invalid lottery number. Say, 0.5% of the time. Then 1.5% of the time it prints a valid but incorrect winning number. And 98% of the time it prints the correct winning number.

A probability tree is ideal for handling that and avoids the difficulties of the Bayes formula as the number of options increases.

haushofer · Dec 30, 2018

PeroK said:

You lost me, I'm afraid.

haushofer said:

Mmmm, so where's the mistake then you think?

Never mind, made an arithmetic error. The probability is just 0.98 if you fill in the numbers:

 P(42|\text{paper:}42) = 0.98 = 98 \% 

Intuitively, I understand it as the probability

##P(paper:42 | \neg 42) = \frac{0.02}{9999999}##

being too small to affect the initial reliability of ##P(paper:x | x)=0.98## of the newspaper.

Bayes, Hume, lotteries and trustworty newspapers

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Undergrad My basic understanding of set theory

High School Onto set mapping is the surjective set mapping, and into injective?

Undergrad How do E[X] and E[|X|] relate?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers