Bayes, Hume, lotteries and trustworty newspapers

haushofer · Dec 21, 2018

Dear all,

Imagine you're participating in a lottery, with a probability of 1 in 10 million to win. Then your newspaper is giving the outcomes of the lottery, and it turns out: you've won! But being a sceptic, you doubt the reliability of the newspaper. Say, historically this newspaper publishes the wrong lottery outcome 1 in every 50 of their lottery-reports (just say; it would be pretty bad!). How would you calculate the probability that you've actually won?

So, given:

[tex] P(\text{won}) = 10^{-7}, P(\text{article contains error}) = 0.02, P(\text{article contains no error}) = 0.98[/tex]

what's

[tex] P(won | article \ contains \ no \ error)[/tex]

? The book argues:

"Why is it reasonable to believe you've won? Because although the probability of you winning is extremely small, the probability of the newspaper giving you the correct outcome is much bigger."

Somehow, I feel a bit itchy about this argument (as about a lot of other arguments in the book) and I would be tempted to use Bayes' theorem.

I know about the well-known examples of testing for illness, where a very reliable test can still give a lot of false outcomes if the illness is a priori very rare; a particular example is e.g. an HIV-test; if a priori one out of ten thousand people is infected, and the reliability of the test is such that it gives a correct result for 99% of infected people and a correct result for 99.9% of not-infected people, then Bayes theorem gives with

[tex] P(infected) = \frac{1}{10000}, \ \ \ P(positive| infected) = 0.99 , \ \ \ P(negative| infected) = 0.01 \nonumber\\<br /> P(negative | not \ infected) = 0.999 , \ \ \ P(positive | not \ infected) = 0.001[/tex]

the outcome

[tex] P(infected | positive) = 0.09 = 9 \%[/tex]

This makes sense: out of 10.000 people we expect one infected person, while the test also gives approximately 10 false positives, making the propability ##P(infected | positive)## approximately 1 on out 11. Bayes theorem indicates that ##P(infected | positive)## is proportional to ##P(infected)##, and intuitively I'd say this is also true for our lottery. But then again, the lottery case is different, because every participant receives the same outcome of the newspaper. So my questions are basically:

[*] How does the illness example relate to the lottery example?
[*] What's the probability of me having actually won the lottery given the positive newspaper outcome, ##P(won | article \ contains \ no \ error)##? Can I trust my newspaper given the highly improbable event of winning?
[*] What do you think about the book's argument ""It's reasonable to trust the newspaper because although the probability of you winning is extremely small, the probability of the newspaper giving you the correct outcome is much bigger." ?
[*] Can the lottery example indeed be used to motivate the existence of highly impropable events? I.e. is the probability for the occurrence of highly improbable events independent of the a priori probability of these events (whatever that number may be), contrary to the illness example?

Thanks in advance!

WWGD · Dec 21, 2018

But someone has to win. The fact that it was you , I don't think makes it into one.

haushofer · Dec 21, 2018

So the mantra of "extraordinary claims need extraordinary evidence" does not really apply here, because winning a lottery is not extraordinary; we see it happening every day, probably by other people. But still I wonder how you would calculate the conditional probability of winning. Maybe I'm overlooking something very basic here.

Bayes' theorem would read

[tex] P(won | article \ contains \ no \ error) = \frac{P(contains \ no \ error | won) P(won)}{P(no \ error)}[/tex]

but what's the conditional probability ##P(contains \ no \ error | won)##? Is there anything sensible to say about this?

Klystron · Dec 21, 2018

In the example from the book the newspaper only influences the reader's knowledge of the lottery outcome. Nothing the newspaper publishes changes the outcomes. Also, the lottery outcome and the contents of the newspaper report are not independent events.

The events "lottery was held" and "newspaper was published" are independent. The outcome of the lottery is independent of the newspaper report. The information in the newspaper report of the lottery seems highly dependent on the lottery outcome even if nobody won.[Dale, thanks for clarifying. focus on the information probabilities.]

Dale · Dec 21, 2018

haushofer said:

Imagine you're participating in a lottery, with a probability of 1 in 10 million to win. Then your newspaper is giving the outcomes of the lottery, and it turns out: you've won! But being a sceptic, you doubt the reliability of the newspaper. Say, historically this newspaper publishes the wrong lottery outcome 1 in every 50 of their lottery-reports (just say; it would be pretty bad!). How would you calculate the probability that you've actually won?

This is a pretty straightforward application of Bayes theorem. $$ P(H|D)=\frac{P(D|H) P(H)}{P(D)}$$

Here H is the hypothesis “you won the lottery” and D is the data “the newspaper reports that you won”. P(H) is easy, 1 in 10 million. P(D|H) is 0.98. The final piece would be more difficult to calculate, it is the probability of the data. That would probably best be estimates as roughly the number of tickets you bought divided by the total number of tickets sold.

haushofer said:

[tex] P(won | article \ contains \ no \ error) = \frac{P(contains \ no \ error | won) P(won)}{P(no \ error)}[/tex]

but what's the conditional probability ##P(contains \ no \ error | won)##? Is there anything sensible to say about this?

The evidence here is not "contains no error" it is that the news paper published that you won.

haushofer · Dec 21, 2018

Dale said:

This is a pretty straightforward application of Bayes theorem. $$ P(H|D)=\frac{P(D|H) P(H)}{P(D)}$$

Here H is the hypothesis “you won the lottery” and D is the data “the newspaper reports that you won”. P(H) is easy, 1 in 10 million. P(D|H) is 0.98. The final piece would be more difficult to calculate, it is the probability of the data. That would probably best be estimates as roughly the number of tickets you bought divided by the total number of tickets sold.

The evidence here is not "contains no error" it is that the news paper published that you won.

Ok, so maybe this is my confusion. But if I fill in the numbers in your notation,

$$ P(H|D)=\frac{P(D|H) P(H)}{P(D)} = \frac{9.8 \times 10^{-8}}{P(D)}$$, that would mean that only a ridiciously low probability of the data P(D) would make P(H|D), my trust in the newspaper, larger dan 50%. Only if ##P(D) < 1.96 \times 10^{-7}## we have that ##P(H|D) > 0.5##. What does that mean?

-edit: wait, I now relate it to your statement "The final piece would be more difficult to calculate, it is the probability of the data. That would probably best be estimates as roughly the number of tickets you bought divided by the total number of tickets sold." Of course, the probability that the newspaper reports that I won is roughly the same as the probability that I actually won, because these two events are highly dependent. Is that it?

I guess I'm confused about the meaning of the evidence, comparing it to the illness-example.

Dale · Dec 21, 2018

haushofer said:

Of course, the probability that the newspaper reports that I won is roughly the same as the probability that I actually won, because these two events are highly dependent

Yes. In situations like this it is often convenient to write ##P(D)=P(D|H)P(H) + P(D|\neg H)P(\neg H)##. In this case ##P(D|H) \approx 1 \approx P(\neg H)## so ##P(D) \approx P(H) + P(D|\neg H)##. When I wrote the above I had thought that this term would be dominated by ##P(D|\neg H)## but now I am not sure

haushofer said:

I guess I'm confused about the meaning of the evidence, comparing it to the illness-example.

The evidence is just whatever observation should update your belief about the hypothesis. In this case, the newspaper report should update your belief.

Ygggdrasil · Dec 21, 2018

haushofer said:

How does the illness example relate to the lottery example?

These examples are very different. In the HIV test example, a false positive means that the test reports that you have the illness when you do not.

In the newspaper example, the newspaper making an error does not necessarily mean that it will falsely report that you won the lottery. It may falsely report that Dale has won the lottery or that I have won the lottery when in fact we have not. The probability of the paper falsely reporting that you have won the lottery is quite small—the probability of the report being erroneous (1/50) multiplied by the probability that your name was erroneously reported out of all possible names they could erroneously report (e.g. if there are 10 million people in the pool, then the probability of a particular individual being erroneously named as the winner is 1 in 50 million versus a 1 in 10 million chance of winning). So, while the probability of an error is high, that error is very unlikely to affect you.

The only case where the probabilities of a false positive would be similar is if the number of names in the lottery pool was less than 200,000 and the chances of winning remain at 1 in 10 million (calculation of why is left as an exercise to the reader).

mfb · Dec 21, 2018

We are missing a critical point from the newspaper: If they are wrong, what do they publish? A random other ticket's name out of the 10 million? Or your name in particular because the editor likes you? Or something else?
It should be clear how this influences the probability.

haushofer said:

How would you calculate the probability that you've actually won?

This is different from what people have been calculating in the first few posts. You want ##P(\text{won}|\text{newspaper publishes your name}) = \frac{P(\text{won} \& \text{newspaper publishes your name})}{P(\text{newspaper publishes your name})}##. The numerator is ##P(\text{won} \& \text{newspaper publishes your name}) = 10^{-7} \cdot 0.98## while the denominator depends on our assumption about newspaper errors: If they publish a random name then ##P(\text{newspaper publishes your name}) = P(\text{newspaper publishes your name}\&\text{you won}) + P(\text{newspaper publishes your name}\&\text{you didn't win}) = 0.98 \cdot 10^{-7} + 0.02 \cdot 10^{-7} \cdot (1-10^{-7})##. If you plug in numbers then you get a roughly 98% chance that you won, in agreement with the naive expectation from the 98% accuracy.

If you assume the newspaper will always publish your name if (a) you didn't win and (b) they make an error then your estimate will look completely different, and most likely you didn't win.

Edit: Missing minus sign and a typo

BWV · Dec 21, 2018

Dale said:

This is a pretty straightforward application of Bayes theorem. $$ P(H|D)=\frac{P(D|H) P(H)}{P(D)}$$

Here H is the hypothesis “you won the lottery” and D is the data “the newspaper reports that you won”. P(H) is easy, 1 in 10 million. P(D|H) is 0.98. The final piece would be more difficult to calculate, it is the probability of the data. That would probably best be estimates as roughly the number of tickets you bought divided by the total number of tickets sold.

The evidence here is not "contains no error" it is that the news paper published that you won.

But this does not work for P(D) with an unrelated probability of the newspaper reporting you as winner. Say you bought half of the possible numbers so your odds are 50% but the newspaper randomly picks one name from the population of 10 million players - Bayes returns a number > 1. You need complementary probabilities in the numerator and denominator. So the only numbers that can go in the denominator are .98, 10^-7 and their complements

the denominator, P(D) should be

$7cfa3c05025c9739bcd57bce815dc7e134870d7d$

so the denominator is (.98)(10^-7) + (.02)(1-10^-7)? Still not sure about this part but the numbers make more sense

10^-8 / 1.96*10^-9 = 4.9 * 10^-6. So chances are you did not win the lottery if the newspaper reported you the winner. The intuition is that the odds are 10^-7 that you won vs. 2% that the newspaper is wrong - just like in all the textbook examples of 99% accurate medical tests on rare conditions

If I bought half the tickets so my lottery odds are 50%, the odds become 98%

PeterDonis · Dec 21, 2018

mfb said:

If they publish a random name then ##P(\text{newspaper publishes your name}) = P(\text{newspaper publishes your name}|\text{you won}) + P(\text{newspaper publishes your name}|\text{you didn't win}) = 0.98 \cdot 10^{-7} + 0.02 \cdot 10^7 \cdot (1-10^{-7}). If you plug in numbers then you get a roughly 98% chance that you won

With the denominator as you write it, I don't get that, I get a miniscule chance that you won, because the denominator is hugely dominated by the term ##0.02 \cdot 10^7 \cdot (1 - 10^{-7})##. In other words, the newspaper reports are dominated by false positives.

However, I don't see where the factor ##10^7## is coming from in that term in the denominator. The chance of reporting some wrong name when you didn't win is ##0.02 \cdot (1 - 10^{-7})##. But the chance of that name being yours is 1 in whatever the total number of possible names is. If we assume that is ##10^7## names, i.e., the number of lottery players, then that extra factor describing the chance that the wrong name is yours should be ##10^{-7}##, not ##10^7##. But if we assume that the newspaper doesn't know who is actually playing the lottery and just reports a random wrong name from the entire population, the chance that the wrong name is yours would be even smaller. Either way, that would make the second term in the denominator (the one describing false positive reports of your name) much smaller than the first.

PeterDonis · Dec 21, 2018

BWV said:

chances are you did not win the lottery if the newspaper reported you the winner. The intuition is that the odds are 10^-7 that you won vs. 2% that the newspaper is wrong

This logic only works if the wrong name that the newspaper reports is always yours. But of course it won't be. The simplest assumption, as in the post of @mfb that I responded to just now, is that the wrong name will be randomly selected from some population, either the population of lottery winners or the population in general. Either way, that multiplies the 2% error rate by a very small chance that the wrong name reported is yours. That is what makes it highly likely that if the newspaper reports your name as the winner, you actually did win.

Notice that in the medical test examples, this extra factor describing "which wrong name is reported" is not present; a wrong test result still applies only to you, it doesn't apply to a randomly chosen member of the population that has a very small chance of being you. That's why the reported positive results of tests whose error rates are much higher than the incidence of what they are testing for are dominated by false positives.

BWV · Dec 21, 2018

PeterDonis said:

This logic only works if the wrong name that the newspaper reports is always yours. But of course it won't be. The simplest assumption, as in the post of @mfb that I responded to just now, is that the wrong name will be randomly selected from some population, either the population of lottery winners or the population in general. Either way, that multiplies the 2% error rate by a very small chance that the wrong name reported is yours. That is what makes it highly likely that if the newspaper reports your name as the winner, you actually did win.

Notice that in the medical test examples, this extra factor describing "which wrong name is reported" is not present; a wrong test result still applies only to you, it doesn't apply to a randomly chosen member of the population that has a very small chance of being you. That's why the results of tests whose error rates are much higher than the incidence of what they are testing for are dominated by false positives.

But the Bayes formula only works for with the same probabilities and their complements in the numerator and denominator so the formula only works if the odds of a false positive are (.02)(1-10^-7) (p the paper is wrong times p you did not win the lottery) which would equate to the medical test examples. If this is not applicable to the circumstances of this example then a simple application of Bayes theorem is not the right approach

haushofer · Dec 21, 2018

I'll read all comments tomorrow, have to spend quality time with the wife

PeterDonis · Dec 21, 2018

BWV said:

a simple application of Bayes theorem is not the right approach

I think that's correct, the simple application of Bayes' Theorem that works for the medical test examples does not work for this example. But that doesn't mean Bayes' Theorem itself doesn't work; it just means that, heuristically, you need two applications of it instead of one, because the case where the newspaper makes an error needs extra analysis.

mfb · Dec 21, 2018

@PeterDonis: I missed the minus sign of course. If the newspaper reports a random name then this part gets even smaller - unless you are called John Smith or similar.

PeroK · Dec 21, 2018

haushofer said:

Dear all,

Imagine you're participating in a lottery, with a probability of 1 in 10 million to win. Then your newspaper is giving the outcomes of the lottery, and it turns out: you've won! But being a sceptic, you doubt the reliability of the newspaper. Say, historically this newspaper publishes the wrong lottery outcome 1 in every 50 of their lottery-reports (just say; it would be pretty bad!). How would you calculate the probability that you've actually won?

So, given:

[tex] P(\text{won}) = 10^{-7}, P(\text{article contains error}) = 0.02, P(\text{article contains no error}) = 0.98[/tex]

what's

[tex] P(won | article \ contains \ no \ error)[/tex]

? The book argues:

"Why is it reasonable to believe you've won? Because although the probability of you winning is extremely small, the probability of the newspaper giving you the correct outcome is much bigger."

Somehow, I feel a bit itchy about this argument (as about a lot of other arguments in the book) and I would be tempted to use Bayes' theorem.

I know about the well-known examples of testing for illness, where a very reliable test can still give a lot of false outcomes if the illness is a priori very rare; a particular example is e.g. an HIV-test; if a priori one out of ten thousand people is infected, and the reliability of the test is such that it gives a correct result for 99% of infected people and a correct result for 99.9% of not-infected people, then Bayes theorem gives with

[tex] P(infected) = \frac{1}{10000}, \ \ \ P(positive| infected) = 0.99 , \ \ \ P(negative| infected) = 0.01 \nonumber\\<br /> P(negative | not \ infected) = 0.999 , \ \ \ P(positive | not \ infected) = 0.001[/tex]

the outcome

[tex] P(infected | positive) = 0.09 = 9 \%[/tex]

This makes sense: out of 10.000 people we expect one infected person, while the test also gives approximately 10 false positives, making the propability ##P(infected | positive)## approximately 1 on out 11. Bayes theorem indicates that ##P(infected | positive)## is proportional to ##P(infected)##, and intuitively I'd say this is also true for our lottery. But then again, the lottery case is different, because every participant receives the same outcome of the newspaper. So my questions are basically:

[*] How does the illness example relate to the lottery example?
[*] What's the probability of me having actually won the lottery given the positive newspaper outcome, ##P(won | article \ contains \ no \ error)##? Can I trust my newspaper given the highly improbable event of winning?
[*] What do you think about the book's argument ""It's reasonable to trust the newspaper because although the probability of you winning is extremely small, the probability of the newspaper giving you the correct outcome is much bigger." ?
[*] Can the lottery example indeed be used to motivate the existence of highly impropable events? I.e. is the probability for the occurrence of highly improbable events independent of the a priori probability of these events (whatever that number may be), contrary to the illness example?

Thanks in advance!

The frequency argument would be. You play the lottery 10 million times. You win once. The newspaper prints the correct winner 49/50 times. So, of all the times you don't win the newspaper prints the winning number correctly 49/50 and one time in 50 prints the wrong number, which has 1 chance in 10 million each of being your number.

So, of all the times the newspaper prints your number, 49 times you have really won and 1 time you have not.

So, it's approx 49/50 that you've won if the newspaper prints it. Not surprisingly.

Dale · Dec 21, 2018

BWV said:

the denominator, P(D) should be

That is the same thing. My D is your B and my H is your A. The denominator is just another way to write P(B)=P(D)

StoneTemplePython · Dec 21, 2018

PeroK said:

The frequency argument would be. You play the lottery 10 million times. You win once. The newspaper prints the correct winner 49/50 times. So, of all the times you don't win the newspaper prints the winning number correctly 49/50 and one time in 50 prints the wrong number, which has 1 chance in 10 million each of being your number.

So, of all the times the newspaper prints your number, 49 times you have really won and 1 time you have not.

So, it's approx 49/50 that you've won if the newspaper prints it. Not surprisingly.

You may have something else in mind, but if you are trying to make this into a natural frequency case (ref e.g. Bayesian Knight sir David Spiegelhalter as this is a recommended way of messaging Bayesian stuff to the general public) then this needs some work.

Out of the (10MM -1) draws you lose on, the newspaper will print approximately
##199,999 \lt \big(1 - \frac{49}{50}\big)\cdot (10,000,000-1) =199999.98\lt 200,000## wrong names.

On the one time you win, your name correctly shows up ##\frac{49}{50}\cdot 1 = 0.98 \lt 1## times.

The result then is

##\frac{\text{your frequency of winning and name showing}}{\text{total times name shown}} = \frac{0.98}{\alpha \cdot 199999.98 + 0.98} \lt \frac{1}{\alpha \cdot 200,000+ 1}##

where ##\alpha \in (0,1]## and is the corrective factor (conditional probability) that maps from erroneous name in general to erroneously showing your name in particular. As pointed out by others, this is missing from the book, and in case of binary testing e.g. for HIV then ##\alpha = 1##

Dale · Dec 21, 2018

There is a slightly easier form of Bayes theorem in terms of odds. $$O(H:\neg H | D)=\frac{P(D|H)}{P(D|\neg H)}O(H:\neg H)$$ Where again D is the data “the newspaper says I won” and H is the hypothesis “I won”.

##P(D|H)## is pretty straightforward (0.98) but ##P(D|\neg H)## depends on how numbers are chosen in the mistakes.

PeroK · Dec 22, 2018

Here's a simpler approach. Over 50 days the paper prints the winning ticket. On 49 of those days it is the correct ticket number. On one day it is the wrong number. If you get those 50 people who bought those tickets together then 49 of them are lottery winners and one is not.

Therefore, if the paper prints your ticker number there is a 49/50 chance you have actually won.

It's as simple as that.

If it's not 49/50 then the statement that the paper prints the correct winning ticket 49 out of 50 days is wrong.

StoneTemplePython · Dec 22, 2018

PeroK said:

Here's a simpler approach. Over 50 days the paper prints the winning ticket. On 49 of those days it is the correct ticket number. On one day it is the wrong number. If you get those 50 people who bought those tickets together then 49 of them are lottery winners and one is not.

Therefore, if the paper prints your ticker number there is a 49/50 chance you have actually won.

It's as simple as that.

If it's not 49/50 then the statement that the paper prints the correct winning ticket 49 out of 50 days is wrong.

This is tantalizingly close to Inspection Paradox, where you are estimating things by 'polling' people who (think) they've won instead of 'polling' the newspaper entries that indicate you've won. Classic results from Inspection Paradox -- e.g. by polling riders (not buses/ bus drivers), you'd discover that on average a bus is in fact more crowded than it is on average. Importance Sampling is a valid and powerful technique, but it is also subtle.

But yes, this holds so long as the other winners are chosen uniformly at random amongst the other tickets (i.e. ##\alpha = \frac{1}{10,000,000 -1}## -- I infer these are 7 digit lottery tickets with ##10^7 = 10,000,000## possibilities).

##\frac{\text{your frequency of winning and name showing}}{\text{total times name shown}} = \frac{0.98}{\alpha \cdot 199999.98 + 0.98} = \frac{\frac{49}{50}}{\alpha \cdot \frac{1}{50}(10,000,000-1) + \frac{49}{50}} \to \frac{\frac{49}{50}}{\frac{1}{10,000,000-1} \cdot \frac{1}{50}(10,000,000-1) + \frac{49}{50}} = \frac{49}{50}##
- - - -
(the difference with the HIV test case, again is: in that case ##\alpha = 1## but ##\alpha## does not 'cancel' the large number of false positives shown here as ##\frac{1}{50}(10,000,000-1)##)
- - - -

It could very well be that the the newspaper erroneously prints things from last months horoscope which is where you get your 'lucky' lottery ticket numbers, in which case this assumption for ##\alpha##, and the stated result, doesn't hold.

StoneTemplePython · Dec 22, 2018

I pressed the 'like button' to the response saying "gimme a probability!" only to discover the post was deleted!

PeroK · Dec 22, 2018

StoneTemplePython said:

But yes, this holds so long as the other winners are chosen uniformly at random ...

It could very well be that the the newspaper erroneously prints things from last months horoscope which is where you get your 'lucky' lottery ticket numbers, in which case this assumption for ##\alpha##, and the stated result, doesn't hold.

Okay, but we're solving a problem based on common assumptions about a lottery. If the lottery is fixed so only the King's relatives ever win, then you still have no chance even if the newspaper prints your number. It's bound to be just a mistake. But that's hardly the point.

PeroK · Dec 22, 2018

StoneTemplePython said:

I pressed the 'like button' to the response saying "gimme a probability!" only to discover the post was deleted!

I realized you had given me a probability!

StoneTemplePython · Dec 22, 2018

PeroK said:

Okay, but we're solving a problem based on common assumptions about a lottery. If the lottery is fixed so only the King's relatives ever win, then you still have no chance even if the newspaper prints your number. It's bound to be just a mistake. But that's hardly the point.

The issue though is we know the probability of winning -- it was supplied by the book as ##\frac{1}{10MM}##... the book was silent on ##\alpha## which is problematic. It very well could be that someone is keying this in manually and so the errors tend to be off in a peculiar way. E.g. it could be that they mess up 2s and 5s and 6s and 9s only/disproportionately. A more careful book would supply the assumption for ##\alpha##.

(Note: this actually happened fairly recently to me on a flight delay notice -- the text I got from airline gave the correct hour for the updated flight departure but wrote 50 mins past the hour not the correct 20minutes, for the new departure time. Thankfully this contradicted the message on the airline's app that I have and I checked monitors to tie-break this. I thought this was automated, but apparently not.)

Dale · Dec 22, 2018

PeroK said:

Therefore, if the paper prints your ticker number there is a 49/50 chance you have actually won.

It's as simple as that.

It seems like it should be that simple, but it isn’t. Suppose, as an extreme case, that you have a lucky number and you always play that number, and suppose further that every time the paper gets it wrong they publish that same number. Then ##P(D|\neg H)=0.02##, so the posterior odds are 49:10^7. The paper still gets it right 49 out of 50 times, and all of the things you mention above still happen, but your odds of winning are not anywhere near 49:1.

On the other extreme, suppose that when the paper makes a mistake all mistakes are equally likely and random. Then ##P(D|\neg H)=0.02*10^{-7}## so the posterior odds are 49:1.

PeroK · Dec 23, 2018

Dale said:

It seems like it should be that simple, but it isn’t. Suppose, as an extreme case, that you have a lucky number and you always play that number, and suppose further that every time the paper gets it wrong they publish that same number. Then ##P(D|\neg H)=0.02##, so the posterior odds are 49:10^7. The paper still gets it right 49 out of 50 times, and all of the things you mention above still happen, but your odds of winning are not anywhere near 49:1.
.

That's not the question, though. Assuming a correlation between your lottery ticket and a (random) newspaper typing error is not part of the OP's question. There are any number of unusual assumptions that would invalidate any attempt at a specific answer. The simplest, as I pointed out above, is that the lottery is rigged. Then all bets are off.

You would have the same issue if you try to analyse the medical problem with the false positives etc. You could hypothetise that the one being tested is genetically predisposed to the disease or not; or predisposed to passing or failing the tests. Then that changes everything. If you assume a different correlation for a specific patient then you get a different answer.

In any case, the problem where the newspaper prints a wrong number one time in 50 and the problem where the newspaper prints your lucky number one time in 50 are different problems.

haushofer · Dec 23, 2018

Many thanks to all the responds, you're really helping me out understanding this stuff.

PeroK said:

Here's a simpler approach. Over 50 days the paper prints the winning ticket. On 49 of those days it is the correct ticket number. On one day it is the wrong number. If you get those 50 people who bought those tickets together then 49 of them are lottery winners and one is not.

Therefore, if the paper prints your ticker number there is a 49/50 chance you have actually won.

It's as simple as that.

If it's not 49/50 then the statement that the paper prints the correct winning ticket 49 out of 50 days is wrong.

Ok. I'm feeling like I'm getting to the core of it. So let me now ask you the following.

Imagine someone is claiming that he saw a unicorn. I know, unicorns are not seen every day, while people win lotteries every day, but imagine that I put the same numbers on this case: I regard the a priori probability P(H) that unicorns exist as 1 in ten million, and I regard the trustworthyness of the person as P(D|H)=0.98; if unicorns actually exist, the probability is 98% that his person saw it right. Can I know similary conclude that it is highly likely that unicorns actually exist, e.g. ##P(H|D) \approx 1##? I'd say no, but I can't pinpoint why your reasoning would fail for the unicorn-case.

I found a similar case in "Bayesian reasoning for intelligent people" by Simon DeDeo, section 2 and 3. There he deduces Hume's argument for a particular case of someone claiming to have magical powers,

"That no testimony is sucient to establish a "miracle" (see DeDeo's example for the context of this term), unless the testimony be of such a kind, that its falsehood would be more miraculous, than the fact, which it endeavours to establish"

from Bayesian reasoning. The essential point is where DeDeo goes from section 2 to 3 (observer reliability), in which he enlarges his hypothesis space by "the observer is crazy" (or delusional or whatever you want to call it). I want to do a similar calculation regarding this unicorn case and lottery case and see if I can understand it intuitively. Ultimately, that's why I'm interested in this case.

For DeDeo's notes, see

tuvalu.santafe.edu/~simon/br.pdf

Dale · Dec 23, 2018

PeroK said:

That's not the question, though. Assuming a correlation between your lottery ticket and a (random) newspaper typing error is not part of the OP's question.

I agree. Fairness and random errors are the best assumptions for this problem.

It is just that your post sounded to me like you were trying to claim that the posterior odds must be 49:1 regardless of any other assumptions, simply because that is the rate of accurate calls by the paper.

My point is that your outcome requires more than just the newspaper accuracy, it also requires an assumption about ##P(D|\neg H)##. That assumption is the most natural one in my opinion and probably what was intended, but it is an additional assumption nonetheless.

Bayes, Hume, lotteries and trustworty newspapers

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect