Specific question re Bayesian statistics/analysis

  • I
  • Thread starter phinds
  • Start date
  • #1
phinds
Science Advisor
Insights Author
Gold Member
16,738
7,425
I am reading "Thinking Fast and Slow" (fantastic book by the way) and I ran across a statement that has me flummoxed. The justification for the statement was said to be "Bayesian analysis" so I looked into that and frankly it's just more than I want to get into so I'm wondering if someone can give me some English language insight into the justification for the statement.

I realize that the answer may well be "Hey, guy, you've just got to learn the math if you really want to understand it" and if so, then so be it, but I thought I'd take a shot.

SO

1) 85% of all cars are blue and 15% are green
2) A witness to an accident says he saw the car involved and it was green.
3) The witness is known to be 80% accurate

Now, what I would take away from that is that the preponderance of blue cars would certainly lower the probability that the car in the accident really was green from 80% to more like maybe 65% or 70%. BUT ... the book says (based on Bayesian statistics) it's 41%. I just can't see how it could be cut it half like that and would appreciate any insight anyone can give me.

Thanks
 
  • Like
Likes StoneTemplePython and jedishrfu

Answers and Replies

  • #2
12,505
6,293
To get things going:


Veritaseum talks about Bayesian interference I mean inference.
 
  • #3
mjc123
Science Advisor
Homework Helper
1,084
522
Calculate the probabilities for the four cases:
Car was blue, witness says blue
Car was blue, witness says green
Car was green, witness says blue
Car was green, witness says green
(You have to make the assumption that the reliability of the witness is independent of the car colour.)
The probability (given that the witness says green) that it was green is
P(green, says green)/{P(green, says green) + P(blue, says green)}
The result of 41% is indeed surprising, but correct.
 
  • #4
kuruman
Science Advisor
Homework Helper
Insights Author
Gold Member
9,990
3,143
This is how I think it was calculated
$$P=\frac{0.8\times 0.15}{0.8\times 0.15+0.2\times 0.85}=0.414$$
The numerator is the probability that he will say "green" when he sees green. The denominator is the sum of the probabilities that he will say green whether he sees green or blue.

On Edit: @mjc123 beat me to the same answer.

On second edit: The result may be surprising, but if one thinks about it for a moment, the probability of "green" should diminish as the percentage of green cars diminishes. To take it to the extreme, if the cars are 99% blue and the witness is correct 99% of the time, the probability of correct identification as green is 50-50. Under the same assumptions, if the witness identifies the car as blue, the probability of correct identification is 99.9%. Apparently, there is a surprising and counter-intuitive asymmetry in the witness's ability to identify car colors. Are lawyers aware of it?
 
Last edited:
  • #5
Dale
Mentor
Insights Author
2020 Award
30,858
7,454
1) 85% of all cars are blue and 15% are green
2) A witness to an accident says he saw the car involved and it was green.
3) The witness is known to be 80% accurate

Now, what I would take away from that is that the preponderance of blue cars would certainly lower the probability that the car in the accident really was green from 80% to more like maybe 65% or 70%. BUT ... the book says (based on Bayesian statistics) it's 41%. I just can't see how it could be cut it half like that
The thing is that the Bayesian approach looks at this the other way around. We are not starting with 80% belief and then decreasing that due to the fraction of green cars, instead we are starting with the 15% frequency of green cars. That is our so-called "prior", in other words without the witness statement we believe that it is 15% likely that the car was green. With no specific data on this particular event, all we can rely on is general knowledge, like the frequency of green cars.

Then, starting from that 15% generic prior belief, we acquire new data about this specific case. In this case the new data is in the form of a reliable witness. Because that witness is known to be quite reliable, after receiving his report we almost triple our belief that the car was green. His statement brings our belief up from 15% to 41%. Therefore, our prior belief is 15%, before hearing the witness, and then after hearing the witness our posterior belief increases to 41%, as it should when receiving information from a reliable witness.

The Bayesian approach is all about updating your prior belief in the face of new evidence. So the first thing that you need to do is to identify what the prior belief is. That is the thing that gets modified. The accuracy of the test is 80%, and that is unchanged in this procedure. What is changed is our belief that the car is green, and that increases dramatically.
 
  • Like
Likes Buzz Bloom, FactChecker, jedishrfu and 1 other person
  • #6
mjc123
Science Advisor
Homework Helper
1,084
522
It's not the witness's ability to identify colours that is asymmetric (as I said, we assume that is independent of colour). It is the posterior reliability of the witness's testimony. If we label the 4 cases listed in post #3 as A, B, C, D respectively, then
Prior probability of correctly identifying green as green = D/(C+D)
Prior probability of correctly identifying blue as blue = A/(A+B) assumed to be equal to previous
Posterior probability that car was blue when witness says blue = A/(A+C)
Posterior probability that car was green when witness says green = D/(B+D)
You are using different probability spaces if you compare e.g. D/(B+D) with D/(C+D).
IIRC, this was essentially the basis of Hume's argument against miracles - that they were so unlikely that, however reliable the witness, P(no miracle, wrong) is much greater than P(miracle, right). (Not that I necessarily agree with him, just using the illustration.)
 
  • #7
PeroK
Science Advisor
Homework Helper
Insights Author
Gold Member
2020 Award
16,218
8,235
I just can't see how it could be cut it half like that and would appreciate any insight anyone can give me.

Thanks
It wasn't cut in half. The probability the car was green before the witness statement was 15%. After the statement it went up to 41%. It was never 80% in the first place.
 
  • #8
phinds
Science Advisor
Insights Author
Gold Member
16,738
7,425
Wow. I KNEW there was some reason I liked PF :smile: Lots of excellent answers and MUCH better ways of looking at things than I was, which was clearly backwards, so now I get it.

Thanks very much to all.
 
  • Like
Likes Dale and jedishrfu
  • #9
12,505
6,293
I liked how Veritaseum explained the counterintuitive result in his video. Its common-sense but you have to think about it the right way.

I still get confused especially when I tried to explain it to my son who was in law school at the time. They have the Defense Attorney fallacy and Prosecutor's Fallacy that is a misapplication/lack thereof of Bayesian logic to sway the jury.

https://en.wikipedia.org/wiki/Prosecutor's_fallacy
 
  • Like
Likes PeroK
  • #10
phinds
Science Advisor
Insights Author
Gold Member
16,738
7,425
I liked how Veritaseum explained the counterintuitive result in his video. Its common-sense but you have to think about it the right way.
Yes, I agree. I think @Dale said it very well:
The thing is that the Bayesian approach looks at this the other way around. We are not starting with 80% belief and then decreasing that due to the fraction of green cars, instead we are starting with the 15% frequency of green cars.
.
.
.
The Bayesian approach is all about updating your prior belief in the face of new evidence. So the first thing that you need to do is to identify what the prior belief is. That is the thing that gets modified.
and of course @PeroK also pointed out my specific wrong way of looking at it:

It wasn't cut in half. The probability the car was green before the witness statement was 15%. After the statement it went up to 41%. It was never 80% in the first place.
 
Last edited:
  • Like
Likes jedishrfu

Related Threads on Specific question re Bayesian statistics/analysis

  • Last Post
Replies
8
Views
3K
Replies
26
Views
2K
Replies
6
Views
1K
Replies
11
Views
2K
Replies
6
Views
3K
Replies
11
Views
1K
Replies
10
Views
2K
Replies
4
Views
904
  • Last Post
Replies
1
Views
2K
  • Last Post
Replies
6
Views
3K
Top