Specific question re Bayesian statistics/analysis

In summary: What you are saying is that the witness statement is more weighty than the fact that 85% of the cars are blue. You're saying that the 85% number is like a red herring. It's irrelevant.Does the math work out in cases where the witness is not very accurate?In summary, the conversation discussed a statement in the book "Thinking Fast and Slow" that used Bayesian analysis to justify the probability of a car being green based on a witness statement. The conversation then delved into the different probabilities for cases where the witness is accurate or not, and how the Bayesian approach updates our prior belief in the face of new evidence. The conclusion was that the witness statement is more
  • #1
phinds
Science Advisor
Insights Author
Gold Member
18,767
13,572
I am reading "Thinking Fast and Slow" (fantastic book by the way) and I ran across a statement that has me flummoxed. The justification for the statement was said to be "Bayesian analysis" so I looked into that and frankly it's just more than I want to get into so I'm wondering if someone can give me some English language insight into the justification for the statement.

I realize that the answer may well be "Hey, guy, you've just got to learn the math if you really want to understand it" and if so, then so be it, but I thought I'd take a shot.

SO

1) 85% of all cars are blue and 15% are green
2) A witness to an accident says he saw the car involved and it was green.
3) The witness is known to be 80% accurate

Now, what I would take away from that is that the preponderance of blue cars would certainly lower the probability that the car in the accident really was green from 80% to more like maybe 65% or 70%. BUT ... the book says (based on Bayesian statistics) it's 41%. I just can't see how it could be cut it half like that and would appreciate any insight anyone can give me.

Thanks
 
  • Like
Likes StoneTemplePython and jedishrfu
Physics news on Phys.org
  • #2
To get things going:



Veritaseum talks about Bayesian interference I mean inference.
 
  • #3
Calculate the probabilities for the four cases:
Car was blue, witness says blue
Car was blue, witness says green
Car was green, witness says blue
Car was green, witness says green
(You have to make the assumption that the reliability of the witness is independent of the car colour.)
The probability (given that the witness says green) that it was green is
P(green, says green)/{P(green, says green) + P(blue, says green)}
The result of 41% is indeed surprising, but correct.
 
  • #4
This is how I think it was calculated
$$P=\frac{0.8\times 0.15}{0.8\times 0.15+0.2\times 0.85}=0.414$$
The numerator is the probability that he will say "green" when he sees green. The denominator is the sum of the probabilities that he will say green whether he sees green or blue.

On Edit: @mjc123 beat me to the same answer.

On second edit: The result may be surprising, but if one thinks about it for a moment, the probability of "green" should diminish as the percentage of green cars diminishes. To take it to the extreme, if the cars are 99% blue and the witness is correct 99% of the time, the probability of correct identification as green is 50-50. Under the same assumptions, if the witness identifies the car as blue, the probability of correct identification is 99.9%. Apparently, there is a surprising and counter-intuitive asymmetry in the witness's ability to identify car colors. Are lawyers aware of it?
 
Last edited:
  • #5
phinds said:
1) 85% of all cars are blue and 15% are green
2) A witness to an accident says he saw the car involved and it was green.
3) The witness is known to be 80% accurate

Now, what I would take away from that is that the preponderance of blue cars would certainly lower the probability that the car in the accident really was green from 80% to more like maybe 65% or 70%. BUT ... the book says (based on Bayesian statistics) it's 41%. I just can't see how it could be cut it half like that
The thing is that the Bayesian approach looks at this the other way around. We are not starting with 80% belief and then decreasing that due to the fraction of green cars, instead we are starting with the 15% frequency of green cars. That is our so-called "prior", in other words without the witness statement we believe that it is 15% likely that the car was green. With no specific data on this particular event, all we can rely on is general knowledge, like the frequency of green cars.

Then, starting from that 15% generic prior belief, we acquire new data about this specific case. In this case the new data is in the form of a reliable witness. Because that witness is known to be quite reliable, after receiving his report we almost triple our belief that the car was green. His statement brings our belief up from 15% to 41%. Therefore, our prior belief is 15%, before hearing the witness, and then after hearing the witness our posterior belief increases to 41%, as it should when receiving information from a reliable witness.

The Bayesian approach is all about updating your prior belief in the face of new evidence. So the first thing that you need to do is to identify what the prior belief is. That is the thing that gets modified. The accuracy of the test is 80%, and that is unchanged in this procedure. What is changed is our belief that the car is green, and that increases dramatically.
 
  • Like
Likes Buzz Bloom, FactChecker, jedishrfu and 1 other person
  • #6
It's not the witness's ability to identify colours that is asymmetric (as I said, we assume that is independent of colour). It is the posterior reliability of the witness's testimony. If we label the 4 cases listed in post #3 as A, B, C, D respectively, then
Prior probability of correctly identifying green as green = D/(C+D)
Prior probability of correctly identifying blue as blue = A/(A+B) assumed to be equal to previous
Posterior probability that car was blue when witness says blue = A/(A+C)
Posterior probability that car was green when witness says green = D/(B+D)
You are using different probability spaces if you compare e.g. D/(B+D) with D/(C+D).
IIRC, this was essentially the basis of Hume's argument against miracles - that they were so unlikely that, however reliable the witness, P(no miracle, wrong) is much greater than P(miracle, right). (Not that I necessarily agree with him, just using the illustration.)
 
  • #7
phinds said:
I just can't see how it could be cut it half like that and would appreciate any insight anyone can give me.

Thanks

It wasn't cut in half. The probability the car was green before the witness statement was 15%. After the statement it went up to 41%. It was never 80% in the first place.
 
  • #8
Wow. I KNEW there was some reason I liked PF :smile: Lots of excellent answers and MUCH better ways of looking at things than I was, which was clearly backwards, so now I get it.

Thanks very much to all.
 
  • Like
Likes Dale and jedishrfu
  • #9
I liked how Veritaseum explained the counterintuitive result in his video. Its common-sense but you have to think about it the right way.

I still get confused especially when I tried to explain it to my son who was in law school at the time. They have the Defense Attorney fallacy and Prosecutor's Fallacy that is a misapplication/lack thereof of Bayesian logic to sway the jury.

https://en.wikipedia.org/wiki/Prosecutor's_fallacy
 
  • Like
Likes PeroK
  • #10
jedishrfu said:
I liked how Veritaseum explained the counterintuitive result in his video. Its common-sense but you have to think about it the right way.
Yes, I agree. I think @Dale said it very well:
Dale said:
The thing is that the Bayesian approach looks at this the other way around. We are not starting with 80% belief and then decreasing that due to the fraction of green cars, instead we are starting with the 15% frequency of green cars.
.
.
.
The Bayesian approach is all about updating your prior belief in the face of new evidence. So the first thing that you need to do is to identify what the prior belief is. That is the thing that gets modified.

and of course @PeroK also pointed out my specific wrong way of looking at it:

PeroK said:
It wasn't cut in half. The probability the car was green before the witness statement was 15%. After the statement it went up to 41%. It was never 80% in the first place.
 
Last edited:
  • Like
Likes jedishrfu

1. What is Bayesian analysis?

Bayesian analysis is a statistical method used to update the probability of a hypothesis as new evidence or data is gathered. It is based on Bayesian probability, which assigns probabilities to hypotheses rather than frequencies of events.

2. How is Bayesian analysis different from other statistical methods?

Unlike traditional statistical methods, Bayesian analysis incorporates prior knowledge or beliefs about a hypothesis into the analysis. It also allows for the updating of probabilities as new evidence is gathered, rather than just making a binary decision based on a p-value.

3. How is prior knowledge or beliefs incorporated into Bayesian analysis?

Prior knowledge or beliefs are incorporated into Bayesian analysis through the use of prior probability distributions. These distributions represent the probabilities assigned to different values of a parameter before any data is collected. As new data is gathered, the prior distribution is updated to become the posterior distribution, which represents the updated probabilities.

4. What are the advantages of using Bayesian analysis?

One of the main advantages of Bayesian analysis is its ability to incorporate prior knowledge into the analysis. This can be particularly useful in situations where prior knowledge is available or when data is limited. Additionally, Bayesian analysis allows for the continual updating of probabilities as new evidence is gathered, providing a more comprehensive and dynamic understanding of the data.

5. What are some common applications of Bayesian analysis?

Bayesian analysis has a wide range of applications in various fields, including medicine, economics, and engineering. It is commonly used in clinical trials, risk assessment, and predictive modeling. In recent years, it has also been applied to machine learning and artificial intelligence.

Similar threads

  • Programming and Computer Science
Replies
1
Views
967
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
17
Views
809
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
3K
Replies
2
Views
44
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
906
  • Programming and Computer Science
Replies
1
Views
1K
Back
Top