Why is Baye's theorem called inverse prob ?

  • Context: Undergrad 
  • Thread starter Thread starter iVenky
  • Start date Start date
  • Tags Tags
    Inverse Theorem
Click For Summary

Discussion Overview

The discussion centers around the terminology of Bayes' theorem, specifically why it is referred to as "inverse probability." Participants explore the nature of forward and inverse probability problems, providing examples and seeking clarification on the concepts involved.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants explain that Bayes' theorem is called "inverse probability" because it addresses questions that are the opposite of typical probability questions, such as determining the probability of a hypothesis given observed data.
  • Examples are provided to illustrate forward probability (p(D|H)) and inverse probability (p(H|D)), emphasizing that these problems are not interchangeable without additional information from Bayes' theorem.
  • One participant discusses the concept of prior probability (p(H)), noting that it reflects a priori knowledge and can vary based on context, such as whether one believes a coin is fair or not.
  • There is mention of historical controversy regarding the assignment of prior probabilities, referencing Laplace's Principle of Insufficient Reason and its acceptance over time.
  • Participants discuss the challenges of determining p(D), the probability of observing the data, and how it can often be more complex than finding p(H).
  • Some participants introduce the concept of likelihood ratios and their application in detection theory, noting that thresholds for declaring detections can vary significantly based on context and desired probabilities.
  • There are differing views on what constitutes a sufficient likelihood ratio for declaring a detection, with some suggesting that a ratio of 1 is not necessarily indicative of a real effect.

Areas of Agreement / Disagreement

Participants express a range of views on the interpretation and application of Bayes' theorem, particularly regarding the assignment of prior probabilities and the thresholds for detection in likelihood ratios. The discussion remains unresolved with multiple competing perspectives presented.

Contextual Notes

Limitations include the dependence on definitions of prior probabilities and likelihoods, as well as the unresolved nature of how to approach problems with insufficient a priori information.

Who May Find This Useful

This discussion may be of interest to those studying probability theory, Bayesian inference, or related fields in statistics and data analysis.

iVenky
Messages
212
Reaction score
12
Why is Baye's theorem called "inverse prob"?

What is the reason for calling Baye's theorem as "inverse probability"?

Any valid reason?

Thanks a lot
 
Physics news on Phys.org


It answers questions opposite to the usual probability questions that you are probably familiar with. An example of a forward probability problem is "If a coin is fair, what is the probability of tossing H T H H T H T T T H T H H T T H"? We write this as p(D|H), the probability of obtaining the data D (sequence of tosses in this case) given the hypothesis H (the coin is fair).

Inverse probability turns it around. We seek p(H|D), which is the probability that the coin is fair (H) given that we observe the particular data sequence of heads and tails (D). The problems aren't interchangeable, and you can't get one from the other without the additional information found in Bayes's theorem.
 


Awesome explanation
 


marcusl said:
It answers questions opposite to the usual probability questions that you are probably familiar with. An example of a forward probability problem is "If a coin is fair, what is the probability of tossing H T H H T H T T T H T H H T T H"? We write this as p(D|H), the probability of obtaining the data D (sequence of tosses in this case) given the hypothesis H (the coin is fair).

Inverse probability turns it around. We seek p(H|D), which is the probability that the coin is fair (H) given that we observe the particular data sequence of heads and tails (D). The problems aren't interchangeable, and you can't get one from the other without the additional information found in Bayes's theorem.


How do you solve this problem?--

I consider just 2 tosses and here's the sequence- H T

Now how would you find out P(H|D) in the above problem?

[itex]P(H|D)=\frac{P(H)P(D|H)}{P(D)}[/itex]

It's easy to see that P(D|H)=1/4 (since the sample space is {HH,HT,TH,TT})

but what about the value of P(H) and P(D)?

Thanks a lot :)
 


p(D,H) is called the likelihood, and it is a forward probability.

p(H) is called the prior probability that hypothesis H is true, or just "prior" for short. It incorporates all a priori knowledge that you have. If you have a new nickel from the US mint and believe that nickels are manufactured with even weighting, you might set this close to or equal to 1. If you are playing a betting game with a known con artist / convicted criminal, you might set this close to 0.

What to do when you have no a priori information was controversial for over 200 years. (p(H) in this case is called an ignorance prior.) Laplace, who derived inverse probability independently of Bayes, said that you should assign equal probability to every hypothesis because there is no compelling reason to do otherwise. He called this the Principle of Insufficient Reason, and it was attacked so viciously and continuously that Bayesian inference was roundly discounted until roughly the 1970's or 80's. Physicist E. T. Jaynes finally showed that Laplace was right because the equal probability prior is the only choice that maximizes information entropy. There is still some small controversy over the correct scale-invariant form for an equal probability prior in certain problems, but this is unimportant here. For your problem, if it were known that the coin had to be either fair or crooked but you had absolutely no a priori information which, you would assign p(H)=1/2.

p(D) is the probability of observing the data, and it is often harder to find than p(H). In your problem, it is the probability of seeing H T independent of any coins or other parameters. In other words, it is the probability of seeing H T over the entire universe of coin tossing experiments with all possible coins (fair, bent, weighted, two-headed and two-tailed).

Basically, then, there is no simple answer to your question. You can fill in numbers according to your knowledge and beliefs, getting an answer that is consistent with them. It is often pointed out that Bayesian inference has parallels to human learning and decision behavior. If you acquire additional information, you update the prior (and possibly p(D)) to get improved estimates.

Problems are often reformulated as ratios to avoid the problem of finding p(D). In radar, one forms the ratio [tex]\Lambda=\frac{p(H_1,D)}{p(H_0,D)},[/tex] where H1 is the hypothesis that a received signal contains a weak echo in noise and H0 is that the signal is pure noise without an echo. Note that p(D), which is common to both, cancels from the ratio. If, in addition, [tex]p(H_1)=p(H_0)=0.5,[/tex] (an ignorance prior) then [tex]\Lambda=\frac{p(D,H_1)}{p(D,H_0)}[/tex] and this is called a Likelihood Ratio Test (LRT). and is widely used in science, engineering and statistics. A detection is declared when [tex]\Lambda\geq 0.5 .[/tex]
 


marcusl said:
A detection is declared when [tex]\Lambda\geq1[/tex]
That just means that a signal is somewhat more likely to give the observation than the background. I doubt that you can declare this as "detection". You expect it to be >=1 in ~50% of the cases just by random fluctuations - and if you can test several different hypotheses at the same time, you expect that about 50% have [itex]\Lambda\geq1[/itex] even without a real effect. If your hypotheses have free parameters, it gets even worse, and you could get [itex]\Lambda\geq1[/itex] nearly all the time.

[itex]\Lambda\geq20[/itex] begins to be interesting, something like [itex]\Lambda\geq1000[/itex] is better for a detection. And while particle physicists usually do not present their results in that way, they require more than [itex]\Lambda\geq10^6[/itex] to call it "observation".
 


mfb is right--I deleted that short post. The threshold for which detection is declared
[tex]\Lambda>\Lambda_0[/tex] is determined by the desired probability of detection and false alarm rate, and by the signal and noise characteristics. Threshold can be related to SNR; SNR thresholds for radar detection are often in the range of 10 to 30.
 


You people are really awesome. I never understood Bayes theorem like this before.

Thanks a lot for your help
 

Similar threads

  • · Replies 47 ·
2
Replies
47
Views
5K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 47 ·
2
Replies
47
Views
5K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K