Why is Baye's theorem called inverse prob ?

iVenky · Oct 26, 2012

Why is Baye's theorem called "inverse prob"?

What is the reason for calling Baye's theorem as "inverse probability"?

Any valid reason?

Thanks a lot

marcusl · Oct 26, 2012

It answers questions opposite to the usual probability questions that you are probably familiar with. An example of a forward probability problem is "If a coin is fair, what is the probability of tossing H T H H T H T T T H T H H T T H"? We write this as p(D|H), the probability of obtaining the data D (sequence of tosses in this case) given the hypothesis H (the coin is fair).

Inverse probability turns it around. We seek p(H|D), which is the probability that the coin is fair (H) given that we observe the particular data sequence of heads and tails (D). The problems aren't interchangeable, and you can't get one from the other without the additional information found in Bayes's theorem.

iVenky · Oct 26, 2012

Awesome explanation

iVenky · Oct 26, 2012

marcusl said:

It answers questions opposite to the usual probability questions that you are probably familiar with. An example of a forward probability problem is "If a coin is fair, what is the probability of tossing H T H H T H T T T H T H H T T H"? We write this as p(D|H), the probability of obtaining the data D (sequence of tosses in this case) given the hypothesis H (the coin is fair).

Inverse probability turns it around. We seek p(H|D), which is the probability that the coin is fair (H) given that we observe the particular data sequence of heads and tails (D). The problems aren't interchangeable, and you can't get one from the other without the additional information found in Bayes's theorem.

How do you solve this problem?--

I consider just 2 tosses and here's the sequence- H T

Now how would you find out P(H|D) in the above problem?

[itex]P(H|D)=\frac{P(H)P(D|H)}{P(D)}[/itex]

It's easy to see that P(D|H)=1/4 (since the sample space is {HH,HT,TH,TT})

but what about the value of P(H) and P(D)?

Thanks a lot :)

marcusl · Oct 27, 2012

p(D,H) is called the likelihood, and it is a forward probability.

p(H) is called the prior probability that hypothesis H is true, or just "prior" for short. It incorporates all a priori knowledge that you have. If you have a new nickel from the US mint and believe that nickels are manufactured with even weighting, you might set this close to or equal to 1. If you are playing a betting game with a known con artist / convicted criminal, you might set this close to 0.

What to do when you have no a priori information was controversial for over 200 years. (p(H) in this case is called an ignorance prior.) Laplace, who derived inverse probability independently of Bayes, said that you should assign equal probability to every hypothesis because there is no compelling reason to do otherwise. He called this the Principle of Insufficient Reason, and it was attacked so viciously and continuously that Bayesian inference was roundly discounted until roughly the 1970's or 80's. Physicist E. T. Jaynes finally showed that Laplace was right because the equal probability prior is the only choice that maximizes information entropy. There is still some small controversy over the correct scale-invariant form for an equal probability prior in certain problems, but this is unimportant here. For your problem, if it were known that the coin had to be either fair or crooked but you had absolutely no a priori information which, you would assign p(H)=1/2.

p(D) is the probability of observing the data, and it is often harder to find than p(H). In your problem, it is the probability of seeing H T independent of any coins or other parameters. In other words, it is the probability of seeing H T over the entire universe of coin tossing experiments with all possible coins (fair, bent, weighted, two-headed and two-tailed).

Basically, then, there is no simple answer to your question. You can fill in numbers according to your knowledge and beliefs, getting an answer that is consistent with them. It is often pointed out that Bayesian inference has parallels to human learning and decision behavior. If you acquire additional information, you update the prior (and possibly p(D)) to get improved estimates.

Problems are often reformulated as ratios to avoid the problem of finding p(D). In radar, one forms the ratio [tex]\Lambda=\frac{p(H_1,D)}{p(H_0,D)},[/tex] where H1 is the hypothesis that a received signal contains a weak echo in noise and H0 is that the signal is pure noise without an echo. Note that p(D), which is common to both, cancels from the ratio. If, in addition, [tex]p(H_1)=p(H_0)=0.5,[/tex] (an ignorance prior) then [tex]\Lambda=\frac{p(D,H_1)}{p(D,H_0)}[/tex] and this is called a Likelihood Ratio Test (LRT). and is widely used in science, engineering and statistics. A detection is declared when [tex]\Lambda\geq 0.5 .[/tex]

mfb · Oct 29, 2012

marcusl said:

A detection is declared when [tex]\Lambda\geq1[/tex]

That just means that a signal is somewhat more likely to give the observation than the background. I doubt that you can declare this as "detection". You expect it to be >=1 in ~50% of the cases just by random fluctuations - and if you can test several different hypotheses at the same time, you expect that about 50% have [itex]\Lambda\geq1[/itex] even without a real effect. If your hypotheses have free parameters, it gets even worse, and you could get [itex]\Lambda\geq1[/itex] nearly all the time.

[itex]\Lambda\geq20[/itex] begins to be interesting, something like [itex]\Lambda\geq1000[/itex] is better for a detection. And while particle physicists usually do not present their results in that way, they require more than [itex]\Lambda\geq10^6[/itex] to call it "observation".

marcusl · Oct 29, 2012

mfb is right--I deleted that short post. The threshold for which detection is declared
[tex]\Lambda>\Lambda_0[/tex] is determined by the desired probability of detection and false alarm rate, and by the signal and noise characteristics. Threshold can be related to SNR; SNR thresholds for radar detection are often in the range of 10 to 30.

iVenky · Nov 1, 2012

You people are really awesome. I never understood Bayes theorem like this before.

Thanks a lot for your help

Why is Baye's theorem called inverse prob ?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight