Why is Baye's theorem called inverse prob ?

In summary: Baye's theorem is a mathematical formula that tells you the probability of an event given the evidence. Inverse probability is a problem that is solved by Bayesian inference. Bayesian inference is a process that uses Baye's theorem to calculate the probability of an event given known information. The evidence can be anything, like data from experiments or information about the world. Baye's theorem is a mathematical formula that tells you the probability of an event given the evidence. Inverse probability is a problem that is solved by Baye's theorem. Bayesian inference is a process that uses Baye's theorem to calculate the probability of an event given known information. The evidence can be anything, like data from experiments or information about the world.
  • #1
iVenky
212
12
Why is Baye's theorem called "inverse prob"?

What is the reason for calling Baye's theorem as "inverse probability"?

Any valid reason?

Thanks a lot
 
Physics news on Phys.org
  • #2


It answers questions opposite to the usual probability questions that you are probably familiar with. An example of a forward probability problem is "If a coin is fair, what is the probability of tossing H T H H T H T T T H T H H T T H"? We write this as p(D|H), the probability of obtaining the data D (sequence of tosses in this case) given the hypothesis H (the coin is fair).

Inverse probability turns it around. We seek p(H|D), which is the probability that the coin is fair (H) given that we observe the particular data sequence of heads and tails (D). The problems aren't interchangeable, and you can't get one from the other without the additional information found in Bayes's theorem.
 
  • #3


Awesome explanation
 
  • #4


marcusl said:
It answers questions opposite to the usual probability questions that you are probably familiar with. An example of a forward probability problem is "If a coin is fair, what is the probability of tossing H T H H T H T T T H T H H T T H"? We write this as p(D|H), the probability of obtaining the data D (sequence of tosses in this case) given the hypothesis H (the coin is fair).

Inverse probability turns it around. We seek p(H|D), which is the probability that the coin is fair (H) given that we observe the particular data sequence of heads and tails (D). The problems aren't interchangeable, and you can't get one from the other without the additional information found in Bayes's theorem.


How do you solve this problem?--

I consider just 2 tosses and here's the sequence- H T

Now how would you find out P(H|D) in the above problem?

[itex]P(H|D)=\frac{P(H)P(D|H)}{P(D)}[/itex]

It's easy to see that P(D|H)=1/4 (since the sample space is {HH,HT,TH,TT})

but what about the value of P(H) and P(D)?

Thanks a lot :)
 
  • #5


p(D,H) is called the likelihood, and it is a forward probability.

p(H) is called the prior probability that hypothesis H is true, or just "prior" for short. It incorporates all a priori knowledge that you have. If you have a new nickel from the US mint and believe that nickels are manufactured with even weighting, you might set this close to or equal to 1. If you are playing a betting game with a known con artist / convicted criminal, you might set this close to 0.

What to do when you have no a priori information was controversial for over 200 years. (p(H) in this case is called an ignorance prior.) Laplace, who derived inverse probability independently of Bayes, said that you should assign equal probability to every hypothesis because there is no compelling reason to do otherwise. He called this the Principle of Insufficient Reason, and it was attacked so viciously and continuously that Bayesian inference was roundly discounted until roughly the 1970's or 80's. Physicist E. T. Jaynes finally showed that Laplace was right because the equal probability prior is the only choice that maximizes information entropy. There is still some small controversy over the correct scale-invariant form for an equal probability prior in certain problems, but this is unimportant here. For your problem, if it were known that the coin had to be either fair or crooked but you had absolutely no a priori information which, you would assign p(H)=1/2.

p(D) is the probability of observing the data, and it is often harder to find than p(H). In your problem, it is the probability of seeing H T independent of any coins or other parameters. In other words, it is the probability of seeing H T over the entire universe of coin tossing experiments with all possible coins (fair, bent, weighted, two-headed and two-tailed).

Basically, then, there is no simple answer to your question. You can fill in numbers according to your knowledge and beliefs, getting an answer that is consistent with them. It is often pointed out that Bayesian inference has parallels to human learning and decision behavior. If you acquire additional information, you update the prior (and possibly p(D)) to get improved estimates.

Problems are often reformulated as ratios to avoid the problem of finding p(D). In radar, one forms the ratio [tex]\Lambda=\frac{p(H_1,D)}{p(H_0,D)},[/tex] where H1 is the hypothesis that a received signal contains a weak echo in noise and H0 is that the signal is pure noise without an echo. Note that p(D), which is common to both, cancels from the ratio. If, in addition, [tex]p(H_1)=p(H_0)=0.5,[/tex] (an ignorance prior) then [tex]\Lambda=\frac{p(D,H_1)}{p(D,H_0)}[/tex] and this is called a Likelihood Ratio Test (LRT). and is widely used in science, engineering and statistics. A detection is declared when [tex]\Lambda\geq 0.5 .[/tex]
 
  • #6


marcusl said:
A detection is declared when [tex]\Lambda\geq1[/tex]
That just means that a signal is somewhat more likely to give the observation than the background. I doubt that you can declare this as "detection". You expect it to be >=1 in ~50% of the cases just by random fluctuations - and if you can test several different hypotheses at the same time, you expect that about 50% have [itex]\Lambda\geq1[/itex] even without a real effect. If your hypotheses have free parameters, it gets even worse, and you could get [itex]\Lambda\geq1[/itex] nearly all the time.

[itex]\Lambda\geq20[/itex] begins to be interesting, something like [itex]\Lambda\geq1000[/itex] is better for a detection. And while particle physicists usually do not present their results in that way, they require more than [itex]\Lambda\geq10^6[/itex] to call it "observation".
 
  • #7


mfb is right--I deleted that short post. The threshold for which detection is declared
[tex]\Lambda>\Lambda_0[/tex] is determined by the desired probability of detection and false alarm rate, and by the signal and noise characteristics. Threshold can be related to SNR; SNR thresholds for radar detection are often in the range of 10 to 30.
 
  • #8


You people are really awesome. I never understood Bayes theorem like this before.

Thanks a lot for your help
 

1. Why is Bayes' theorem called "inverse prob"?

Bayes' theorem is often referred to as "inverse prob" because it involves calculating the probability of an event occurring based on prior knowledge or evidence. This is the inverse of traditional probability calculations, where we know the event and are trying to determine the probability.

2. Who discovered Bayes' theorem?

Bayes' theorem was first discovered by English mathematician and Presbyterian minister Thomas Bayes in the 18th century. However, it was not published until after his death, and was later refined and popularized by French mathematician Pierre-Simon Laplace.

3. How does Bayes' theorem work?

Bayes' theorem uses conditional probability to calculate the likelihood of an event occurring based on prior knowledge or evidence. It takes into account both the prior probability of the event and the likelihood of the evidence given the event. The formula is P(A|B) = P(B|A) * P(A) / P(B), where A and B represent events and P(A) and P(B) represent their respective probabilities.

4. What is the practical application of Bayes' theorem?

Bayes' theorem has many practical applications, particularly in fields such as statistics, data analysis, and machine learning. It can be used to update probabilities as new evidence is obtained, make predictions based on prior knowledge, and analyze complex systems with multiple variables.

5. Why is Bayes' theorem important?

Bayes' theorem is important because it provides a logical and mathematical framework for updating beliefs and making decisions based on evidence. It allows us to incorporate new information into our understanding of the world and make more accurate predictions. It is also the basis for many statistical and machine learning techniques used in various fields.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
377
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
47
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
1K
  • Precalculus Mathematics Homework Help
Replies
1
Views
925
  • Programming and Computer Science
Replies
1
Views
653
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
797
Back
Top