B What motivates Bayes' Theorem?

  • Thread starter Thread starter Agent Smith
  • Start date Start date
  • Tags Tags
    Revolution Theorem
Click For Summary
Bayes' theorem revolutionized probability by allowing the calculation of reverse conditional probabilities, enabling the update of probabilities based on new evidence. Key figures in its development include Thomas Bayes and Pierre-Simon Laplace, with the theorem being fundamental to Bayesian statistics. Unlike frequentist methods, which require a sample, Bayesian approaches can work with limited or no evidence by incorporating prior beliefs. The theorem's application is crucial in scientific contexts, allowing researchers to test theories against observations. Overall, Bayes' theorem provides a powerful framework for understanding and estimating probabilities in various scenarios.
Agent Smith
Messages
345
Reaction score
36
TL;DR
What motivates Bayes' Theorem
As far as I know, Bayes' theorem is ##P(A|B) = \frac{P(A) \times P(B|A)}{P(A) \times P(B|A) + P(\neg A) \times P(B|\neg A)}##.

I recall someone saying Bayes' theorem revolutionized probability. Bayes himself and Laplace are supposedly key figures in this revolution. I know how to apply the theorem, but in what sense is the theorem a "revolution"?
 
Physics news on Phys.org
jedishrfu said:
Bayes theorem allows for probabilities to be updated based on new evidence making it a better way to estimate probabilities.

https://en.wikipedia.org/wiki/Bayes'_theorem
You may be confusing Bayesian Statistics, which is what you describe, and Bayes' Theorem, which is the essential and elementary component of probability theory quoted in the OP. The former is the Bayesian Interpretation, as given in that Wikipedia page.

Perhaps Bayes' Theorem was revolutionary, in that it allowed probabilities to be calculated in reverse. Bayesian statistics allows probabilities to be calculated in a wider range of circumstances. Most of the time, Bayesian and frequentist calculations agree.

That said, a frequentist obviously updates probabilities based on new evidence. What the Bayesian can do is start with very limited evidence or no evidence at all! The frequentist can't do that.
 
  • Like
Likes Agent Smith
Agent Smith said:
TL;DR Summary: What motivates Bayes' Theorem

As far as I know, Bayes' theorem is ##P(A|B) = \frac{P(A) \times P(B|A)}{P(A) \times P(B|A) + P(\neg A) \times P(B|\neg A)}##.
I've never seen it like that before. The denominator on the right is just an expansion of ##P(B)##. The simplest form of Bayes' Theorem, IMO, is:
$$P(B)P(A|B) = P(A \cap B) = P(A)P(B|A)$$You can draw a Venn diagram to see this.
 
  • Like
Likes Demystifier and Agent Smith
PeroK said:
What the Bayesian can do is start with very limited evidence or no evidence at all! The frequentist can't do that.
Which one are you, a Bayesian or frequentist?
 
vela said:
Which one are you, a Bayesian or frequentist?
I'm a frequentist myself. I'm not convinced that the probabilities that can be calculated from the Bayesian Interpretation are, in all circumstances, meaningful.
 
  • Like
Likes Lord Jestocost
For example, using Bayesian methods to calculate the probability that the universe is flat (or whatever). Yes, you get a number and can call it a probability, but what does it mean? As a frequentist, you have to be honest and say that there is no answer (from frequentist probability theory). And sometimes "no answer" is better and more honest than an answer that may or may not mean anything.
 
  • Like
Likes Agent Smith
Bayes theorem has been extrapolated to create the Bayesian interpretation of probability which has been extrapolated to a whole Bayesian epistemology. This has had some influence in physics with some ideas of Bayesian thermodynamics and Bayesian interpretations of quantum mechanics, but none of these are very popular.
 
  • Like
  • Informative
Likes Demystifier and PeroK
Bayes' formula allows you to convert P(A|B) into P(B|A). E.g., in tossing a coin you can calculate the probability of an outcome given a (null)hypothesis. But often you want to know the reverse: what's e.g. the probability the coin is not fair given the outcome?

Or consider a suspect. You can calculate with some model the probability he was at the crime scene given he's innocent. But a prosecutor wants to know the reverse probability. Confusing these is known as the prosecutor's fallacy.
 
  • Like
  • Informative
Likes Demystifier, Dale and Agent Smith
  • #10
PeroK said:
For example, using Bayesian methods to calculate the probability that the universe is flat (or whatever). Yes, you get a number and can call it a probability, but what does it mean? As a frequentist, you have to be honest and say that there is no answer (from frequentist probability theory). And sometimes "no answer" is better and more honest than an answer that may or may not mean anything.
It means how much confidence you have given certain background information. Subjectivity doesn't deprive things of meaning.
 
  • #11
haushofer said:
It means how much confidence you have given certain background information. Subjectivity doesn't deprive things of meaning.
One could argue that science must be objective.
 
  • #12
haushofer said:
Bayes' formula allows you to convert P(A|B) into P(B|A). E.g., in tossing a coin you can calculate the probability of an outcome given a (null)hypothesis. But often you want to know the reverse: what's e.g. the probability the coin is not fair given the outcome?

Or consider a suspect. You can calculate with some model the probability he was at the crime scene given he's innocent. But a prosecutor wants to know the reverse probability. Confusing these is known as the prosecutor's fallacy.
In mathematics one has to distinguish between probabilities and likelihood.
 
  • #13
PeroK said:
I've never seen it like that before. The denominator on the right is just an expansion of ##P(B)##. The simplest form of Bayes' Theorem, IMO, is:
$$P(B)P(A|B) = P(A \cap B) = P(A)P(B|A)$$You can draw a Venn diagram to see this.
Si, I don't know why, if it all, one form is preferred over the other. The derivation is also difficult to follow.
 
  • #14
haushofer said:
Bayes' formula allows you to convert P(A|B) into P(B|A). E.g., in tossing a coin you can calculate the probability of an outcome given a (null)hypothesis. But often you want to know the reverse: what's e.g. the probability the coin is not fair given the outcome?

Or consider a suspect. You can calculate with some model the probability he was at the crime scene given he's innocent. But a prosecutor wants to know the reverse probability. Confusing these is known as the prosecutor's fallacy.
Is that it? We can compute "reverse probabilities"?
 
  • #15
Agent Smith said:
Si, I don't know why, if it all, one form is preferred over the other.
I prefer simplicity and memorability.
Agent Smith said:
The derivation is also difficult to follow.
What derivation?
 
  • #16
jedishrfu said:
Bayes theorem allows for probabilities to be updated based on new evidence making it a better way to estimate probabilities.

https://en.wikipedia.org/wiki/Bayes'_theorem
This "new evidence" is a sample, but we can use this even in standard (frequentist) statistics.
 
  • #17
PeroK said:
I prefer simplicity and memorability.

What derivation?
The derivation from your simpler formula to the more complex one I posted. I drew a Venn Diagram and it meshes with what I know. Gracias
 
  • #18
Agent Smith said:
The derivation from your simpler formula to the more complex one I posted. I drew a Venn Diagram and it meshes with what I know. Gracias
If ##A_1, \dots A_n## are mutually exclusive events that exhaust the sample space, then for any event ##B##:
$$P(B) = P(A_1)P(B|A_1) + \dots +P(A_n)P(B|A_n)$$A special case of this is where the events are ##A## and ##\neg A##, in which case:
$$P(B) = P(A)P(B|A) + P(\neg A)P(B|\neg A)$$That's a general expansion of ##P(B)##, so there is no need to include that in Bayes' theorem. The simple ##P(B)## should do just as well.
 
  • Like
Likes Agent Smith
  • #19
Agent Smith said:
TL;DR Summary: What motivates Bayes' Theorem

in what sense is the theorem a "revolution"?
The main thing is that it allows you to reverse conditional probabilities. So if you know the unconditional probabilities and also the conditional probability ##P(A|B)## then you can calculate the reversed conditional probability ##P(B|A)##.

This is important in science. Usually my theory tells me ##P(observation|theory)##. But then I can use Bayes theorem to determine ##P(theory|observation)##. So I can use experiment to test my theory.
 
  • Like
Likes Agent Smith and PeroK
  • #20
Is there anything I should note other than that probabilities can be reversed with Bayes' theorem? The Wikipedia page says that Bayes' "real intention" was to "prove the existence of God."

For ##P(A|B) = \frac{P(A) \times P(B|A)}{P(B)}##, what happens when ##P(B) = 0##?
Gracias. The Wikipedia article on Bayes' theorem mentions the caveat that ##P(B) \ne 0##.
 
  • #21
Agent Smith said:
TL;DR Summary: What motivates Bayes' Theorem

I recall someone saying
Reference please? Otherwise we are chasing ghosts.
 
  • Like
Likes Agent Smith
  • #22
  • #23
So we're going to play that game, eh? "It's somewhere in those zillion pages."

Count me out.
 
  • #24
Vanadium 50 said:
So we're going to play that game, eh? "It's somewhere in those zillion pages."

Count me out.
Wikipedia has a good article on Bayes' theorem. Most of what I wrote here cometh from there.
 
  • #25
PeroK said:
What the Bayesian can do is start with very limited evidence or no evidence at all! The frequentist can't do that.
A frequentist takes a sample (the least). A Bayesian ___? 🤔
 
  • #26
PeroK said:
For example, using Bayesian methods to calculate the probability that the universe is flat (or whatever). Yes, you get a number and can call it a probability, but what does it mean? As a frequentist, you have to be honest and say that there is no answer (from frequentist probability theory). And sometimes "no answer" is better and more honest than an answer that may or may not mean anything.
Throwing darts here, but does the process work like this:
Fix a significance level ##\alpha = 0.05##
##H_0## = The universe is curved i.e. ##\mu_0 = v## and ##\sigma_0 = u## where ##v## is a specific value. Here ##\mu_0## is the mean of some physical measurement and ##\sigma_0## its standard deviation
##H_a## = The universe is flat i.e. ##\mu_0 > v##

We then proceed to make some measurements (sample, of size ##n##). Compute the mean from the sample ##\mu_s = x##. We find the ##\text{z score} = \frac{\mu_s - \mu_0}{\frac{\sigma}{\sqrt n}}##. We can read off our ##\text{P-value}## from a z-table. If ##\text{P-value} < \alpha## we can reject ##H_0## in favor of ##H_a## and conclude that the universe is flat.

?
 
  • #27
Agent Smith said:
A frequentist takes a sample (the least). A Bayesian ___? 🤔
You can read about it. The difference is more about what fundamentally is a probability. At this stage you should focus on probability theory and your course material.

It's unfortunate that many people associate Bayes Theorem with Bayesian statistics. This leads to these esoteric debates, which are of limited value at this stage. Bayes Theorem is elementary and fundamental and shouldn't lead to a debate about Bayesian statistics.
 
  • #28
Agent Smith said:
Throwing darts here, but does the process work like this:
Fix a significance level ##\alpha = 0.05##
##H_0## = The universe is curved i.e. ##\mu_0 = v## and ##\sigma_0 = u## where ##v## is a specific value. Here ##\mu_0## is the mean of some physical measurement and ##\sigma_0## its standard deviation
##H_a## = The universe is flat i.e. ##\mu_0 > v##

We then proceed to make some measurements (sample, of size ##n##). Compute the mean from the sample ##\mu_s = x##. We find the ##\text{z score} = \frac{\mu_s - \mu_0}{\frac{\sigma}{\sqrt n}}##. We can read off our ##\text{P-value}## from a z-table. If ##\text{P-value} < \alpha## we can reject ##H_0## in favor of ##H_a## and conclude that the universe is flat.

?
There is only one universe, so the sample size is 1.
 
  • Like
Likes Agent Smith
  • #29
PeroK said:
There is only one universe, so the sample size is 1.
:smile: Yes that's correct, I forgot. What about the measurements I "used" to test the hypotheses that the universe is flat?
 
  • #30
Agent Smith said:
:smile: Yes that's correct, I forgot. What about the measurements I "used" to test the hypotheses that the universe is flat?
The universe is either flat or it isn't. Probability doesn't apply. One could say that the distribution is everything and in this case there is either no distribution or the distribution is entirely unknown.
 
  • Like
Likes Agent Smith

Similar threads

Replies
9
Views
2K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 7 ·
Replies
7
Views
614
Replies
8
Views
3K
  • · Replies 47 ·
2
Replies
47
Views
5K
  • · Replies 4 ·
Replies
4
Views
3K
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 23 ·
Replies
23
Views
2K