# Understanding the P(B|A) ≡ P(A ∩ B) / P(A) formula.

1. Feb 14, 2015

### s3a

1. The problem statement, all variables and given/known data
My book says:
"Let A and B be two events such that P(A) > 0. Denote by P(B|A) the probability of B given that A has occurred. Since A is known to have occurred, it becomes the new sample space replacing the original S.

From this we are led to the definition

P(B|A) ≡ P(A ∩ B) / P(A).

P(A ∩ B) ≡ P(B|A) P(A)."

2. Relevant equations
P(B|A) ≡ P(A ∩ B) / P(A)
and
P(A ∩ B) ≡ P(B|A) P(A) [ii]

3. The attempt at a solution
1) Could someone please explain to me the intuition for at least the equation? I understand that, since event A has happened and what we want a certain probability out of the probability of A having occurred, P(A) is the denominator, but I don't intuitively understand the P(A ∩ B) part/numerator of .

2) I suppose that is not a theorem, because is not something that can be proven (and same goes for [ii]). Why is (and same goes for [ii]) not an axiom instead of a definition, though? And phrased more directly: why is it a definition, and why does it need to be defined?

I hope my questions make sense; please tell me if they don't.

I appreciate all answers, but please provide me with the most succinct answers you can, because I don't want to get more confused.

2. Feb 14, 2015

### vela

Staff Emeritus
I think it's easiest to see this using a Venn diagram. The probability of event A happening is represented by the fraction of the entire sample space U that A covers. Similarly, the probability of event B happening is represented by the fraction of the entire sample space U that B covers.

If you're given that A has happened, you only need to look at outcomes in A, so A takes over the role of U. The outcomes corresponding to event B are now represented by the intersection of A and B because you no longer care about outcomes that aren't part of event A.

#### Attached Files:

• ###### images.png
File size:
3.5 KB
Views:
132
3. Feb 15, 2015

### Ray Vickson

Think of the (limiting) relative-frequency interpretation of probability. Specifically, suppose we repeat the experiment (or whatever it is) independently, and under identical conditions, $M$ times, where $M$ is some huge number. The number of times event $A$ occurs is (approximately) $N_A = M P(A)$ and the number of times $B$ occurs is $N_B = M P(B)$. When we want the conditional probability $P(B|A)$, we look just at those particular outcomes where $A$ occurred. In how many of those particular outcomes did $B$ occur? Well, those are just the outcomes in which both $A$ and $B$ occurred, so is the number of times that $A \cap B$ occurred. Can you take it from there?

4. Feb 15, 2015

### Stephen Tashi

That's a good question and it highlights the difference between the concepts involved in the theory of probability and the concepts involved in applying the theory of probability to real life situations.

You are correct that conditional probability is implemented as a definition, not a theorem. This is because the axiomatic approach to probability theory only deals with the probability of events in a probability space, not with the concept of those events "actually occurring". In applying probability, we freely think of being "given" that an event in a probability space 'actually occurred", but that's application, not theory. The way that theory implements the concept that an event $A$ in a probability space "actually occurs" is indirect. It doesn't give a formal definition for the statement "$A$ actually occurs". Instead it gives a formula for computing probabilities in another probability space that is defined by the condition "given $A$".

We can daydream about an approach to probability that gives a direct definition of "event $A$ actually occurs" and then proves the formula for $P(B|A)$ as a theorem. The reason that standard probability theory chickens out from this approach is that it is intellectually treacherous to mix the concept of a probabilistic event with the concept of the definite occurrence of an event.

For example, suppose $X$ is a random variable uniformly distributed on the interval [0,1]. Suppose we assume it is possible to take a random sample of $X$ and that the event $\{X = r \}$ for some number $r$ "actually occurs". Such an event has probability zero. The event $\{ X \ne r \}$ has probability 1. So assuming the event $\{ X = r \}$ "actually occurred" implies an event with probability zero "actually occurred" and event with probability 1 " did not actually occur".

This apparent contradiction is not a paradox in standard treatment of mathematical probability theory simply because the standard treatment of probability theory does not give any definition for the "actually occurrence" of an event in a probability space ( -which is the reason I keep putting the such phrases in quotation marks). So if a person wants to talk about an event in a probability space "actually occurring", he is not discussing a concept that is defined within the standard treatment of mathematical probability theory.

People can debate questions like "Must an event with probability 1 always occur?". Such debates are in the realm of Philosophy or Metaphysics (or perhaps we can call them debates about "the interpretation of probability theory" since the forum frowns on philosophical discussions). There is no way to settle such questions by using the axioms and definitions of probability theory since that material doesn't treat the concept of "actual occurrence".

Axiomatic probably avoids philosophical tangles by side stepping the concept of a probabilistic event actually occurring. The concept of actual occurrence is dealt with indirectly by means of the definition of conditional probability. As Ray Vickson points out, this definition is motivated by concepts involved in applying probability to practical situations.

5. Feb 16, 2015

" Could someone please explain to me the intuition for at least the equation?"
A demonstration.
Consider a group of 100 people, randomly selected but (magically and for the sake of easy arithmetic)
- 50 are men, 30 of whom are married, 20 not married
- 50 are women, 25 of whom are married, 25 not married

The "experiment" is this: randomly select one person from the 100. Start with these two defined events:
A: the person selected is married
B: the person selected is a woman
Then $A \cap B$ is the event "the person selected is a married woman"

I hope you agree that P(A) = 55/100 = .55, P(B) = 50/100 = .5, P($A \cap B$) = 25/100 = 0.25 .

Now consider P(B | A).
* By the definition for conditional probability you supplied, this means we are told the selected person is married, and want to know, based on this, the chance the person is a woman. Since the person is married, that person must be one of the 55 married people. Among these 25 are women, so from first principles the probability is 25/55. But
notice this:
$$P(B \mid A) = \frac{25}{55} = \dfrac{\frac{25}{100}}{\frac{55}{100}} = \dfrac{P(A \cap B)}{P(A)}$$

Please note that this is far from a rigorous justification of the general idea, but is simply a basic demonstration. But for an introduction there is no need to be concerned with philosophy or more advanced issues. There will be time for those later in your studies.

6. Mar 8, 2015

### s3a

About intuitively understanding what P(B|A) ≡ P(A ∩ B) / P(A) means, I get it now. Thanks, everyone. :)

Specifically about what Stephen Tashi/you said:
So, in short, P(B|A) ≡ P(A ∩ B) / P(A) is a definition instead of a theorem, because it doesn't give a formal definition as to what it means for A to occur, but it just assumes that A occurred?

And, in short, P(B|A) ≡ P(A ∩ B) / P(A) is a definition instead of an axiom because it doesn't just compare probabilities of events (without saying anything about occurrence), but it also assumes that event A occurred?

P.S.
Sorry for the very late response. I'm really busy with school and work, and I have to prioritize the topics that are for school (since my probability and statistics course doesn't cover proofs, and I'm just looking into them for pure learning).

P.P.S.
Sorry for the poor grammar in my opening post, I must have either been tired when writing the post, had problems with my keyboard (because sometimes the keys don't work) or both.

7. Mar 8, 2015

### Stephen Tashi

The definition of the conditional probability $P(B| A)$ doesn't assume A occurred and it doesn't assume A did not occur. It simply doesn't comment on what the statement "A occured" would mean.

The definition of conditional probability defines the meaning of the complete phrase "The probability of B given A" without defining what the phrase "given A" would mean by itself.

This is analagous to the definition in calculus for "The limit of F(X) as X approaches A". The definition of limit ( "For each epsilon greater than zero, there exists a delta such that ..." ) doesn't contain any language that explains what the individual phrase "X approaches A" means. When we apply calculus, we sometimes visualize a dynamic process where X changes in time and gets closer to A as time passes, but this is not part of the formal definition of limit. Likewise, when we apply probability theory we interpret "given A" to mean that A actually occurred, but this is not part of the formal definition of conditional probability.

Technically, the format of a mathematical definition stipulates that a statement using previously undefined language shall be equivalent to a statement that uses already defined laguage. In mathematics, it doesn't always work to pick out individual words and phrases in the statement that is being defined and expect to find the meaning of the these words and phrases in the definition.

8. Mar 8, 2015

### s3a

Okay, so, the definition treats "the probability of 'B given A'" as its own entity and purposely avoids establishing the "given A" part as its own entity in order to avoid uncertain philosophical arguments, whereas the theorem approach to establishing conditional probability would have to establish the "given A" as its own entity and then base the probability of B off of the "given A" part?

What about comparing the definition approach to the axiom approach? Both the definition approach and axiom approach to establishing conditional probability seem to share the fact that they establish a relationship between probabilities of different entities without saying anything about those entities, but what is different between the two?

9. Mar 8, 2015

### Stephen Tashi

Yes.

I'm not sure what you mean by the "axiom approach".

10. Mar 10, 2015

### s3a

By the "axiom approach", I meant treating the equation P(B given A) = P(A intersection B) / P(A) as an axiom.

Is the reason why the equation P(B given A) = P(A intersection B) / P(A) is not an axiom because it has the potential to be proven/be a theorem (and axioms are fundamental, unprovable "building blocks" of a theory)?

11. Mar 10, 2015

### Stephen Tashi

For such an axiom to mean anything, one would first have to give a definition for P(B given A).

12. Mar 10, 2015

### s3a

Okay, so because the definition of P(B given A), P(B given A) triple_bar_symbol P(A intersection B) / P(A), is itself sufficient for establishing conditional probability, we don't need to develop an axiom which uses the definition of P(B given A) to establish conditional probability, even thought it would be mathematically correct to do so, right?

13. Mar 10, 2015

### Stephen Tashi

No, it wouldn't be mathematically correct to have an axiom whose content was just the content of definition. For example if you define pi as the ratio of a the circumference of a unit circle to its diameter then it isn't correct have an axiom that assumes pi is the ratio of the circumference of a unit circle to its diameter. If you get into the question of whether the ratio of the circumference to the diameter of any given circle is pi then that's a matter for an axiom or a theorem.

14. Mar 11, 2015

### Fredrik

Staff Emeritus
The finite case is easy to understand. Let X be the set of all people. Let A be the set of all people with blonde hair. Let B be the set of all Swedish people. I'll use the notation |E| for the number of elements of a subset of X. We define the function P by
$$P(E)=\frac{|E|}{|X|}$$
for all $E\subseteq X$. For each $E\subseteq X$, we call P(E) the probability of E. It's the probability that a randomly chosen element of X is an element of E. So P(A) is the probability that a randomly chosen person is blonde, and P(B) is the probability that a randomly chosen person is Swedish. The notation P(A|B) is supposed to be interpreted as the probability that a person is blonde, given that he/she is Swedish. In other words, it's supposed to be the probability that a randomly chosen Swedish person is a blonde Swedish person.
$$P(A|B)=\frac{|A\cap B|}{|B|}=\frac{\frac{|A\cap B|}{|X|}}{\frac{|B|}{|X|}} =\frac{P(A\cap B)}{P(B)}.$$

Last edited: Mar 11, 2015
15. Mar 22, 2015

### s3a

Fredrik, I had already understood the intuition behind applying the formula, but thanks anyways. :)

So, Stephen Tashi, is it possible, in any way, to establish the P(B|A) = P(A ∩ B) / P(A) formula as an axiom (because I only understand how to establish it as a definition or as a theorem, and you said that for an axiom stating that P(B|A) = P(A ∩ B) / P(A) to mean anything, one would have to define P(B|A) as P(A ∩ B) / P(A), but you also said that one cannot assume a definition as the basis of an axiom)?

16. Mar 22, 2015

### Stephen Tashi

The standard way (i.e. Kolmogorov, measure theory) to treat that formula is to treat it as a definition of P(B|A). From time to time people make attempts to formulate mathematical theories in new ways. I'm sure there are attempts to formulate probability theory in a different manner. Such attempts remain little known and obscure. I don't know of any method to establish the formula as an axiom.

It is clear that someone wants to establish $P(B|A) = P(A \cap B) / P(A)$ as an axiom or a theorem, they are required to state a definition for $P(B|A)$ otherwise the axiom or theorem has no specific meaning. If the formula is to be established as an axiom or theorem, we must state a defintion for $P(B|A)$ that is different than the formula before we assume or prove the formula.

( It's interesting that the current Wikipedia article on conditional probability claims that DeFinetti treats the formula as an axiom and that he treats P(B|A) as a "primitive". What that means is unclear. It's obvious that for the formula to make sense, P(B|A) must be a number of some sort. So P(B|A) must have some defined properties.)

A once well known attempt to axiomatize probabiltity that predated the Kolmogorov approach was Richard Von Mises' theory of "collectives". Mainstream mathematicians think that the theory of collectives is logically consistent. I don't know how Von Mises defined conditional probability. You can find respectable articles about the foundations of probability that exist in the twilight zone between mathematics and philosophy (assuming you consider philosophy respectable). People still comment on Von Mises work
e.g. http://patrick.maher1.net/517/lectures/lecture9.pdf (In those notes, Maher uses "rf" to abbreviate "relative frequency").

Last edited: Mar 23, 2015
17. Mar 25, 2015

### Stephen Tashi

It's worth pointing out that your interpretation of "given" is different that interpreting "given" to mean that a particular Swedish person has been chosen. If a particular Swedish person has been chosen, he/she (by the conventions of the problem) either is blonde or isn't blonde. There is no probability (other than 0 or 1) associated with an actual person being blonde (According to the conventions of the problem, a person isn't blonde one second and not-blonde a second later.)

There is a mathematical distinction between a quantity that is definite but unknown and a symbol that represents a random variable. For example, in a proof if we say "Let $\epsilon > 0$ be given". This statement does not associate any probability distribution with $\epsilon$, it only establishes $\epsilon$ as a definite but unknown positive number. Likewise if we say "Let U be a uniformly distributed random variable on the interval [0,1]", this does not imply that U represents a definite but unknown number.

Letting A be the event "The person is Swedish", we can't interpret "given A" to mean "A happened" in the sense that some particular outcome in A was realized. If a particular outcome in A was realized then statements about that outcome are statements about a definite but unknown person, not statements about a random variable.

( In statistics, a similar situation arises in the interpretation of confidence intervals. For example, we may compute that there is a .90 probability that that the sample mean (as a random variable) in an experiment will be plus or minus 5.0 of the population mean. But this does not imply that the particular sample mean 3.64 is has a probability of 0.90 of being within plus or minus 5.0 of the population mean.)

Last edited: Mar 25, 2015