My book says:
"Let A and B be two events such that P(A) > 0. Denote by P(B|A) the probability of B given that A has occurred. Since A is known to have occurred, it becomes the new sample space replacing the original S.
From this we are led to the definition
P(B|A) ≡ P(A ∩ B) / P(A).
P(A ∩ B) ≡ P(B|A) P(A)."
P(B|A) ≡ P(A ∩ B) / P(A)
P(A ∩ B) ≡ P(B|A) P(A) [ii]
The Attempt at a Solution
There's two things I would like to ask about:
1) Could someone please explain to me the intuition for at least the equation? I understand that, since event A has happened and what we want a certain probability out of the probability of A having occurred, P(A) is the denominator, but I don't intuitively understand the P(A ∩ B) part/numerator of .
2) I suppose that is not a theorem, because is not something that can be proven (and same goes for [ii]). Why is (and same goes for [ii]) not an axiom instead of a definition, though? And phrased more directly: why is it a definition, and why does it need to be defined?
I hope my questions make sense; please tell me if they don't.
I appreciate all answers, but please provide me with the most succinct answers you can, because I don't want to get more confused.