Conditional probabilities of conditioned probabilities

WWCY · Sep 30, 2019

scottdave · Sep 30, 2019

Does the comma as in P(A, B) represent intersection? If so I think the last one would be easier to read and understand. I am currently taking a Bayesian course. A classmate shared this link, which helps to visualize the concepts. https://seeing-theory.brown.edu/bayesian-inference/index.html

FactChecker · Sep 30, 2019

It's hard for me to read the problem because I don't know what the precedence of the operations ',' and '|' are. Some added parentheses might make the question more clear.

StoneTemplePython · Sep 30, 2019

WWCY said:

Writing something like ##P(A,B|C) = P(A|B|C) \ P(B|C)## seems nonsensical, and I've seen stuff that suggests ##P(A,B|C) = P(A|B,C)\ P(B|C)##, but I don't understand what is the right answer.

Assistance is greatly appreciated.

basic choices:
use the second form
(i) ##P(A,B|C) = P(A|B,C)\ P(B|C)##

or re-label so that ##A^{'}## is defined as the probability of A conditioned on B, and then
(ii) ## P(A|B|C) \ P(B|C)\longrightarrow P(A^{'}|C) \ P(B|C) =P(A,B|C)##
which looks less nonsensical

I'd tend to go for (i) though occassionally if there are a ton of moving parts and you are working through conditioning one 'stage' at a time (say with induction), then (ii) may be preferable

there's lots of variations on notation. What about (i) don't you like?

note for (i) the notation and idea (using indicators if needed) is the same as with conditional expectations, e.g.

##E\Big[X\big \vert Y,Z\Big]## or ##E\Big[X\big \vert Y=y,Z=z\Big]##
where you have a function defined on Y and Z. Standard function notation would be something like
##E\Big[X\big \vert Y=y,Z=z\Big]= g_X\big(y,z\big) =g\big(y,z\big)##

Stephen Tashi · Sep 30, 2019

WWCY said:

but I don't understand what is the right answer.

A question posed using notation isn't a specific question until the meaning of the notation is defined.
With certain interpretations of notation, both the equations you ask about are correct.

If you associate the English word "given" with the symbol "|" you can interpret notation like "P(A|B|C)" as a sequence of English words. However this doesn't guarantee that the sequence of Engish words has a specific mathematical interpretation. The formal theory of mathematical probability is based on assigning probabilities to sets. The difficulty you face in interpreting notation involving a "|" is how to translate that notation into a statement about sets. Notation such as "A|B" isn't specific until we can translate it into statements involving only the standard operations on sets such as "##\cap##" and "##\cup##".

It's important to understand the notation for conditional probability. To do that, begin by understanding that mathematical probability is defined on a "probability space". The probability space has a set ##\Omega## and there is a function ( a "probability measure) ##\mu## that gives the probability of certain subsets of ##\Omega##. With this understanding, the notation "##P(A)##" means ##\mu(A)##.

Suppose we have a problem concerning two probability spaces that have the same ##\Omega## but have different probability measures, ##\mu_1## and ##\mu_2##. In such a case, the notation "##P(A)##" is ambiguous. It might mean ##\mu_1(A)## or it might mean ##\mu_2(A)##.

The important thing to understand about the "##|##" notation is that it is used in the above situation for the purpose of distinguishing two different probability measures. In the convention for using the "##|##" notation, we write "##P(A)##" for the probability of the set ##A## using some probability measure ##\mu_1## that is "understood" or the "default" probability measure. We write "##P(A|##<something>##)##" to indicate the probability of ##A## using a different probability measure ##\mu_2##.

If we have a probability identity that applies to all probability measures we express it in the "P" notation with the understanding that some default probability measure ##\mu_1## is being used. So whe we write ##P(A \cup B) = P(A) + P(B) - P(A \cap B)##, we are saying ##\mu_1(A\cup B) = \mu_1(A) + \mu_1(B) - \mu_1(A\cap B)##.

With the understanding that "##|##<something>" indicates using a different probability measure, any identity valid for all probability measures also applies when written with the same "##|##<something>" used in the "##P(...)##"terms. For example, ##P(A \cup B|##<something>##)= P(A|##<something>##) + P(B|##<something>##) - P(A \cap B | ##<something>##)## is an abbreviation for ##\mu_2(A \cup B) = \mu_2(A) + \mu_2(B) - \mu_2(A \cap B)## where ##\mu_2## is some probability measure distinct from the default probability measure.

The convention for interpreting notation such a ##P(A|C)## where "##C##" is a set is as follows:. There is some default probability measure ##\mu_1##. Define a different probability measure ##\mu_2## by ##\mu_2(X) = \mu_1(X \cap C)/ \mu_1(C)##. Then "##P(A|C)##" denotes ##\mu_2(A)##.

Instructors emphasize the distinction between ##P(A \cap C)## and ##P(A | C)##. However, this can lead to the misunderstanding that the two notations are different because they involve different sets. The two notations both involve the set ##A \cap C##. The notations indicate computing the probability of ##A \cap C## using different probability measures. "##P(A \cap C)"## denotes ##\mu_1(A \cap C)## for the default probability measure ##\mu_1##. The evaluation of "##P(A |C)##" also includes computing ##\mu_1(A \cap C)##. The significant difference is that "##P(A|C)##" indicates that a probability measure ##\mu_2## distinct from ##\mu_1## is being used to get the final answer.

To restate the above idea, we can note ##P(A| C) = P(A \cap C | C)##. This is because ##P(A \cap C | C)## is defined to be ##\mu_1( (A \cap C) \cap C)/ \mu_1(C) = \mu_1 (A \cap C)/ \mu_1(C) = P(A | C) ##Both ##P(A|C) = P(A \cap C | C) ## and ##P(A \cap C)## can be regarded as computing a probability for the set ##A \cap C##. The distinction is that ##P(A \cap C)## and ##P(A|C)## assign probabilities to the set ##A \cap C## using different probability measures.

When we think of probability problems inuitively, it is tempting to imagine that events have a single probability and that other aspects of the problem (such as probabilities "given additional information") are merely accessories that have don't alter "the" unique probability of the event. It's better to realize that the same event can be assigned different probabilities by different probability measures. In one way of thinking, all probabilties are conditional probabilities. "##P(A)##" denotes the probability of the set ##A## under the conditions that establish some default probability measure ##\mu_1##.

When it comes to interpreting notation like "##P(A|B|C)##", we are on our own. I've never seen a probability textbook that develops such notation in detail. (The fact that we can write notation that looks plausible doesn't guarantee it has a unique or sensible interpretation.)

As @FactChecker says, we can worry about a distinction between "##P( (A|B) |C)" ## versus ##P(A | (B|C))##. However, to do that, we first have to make sense of notation like "##(A|B)##". The notation for conditional probability ##P(A|B)## doesn't define a way to interpret "##A|B##" as a set. In fact, in set theory, some people might use the notation "##A|B##" to denot the set ##\{x: x \in A \land x \notin B\}##, which is not what we want.

The technicalities of probability measures and probability spaces are complicated. Ignoring such complexities, I will make some suggestions.

WWCY said:

Writing something like P(A,B|C)=P(A|B|C) P(B|C) seems nonsensical,

Suppose we interpret ##P(A|B|C)## to mean ##P(A |(B \cap C)) ##. The latter has a specific interpretation by the conventions given above. This interpretation is ##\mu_2(A)## where ##\mu_2 = \mu_1(A \cap B \cap C)/ \mu_1(B \cap C))## using the default probability measure ##\mu_1##.

With that interpretation we can consider whether ##P(A \cap B|C) = P(A | B | C) P(B | C)##.

The left side of the equation is defined as ##\mu_1((A \cap B) \cap C)/ \mu_1(C)##.

With my suggestion, the right side is defined as ##P(A | B \cap C) P(B | C) = ( \mu_1(A \cap B \cap C)/ \mu_1(B \cap C) )\ ( \mu_1( B \cap C) / \mu_1(C))## which reduces to the left hand side. (Of course we must assume none of this involves division by zero.)

I've seen stuff that suggests P(A,B|C)=P(A|B,C) P(B|C),

Interpreting that as the claim ##P((A \cap B) | C) = P(A | (B \cap C)) P(B | C)## you should be able to translate it into a claim about expressions that only involve the probability measure ##\mu_1##

Conditional probabilities of conditioned probabilities

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect