WWCY said:
but I don't understand what is the right answer.
A question posed using notation isn't a specific question until the meaning of the notation is defined.
With certain interpretations of notation, both the equations you ask about are correct.
If you associate the English word "given" with the symbol "|" you can interpret notation like "P(A|B|C)" as a sequence of English words. However this doesn't guarantee that the sequence of Engish words has a specific mathematical interpretation. The formal theory of mathematical probability is based on assigning probabilities to
sets. The difficulty you face in interpreting notation involving a "|" is how to translate that notation into a statement about sets. Notation such as "A|B" isn't specific until we can translate it into statements involving only the standard operations on sets such as "##\cap##" and "##\cup##".
It's important to understand the notation for conditional probability. To do that, begin by understanding that mathematical probability is defined on a "probability space". The probability space has a set ##\Omega## and there is a function ( a "probability measure) ##\mu## that gives the probability of certain subsets of ##\Omega##. With this understanding, the notation "##P(A)##" means ##\mu(A)##.
Suppose we have a problem concerning two probability spaces that have the same ##\Omega## but have different probability measures, ##\mu_1## and ##\mu_2##. In such a case, the notation "##P(A)##" is ambiguous. It might mean ##\mu_1(A)## or it might mean ##\mu_2(A)##.
The important thing to understand about the "##|##" notation is that it is used in the above situation for the purpose of
distinguishing two different probability measures. In the convention for using the "##|##" notation, we write "##P(A)##" for the probability of the set ##A## using some probability measure ##\mu_1## that is "understood" or the "default" probability measure. We write "##P(A|##<something>##)##" to indicate the probability of ##A## using a different probability measure ##\mu_2##.
If we have a probability identity that
applies to all probability measures we express it in the "P" notation with the understanding that some default probability measure ##\mu_1## is being used. So whe we write ##P(A \cup B) = P(A) + P(B) - P(A \cap B)##, we are saying ##\mu_1(A\cup B) = \mu_1(A) + \mu_1(B) - \mu_1(A\cap B)##.
With the understanding that "##|##<something>" indicates using a different probability measure, any identity valid for all probability measures also applies when written with the same "##|##<something>" used in the "##P(...)##"terms. For example, ##P(A \cup B|##<something>##)= P(A|##<something>##) + P(B|##<something>##) - P(A \cap B | ##<something>##)## is an abbreviation for ##\mu_2(A \cup B) = \mu_2(A) + \mu_2(B) - \mu_2(A \cap B)## where ##\mu_2## is some probability measure distinct from the default probability measure.
The convention for interpreting notation such a ##P(A|C)## where "##C##" is a set is as follows:. There is some default probability measure ##\mu_1##. Define a different probability measure ##\mu_2## by ##\mu_2(X) = \mu_1(X \cap C)/ \mu_1(C)##. Then "##P(A|C)##" denotes ##\mu_2(A)##.
Instructors emphasize the distinction between ##P(A \cap C)## and ##P(A | C)##. However, this can lead to the misunderstanding that the two notations are different because they involve different
sets. The two notations both involve the set ##A \cap C##. The notations indicate computing the probability of ##A \cap C## using different
probability measures. "##P(A \cap C)"## denotes ##\mu_1(A \cap C)## for the default probability measure ##\mu_1##. The evaluation of "##P(A |C)##" also includes computing ##\mu_1(A \cap C)##. The significant difference is that "##P(A|C)##" indicates that a probability measure ##\mu_2## distinct from ##\mu_1## is being used to get the final answer.
To restate the above idea, we can note ##P(A| C) = P(A \cap C | C)##. This is because ##P(A \cap C | C)## is defined to be ##\mu_1( (A \cap C) \cap C)/ \mu_1(C) = \mu_1 (A \cap C)/ \mu_1(C) = P(A | C) ##Both ##P(A|C) = P(A \cap C | C) ## and ##P(A \cap C)## can be regarded as computing a probability for the set ##A \cap C##. The distinction is that ##P(A \cap C)## and ##P(A|C)## assign probabilities to the set ##A \cap C## using different probability measures.
When we think of probability problems inuitively, it is tempting to imagine that events have a single probability and that other aspects of the problem (such as probabilities "given additional information") are merely accessories that have don't alter "the" unique probability of the event. It's better to realize that the same event can be assigned different probabilities by different probability measures. In one way of thinking, all probabilties are conditional probabilities. "##P(A)##" denotes the probability of the set ##A## under the conditions that establish some default probabilty measure ##\mu_1##.
When it comes to interpreting notation like "##P(A|B|C)##", we are on our own. I've never seen a probability textbook that develops such notation in detail. (The fact that we can write notation that looks plausible doesn't guarantee it has a unique or sensible interpretation.)
As
@FactChecker says, we can worry about a distinction between "##P( (A|B) |C)" ## versus ##P(A | (B|C))##. However, to do that, we first have to make sense of notation like "##(A|B)##". The notation for conditional probability ##P(A|B)## doesn't define a way to interpret "##A|B##" as a set. In fact, in set theory, some people might use the notation "##A|B##" to denot the set ##\{x: x \in A \land x \notin B\}##, which is not what we want.
The technicalities of probability measures and probability spaces are complicated. Ignoring such complexities, I will make some suggestions.
WWCY said:
Writing something like P(A,B|C)=P(A|B|C) P(B|C) seems nonsensical,
Suppose we interpret ##P(A|B|C)## to mean ##P(A |(B \cap C)) ##. The latter has a specific interpretation by the conventions given above. This interpretation is ##\mu_2(A)## where ##\mu_2 = \mu_1(A \cap B \cap C)/ \mu_1(B \cap C))## using the default probability measure ##\mu_1##.
With that interpretation we can consider whether ##P(A \cap B|C) = P(A | B | C) P(B | C)##.
The left side of the equation is defined as ##\mu_1((A \cap B) \cap C)/ \mu_1(C)##.
With my suggestion, the right side is defined as ##P(A | B \cap C) P(B | C) = ( \mu_1(A \cap B \cap C)/ \mu_1(B \cap C) )\ ( \mu_1( B \cap C) / \mu_1(C))## which reduces to the left hand side. (Of course we must assume none of this involves division by zero.)
I've seen stuff that suggests P(A,B|C)=P(A|B,C) P(B|C),
Interpreting that as the claim ##P((A \cap B) | C) = P(A | (B \cap C)) P(B | C)## you should be able to translate it into a claim about expressions that only involve the probability measure ##\mu_1##