# Probability Theory: Need help understanding a step

## Homework Statement

Discrete random variables ##X,Y,Z## are mutually independent if for all ##x_i, y_j, z_k##,
$$P(X=x_i \wedge Y=y_j \wedge Z=z_k ) = P(X=x_i)P(Y=y_j)P(Z=z_k )$$

I am trying to show (or trying to understand how someone has shown) that ##X,Y## are also independent as a result of ##X,Y,Z## being mutually independent.

## The Attempt at a Solution

It starts of with
$$P(X=x_i \wedge Y=y_j ) = \sum_k P(X=x_i \wedge Y=y_j \wedge Z=z_k )$$
before going using the definition of mutual independence for the three variables to complete the proof. This is the step I don't understand. Why is the probability of getting results ##x_i,y_j## equal to the sum (over ##k##) of probabilities of getting results ##x_i, y_j, z_k##?

Orodruin
Staff Emeritus
Homework Helper
Gold Member
You are looking for a probability of some case A. You then need to add up the probabilities for all outcomes where A is true, in this case that X and Y take particular values. This is true independent of Z whenever X and Y take the correct values so you end up with a sum over the possible outcomes for Z.

Stephen Tashi
Why is the probability of getting results ##x_i,y_j## equal to the sum (over ##k##) of probabilities of getting results ##x_i, y_j, z_k##?

It's as @Oroduin said - and the concept is significant enough to have its own name: https://en.wikipedia.org/wiki/Law_of_total_probability.

Rather than being a "law of nature", it is implicit in the definition of a probability space, which depends on the definition of a probability "measure", whose definition says it is an "additive" function when applied to disjoint measureable sets. That's an outline of the mathematical structure, which is not made clear by the Wikipedia article.

Orodruin
Orodruin
Staff Emeritus
Homework Helper
Gold Member
It's as @Orodruin said - and the concept is significant enough to have its own name: https://en.wikipedia.org/wiki/Law_of_total_probability.

Rather than being a "law of nature", it is implicit in the definition of a probability space, which depends on the definition of a probability "measure", whose definition says it is an "additive" function when applied to disjoint measureable sets. That's an outline of the mathematical structure, which is not made clear by the Wikipedia article.
Well said (and a bit more direct than I managed on my phone this morning). In general, I think the connection between probability theory and the measure theory is typically underemphasised in introductory courses on probability (at least for non-mathematicians). Also, just for OP's reference: https://en.wikipedia.org/wiki/Measure_(mathematics)

StoneTemplePython
Gold Member
2019 Award

I tried to emphasize the role of events and (sub) additivity, but if OP did not understand the event partitioning (and union) argument, then introducing measures... is a step in the wrong direction. And it certainly is not needed for discrete random variables.
- - - - -
another approach is to unpack the joint probability into multiplicative conditional probability. Ignoring any nits about zero probability events, we have the identity

##P\Big(X=x_i, Y=y_j, Z= z_k\Big) = P\Big(X=x_i\Big)P\Big(Y=y_j \big \vert X = x_i\Big)P\Big( Z= z_k\big \vert X = x_i, Y = y_j \Big)##

But we need to recall that conditional probabilities are in fact probabilities, so summing over all ##k##

##\sum_k P\Big(X=x_i, Y=y_j, Z= z_k\Big) ##
##= \sum_k P\Big(X=x_i\Big)P\Big(Y=y_j \big \vert X = x_i\Big)P\Big( Z= z_k\big \vert X= x_i, Y= y_j\Big) ##
##= P\Big(X=x_i\Big)P\Big(Y=y_j \big \vert X = x_i\Big)\cdot \sum_k P\Big( Z= z_k\big \vert X= x_i, Y= y_j\Big) ##
##= P\Big(X=x_i\Big)P\Big(Y=y_j \big \vert X = x_i\Big)\cdot 1 ##
##=P\Big(X=x_i\Big)P\Big(Y=y_j \big \vert X = x_i\Big)##
##=P\Big(X=x_i, Y=y_j\Big)##

as desired

Thanks for the responses.

I did not think to apply the law of total probability for the case of variables. Now I see the connection.

@StoneTemplePython I did manage to grasp the idea behind your proof in the other thread, but couldn't do so for the method (just that step) presented above, hence the question. Thanks again for your time!