How to prove the chain rule for mutual information

Master1022 · Oct 18, 2021

Hi,

I was attempting the following problem and am a bit confused because I don't quite understand the structure/notation of how mutual information and entropy work when we have conditionals (a|b) and multiple events (a, b).

Question: Prove the chain rule for mutual information.
[tex] I(X_1, X_2, ... , X_n ; Y) = \sum_{j=1}^{n} I(X_j; Y|X_{j-1}, ..., X_1) [/tex]

Attempt:
I started by doing:

[tex] I(X_1, X_2, ... , X_n ; Y) = H(X_1, X_2, ..., X_n) - H(X_1, X_2, ... , X_n | Y) [/tex]
And I know the expression on the right can be expanded using the chain rule for entropy which is [itex] H(X_1, X_2, ..., X_n) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) [/itex], and thus the expression becomes:
[tex] I(X_1, X_2, ... , X_n ; Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - H(X_1, X_2, ... , X_n | Y) [/tex]

Now I am thinking about how to deal with the second term. I think I should be able to use the chain rule for entropy as well, then then combine both sums to get something useful. However, I am stuck because I am confused by the presence of the condition and multiple things before the condition bar... Nonetheless, if I think about it from a Venn diagram standpoint then I think I get to something like:
[tex] H(X_1, X_2, ... , X_n | Y) = H(X_n | Y, X_{n - 1}, ... , X_1) \cdot H(X_{n - 1}, ... , X_1) = H(X_n | Y, X_{n - 1}, ... , X_1) \prod_{j = 1}^{n-1} H(X_j | X_{j - 1}, ... , X_1) [/tex]

However, I am not sure whether this is correct. Any help would be greatly appreciated.

mighty2000 · Oct 18, 2021

Hi there,

It looks like you're on the right track! Let me try to break down the problem and explain it in a bit more detail.

First, let's start with the definition of mutual information. Mutual information between two random variables X and Y is defined as:

I(X;Y) = H(X) - H(X|Y)

where H(X) is the entropy of X and H(X|Y) is the conditional entropy of X given Y.

Now, let's expand this definition to multiple random variables. In your attempt, you correctly used the chain rule for entropy to expand H(X) to:

H(X_1, X_2, ..., X_n) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1)

We can do the same for the conditional entropy H(X|Y) by using the chain rule for conditional entropy. This gives us:

H(X_1, X_2, ..., X_n | Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)

Now, we can substitute these expanded expressions into the definition of mutual information to get:

I(X_1, X_2, ..., X_n ; Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)

At this point, we can see that H(X_j | X_{j-1}, ... , X_1) appears in both sums. We can combine these sums to get:

I(X_1, X_2, ..., X_n ; Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)

= \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)

= \sum_{j=1}^{n}

How to prove the chain rule for mutual information

Related to How to prove the chain rule for mutual information

1. What is the chain rule for mutual information?

2. How is the chain rule for mutual information derived?

3. What is the significance of the chain rule for mutual information?

4. Can the chain rule for mutual information be applied to continuous variables?

5. Are there any limitations to the chain rule for mutual information?

Similar threads

Hot Threads

Recent Insights