How to prove the chain rule for mutual information

In summary: H(X_j | X_{j-1}, ... , X_1) - \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)= \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - H(X_1, X_2, ... , X_n | Y)= \sum_{j=1}^{n} I(X_j; Y|X_{j-1}, ..., X_1)This proves the chain rule for mutual information. In summary, we can use the chain rule for entropy and conditional entropy to expand the mutual information for multiple random variables and
  • #1
Master1022
611
117
Homework Statement
Prove the chain rule for mutual information
Relevant Equations
Entropy and mutual information
Hi,

I was attempting the following problem and am a bit confused because I don't quite understand the structure/notation of how mutual information and entropy work when we have conditionals (a|b) and multiple events (a, b).

Question: Prove the chain rule for mutual information.
[tex] I(X_1, X_2, ... , X_n ; Y) = \sum_{j=1}^{n} I(X_j; Y|X_{j-1}, ..., X_1) [/tex]

Attempt:
I started by doing:

[tex] I(X_1, X_2, ... , X_n ; Y) = H(X_1, X_2, ..., X_n) - H(X_1, X_2, ... , X_n | Y) [/tex]
And I know the expression on the right can be expanded using the chain rule for entropy which is [itex] H(X_1, X_2, ..., X_n) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) [/itex], and thus the expression becomes:
[tex] I(X_1, X_2, ... , X_n ; Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - H(X_1, X_2, ... , X_n | Y) [/tex]

Now I am thinking about how to deal with the second term. I think I should be able to use the chain rule for entropy as well, then then combine both sums to get something useful. However, I am stuck because I am confused by the presence of the condition and multiple things before the condition bar... Nonetheless, if I think about it from a Venn diagram standpoint then I think I get to something like:
[tex] H(X_1, X_2, ... , X_n | Y) = H(X_n | Y, X_{n - 1}, ... , X_1) \cdot H(X_{n - 1}, ... , X_1) = H(X_n | Y, X_{n - 1}, ... , X_1) \prod_{j = 1}^{n-1} H(X_j | X_{j - 1}, ... , X_1) [/tex]

However, I am not sure whether this is correct. Any help would be greatly appreciated.
 
Physics news on Phys.org
  • #2


Hi there,

It looks like you're on the right track! Let me try to break down the problem and explain it in a bit more detail.

First, let's start with the definition of mutual information. Mutual information between two random variables X and Y is defined as:

I(X;Y) = H(X) - H(X|Y)

where H(X) is the entropy of X and H(X|Y) is the conditional entropy of X given Y.

Now, let's expand this definition to multiple random variables. In your attempt, you correctly used the chain rule for entropy to expand H(X) to:

H(X_1, X_2, ..., X_n) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1)

We can do the same for the conditional entropy H(X|Y) by using the chain rule for conditional entropy. This gives us:

H(X_1, X_2, ..., X_n | Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)

Now, we can substitute these expanded expressions into the definition of mutual information to get:

I(X_1, X_2, ..., X_n ; Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)

At this point, we can see that H(X_j | X_{j-1}, ... , X_1) appears in both sums. We can combine these sums to get:

I(X_1, X_2, ..., X_n ; Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)

= \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1, Y)

= \sum_{j=1}^{n}
 

Similar threads

Replies
5
Views
2K
Replies
3
Views
869
Replies
4
Views
675
Replies
7
Views
2K
Replies
1
Views
2K
Replies
2
Views
1K
Back
Top