Mutual information. concave/convex

PetitPrince · Nov 20, 2011

hi everybody,

while looking on the mutual information of two variables, one find that it is concave of p(x) given p(x|y) and convex of p(x|y) given p(x).

the first statement is okey, but when it comes to proving the second, i get stuck, even when i find proofs already done i didn't get how they can conclude the convexity of I(x,y) as a function of p(x|y) from the convexity of the relative entropy D(p||q).

here is a piece of the proof i didnt understand
http://ocw.usu.edu/Electrical_and_Computer_Engineering/Information_Theory/lecture3.pdf

if you have any idea, i'd very much appreciate it.

thank you in advance.

mmwave · Nov 20, 2011

Hello there,

It is great to see that you are actively engaging with the concept of mutual information and its properties. The proof you have shared is indeed quite complex and requires a solid understanding of information theory and convex optimization. Let me try to break it down for you in simpler terms.

First, let's define some terms. The mutual information between two random variables X and Y is denoted by I(X,Y) and is defined as the difference between the joint entropy of X and Y and the sum of their individual entropies, as follows:

I(X,Y) = H(X) + H(Y) - H(X,Y)

Where H(X) and H(Y) are the entropies of X and Y respectively, and H(X,Y) is the joint entropy of X and Y.

Now, the relative entropy between two probability distributions p and q is denoted by D(p||q) and is defined as follows:

D(p||q) = E[p(x) log(p(x)/q(x))]

Where E[.] denotes the expectation operator.

Now, let's look at the proof you shared. The main idea behind the proof is that the mutual information I(X,Y) can be written as a function of the relative entropy D(p||q), where p and q are two different probability distributions. In other words, we can express I(X,Y) as a function of p(x|y) and p(x), which are two different probability distributions for the random variable X.

Now, the relative entropy D(p||q) is a convex function of p(x|y) given p(x), which means that it has a unique minimum point. This minimum point can be found by setting the derivative of D(p||q) with respect to p(x|y) equal to zero. This minimum point can then be used to find the minimum value of I(X,Y).

I hope this explanation helps you understand the proof better. If you have any further questions, please do not hesitate to ask. Keep up the good work!

Mutual information. concave/convex

1. What is mutual information and why is it important in science?

2. How is mutual information calculated?

3. What does it mean for mutual information to be concave or convex?

4. How is mutual information used in machine learning?

5. Are there any limitations to using mutual information?

Similar threads

Hot Threads

Recent Insights