Mutual Information between two Gaussian distributions

In summary, the conversation discusses the calculation of mutual information for a set of data generated from a Gaussian distribution and added with Gaussian noise. The mutual information is calculated using a formula involving probabilities and logarithms. However, there is confusion on whether the mutual information can be greater than 1, and it is clarified that it is not bounded above by 1 but by the marginal entropies. The conversation also mentions the use of binning to perform the mutual information calculation.
  • #1
six7th
14
0
Suppose I have a Gaussian probability distribution:

[itex]N_{A}(0,1).[/itex]

A set of values are generated from this distribution to which an arbitrary amount of Gaussian noise, say [itex]N_{B}(0,0.5)[/itex], is added and then the [itex]N_{B}[/itex] values sorted from lowest to highest. These are then digitised by assigning 0 to a value below 0 and then 1 to a value above 0. We end up with values such as:

-3.11359 | 0 | -6.91717 | 0 |
-3.31609 | 0 | -6.70810 | 0 |
-3.67497 | 0 | -6.13978 | 0 |
-2.73024 | 0 | -6.12173 | 0 |
-4.20178 | 0 | -6.07266 | 0 |
-3.38846 | 0 | -5.88277 | 0 |

To calculate the mutual information in bits between this set of data we use the following formula:

[itex]I(A;B) = \sum_{x \in A} \sum_{y \in B}P(x,y)log_{2}(\frac{P(x,y)}{P(x)P(y)}) [/itex]

where [itex]P(x,y)[/itex] is the probability of A = x and B = y. Now in this case P(0,1) = P(1,0) = P(1,1) = 0 so we are left with

[itex]I(A;B) = P(0,0)log_{2}(\frac{P(0,0)}{P(x=0)P(y=0)})[/itex]

Now here is what I don't get. For this Gaussian distribution P(x=0) ≈ 0.5 and P(y=0) ≈ 0.5 and from this set of data P(0,0) = 1. This gives,

[itex]I(A;B) = log_{2}(\frac{1}{0.25}) = 2 [/itex]

From what I understand it is should not possible for the mutual information to be greater than 1. Where am I going wrong with this? I feel as though I am making a very basic error.

Thanks.
 
Physics news on Phys.org
  • #2
For starters - your notation is confusing. First you have A and B as random variables with x and y as numbers, but then x and y become random variables?
 
  • #3
Hey six7th and welcome to the forums.

If you are using entropy calculations on a continuous random variable, then you are going to likely get junk answers for entropy calculations.

Information and entropy calculations only really make sense when you are looking at finite distributions or information samples with finite alphabets.

There are results for continuous distributions, but again in the context of entropy is, how its calculated, and what it represents is not the same as a proper measure with regard to distributions with a finite sample space and alphabet.
 
  • #4
mathman said:
For starters - your notation is confusing. First you have A and B as random variables with x and y as numbers, but then x and y become random variables?

The way I have used the notation is that A and B are the distributions and x and y are the numbers that belong to each distribution.

chiro said:
Hey six7th and welcome to the forums.

If you are using entropy calculations on a continuous random variable, then you are going to likely get junk answers for entropy calculations.

Information and entropy calculations only really make sense when you are looking at finite distributions or information samples with finite alphabets.

There are results for continuous distributions, but again in the context of entropy is, how its calculated, and what it represents is not the same as a proper measure with regard to distributions with a finite sample space and alphabet.

Thanks chiro. Surely if I generate an arbitrary but finite amount of numbers from a Gaussian distribution then I will have a new distribution of numbers which will be finite, and the given equation can be used? Apologies if such a question is completely wrong, I have not formally studied Information theory and I am only just getting to grips with it.
 
  • #5
If the distribution is finite in its possibilities then yes you can do that.

Are you binning the distribution in some way and if so, how?
 
  • #6
Yeah I am binning the data, here's a basic outline of what is being done:

  1. Generate 1 Million values from a Gaussian distribution of mean 0 and standard deviation 1
  2. Add gaussian noise to each of these values, we now have two distributions of numbers. One with noise and one without
  3. Sort by the noisy distribution from lowest to highest
  4. Group the data into arbitrary bin widths
  5. Perform the mutual information calculation on each bin

The values in the first post are an example of what would be the initial bin
 
  • #7
You should be able to use the definitions for the distribution to calculate the entropies including the joint and mutual ones to answer your question.
 
  • #8
The mutual information is not bounded above by 1, but by the marginal entropies. The mutual information is a measure of how many bits is transmitted about X when Y is known. If X and Y are high-dimensional, there's no reasons the marginal entropies couldn't be large, that the channel capacity couldn't be large, or that the mutual information couldn't be large.
 

What is mutual information?

Mutual information is a measure of the shared information between two random variables. It quantifies the amount of information that one variable provides about the other.

How is mutual information calculated?

Mutual information can be calculated using the formula MI(X,Y) = H(X) + H(Y) - H(X,Y), where H(X) and H(Y) are the individual entropies of the two variables and H(X,Y) is their joint entropy.

What does mutual information between two Gaussian distributions tell us?

Mutual information between two Gaussian distributions tells us how much information is shared between the two distributions. It is a measure of the dependence or correlation between the two variables.

How is mutual information related to other measures of dependence?

Mutual information is closely related to other measures of dependence, such as correlation and covariance. However, it takes into account both linear and non-linear relationships between variables and can capture more complex dependencies.

Can mutual information be negative?

Yes, mutual information can be negative. This indicates that the two variables have a negative correlation or dependence, meaning that as one variable increases, the other decreases.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
1K
Replies
0
Views
358
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
931
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
821
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
961
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
Back
Top