# Mutual Information between two Gaussian distributions

1. Jul 16, 2013

### six7th

Suppose I have a Gaussian probability distribution:

$N_{A}(0,1).$

A set of values are generated from this distribution to which an arbitrary amount of Gaussian noise, say $N_{B}(0,0.5)$, is added and then the $N_{B}$ values sorted from lowest to highest. These are then digitised by assigning 0 to a value below 0 and then 1 to a value above 0. We end up with values such as:

-3.11359 | 0 | -6.91717 | 0 |
-3.31609 | 0 | -6.70810 | 0 |
-3.67497 | 0 | -6.13978 | 0 |
-2.73024 | 0 | -6.12173 | 0 |
-4.20178 | 0 | -6.07266 | 0 |
-3.38846 | 0 | -5.88277 | 0 |

To calculate the mutual information in bits between this set of data we use the following formula:

$I(A;B) = \sum_{x \in A} \sum_{y \in B}P(x,y)log_{2}(\frac{P(x,y)}{P(x)P(y)})$

where $P(x,y)$ is the probability of A = x and B = y. Now in this case P(0,1) = P(1,0) = P(1,1) = 0 so we are left with

$I(A;B) = P(0,0)log_{2}(\frac{P(0,0)}{P(x=0)P(y=0)})$

Now here is what I don't get. For this Gaussian distribution P(x=0) ≈ 0.5 and P(y=0) ≈ 0.5 and from this set of data P(0,0) = 1. This gives,

$I(A;B) = log_{2}(\frac{1}{0.25}) = 2$

From what I understand it is should not possible for the mutual information to be greater than 1. Where am I going wrong with this? I feel as though I am making a very basic error.

Thanks.

2. Jul 16, 2013

### mathman

For starters - your notation is confusing. First you have A and B as random variables with x and y as numbers, but then x and y become random variables?

3. Jul 16, 2013

### chiro

Hey six7th and welcome to the forums.

If you are using entropy calculations on a continuous random variable, then you are going to likely get junk answers for entropy calculations.

Information and entropy calculations only really make sense when you are looking at finite distributions or information samples with finite alphabets.

There are results for continuous distributions, but again in the context of entropy is, how its calculated, and what it represents is not the same as a proper measure with regard to distributions with a finite sample space and alphabet.

4. Jul 17, 2013

### six7th

The way I have used the notation is that A and B are the distributions and x and y are the numbers that belong to each distribution.

Thanks chiro. Surely if I generate an arbitrary but finite amount of numbers from a Gaussian distribution then I will have a new distribution of numbers which will be finite, and the given equation can be used? Apologies if such a question is completely wrong, I have not formally studied Information theory and I am only just getting to grips with it.

5. Jul 17, 2013

### chiro

If the distribution is finite in its possibilities then yes you can do that.

Are you binning the distribution in some way and if so, how?

6. Jul 18, 2013

### six7th

Yeah I am binning the data, here's a basic outline of what is being done:

1. Generate 1 Million values from a Gaussian distribution of mean 0 and standard deviation 1
2. Add gaussian noise to each of these values, we now have two distributions of numbers. One with noise and one without
3. Sort by the noisy distribution from lowest to highest
4. Group the data into arbitrary bin widths
5. Perform the mutual information calculation on each bin

The values in the first post are an example of what would be the initial bin

7. Jul 18, 2013

### chiro

You should be able to use the definitions for the distribution to calculate the entropies including the joint and mutual ones to answer your question.

8. Jul 22, 2013

### jfizzix

The mutual information is not bounded above by 1, but by the marginal entropies. The mutual information is a measure of how many bits is transmitted about X when Y is known. If X and Y are high-dimensional, there's no reasons the marginal entropies couldn't be large, that the channel capacity couldn't be large, or that the mutual information couldn't be large.