Mutual Information between two Gaussian distributions

Click For Summary

Discussion Overview

The discussion revolves around the calculation of mutual information between two Gaussian distributions, particularly focusing on the implications of using continuous distributions in entropy calculations. Participants explore the nuances of applying mutual information formulas to finite samples derived from Gaussian distributions and the potential pitfalls of such calculations.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant presents a formula for mutual information and expresses confusion about obtaining a value greater than 1, suspecting a basic error in their reasoning.
  • Another participant points out potential confusion in notation, questioning the distinction between random variables and their corresponding values.
  • A different participant suggests that entropy calculations for continuous random variables may yield misleading results, emphasizing that such calculations are more appropriate for finite distributions.
  • One participant argues that generating a finite number of samples from a Gaussian distribution should allow for valid mutual information calculations, seeking clarification on the use of the formula.
  • Another participant inquires about the method of binning the distribution to facilitate the mutual information calculation.
  • A later reply confirms that binning the data is indeed a valid approach and outlines the process of generating values, adding noise, sorting, and grouping data into bins for analysis.
  • One participant asserts that mutual information is not inherently bounded by 1, but rather by the marginal entropies, suggesting that high-dimensional variables can lead to larger mutual information values.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of using mutual information calculations with continuous distributions, with some asserting that it is valid under certain conditions while others caution against it. The discussion remains unresolved regarding the implications of these calculations and the proper interpretation of results.

Contextual Notes

There are limitations regarding the assumptions made about the distributions and the definitions used in the calculations. The discussion highlights the need for clarity in notation and the conditions under which mutual information can be calculated accurately.

six7th
Messages
14
Reaction score
0
Suppose I have a Gaussian probability distribution:

N_{A}(0,1).

A set of values are generated from this distribution to which an arbitrary amount of Gaussian noise, say N_{B}(0,0.5), is added and then the N_{B} values sorted from lowest to highest. These are then digitised by assigning 0 to a value below 0 and then 1 to a value above 0. We end up with values such as:

-3.11359 | 0 | -6.91717 | 0 |
-3.31609 | 0 | -6.70810 | 0 |
-3.67497 | 0 | -6.13978 | 0 |
-2.73024 | 0 | -6.12173 | 0 |
-4.20178 | 0 | -6.07266 | 0 |
-3.38846 | 0 | -5.88277 | 0 |

To calculate the mutual information in bits between this set of data we use the following formula:

I(A;B) = \sum_{x \in A} \sum_{y \in B}P(x,y)log_{2}(\frac{P(x,y)}{P(x)P(y)})

where P(x,y) is the probability of A = x and B = y. Now in this case P(0,1) = P(1,0) = P(1,1) = 0 so we are left with

I(A;B) = P(0,0)log_{2}(\frac{P(0,0)}{P(x=0)P(y=0)})

Now here is what I don't get. For this Gaussian distribution P(x=0) ≈ 0.5 and P(y=0) ≈ 0.5 and from this set of data P(0,0) = 1. This gives,

I(A;B) = log_{2}(\frac{1}{0.25}) = 2

From what I understand it is should not possible for the mutual information to be greater than 1. Where am I going wrong with this? I feel as though I am making a very basic error.

Thanks.
 
Physics news on Phys.org
For starters - your notation is confusing. First you have A and B as random variables with x and y as numbers, but then x and y become random variables?
 
Hey six7th and welcome to the forums.

If you are using entropy calculations on a continuous random variable, then you are going to likely get junk answers for entropy calculations.

Information and entropy calculations only really make sense when you are looking at finite distributions or information samples with finite alphabets.

There are results for continuous distributions, but again in the context of entropy is, how its calculated, and what it represents is not the same as a proper measure with regard to distributions with a finite sample space and alphabet.
 
mathman said:
For starters - your notation is confusing. First you have A and B as random variables with x and y as numbers, but then x and y become random variables?

The way I have used the notation is that A and B are the distributions and x and y are the numbers that belong to each distribution.

chiro said:
Hey six7th and welcome to the forums.

If you are using entropy calculations on a continuous random variable, then you are going to likely get junk answers for entropy calculations.

Information and entropy calculations only really make sense when you are looking at finite distributions or information samples with finite alphabets.

There are results for continuous distributions, but again in the context of entropy is, how its calculated, and what it represents is not the same as a proper measure with regard to distributions with a finite sample space and alphabet.

Thanks chiro. Surely if I generate an arbitrary but finite amount of numbers from a Gaussian distribution then I will have a new distribution of numbers which will be finite, and the given equation can be used? Apologies if such a question is completely wrong, I have not formally studied Information theory and I am only just getting to grips with it.
 
If the distribution is finite in its possibilities then yes you can do that.

Are you binning the distribution in some way and if so, how?
 
Yeah I am binning the data, here's a basic outline of what is being done:

  1. Generate 1 Million values from a Gaussian distribution of mean 0 and standard deviation 1
  2. Add gaussian noise to each of these values, we now have two distributions of numbers. One with noise and one without
  3. Sort by the noisy distribution from lowest to highest
  4. Group the data into arbitrary bin widths
  5. Perform the mutual information calculation on each bin

The values in the first post are an example of what would be the initial bin
 
You should be able to use the definitions for the distribution to calculate the entropies including the joint and mutual ones to answer your question.
 
The mutual information is not bounded above by 1, but by the marginal entropies. The mutual information is a measure of how many bits is transmitted about X when Y is known. If X and Y are high-dimensional, there's no reasons the marginal entropies couldn't be large, that the channel capacity couldn't be large, or that the mutual information couldn't be large.
 

Similar threads

  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K