Calculating Mutual Information

  • Thread starter mathguy2009
  • Start date
  • Tags
    Information
In summary, the person was attempting to estimate the mutual information of two variables (die roll and table position) and made a mistake in thinking that the two variables were unrelated. The mutual information calculated from the data (after an infinite number of trials) was very small, suggesting that the die was loaded.
  • #1
mathguy2009
3
0
Hi all,

I had a question about calculating mutual information, a topic to which I am very new. Consider the following hypothetical situation, in which I describe the entire process from what I understand:

Suppose I had a normal die (sides numbered 1-6) that you roll on a leaning table. I draw a line on the table so that the line divides the higher and lower halves of the table. Now, suppose I wanted to calculate the probability that I would roll a 5 on the upper half of the table. So, I begin by drawing a table like such:

-----1----2----3----4----5----6
High
Low

Then I perform 100 trials and record how many times a certain number was rolled on the higher or lower half of the table:

-----1----2----3----4----5----6
High_2____5___7___10___1___8
Low_11___6___8___12__20___10

To calculate the joint probability distribution P(roll, position), I would simply divide by the number of times the die was rolled (100 times) to produce the following table:

-------1------2------3------4------5-------6
High_0.02___0.05___0.07___0.10__0.01___0.08
Low_0.11___0.06___0.08___0.12__0.20___0.10

To calculate the marginal probabilities, I sum the rows and columns:
P(roll) = (sum(col 1), sum(col 2), ..., sum(col 6)) = (0.13, 0.11, 0.15, 0.22, 0.21, 0.18)
P(position) = (sum(row 1), sum(row 2)) = (0.33, 0.67)

Lastly, I calculate the mutual information of the data set (for an article on mutual information, see http://en.wikipedia.org/wiki/Mutual_information) using the following formula, using a base-2 logarithm to obtain an answer in units of bits:

MI(roll#; position) = [itex]\sum[/itex]roll#[itex]\sum[/itex]positionP(roll#, position)log2[itex]\frac{P(roll, position)}{P(roll)P(position)}[/itex]

The number I get is 0.1206 bits...I suspect I did something wrong somewhere along the way, since this number is suspiciously small, but I cannot find my mistake. Any suggestions/corrections would be very much appreciated. Thanks in advance!
 
Physics news on Phys.org
  • #2
The most fundamental mistake you are making is to confuse the idea of "data from a sample" with the idea of "parameters of a probability distribution" from which a sample was selected.

The formula in the wikipedia article on mutual information is a formula that would be applied to to a probability distribution. What you did is the natural way that one would would estimate the mutual information from a sample. You used the observed frequencies of the dice results as their probabilities. When you contemplate your results you have to think this way:

I'm using a sample to estimate the true probabilities. The frequencies in the sample are unlikely to exactly match the true probabilities. What sort of distribution of errors between the estimated mutual information and the true mutual information should I expect?

Sometime the best way to estimate things from a sample is to use a formula that does not look like the corresponding formula involving the parameters of the probability distribution. I don't know the best estimation formula for mutual information.

One way you can try to answer this is take samples from various probability distributions using a computer simulation and see how the estimated mutual information varies from the truth when various ways of estimating it are used.

You say that your calculation produced a number is "suspiciously small", but if this were a problem from a textbook, one would expect the mutual information to be zero because in a typical textbook setting, the way the die lands would be independent of where it landed on the table. Why do you think the number is too small? Is the thrower attempting to manipulate how the die lands and do you expect him to be more successful on one half of the table than the other?
 
Last edited:
  • #3
Thanks for the reply, Stephen. I did suspect that the die number and the position on the table were unrelated, but wasn't 100% sure. So, in a perfect world/after an infinite number of trials, the mutual information would have actually approached 0 bits, correct?

I am actually working with some large amounts of data in a project in which I believed two random variables involved are highly related, yet I was getting smaller numbers than in this little example here. So I suspected that perhaps my calculations were incorrect; but, if I understand you correctly, my calculations were indeed correct, but my interpretation of the results was wrong.

Interesting side note: could we tell whether the die is loaded, based on how much mutual information we obtain?
 
  • #4
mathguy2009 said:
So, in a perfect world/after an infinite number of trials, the mutual information would have actually approached 0 bits, correct?
That would depend on how you define a perfect world!

if I understand you correctly, my calculations were indeed correct, but my interpretation of the results was wrong.

You didn't understand me. What would it mean for your calculations to be "correct"? You seem to think there is only one "correct" way to estimate something from data. That isn't true. The definition of mutual information (which is given in terms of probabilities) certainly suggests that we can try estimating mutual information by plugging in observed frequencies in place of probabilities. But it isn't clear this is the "correct" way.

This link says that for small samples, the calculation you are using will, on the average, overestimate mutual information:

http://robotics.stanford.edu/~gal/Research/Redundancy-Reduction/Neuron_suppl/node2.html

My guess is that for large samples, your method is OK, but I haven't tried to prove that.
 
  • #5


Hi there,

First of all, it's great that you are exploring mutual information and trying to apply it in a real-world scenario. Your approach to calculating the joint probability distribution and marginal probabilities is correct.

However, I believe the mistake lies in your calculation of mutual information. The formula you have used is for discrete variables, but in your example, the position variable is not discrete - it is a continuous variable. This means that you cannot simply sum over all possible positions, as there are infinitely many possible positions between the high and low halves of the table.

To calculate mutual information for continuous variables, you need to use a different formula, such as the Kullback-Leibler divergence or the Pearson correlation coefficient. These formulas take into account the continuous nature of the variables and will give you a more accurate result.

I would suggest doing some more research on mutual information and its applications to continuous variables before proceeding with your calculations. Additionally, it may be helpful to consult with a statistician or data scientist for guidance on your specific scenario. Good luck!
 

What is mutual information?

Mutual information is a measure of the mutual dependence between two variables. It measures how much information is shared between the two variables, and is often used in data analysis and machine learning.

How is mutual information calculated?

Mutual information is calculated by taking the sum of the joint probabilities multiplied by the log of the ratio of the joint probabilities to the product of the individual probabilities. This can be expressed as MI(X,Y) = ∑∑P(x,y) * log(P(x,y) / (P(x) * P(y))).

What is the range of values for mutual information?

The range of values for mutual information is from 0 to infinity. A value of 0 indicates no mutual dependence between the two variables, while a higher value indicates a stronger relationship.

What are some applications of mutual information?

Mutual information has many applications in fields such as data analysis, machine learning, and information theory. It is often used for feature selection, clustering, and dimensionality reduction in data analysis. In machine learning, mutual information is used for feature selection and as an evaluation metric for models. In information theory, it is used to measure the amount of information shared between two variables.

What are the limitations of mutual information?

Mutual information can only measure linear relationships between variables, and may not capture non-linear relationships. It also assumes that the variables are independent, which may not always be the case. Additionally, mutual information may be affected by the size of the dataset and the number of categories in the variables being measured.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
32
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
846
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
41
Views
3K
Back
Top