How to Calculate Mutual Information for a Die Roll on a Leaning Table?

  • Thread starter Thread starter mathguy2009
  • Start date Start date
  • Tags Tags
    Information
AI Thread Summary
Calculating mutual information for a die roll on a leaning table involves estimating joint and marginal probabilities from observed frequencies after rolling the die multiple times. The initial calculations yielded a mutual information value of 0.1206 bits, which raised concerns about its validity due to its small size. It was clarified that the mutual information should approach zero if the die's outcome is independent of its position on the table, suggesting that the initial interpretation of the results was flawed. The discussion also highlighted that the method used for estimation might not be the most accurate, particularly with small sample sizes, and that different approaches could yield varying results. Ultimately, the relationship between the die's outcome and its position may not be significant, especially in a theoretical context.
mathguy2009
Messages
3
Reaction score
0
Hi all,

I had a question about calculating mutual information, a topic to which I am very new. Consider the following hypothetical situation, in which I describe the entire process from what I understand:

Suppose I had a normal die (sides numbered 1-6) that you roll on a leaning table. I draw a line on the table so that the line divides the higher and lower halves of the table. Now, suppose I wanted to calculate the probability that I would roll a 5 on the upper half of the table. So, I begin by drawing a table like such:

-----1----2----3----4----5----6
High
Low

Then I perform 100 trials and record how many times a certain number was rolled on the higher or lower half of the table:

-----1----2----3----4----5----6
High_2____5___7___10___1___8
Low_11___6___8___12__20___10

To calculate the joint probability distribution P(roll, position), I would simply divide by the number of times the die was rolled (100 times) to produce the following table:

-------1------2------3------4------5-------6
High_0.02___0.05___0.07___0.10__0.01___0.08
Low_0.11___0.06___0.08___0.12__0.20___0.10

To calculate the marginal probabilities, I sum the rows and columns:
P(roll) = (sum(col 1), sum(col 2), ..., sum(col 6)) = (0.13, 0.11, 0.15, 0.22, 0.21, 0.18)
P(position) = (sum(row 1), sum(row 2)) = (0.33, 0.67)

Lastly, I calculate the mutual information of the data set (for an article on mutual information, see http://en.wikipedia.org/wiki/Mutual_information) using the following formula, using a base-2 logarithm to obtain an answer in units of bits:

MI(roll#; position) = \sumroll#\sumpositionP(roll#, position)log2\frac{P(roll, position)}{P(roll)P(position)}

The number I get is 0.1206 bits...I suspect I did something wrong somewhere along the way, since this number is suspiciously small, but I cannot find my mistake. Any suggestions/corrections would be very much appreciated. Thanks in advance!
 
Physics news on Phys.org
The most fundamental mistake you are making is to confuse the idea of "data from a sample" with the idea of "parameters of a probability distribution" from which a sample was selected.

The formula in the wikipedia article on mutual information is a formula that would be applied to to a probability distribution. What you did is the natural way that one would would estimate the mutual information from a sample. You used the observed frequencies of the dice results as their probabilities. When you contemplate your results you have to think this way:

I'm using a sample to estimate the true probabilities. The frequencies in the sample are unlikely to exactly match the true probabilities. What sort of distribution of errors between the estimated mutual information and the true mutual information should I expect?

Sometime the best way to estimate things from a sample is to use a formula that does not look like the corresponding formula involving the parameters of the probability distribution. I don't know the best estimation formula for mutual information.

One way you can try to answer this is take samples from various probability distributions using a computer simulation and see how the estimated mutual information varies from the truth when various ways of estimating it are used.

You say that your calculation produced a number is "suspiciously small", but if this were a problem from a textbook, one would expect the mutual information to be zero because in a typical textbook setting, the way the die lands would be independent of where it landed on the table. Why do you think the number is too small? Is the thrower attempting to manipulate how the die lands and do you expect him to be more successful on one half of the table than the other?
 
Last edited:
Thanks for the reply, Stephen. I did suspect that the die number and the position on the table were unrelated, but wasn't 100% sure. So, in a perfect world/after an infinite number of trials, the mutual information would have actually approached 0 bits, correct?

I am actually working with some large amounts of data in a project in which I believed two random variables involved are highly related, yet I was getting smaller numbers than in this little example here. So I suspected that perhaps my calculations were incorrect; but, if I understand you correctly, my calculations were indeed correct, but my interpretation of the results was wrong.

Interesting side note: could we tell whether the die is loaded, based on how much mutual information we obtain?
 
mathguy2009 said:
So, in a perfect world/after an infinite number of trials, the mutual information would have actually approached 0 bits, correct?
That would depend on how you define a perfect world!

if I understand you correctly, my calculations were indeed correct, but my interpretation of the results was wrong.

You didn't understand me. What would it mean for your calculations to be "correct"? You seem to think there is only one "correct" way to estimate something from data. That isn't true. The definition of mutual information (which is given in terms of probabilities) certainly suggests that we can try estimating mutual information by plugging in observed frequencies in place of probabilities. But it isn't clear this is the "correct" way.

This link says that for small samples, the calculation you are using will, on the average, overestimate mutual information:

http://robotics.stanford.edu/~gal/Research/Redundancy-Reduction/Neuron_suppl/node2.html

My guess is that for large samples, your method is OK, but I haven't tried to prove that.
 
I'm taking a look at intuitionistic propositional logic (IPL). Basically it exclude Double Negation Elimination (DNE) from the set of axiom schemas replacing it with Ex falso quodlibet: ⊥ → p for any proposition p (including both atomic and composite propositions). In IPL, for instance, the Law of Excluded Middle (LEM) p ∨ ¬p is no longer a theorem. My question: aside from the logic formal perspective, is IPL supposed to model/address some specific "kind of world" ? Thanks.
I was reading a Bachelor thesis on Peano Arithmetic (PA). PA has the following axioms (not including the induction schema): $$\begin{align} & (A1) ~~~~ \forall x \neg (x + 1 = 0) \nonumber \\ & (A2) ~~~~ \forall xy (x + 1 =y + 1 \to x = y) \nonumber \\ & (A3) ~~~~ \forall x (x + 0 = x) \nonumber \\ & (A4) ~~~~ \forall xy (x + (y +1) = (x + y ) + 1) \nonumber \\ & (A5) ~~~~ \forall x (x \cdot 0 = 0) \nonumber \\ & (A6) ~~~~ \forall xy (x \cdot (y + 1) = (x \cdot y) + x) \nonumber...
Back
Top