Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Maximizing the entropy of a constrained distribution

  1. Feb 18, 2014 #1
    It is well-known that with known marginal probabilities a_{i} and
    b_{j} the joint probability distribution maximizing the entropy

    [tex]H(P)=-\sum_{i=1}^{m}\sum_{j=1}^{n}p_{ij}\log{}p_{ij}[/tex]

    is [tex]p_{ij}=a_{i}b_{j}[/tex]

    For m=3 and n=3, a=(0.2,0.3,0.5), b=(0.1,0.6,0.3), for example,

    [tex]\begin{equation}
    \label{eq:r1}
    P=\left(
    \begin{array}{rrr}
    0.02 & 0.12 & 0.06 \\
    0.03 & 0.18 & 0.09 \\
    0.05 & 0.30 & 0.15
    \end{array}
    \right)
    \end{equation}[/tex]

    Now, I have a problem where the joint probability distribution is
    constrained (much like a random walk where the path from one node to
    another is blocked). For example, let m, n, a, and b be as above with
    the constraint that

    [tex]\begin{equation}
    \label{eq:r2}
    P=\left(
    \begin{array}{rrr}
    p_{11} & 0 & p_{13} \\
    p_{21} & p_{22} & p_{23} \\
    p_{31} & p_{32} & p_{33} \\
    \end{array}
    \right)
    \end{equation}[/tex]

    Because a and b are known, it suffices to find out any 2x2 matrix
    contained in the 3x3 matrix, for example (x,y,z the variables by which
    w_{1}, w_{2}, v_{1}, v_{2}, and sigma are expressible, given a and b)

    [tex]\begin{equation}
    \label{eq:r3}
    P=\left(
    \begin{array}{rrr}
    x & 0 & w_{1} \\
    y & z & w_{2} \\
    v_{1} & v_{2} & \sigma \\
    \end{array}
    \right)
    \end{equation}[/tex]

    I use this to write out the entropy and differentiate with respect to
    x, y, and z to find out that the maximum will be where

    [tex]\begin{equation}
    \label{eq:r4}
    \det\left(
    \begin{array}{ll}
    x & w_{1} \\
    v_{1} & \sigma
    \end{array}
    \right)=0
    \end{equation}[/tex]

    [tex]\begin{equation}
    \label{eq:r5}
    \det\left(
    \begin{array}{ll}
    y & w_{2} \\
    v_{1} & \sigma
    \end{array}
    \right)=0
    \end{equation}[/tex]

    [tex]\begin{equation}
    \label{eq:r6}
    \det\left(
    \begin{array}{ll}
    z & w_{2} \\
    v_{2} & \sigma
    \end{array}
    \right)=0
    \end{equation}[/tex]

    This is a system of 3 non-linear equations which are awkward to solve
    algebraically. In the end, I am interested to know which property a
    and b need to have to ensure that P is a proper probability
    distribution (i.e. no negative elements). For now, however, I'd be
    thrilled if anybody could give me a hint how I could find the
    solutions for x, y, and z algebraically without these non-linear
    equations.

    The numeric solution for this is

    [tex]\begin{equation}
    \label{eq:r7}
    P=\left(
    \begin{array}{rrr}
    0.093 & 0.000 & 0.107 \\
    0.001 & 0.113 & 0.186 \\
    0.006 & 0.487 & 0.007
    \end{array}
    \right)
    \end{equation}[/tex]

    But I have definitely seen systems where all solutions were complex
    and some of the probabilities ended up <0, especially when m and n are
    larger than 3 and there are more than one zero constraint in the joint
    probability matrix.
     
    Last edited: Feb 18, 2014
  2. jcsd
  3. Feb 19, 2014 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    I'm curious how the matrix in the numerical example can be interpreted as describing a random walk between nodes. I think of such a walk being described by the transition matrix for a markov chain rather than a matrix that is a joint probability density.
     
  4. Feb 19, 2014 #3
    Yes, that's correct. I mentioned the transition matrix for a Markov
    chain because in Cover and Thomas (exercise 4.16) there is a really
    interesting observation about the eigenvalues of a matrix derived from
    the transition matrix and the maximum entropy of the transition
    matrix. What I am trying to do has nothing to do with random walks,
    but Cover and Thomas's observation and the fact that random walks also
    have constraints such as a broken path from a node, made it worth
    mentioning.

    Here is what I need to solve. I have a prior joint probability
    distribution

    [tex]
    \begin{equation}
    \label{eq:r1}
    \left(
    \begin{array}{rrrr}
    p_{11} & p_{12} & 0 & 0 \\
    p_{21} & p_{22} & 0 & 0 \\
    0 & p_{32} & 0 & p_{34} \\
    p_{41} & p_{42} & p_{43} & p_{44} \\
    p_{51} & p_{52} & p_{53} & p_{54}
    \end{array}
    \right)
    \end{equation}
    [/tex]

    The marginals are
    [itex](\alpha_{1},\alpha_{2},\alpha_{3},\alpha_{4},1-\sum\alpha_{i})^{T}[/itex]
    and [itex](0.2,0.3,0.4,0.1)[/itex]; and a posterior joint probability
    matrix

    [tex]
    \begin{equation}
    \label{eq:r2}
    \left(
    \begin{array}{rrrr}
    q_{11} & q_{12} & 0 & 0 \\
    q_{21} & q_{22} & 0 & 0 \\
    0 & q_{32} & 0 & q_{34} \\
    q_{41} & q_{42} & q_{43} & q_{44} \\
    0 & 0 & 0 & 0
    \end{array}
    \right)
    \end{equation}
    [/tex]

    where the marginals are [itex](.40,.30,.20,.10,0)^{T}[/itex] and
    [itex](\beta_{1},\beta_{2},\beta_{3},1-\sum\beta_{i})[/itex]. I want
    to populate these joint probability distributions using the principle
    of maximum entropy and find out
    [itex](\beta_{1},\beta_{2},\beta_{3})[/itex].
     
  5. Feb 21, 2014 #4

    Stephen Tashi

    User Avatar
    Science Advisor

    This is just a random thought:

    A naive computer programmer who was given some marginal densities a[], b[] and some constraints like p[1][2] = 0 might implement sampling from a distribution from those properties by the straightforward algorithm:

    Pick i from the distribution defined by a[], pick j from the distribution defined by b[j]. If p[j] is is not constrained to be zero, return the event (i,j). If p[j] is zero, repeat the process.

    The problem with that implementation is that the marginals of the realized distribution are not a[] and b[]. However, it makes me wonder if there are "doctored" marginal distributions a'[] and b'[] that the programmer could use in order to attain a[] and b[] with the above algorithm.

    For example, if the single constraint is p[r] = 0 then I think the doctored distributions would have to satisfy

    a[r] = a'[r](1 - b')/ (1 - a'[r]b')
    a = a'/ ( 1 - a'[r]b['s]) , for i not equal to r

    and similar equations for b'[].
     
  6. Feb 21, 2014 #5
    What do you mean by a'[r]? In my original posting (before the replies), I was looking for m=3 and n=3 (3x3 probability matrix), a=(0.2,0.3,0.5), b=(0.1,0.6,0.3), and

    [tex]\begin{equation}
    P=\left(
    \begin{array}{rrr}
    p_{11} & 0 & p_{13} \\
    p_{21} & p_{22} & p_{23} \\
    p_{31} & p_{32} & p_{33} \\
    \end{array}
    \right)
    \end{equation}
    [/tex]

    What would your doctored distributions yield (and how?)?
     
  7. Feb 21, 2014 #6

    Stephen Tashi

    User Avatar
    Science Advisor

    The constraint in the example is at p[1][2]. So r = 1 , s = 2.

    Suppose we use the marginal distributions a'[] ,b'[] to randomly select i according to a'[] and randomly select j according to b'[] until an (i,j) is selected other than (1,2).

    The probability of selecting something in the first row is the sum of the probabilities from selecting it on the 1st, 2nd, 3rd,.... try.

    The probability of selecting something from row 1 on the 1st try is a'[1]( 1 - b'[2])

    The probability of selecting something from row 1 on the 2nd try = the probability of failing to select anything on the 1st try and selecting something from row 1 on the second try
    = a'[1] b'[2] a'[1](1 - b'[2])

    Continuing in this manner, the probability of selecting something from the first row is given by a geometric series in powers of a'[1]b'[2]. It sums to a'[1](1 - b'[2])( 1 / ( 1 - a'[1]b'[2]).

    If we want the selection process to effectively give a probability of a[1] for picking something from the first row we must have a[1] = a'[1](1 - b'[2])( 1 / ( 1 - a'[1]b'[2]).

    I'm suggesting setting up similar equations and trying to solve for the doctored distributions a'[] and b'[] from the given distributions a[] and b[].

    I don't know if the doctored distributions would yield the joint maximum entropy distribution but intuitively, they yield a joint distribution with the desired marginal distributions that results from a process where (i,j) is selected by selecting i and j independently..
     
  8. Feb 22, 2014 #7
    I see what you mean. You would end up with a joint probability matrix which contains 0s in the right places and p[j]'s that are suggestive of the desired distribution, but the marginal probabilities would be off. Unfortunately, I think, you'd need (again) some kind of maximum entropy procedure to "doctor" things so that the marginals fall into place, but that introduces the above equations ... I will continue working on this. Thanks for your help.

    PS.: There is very little literature on this. There is an obscure article by a Czech physicist (Majernik, 2000) about finding the marginal distributions when you are in possession of the conditional probabilities. Here we have the opposite problem: we are trying to find the conditional probabilities (which immediately follow from the joint probabilities and the marginals) given the marginal probabilities and some, but not all, conditional probabilities, using MAXENT. An American mathematician, Carl Wagner, wrote a piece on this in 1992 ("Generalized Probability Kinematics"), but I am almost certain that he made a mistake, and I am trying to fix it (unsuccessfully, so far).
     
    Last edited: Feb 22, 2014
  9. Feb 22, 2014 #8

    Stephen Tashi

    User Avatar
    Science Advisor

    In the example, the doctored marginals are determined by equations with only two unknowns, a'[1] and b'[2]. Once these are determined, the other values of the doctored marginals are determined. The values of the p[j] are functions of the doctored marginals.
     
  10. Feb 22, 2014 #9
    thx -- will update when I have something ...
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook