Maximizing the entropy of a constrained distribution

  • Context: Graduate 
  • Thread starter Thread starter noowutah
  • Start date Start date
  • Tags Tags
    Distribution Entropy
Click For Summary

Discussion Overview

The discussion revolves around maximizing the entropy of a constrained joint probability distribution, particularly in the context of a 3x3 matrix with specified marginal probabilities. Participants explore the implications of constraints on the joint distribution, methods for solving related equations, and the interpretation of the matrix in relation to random walks and Markov chains.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant states that the joint probability distribution maximizing entropy under known marginals is given by the product of the marginals, but introduces constraints that complicate the solution.
  • Another participant questions the interpretation of the constrained matrix as a random walk, suggesting that a transition matrix for a Markov chain is more appropriate.
  • A different participant references a concept from Cover and Thomas regarding the eigenvalues of a transition matrix and its relation to maximum entropy, indicating a connection to the original problem despite the focus on random walks.
  • One participant proposes a sampling method that could yield the desired marginals but raises concerns about the validity of the resulting distribution.
  • Further discussion includes the need for "doctored" marginal distributions to achieve the desired outcomes when constraints are present, with participants attempting to derive these distributions from the original marginals.

Areas of Agreement / Disagreement

Participants express differing views on the interpretation of the constrained distribution and the methods for achieving the desired marginals. There is no consensus on the best approach to solve the problem or the implications of the constraints.

Contextual Notes

Participants note that the system of equations derived from the constraints is non-linear and challenging to solve algebraically. There is also mention of potential issues with negative probabilities arising in larger systems.

noowutah
Messages
56
Reaction score
3
It is well-known that with known marginal probabilities a_{i} and
b_{j} the joint probability distribution maximizing the entropy

H(P)=-\sum_{i=1}^{m}\sum_{j=1}^{n}p_{ij}\log{}p_{ij}

is p_{ij}=a_{i}b_{j}

For m=3 and n=3, a=(0.2,0.3,0.5), b=(0.1,0.6,0.3), for example,

\begin{equation}<br /> \label{eq:r1}<br /> P=\left(<br /> \begin{array}{rrr}<br /> 0.02 &amp; 0.12 &amp; 0.06 \\<br /> 0.03 &amp; 0.18 &amp; 0.09 \\<br /> 0.05 &amp; 0.30 &amp; 0.15<br /> \end{array}<br /> \right)<br /> \end{equation}

Now, I have a problem where the joint probability distribution is
constrained (much like a random walk where the path from one node to
another is blocked). For example, let m, n, a, and b be as above with
the constraint that

\begin{equation}<br /> \label{eq:r2}<br /> P=\left(<br /> \begin{array}{rrr}<br /> p_{11} &amp; 0 &amp; p_{13} \\<br /> p_{21} &amp; p_{22} &amp; p_{23} \\<br /> p_{31} &amp; p_{32} &amp; p_{33} \\<br /> \end{array}<br /> \right)<br /> \end{equation}

Because a and b are known, it suffices to find out any 2x2 matrix
contained in the 3x3 matrix, for example (x,y,z the variables by which
w_{1}, w_{2}, v_{1}, v_{2}, and sigma are expressible, given a and b)

\begin{equation}<br /> \label{eq:r3}<br /> P=\left(<br /> \begin{array}{rrr}<br /> x &amp; 0 &amp; w_{1} \\<br /> y &amp; z &amp; w_{2} \\<br /> v_{1} &amp; v_{2} &amp; \sigma \\<br /> \end{array}<br /> \right)<br /> \end{equation}

I use this to write out the entropy and differentiate with respect to
x, y, and z to find out that the maximum will be where

\begin{equation}<br /> \label{eq:r4}<br /> \det\left(<br /> \begin{array}{ll}<br /> x &amp; w_{1} \\<br /> v_{1} &amp; \sigma<br /> \end{array}<br /> \right)=0<br /> \end{equation}

\begin{equation}<br /> \label{eq:r5}<br /> \det\left(<br /> \begin{array}{ll}<br /> y &amp; w_{2} \\<br /> v_{1} &amp; \sigma<br /> \end{array}<br /> \right)=0<br /> \end{equation}

\begin{equation}<br /> \label{eq:r6}<br /> \det\left(<br /> \begin{array}{ll}<br /> z &amp; w_{2} \\<br /> v_{2} &amp; \sigma<br /> \end{array}<br /> \right)=0<br /> \end{equation}

This is a system of 3 non-linear equations which are awkward to solve
algebraically. In the end, I am interested to know which property a
and b need to have to ensure that P is a proper probability
distribution (i.e. no negative elements). For now, however, I'd be
thrilled if anybody could give me a hint how I could find the
solutions for x, y, and z algebraically without these non-linear
equations.

The numeric solution for this is

\begin{equation}<br /> \label{eq:r7}<br /> P=\left(<br /> \begin{array}{rrr}<br /> 0.093 &amp; 0.000 &amp; 0.107 \\<br /> 0.001 &amp; 0.113 &amp; 0.186 \\<br /> 0.006 &amp; 0.487 &amp; 0.007<br /> \end{array}<br /> \right)<br /> \end{equation}

But I have definitely seen systems where all solutions were complex
and some of the probabilities ended up <0, especially when m and n are
larger than 3 and there are more than one zero constraint in the joint
probability matrix.
 
Last edited by a moderator:
Physics news on Phys.org
stlukits said:
Now, I have a problem where the joint probability distribution is
constrained (much like a random walk where the path from one node to
another is blocked). For example, let m, n, a, and b be as above with
the constraint that

\begin{equation}<br /> \label{eq:r2}<br /> P=\left(<br /> \begin{array}{rrr}<br /> p_{11} &amp; 0 &amp; p_{13} \\<br /> p_{21} &amp; p_{22} &amp; p_{23} \\<br /> p_{31} &amp; p_{32} &amp; p_{33} \\<br /> \end{array}<br /> \right)<br /> \end{equation}

I'm curious how the matrix in the numerical example can be interpreted as describing a random walk between nodes. I think of such a walk being described by the transition matrix for a markov chain rather than a matrix that is a joint probability density.
 
Yes, that's correct. I mentioned the transition matrix for a Markov
chain because in Cover and Thomas (exercise 4.16) there is a really
interesting observation about the eigenvalues of a matrix derived from
the transition matrix and the maximum entropy of the transition
matrix. What I am trying to do has nothing to do with random walks,
but Cover and Thomas's observation and the fact that random walks also
have constraints such as a broken path from a node, made it worth
mentioning.

Here is what I need to solve. I have a prior joint probability
distribution

<br /> \begin{equation}<br /> \label{eq:r1}<br /> \left(<br /> \begin{array}{rrrr}<br /> p_{11} &amp; p_{12} &amp; 0 &amp; 0 \\<br /> p_{21} &amp; p_{22} &amp; 0 &amp; 0 \\<br /> 0 &amp; p_{32} &amp; 0 &amp; p_{34} \\<br /> p_{41} &amp; p_{42} &amp; p_{43} &amp; p_{44} \\<br /> p_{51} &amp; p_{52} &amp; p_{53} &amp; p_{54} <br /> \end{array}<br /> \right)<br /> \end{equation}<br />

The marginals are
(\alpha_{1},\alpha_{2},\alpha_{3},\alpha_{4},1-\sum\alpha_{i})^{T}
and (0.2,0.3,0.4,0.1); and a posterior joint probability
matrix

<br /> \begin{equation}<br /> \label{eq:r2}<br /> \left(<br /> \begin{array}{rrrr}<br /> q_{11} &amp; q_{12} &amp; 0 &amp; 0 \\<br /> q_{21} &amp; q_{22} &amp; 0 &amp; 0 \\<br /> 0 &amp; q_{32} &amp; 0 &amp; q_{34} \\<br /> q_{41} &amp; q_{42} &amp; q_{43} &amp; q_{44} \\<br /> 0 &amp; 0 &amp; 0 &amp; 0 <br /> \end{array}<br /> \right)<br /> \end{equation}<br />

where the marginals are (.40,.30,.20,.10,0)^{T} and
(\beta_{1},\beta_{2},\beta_{3},1-\sum\beta_{i}). I want
to populate these joint probability distributions using the principle
of maximum entropy and find out
(\beta_{1},\beta_{2},\beta_{3}).
 
This is just a random thought:

A naive computer programmer who was given some marginal densities a[], b[] and some constraints like p[1][2] = 0 might implement sampling from a distribution from those properties by the straightforward algorithm:

Pick i from the distribution defined by a[], pick j from the distribution defined by b[j]. If p[j] is is not constrained to be zero, return the event (i,j). If p[j] is zero, repeat the process.

The problem with that implementation is that the marginals of the realized distribution are not a[] and b[]. However, it makes me wonder if there are "doctored" marginal distributions a'[] and b'[] that the programmer could use in order to attain a[] and b[] with the above algorithm.

For example, if the single constraint is p[r] = 0 then I think the doctored distributions would have to satisfy

a[r] = a'[r](1 - b')/ (1 - a'[r]b')
a = a'/ ( 1 - a'[r]b['s]) , for i not equal to r

and similar equations for b'[].
 
  • Like
Likes   Reactions: 1 person
What do you mean by a'[r]? In my original posting (before the replies), I was looking for m=3 and n=3 (3x3 probability matrix), a=(0.2,0.3,0.5), b=(0.1,0.6,0.3), and

\begin{equation}<br /> P=\left(<br /> \begin{array}{rrr}<br /> p_{11} &amp; 0 &amp; p_{13} \\<br /> p_{21} &amp; p_{22} &amp; p_{23} \\<br /> p_{31} &amp; p_{32} &amp; p_{33} \\<br /> \end{array}<br /> \right)<br /> \end{equation}<br />

What would your doctored distributions yield (and how?)?
 
stlukits said:
What do you mean by a'[r]? In my original posting (before the replies), I was looking for m=3 and n=3 (3x3 probability matrix), a=(0.2,0.3,0.5), b=(0.1,0.6,0.3), and

\begin{equation}<br /> P=\left(<br /> \begin{array}{rrr}<br /> p_{11} &amp; 0 &amp; p_{13} \\<br /> p_{21} &amp; p_{22} &amp; p_{23} \\<br /> p_{31} &amp; p_{32} &amp; p_{33} \\<br /> \end{array}<br /> \right)<br /> \end{equation}<br />

The constraint in the example is at p[1][2]. So r = 1 , s = 2.

Suppose we use the marginal distributions a'[] ,b'[] to randomly select i according to a'[] and randomly select j according to b'[] until an (i,j) is selected other than (1,2).

The probability of selecting something in the first row is the sum of the probabilities from selecting it on the 1st, 2nd, 3rd,... try.

The probability of selecting something from row 1 on the 1st try is a'[1]( 1 - b'[2])

The probability of selecting something from row 1 on the 2nd try = the probability of failing to select anything on the 1st try and selecting something from row 1 on the second try
= a'[1] b'[2] a'[1](1 - b'[2])

Continuing in this manner, the probability of selecting something from the first row is given by a geometric series in powers of a'[1]b'[2]. It sums to a'[1](1 - b'[2])( 1 / ( 1 - a'[1]b'[2]).

If we want the selection process to effectively give a probability of a[1] for picking something from the first row we must have a[1] = a'[1](1 - b'[2])( 1 / ( 1 - a'[1]b'[2]).

I'm suggesting setting up similar equations and trying to solve for the doctored distributions a'[] and b'[] from the given distributions a[] and b[].

What would your doctored distributions yield (and how?)?

I don't know if the doctored distributions would yield the joint maximum entropy distribution but intuitively, they yield a joint distribution with the desired marginal distributions that results from a process where (i,j) is selected by selecting i and j independently..
 
I see what you mean. You would end up with a joint probability matrix which contains 0s in the right places and p[j]'s that are suggestive of the desired distribution, but the marginal probabilities would be off. Unfortunately, I think, you'd need (again) some kind of maximum entropy procedure to "doctor" things so that the marginals fall into place, but that introduces the above equations ... I will continue working on this. Thanks for your help.

PS.: There is very little literature on this. There is an obscure article by a Czech physicist (Majernik, 2000) about finding the marginal distributions when you are in possession of the conditional probabilities. Here we have the opposite problem: we are trying to find the conditional probabilities (which immediately follow from the joint probabilities and the marginals) given the marginal probabilities and some, but not all, conditional probabilities, using MAXENT. An American mathematician, Carl Wagner, wrote a piece on this in 1992 ("Generalized Probability Kinematics"), but I am almost certain that he made a mistake, and I am trying to fix it (unsuccessfully, so far).
 
Last edited by a moderator:
stlukits said:
but that introduces the above equations

In the example, the doctored marginals are determined by equations with only two unknowns, a'[1] and b'[2]. Once these are determined, the other values of the doctored marginals are determined. The values of the p[j] are functions of the doctored marginals.
 
thx -- will update when I have something ...
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 20 ·
Replies
20
Views
5K
  • · Replies 36 ·
2
Replies
36
Views
6K