The Missing information function

1. Jan 18, 2015

CAF123

1. The problem statement, all variables and given/known data

Consider the missing information function $S(\left\{p_i\right\}) = -k \sum_{i=1}^r p_i \ln p_i$ where $\left\{p_i\right\}$ represents the probabilities associated with the $r$ mutually exclusive outcomes of some procedure.

i) Sketch the form of S as a function of $p_1$ for $r=2$
ii) Show that $S$ has its max value when $p_i = 1/r \forall i$
iii) Show that the value of $S$ is increased when any two of the probablities are changed in such a way as to reduce their difference.

2. Relevant equations

3. The attempt at a solution

In i), we take $S = S(p_1, p_2)$ and regard $p_2$ as a constant. Then $p_1 \in [0,1]$ is the domain and the plot of S versus p_1 looks like a skewed hump with max at $p_1 = 1/e$ and $S=0$ when $p_1 = 0,1$. Did I interpret the question here right? I just don't see why we restrict $r$ to 2.

ii) is fine I think. I get that $p_j = 1/e$ then $\sum_{i=1}^r p_j = 1 \Rightarrow r (1/e) = 1 = r p_j \Rightarrow p_j = 1/r$. What is the significance of 1/e though? The result wouldn't change as long as it was a constant.

In iii), not really sure where to start. I think the condition is that if $|p_i + \delta - (p_j + \gamma)| < |p_i - p_j|$ then $dS > 0$, where $\delta, \gamma$ are real numbers. But not sure how to implement this.

2. Jan 18, 2015

Staff: Mentor

(i) You cannot regard p2 as constant. Exactly one of the r outcomes always has to happen, so the sum over all pi has to be 1.
(ii) I don't understand your argument, but the important part of the calculation is certainly missing.
(iii) I would use this to prove (ii). Due to the fixed sum, you cannot introduce two different variables for the changes. Yes, the idea is to show that such a change increases the "missing information" with the formula you have.

3. Jan 18, 2015

CAF123

Yes, I understand, but the question asks to plot $S$ as a function of $p_1$. If $r=2$, then $S = S(p_1, p_2)$ which I am treating as a function of two variables. If I am to plot this as a function of $p_1$ only then via the constraint $p_1 + p_2 = 1$, I can obtain $S(p_1)$.
I took the derivative as shown and found $p_j = e^{-1}$. Then using the constraint $\sum_{i=1}^r p_i = 1$. $p_i$ is a constant so the sum gives $rp_i$ and since this is equal to one I obtain $p_i = 1/r$?
Ok, so is the equation I wrote down correct just to check? I'm not sure how to implement this into the calculation.
Thanks.

Last edited: Jan 18, 2015
4. Jan 18, 2015

TSny

I'll try to help a little while mfb is away having fun. mfb was pointing out that you can't vary p1 without simultaneously varying p2 due to the constraint that the p's add to 1. The constraint reduces the number of independent p's from r to r-1. So, for the case of r = 2, you can think of S as a function of one variable, say p1. But then, p2 is now a function of p1.

When finding the maximum of S, did you include the effect of the constraint? The lagrange multipler method might be helpful.

For (iii), without loss of generality you may take pi to be the larger of {pi, pj}. Let ε be the change in pj. Is ε >0 or is ε<0? Can you express the change in pi also in terms of the same ε?

5. Jan 18, 2015

CAF123

Hi TSny,
Ok so this produces a graph that is a symmetric hump with max at $p_1 = 1/2$ and hitting the $p_1$ axis at $p_1 = 0,1$. I reexpress all instances of $p_2$ by $1-p_1$.

Oh yes, this gives $p_j = \exp(-\lambda/k - 1) = \text{const}$, where $\lambda$ is the multiplier and then I input into constraint $\sum_{i=1}^r p_i = 1$. This is $$\sum_{i=1}^r \exp(-1-\lambda/k) = \text{const} \cdot r = 1 \Rightarrow \text{\const} = 1/r = p_i$$

I think $\epsilon$ can be greater or smaller than zero. If $\epsilon < 0$ then $p_j' = p_j - \epsilon$. Then $p_i' \in (p_i - \epsilon, p_j - \epsilon)$ so as to reduce the difference between the two. If $\epsilon > 0$ then $p_i' > p_i + \epsilon < p_j - \epsilon$

6. Jan 18, 2015

TSny

OK
OK
If pj is the smaller of {pi, pj}, and if ε is the amount that you change pj, then ε can have only one particular sign if you are going to reduce the separation between pj and pi.
If you reduce pj and pi by the same amount, will the constraint condition still hold?

7. Jan 18, 2015

CAF123

I was thinking along the lines of if $p_j$ decreases by $\epsilon$ then $p_i$ must decrease by an amount greater than epsilon so that their difference is reduced. If p_j increased by epsilon then p_i can increase or decrease, with an increase not bigger than epsilon.
No, that would mean their difference is not reduced. That is why I let the interval open, however I was taking p_j to be larger than p_i in that analysis (accidentally) contrary to the notation you gave.

8. Jan 18, 2015

TSny

Since the sum of the p's must always add to 1, what must happen to pi if you increase pj?

9. Jan 19, 2015

CAF123

Oh I see what you were getting at, yes sorry one must increase while the other decreases so as to not violate conservation of probability. So, if $p_i > p_j$ and we are to reduce the difference then $p_j$ must increase so that $p_j' = p_j + \epsilon, \epsilon > 0$ while $p_i' = p_i - \epsilon$. Then $\sum_k p_k = 1$ still.

So, $S' - S =-k[ (p_j + \epsilon) \ln (p_j + \epsilon) + (p_i - \epsilon)\ln (p_i - \epsilon)]$. Since $p_i > p_j, p_j + \epsilon > 0$ and $p_i - \epsilon \geq 0$. (i.e if we have initially, $p_i = \epsilon, p_j = 0$, assuming we can attribute a zero probability to one of the $p_k$'s) So then $S'-S>0$. Is it fine?

Maybe one thing I am not getting is why would the difference between the two probabilities change? If $p_j' = p_j + \epsilon, p_i' = p_i - \epsilon$ then one increases by some amount while the other decreases so the difference is the same.

Thanks!

10. Jan 19, 2015

Staff: Mentor

Subtract them, and see how the difference differs from the previous difference?
Alternatively, consider this numerical example: Increase 2 by 1 and decrease 7 by 1 and see if the difference changes.

No you cannot.

11. Jan 19, 2015

CAF123

Yes, sorry I was being silly, at things for too long last night, I was too tired.

Yes I thought so, otherwise there would not be $r$ possible outcomes. But I believe my argument before still works except now with $p_i - \epsilon > 0$. Yes? Thanks!

12. Jan 19, 2015

Staff: Mentor

Where is your proof that S'-S>0 apart from the easy case where some parts are zero?

13. Jan 20, 2015

CAF123

$S' = -k [p_1 \ln p_1 + \dots + (p_j + \epsilon)\ln (p_j + \epsilon) + \dots (p_i - \epsilon) \ln (p_i - \epsilon) + \dots ]$ and $S = -k[p_1 \ln p_1 + \dots ]$, so $$S' - S = k[p_j \ln ( \frac{p_j}{p_j + \epsilon}) + p_i \ln (\frac{p_i}{p_i - \epsilon}) - \epsilon \ln ( \frac{p_j + \epsilon}{p_i - \epsilon})]$$ The argument of the first log is < 1 so the whole term is negative. The argument of the second log is > 1 so the whole term is positive. The argument of the last log can be > 1 or < 1. So i am not sure how to make any conclusions?

Last edited: Jan 20, 2015
14. Jan 20, 2015

Staff: Mentor

You still have pj < pi

15. Jan 21, 2015

CAF123

Ok, so the argument of the last log can be rewritten like $$\frac{p_j}{p_i - \epsilon} + \frac{\epsilon}{p_i - \epsilon},$$ so if we are changing the $p_i$ and $p_j$ so as to only reduce their difference, then the first term is $< 1$ and the last term is <1. However I still have a term (the first one) that has argument <1 so it is difficult to make conclusions without knowing their relative difference. Thanks.

16. Jan 21, 2015

Staff: Mentor

Then you'll have to find some other way to rewrite the expressions.
It might help (did not test it) to define $p_j = p_i + x$.

A completely different approach: Consider $S(p_i)$ as function of one independent variable (different from what you did for the other parts), show that its second derivative exists and is negative everywhere.

17. Jan 21, 2015

CAF123

Ok thanks I'll think more about your suggestion. In the meantime, I wonder if TSny could perhaps tell me what he had in mind when he was directing me in the previous method? :)

18. Jan 21, 2015

TSny

I was thinking of letting $\epsilon$ be a small, first-order quantity and considering the change $\delta S$ when $\delta p_j = \epsilon$. Note $\delta p_i$ is then $-\epsilon$. (As before, I'm thinking of $p_j$ as the smaller of the two p's.)

Once you get the result for an infinitesimal $\epsilon$, you can easily generalize to a finite change.

This approach is similar to mfb's suggestion of evaluating $\frac{\partial S}{\partial p_j}$ while treating $p_i$ as a function of $p_j$.

19. Jan 26, 2015

CAF123

Ok so if I understand you correctly, I am using my expression for S'-S in #13 and rewriting all instances of $p_i$ with $1-p_j - \sum_{k=1, k \neq i,j} p_k$ (to reexpress $p_i$ in terms of $p_j).$ Then take the derivative wrt $p_j$?

20. Jan 26, 2015

TSny

You want to show that $\delta S > 0$ when $p_j$ and $p_i$ are moved closer to each other. The $\delta$ operation is essentially the same as taking a differential since we are considering $\epsilon$ as small.

So, consider $\delta \left ( -k \sum_{n=1}^r p_n \ln p_n \right )$. Only two of the p's vary: pi and pj. What do you get for $\delta \left ( p_i \ln p_i \right )$ and $\delta \left ( p_j \ln p_j \right )$ when $\delta p_j = \epsilon$ and $\delta p_i = - \epsilon$?