# The Missing information function

Gold Member

## Homework Statement

[/B]
Consider the missing information function ##S(\left\{p_i\right\}) = -k \sum_{i=1}^r p_i \ln p_i## where ##\left\{p_i\right\}## represents the probabilities associated with the ##r## mutually exclusive outcomes of some procedure.

i) Sketch the form of S as a function of ##p_1## for ##r=2##
ii) Show that ##S## has its max value when ##p_i = 1/r \forall i##
iii) Show that the value of ##S## is increased when any two of the probablities are changed in such a way as to reduce their difference.

2. Homework Equations

3. The Attempt at a Solution

In i), we take ##S = S(p_1, p_2)## and regard ##p_2## as a constant. Then ##p_1 \in [0,1]## is the domain and the plot of S versus p_1 looks like a skewed hump with max at ##p_1 = 1/e## and ##S=0## when ##p_1 = 0,1##. Did I interpret the question here right? I just don't see why we restrict ##r## to 2.

ii) is fine I think. I get that ##p_j = 1/e## then ##\sum_{i=1}^r p_j = 1 \Rightarrow r (1/e) = 1 = r p_j \Rightarrow p_j = 1/r##. What is the significance of 1/e though? The result wouldn't change as long as it was a constant.

In iii), not really sure where to start. I think the condition is that if ##|p_i + \delta - (p_j + \gamma)| < |p_i - p_j|## then ##dS > 0##, where ##\delta, \gamma## are real numbers. But not sure how to implement this.

## Answers and Replies

mfb
Mentor
(i) You cannot regard p2 as constant. Exactly one of the r outcomes always has to happen, so the sum over all pi has to be 1.
(ii) I don't understand your argument, but the important part of the calculation is certainly missing.
(iii) I would use this to prove (ii). Due to the fixed sum, you cannot introduce two different variables for the changes. Yes, the idea is to show that such a change increases the "missing information" with the formula you have.

Gold Member
(i) You cannot regard p2 as constant. Exactly one of the r outcomes always has to happen, so the sum over all pi has to be 1.
Yes, I understand, but the question asks to plot ##S## as a function of ##p_1##. If ##r=2##, then ##S = S(p_1, p_2)## which I am treating as a function of two variables. If I am to plot this as a function of ##p_1## only then via the constraint ##p_1 + p_2 = 1##, I can obtain ##S(p_1)##.
(ii) I don't understand your argument, but the important part of the calculation is certainly missing.
I took the derivative as shown and found ##p_j = e^{-1}##. Then using the constraint ##\sum_{i=1}^r p_i = 1##. ##p_i## is a constant so the sum gives ##rp_i## and since this is equal to one I obtain ##p_i = 1/r##?
(iii) I would use this to prove (ii). Due to the fixed sum, you cannot introduce two different variables for the changes. Yes, the idea is to show that such a change increases the "missing information" with the formula you have.
Ok, so is the equation I wrote down correct just to check? I'm not sure how to implement this into the calculation.
Thanks.

Last edited:
TSny
Homework Helper
Gold Member
Yes, I understand, but the question asks to plot ##S## as a function of ##p_1##. If ##r=2##, then ##S = S(p_1, p_2)## which I am treating as a function of two variables. If I am to plot this as a function of ##p_1## only then via the constraint ##p_1 + p_2 = 1##, I can obtain ##S(p_1)##.

I'll try to help a little while mfb is away having fun. mfb was pointing out that you can't vary p1 without simultaneously varying p2 due to the constraint that the p's add to 1. The constraint reduces the number of independent p's from r to r-1. So, for the case of r = 2, you can think of S as a function of one variable, say p1. But then, p2 is now a function of p1.

I took the derivative as shown and found ##p_j = e^{-1}##. Then using the constraint ##\sum_{i=1}^r p_i = 1##. ##p_i## is a constant so the sum gives ##rp_i## and since this is equal to one I obtain ##p_i = 1/r##?

When finding the maximum of S, did you include the effect of the constraint? The lagrange multipler method might be helpful.

Ok, so is the equation I wrote down correct just to check? I'm not sure how to implement this into the calculation.
Thanks.

For (iii), without loss of generality you may take pi to be the larger of {pi, pj}. Let ε be the change in pj. Is ε >0 or is ε<0? Can you express the change in pi also in terms of the same ε?

Gold Member
Hi TSny,
I'll try to help a little while mfb is away having fun. mfb was pointing out that you can't vary p1 without simultaneously varying p2 due to the constraint that the p's add to 1. The constraint reduces the number of independent p's from r to r-1. So, for the case of r = 2, you can think of S as a function of one variable, say p1. But then, p2 is now a function of p1.
Ok so this produces a graph that is a symmetric hump with max at ##p_1 = 1/2## and hitting the ##p_1## axis at ##p_1 = 0,1##. I reexpress all instances of ##p_2## by ##1-p_1##.

When finding the maximum of S, did you include the effect of the constraint? The lagrange multipler method might be helpful.
Oh yes, this gives ##p_j = \exp(-\lambda/k - 1) = \text{const}##, where ##\lambda## is the multiplier and then I input into constraint ##\sum_{i=1}^r p_i = 1##. This is $$\sum_{i=1}^r \exp(-1-\lambda/k) = \text{const} \cdot r = 1 \Rightarrow \text{\const} = 1/r = p_i$$

For (iii), without loss of generality you may take pi to be the larger of {pi, pj}. Let ε be the change in pj. Is ε >0 or is ε<0? Can you express the change in pi also in terms of the same ε?
I think ##\epsilon## can be greater or smaller than zero. If ##\epsilon < 0## then ##p_j' = p_j - \epsilon##. Then ##p_i' \in (p_i - \epsilon, p_j - \epsilon)## so as to reduce the difference between the two. If ##\epsilon > 0## then ##p_i' > p_i + \epsilon < p_j - \epsilon##

TSny
Homework Helper
Gold Member
Hi TSny,

Ok so this produces a graph that is a symmetric hump with max at ##p_1 = 1/2## and hitting the ##p_1## axis at ##p_1 = 0,1##. I reexpress all instances of ##p_2## by ##1-p_1##.
OK
Oh yes, this gives ##p_j = \exp(-\lambda/k - 1) = \text{const}##, where ##\lambda## is the multiplier and then I input into constraint ##\sum_{i=1}^r p_i = 1##. This is $$\sum_{i=1}^r \exp(-1-\lambda/k) = \text{const} \cdot r = 1 \Rightarrow \text{\const} = 1/r = p_i$$
OK
I think ##\epsilon## can be greater or smaller than zero.
If pj is the smaller of {pi, pj}, and if ε is the amount that you change pj, then ε can have only one particular sign if you are going to reduce the separation between pj and pi.
##p_i' \in (p_i - \epsilon, p_j - \epsilon)##
If you reduce pj and pi by the same amount, will the constraint condition still hold?

Gold Member
If pj is the smaller of {pi, pj}, and if ε is the amount that you change pj, then ε can have only one particular sign if you are going to reduce the separation between pj and pi.
I was thinking along the lines of if ##p_j## decreases by ##\epsilon## then ##p_i## must decrease by an amount greater than epsilon so that their difference is reduced. If p_j increased by epsilon then p_i can increase or decrease, with an increase not bigger than epsilon.
If you reduce pj and pi by the same amount, will the constraint condition still hold?
No, that would mean their difference is not reduced. That is why I let the interval open, however I was taking p_j to be larger than p_i in that analysis (accidentally) contrary to the notation you gave.

TSny
Homework Helper
Gold Member
Since the sum of the p's must always add to 1, what must happen to pi if you increase pj?

Gold Member
Since the sum of the p's must always add to 1, what must happen to pi if you increase pj?
Oh I see what you were getting at, yes sorry one must increase while the other decreases so as to not violate conservation of probability. So, if ##p_i > p_j## and we are to reduce the difference then ##p_j## must increase so that ##p_j' = p_j + \epsilon, \epsilon > 0## while ##p_i' = p_i - \epsilon##. Then ##\sum_k p_k = 1## still.

So, ##S' - S =-k[ (p_j + \epsilon) \ln (p_j + \epsilon) + (p_i - \epsilon)\ln (p_i - \epsilon)]##. Since ##p_i > p_j, p_j + \epsilon > 0## and ##p_i - \epsilon \geq 0##. (i.e if we have initially, ##p_i = \epsilon, p_j = 0##, assuming we can attribute a zero probability to one of the ##p_k##'s) So then ##S'-S>0##. Is it fine?

Maybe one thing I am not getting is why would the difference between the two probabilities change? If ##p_j' = p_j + \epsilon, p_i' = p_i - \epsilon## then one increases by some amount while the other decreases so the difference is the same.

Thanks!

mfb
Mentor
Maybe one thing I am not getting is why would the difference between the two probabilities change? If ##p_j' = p_j + \epsilon, p_i' = p_i - \epsilon## then one increases by some amount while the other decreases so the difference is the same.
Subtract them, and see how the difference differs from the previous difference?
Alternatively, consider this numerical example: Increase 2 by 1 and decrease 7 by 1 and see if the difference changes.

assuming we can attribute a zero probability to one of the ##p_k##'s)
No you cannot.

Gold Member
Subtract them, and see how the difference differs from the previous difference?
Alternatively, consider this numerical example: Increase 2 by 1 and decrease 7 by 1 and see if the difference changes.
Yes, sorry I was being silly, at things for too long last night, I was too tired.

No you cannot.
Yes I thought so, otherwise there would not be ##r## possible outcomes. But I believe my argument before still works except now with ##p_i - \epsilon > 0##. Yes? Thanks!

mfb
Mentor
Where is your proof that S'-S>0 apart from the easy case where some parts are zero?

Gold Member
Where is your proof that S'-S>0 apart from the easy case where some parts are zero?

##S' = -k [p_1 \ln p_1 + \dots + (p_j + \epsilon)\ln (p_j + \epsilon) + \dots (p_i - \epsilon) \ln (p_i - \epsilon) + \dots ]## and ##S = -k[p_1 \ln p_1 + \dots ]##, so $$S' - S = k[p_j \ln ( \frac{p_j}{p_j + \epsilon}) + p_i \ln (\frac{p_i}{p_i - \epsilon}) - \epsilon \ln ( \frac{p_j + \epsilon}{p_i - \epsilon})]$$ The argument of the first log is < 1 so the whole term is negative. The argument of the second log is > 1 so the whole term is positive. The argument of the last log can be > 1 or < 1. So i am not sure how to make any conclusions?

Last edited:
mfb
Mentor
You still have pj < pi

Gold Member
You still have pj < pi
Ok, so the argument of the last log can be rewritten like $$\frac{p_j}{p_i - \epsilon} + \frac{\epsilon}{p_i - \epsilon},$$ so if we are changing the ##p_i## and ##p_j## so as to only reduce their difference, then the first term is ##< 1## and the last term is <1. However I still have a term (the first one) that has argument <1 so it is difficult to make conclusions without knowing their relative difference. Thanks.

mfb
Mentor
Then you'll have to find some other way to rewrite the expressions.
It might help (did not test it) to define ##p_j = p_i + x##.

A completely different approach: Consider ##S(p_i)## as function of one independent variable (different from what you did for the other parts), show that its second derivative exists and is negative everywhere.

Gold Member
Then you'll have to find some other way to rewrite the expressions.
It might help (did not test it) to define ##p_j = p_i + x##.

A completely different approach: Consider ##S(p_i)## as function of one independent variable (different from what you did for the other parts), show that its second derivative exists and is negative everywhere.

Ok thanks I'll think more about your suggestion. In the meantime, I wonder if TSny could perhaps tell me what he had in mind when he was directing me in the previous method? :)

TSny
Homework Helper
Gold Member
In the meantime, I wonder if TSny could perhaps tell me what he had in mind when he was directing me in the previous method? :)

I was thinking of letting ##\epsilon## be a small, first-order quantity and considering the change ##\delta S## when ##\delta p_j = \epsilon##. Note ##\delta p_i## is then ##-\epsilon##. (As before, I'm thinking of ##p_j## as the smaller of the two p's.)

Once you get the result for an infinitesimal ##\epsilon##, you can easily generalize to a finite change.

This approach is similar to mfb's suggestion of evaluating ##\frac{\partial S}{\partial p_j}## while treating ##p_i## as a function of ##p_j##.

Gold Member
I was thinking of letting ##\epsilon## be a small, first-order quantity and considering the change ##\delta S## when ##\delta p_j = \epsilon##. Note ##\delta p_i## is then ##-\epsilon##. (As before, I'm thinking of ##p_j## as the smaller of the two p's.)

Once you get the result for an infinitesimal ##\epsilon##, you can easily generalize to a finite change.

This approach is similar to mfb's suggestion of evaluating ##\frac{\partial S}{\partial p_j}## while treating ##p_i## as a function of ##p_j##.
Ok so if I understand you correctly, I am using my expression for S'-S in #13 and rewriting all instances of ##p_i## with ##1-p_j - \sum_{k=1, k \neq i,j} p_k## (to reexpress ##p_i## in terms of ##p_j).## Then take the derivative wrt ##p_j##?

TSny
Homework Helper
Gold Member
You want to show that ##\delta S > 0## when ##p_j## and ##p_i## are moved closer to each other. The ##\delta## operation is essentially the same as taking a differential since we are considering ##\epsilon## as small.

So, consider ##\delta \left ( -k \sum_{n=1}^r p_n \ln p_n \right )##. Only two of the p's vary: pi and pj. What do you get for ##\delta \left ( p_i \ln p_i \right )## and ##\delta \left ( p_j \ln p_j \right ) ## when ##\delta p_j = \epsilon## and ##\delta p_i = - \epsilon##?