# The Missing information function

• CAF123
Yes, I can express ##p_i = 1 - p_j## and so ##\delta p_i = - \delta p_j##. But where does the ##\epsilon## come in? I was thinking I could write ##S = f(p_i)##, so ##\delta S = f'(p_i) \delta p_i##. Then I can express ##p_i = 1 - p_j## and ##\delta p_j = - \delta p_i## and substitute into the expression for ##\delta S##. Is this what you had in mind?
CAF123
Gold Member

## Homework Statement

[/B]
Consider the missing information function ##S(\left\{p_i\right\}) = -k \sum_{i=1}^r p_i \ln p_i## where ##\left\{p_i\right\}## represents the probabilities associated with the ##r## mutually exclusive outcomes of some procedure.

i) Sketch the form of S as a function of ##p_1## for ##r=2##
ii) Show that ##S## has its max value when ##p_i = 1/r \forall i##
iii) Show that the value of ##S## is increased when any two of the probablities are changed in such a way as to reduce their difference.

2. Homework Equations

3. The Attempt at a Solution

In i), we take ##S = S(p_1, p_2)## and regard ##p_2## as a constant. Then ##p_1 \in [0,1]## is the domain and the plot of S versus p_1 looks like a skewed hump with max at ##p_1 = 1/e## and ##S=0## when ##p_1 = 0,1##. Did I interpret the question here right? I just don't see why we restrict ##r## to 2.

ii) is fine I think. I get that ##p_j = 1/e## then ##\sum_{i=1}^r p_j = 1 \Rightarrow r (1/e) = 1 = r p_j \Rightarrow p_j = 1/r##. What is the significance of 1/e though? The result wouldn't change as long as it was a constant.

In iii), not really sure where to start. I think the condition is that if ##|p_i + \delta - (p_j + \gamma)| < |p_i - p_j|## then ##dS > 0##, where ##\delta, \gamma## are real numbers. But not sure how to implement this.

(i) You cannot regard p2 as constant. Exactly one of the r outcomes always has to happen, so the sum over all pi has to be 1.
(ii) I don't understand your argument, but the important part of the calculation is certainly missing.
(iii) I would use this to prove (ii). Due to the fixed sum, you cannot introduce two different variables for the changes. Yes, the idea is to show that such a change increases the "missing information" with the formula you have.

mfb said:
(i) You cannot regard p2 as constant. Exactly one of the r outcomes always has to happen, so the sum over all pi has to be 1.
Yes, I understand, but the question asks to plot ##S## as a function of ##p_1##. If ##r=2##, then ##S = S(p_1, p_2)## which I am treating as a function of two variables. If I am to plot this as a function of ##p_1## only then via the constraint ##p_1 + p_2 = 1##, I can obtain ##S(p_1)##.
(ii) I don't understand your argument, but the important part of the calculation is certainly missing.
I took the derivative as shown and found ##p_j = e^{-1}##. Then using the constraint ##\sum_{i=1}^r p_i = 1##. ##p_i## is a constant so the sum gives ##rp_i## and since this is equal to one I obtain ##p_i = 1/r##?
(iii) I would use this to prove (ii). Due to the fixed sum, you cannot introduce two different variables for the changes. Yes, the idea is to show that such a change increases the "missing information" with the formula you have.
Ok, so is the equation I wrote down correct just to check? I'm not sure how to implement this into the calculation.
Thanks.

Last edited:
CAF123 said:
Yes, I understand, but the question asks to plot ##S## as a function of ##p_1##. If ##r=2##, then ##S = S(p_1, p_2)## which I am treating as a function of two variables. If I am to plot this as a function of ##p_1## only then via the constraint ##p_1 + p_2 = 1##, I can obtain ##S(p_1)##.

I'll try to help a little while mfb is away having fun. mfb was pointing out that you can't vary p1 without simultaneously varying p2 due to the constraint that the p's add to 1. The constraint reduces the number of independent p's from r to r-1. So, for the case of r = 2, you can think of S as a function of one variable, say p1. But then, p2 is now a function of p1.

I took the derivative as shown and found ##p_j = e^{-1}##. Then using the constraint ##\sum_{i=1}^r p_i = 1##. ##p_i## is a constant so the sum gives ##rp_i## and since this is equal to one I obtain ##p_i = 1/r##?

When finding the maximum of S, did you include the effect of the constraint? The lagrange multipler method might be helpful.

Ok, so is the equation I wrote down correct just to check? I'm not sure how to implement this into the calculation.
Thanks.

For (iii), without loss of generality you may take pi to be the larger of {pi, pj}. Let ε be the change in pj. Is ε >0 or is ε<0? Can you express the change in pi also in terms of the same ε?

Hi TSny,
TSny said:
I'll try to help a little while mfb is away having fun. mfb was pointing out that you can't vary p1 without simultaneously varying p2 due to the constraint that the p's add to 1. The constraint reduces the number of independent p's from r to r-1. So, for the case of r = 2, you can think of S as a function of one variable, say p1. But then, p2 is now a function of p1.
Ok so this produces a graph that is a symmetric hump with max at ##p_1 = 1/2## and hitting the ##p_1## axis at ##p_1 = 0,1##. I reexpress all instances of ##p_2## by ##1-p_1##.

When finding the maximum of S, did you include the effect of the constraint? The lagrange multipler method might be helpful.
Oh yes, this gives ##p_j = \exp(-\lambda/k - 1) = \text{const}##, where ##\lambda## is the multiplier and then I input into constraint ##\sum_{i=1}^r p_i = 1##. This is $$\sum_{i=1}^r \exp(-1-\lambda/k) = \text{const} \cdot r = 1 \Rightarrow \text{\const} = 1/r = p_i$$

For (iii), without loss of generality you may take pi to be the larger of {pi, pj}. Let ε be the change in pj. Is ε >0 or is ε<0? Can you express the change in pi also in terms of the same ε?
I think ##\epsilon## can be greater or smaller than zero. If ##\epsilon < 0## then ##p_j' = p_j - \epsilon##. Then ##p_i' \in (p_i - \epsilon, p_j - \epsilon)## so as to reduce the difference between the two. If ##\epsilon > 0## then ##p_i' > p_i + \epsilon < p_j - \epsilon##

CAF123 said:
Hi TSny,

Ok so this produces a graph that is a symmetric hump with max at ##p_1 = 1/2## and hitting the ##p_1## axis at ##p_1 = 0,1##. I reexpress all instances of ##p_2## by ##1-p_1##.
OK
Oh yes, this gives ##p_j = \exp(-\lambda/k - 1) = \text{const}##, where ##\lambda## is the multiplier and then I input into constraint ##\sum_{i=1}^r p_i = 1##. This is $$\sum_{i=1}^r \exp(-1-\lambda/k) = \text{const} \cdot r = 1 \Rightarrow \text{\const} = 1/r = p_i$$
OK
I think ##\epsilon## can be greater or smaller than zero.
If pj is the smaller of {pi, pj}, and if ε is the amount that you change pj, then ε can have only one particular sign if you are going to reduce the separation between pj and pi.
##p_i' \in (p_i - \epsilon, p_j - \epsilon)##
If you reduce pj and pi by the same amount, will the constraint condition still hold?

If pj is the smaller of {pi, pj}, and if ε is the amount that you change pj, then ε can have only one particular sign if you are going to reduce the separation between pj and pi.
I was thinking along the lines of if ##p_j## decreases by ##\epsilon## then ##p_i## must decrease by an amount greater than epsilon so that their difference is reduced. If p_j increased by epsilon then p_i can increase or decrease, with an increase not bigger than epsilon.
If you reduce pj and pi by the same amount, will the constraint condition still hold?
No, that would mean their difference is not reduced. That is why I let the interval open, however I was taking p_j to be larger than p_i in that analysis (accidentally) contrary to the notation you gave.

Since the sum of the p's must always add to 1, what must happen to pi if you increase pj?

TSny said:
Since the sum of the p's must always add to 1, what must happen to pi if you increase pj?
Oh I see what you were getting at, yes sorry one must increase while the other decreases so as to not violate conservation of probability. So, if ##p_i > p_j## and we are to reduce the difference then ##p_j## must increase so that ##p_j' = p_j + \epsilon, \epsilon > 0## while ##p_i' = p_i - \epsilon##. Then ##\sum_k p_k = 1## still.

So, ##S' - S =-k[ (p_j + \epsilon) \ln (p_j + \epsilon) + (p_i - \epsilon)\ln (p_i - \epsilon)]##. Since ##p_i > p_j, p_j + \epsilon > 0## and ##p_i - \epsilon \geq 0##. (i.e if we have initially, ##p_i = \epsilon, p_j = 0##, assuming we can attribute a zero probability to one of the ##p_k##'s) So then ##S'-S>0##. Is it fine?

Maybe one thing I am not getting is why would the difference between the two probabilities change? If ##p_j' = p_j + \epsilon, p_i' = p_i - \epsilon## then one increases by some amount while the other decreases so the difference is the same.

Thanks!

CAF123 said:
Maybe one thing I am not getting is why would the difference between the two probabilities change? If ##p_j' = p_j + \epsilon, p_i' = p_i - \epsilon## then one increases by some amount while the other decreases so the difference is the same.
Subtract them, and see how the difference differs from the previous difference?
Alternatively, consider this numerical example: Increase 2 by 1 and decrease 7 by 1 and see if the difference changes.

CAF123 said:
assuming we can attribute a zero probability to one of the ##p_k##'s)
No you cannot.

mfb said:
Subtract them, and see how the difference differs from the previous difference?
Alternatively, consider this numerical example: Increase 2 by 1 and decrease 7 by 1 and see if the difference changes.
Yes, sorry I was being silly, at things for too long last night, I was too tired.

No you cannot.
Yes I thought so, otherwise there would not be ##r## possible outcomes. But I believe my argument before still works except now with ##p_i - \epsilon > 0##. Yes? Thanks!

Where is your proof that S'-S>0 apart from the easy case where some parts are zero?

mfb said:
Where is your proof that S'-S>0 apart from the easy case where some parts are zero?

##S' = -k [p_1 \ln p_1 + \dots + (p_j + \epsilon)\ln (p_j + \epsilon) + \dots (p_i - \epsilon) \ln (p_i - \epsilon) + \dots ]## and ##S = -k[p_1 \ln p_1 + \dots ]##, so $$S' - S = k[p_j \ln ( \frac{p_j}{p_j + \epsilon}) + p_i \ln (\frac{p_i}{p_i - \epsilon}) - \epsilon \ln ( \frac{p_j + \epsilon}{p_i - \epsilon})]$$ The argument of the first log is < 1 so the whole term is negative. The argument of the second log is > 1 so the whole term is positive. The argument of the last log can be > 1 or < 1. So i am not sure how to make any conclusions?

Last edited:
You still have pj < pi

mfb said:
You still have pj < pi
Ok, so the argument of the last log can be rewritten like $$\frac{p_j}{p_i - \epsilon} + \frac{\epsilon}{p_i - \epsilon},$$ so if we are changing the ##p_i## and ##p_j## so as to only reduce their difference, then the first term is ##< 1## and the last term is <1. However I still have a term (the first one) that has argument <1 so it is difficult to make conclusions without knowing their relative difference. Thanks.

Then you'll have to find some other way to rewrite the expressions.
It might help (did not test it) to define ##p_j = p_i + x##.

A completely different approach: Consider ##S(p_i)## as function of one independent variable (different from what you did for the other parts), show that its second derivative exists and is negative everywhere.

mfb said:
Then you'll have to find some other way to rewrite the expressions.
It might help (did not test it) to define ##p_j = p_i + x##.

A completely different approach: Consider ##S(p_i)## as function of one independent variable (different from what you did for the other parts), show that its second derivative exists and is negative everywhere.

Ok thanks I'll think more about your suggestion. In the meantime, I wonder if TSny could perhaps tell me what he had in mind when he was directing me in the previous method? :)

CAF123 said:
In the meantime, I wonder if TSny could perhaps tell me what he had in mind when he was directing me in the previous method? :)

I was thinking of letting ##\epsilon## be a small, first-order quantity and considering the change ##\delta S## when ##\delta p_j = \epsilon##. Note ##\delta p_i## is then ##-\epsilon##. (As before, I'm thinking of ##p_j## as the smaller of the two p's.)

Once you get the result for an infinitesimal ##\epsilon##, you can easily generalize to a finite change.

This approach is similar to mfb's suggestion of evaluating ##\frac{\partial S}{\partial p_j}## while treating ##p_i## as a function of ##p_j##.

TSny said:
I was thinking of letting ##\epsilon## be a small, first-order quantity and considering the change ##\delta S## when ##\delta p_j = \epsilon##. Note ##\delta p_i## is then ##-\epsilon##. (As before, I'm thinking of ##p_j## as the smaller of the two p's.)

Once you get the result for an infinitesimal ##\epsilon##, you can easily generalize to a finite change.

This approach is similar to mfb's suggestion of evaluating ##\frac{\partial S}{\partial p_j}## while treating ##p_i## as a function of ##p_j##.
Ok so if I understand you correctly, I am using my expression for S'-S in #13 and rewriting all instances of ##p_i## with ##1-p_j - \sum_{k=1, k \neq i,j} p_k## (to reexpress ##p_i## in terms of ##p_j).## Then take the derivative wrt ##p_j##?

You want to show that ##\delta S > 0## when ##p_j## and ##p_i## are moved closer to each other. The ##\delta## operation is essentially the same as taking a differential since we are considering ##\epsilon## as small.

So, consider ##\delta \left ( -k \sum_{n=1}^r p_n \ln p_n \right )##. Only two of the p's vary: pi and pj. What do you get for ##\delta \left ( p_i \ln p_i \right )## and ##\delta \left ( p_j \ln p_j \right ) ## when ##\delta p_j = \epsilon## and ##\delta p_i = - \epsilon##?

## What is "The Missing Information Function"?

The Missing Information Function is a statistical measure used to evaluate the amount of missing data in a dataset. It calculates the percentage of missing values in each variable and provides a visual representation of the missingness patterns within the data.

## Why is "The Missing Information Function" important in data analysis?

Missing data can significantly impact the results and conclusions drawn from a dataset. The Missing Information Function helps identify the extent of missing data, allowing researchers to make informed decisions on how to handle it and minimize its effects on the analysis.

## How is "The Missing Information Function" calculated?

The Missing Information Function is calculated by dividing the number of missing values in a variable by the total number of observations in the dataset. The result is then multiplied by 100 to get the percentage of missing values in that variable.

## What are some common patterns observed in "The Missing Information Function" plot?

The plot generated by the Missing Information Function can show various patterns, including random missingness, missing completely at random, and missing not at random. It can also reveal variables with high percentages of missing data, which may require further investigation.

## How can "The Missing Information Function" be used to handle missing data?

The Missing Information Function can be used to inform imputation methods, where missing values are replaced with estimated values, or deletion methods, where the entire observation or variable is removed. It can also guide researchers in collecting additional data to fill in the missing values.

Replies
4
Views
390
• Electromagnetism
Replies
2
Views
823
Replies
3
Views
787
Replies
27
Views
2K
Replies
9
Views
1K
Replies
1
Views
1K
Replies
1
Views
1K
Replies
1
Views
2K
• Linear and Abstract Algebra
Replies
3
Views
1K