MLE, Uniform Distribution, missing data

Click For Summary

Discussion Overview

The discussion revolves around determining the maximum likelihood estimate (MLE) for the parameter k in a uniform distribution U(0,k) with missing data. Participants explore the implications of having incomplete data on the estimation process and consider various approaches, including the Expectation-Maximization (EM) algorithm.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that the MLE for k should be the largest observed value, which is 3, given the data set X={1,3,*} where * is unknown.
  • Others discuss the potential bias in the MLE due to the missing data, suggesting that the estimate may underestimate k.
  • A participant suggests using the Expectation-Maximization algorithm, assuming the missing value x* is large and iterating to refine the estimate of k.
  • There is a consideration of the likelihood function and how it might be affected by the missing data, with one participant proposing a specific form for the likelihood based on conditions of x.
  • Some participants express concern about the small sample size and its impact on the reliability of the MLE, suggesting that interpolation or simulation might not significantly improve the estimate.
  • One participant indicates a shift in focus from the uniform distribution to the exponential family of distributions for better application of the EM algorithm.

Areas of Agreement / Disagreement

Participants generally agree that k must be at least 3 based on the observed data, but there is no consensus on the best approach to handle the missing data or the validity of the proposed methods. Multiple competing views remain regarding the implications of the missing data on the MLE.

Contextual Notes

Participants note limitations related to the small sample size and the potential bias introduced by missing data. There is also mention of unresolved mathematical steps in deriving the likelihood function.

Who May Find This Useful

This discussion may be useful for those interested in statistical estimation techniques, particularly in the context of uniform distributions and handling missing data, as well as those exploring the application of the Expectation-Maximization algorithm.

sopsku
Messages
4
Reaction score
0
I would like to determine the MLE for k in U(0,k) where U is the uniform pdf constant on the interval [0,k] and zero elsewhere. I would like this estimate in the case of missing data. To be specific, what is the MLE for k given the three draws X={1,3,*} where * is unknown.
 
Physics news on Phys.org
sopsku said:
I would like to determine the MLE for k in U(0,k) where U is the uniform pdf constant on the interval [0,k] and zero elsewhere. I would like this estimate in the case of missing data. To be specific, what is the MLE for k given the three draws X={1,3,*} where * is unknown.

The only thing we can say for certain is that [tex]k\geq 3[/tex]. So what do you think the MLE of k would be?
 
Yes. I think it should be the largest measured value, in this case three. Thank you for the verification.

I had tried to look at it from doing Expectation Maximization: Assume the missing value x* as large and estimating k by using the expectation value of x* = x*/2. If this is greater than 3 iterate again using this as my new k. If it is less than 3, then K=3. This will ultimately always converge to the largest recorded value (=3). Is this a valid argument?

I was troubled by the fact that I have information (additional mesurement(s)) that is being ignored. I guess that means that the MLE with missing information is even more biased by the fact that this information is ignored.
 
sopsku said:
This will ultimately always converge to the largest recorded value (=3). Is this a valid argument?

It should be as long as your likelihood function ranges over distribution parameters for the data you actually have. I believe the MLE is biased toward underestimating k.
was troubled by the fact that I have information (additional mesurement(s)) that is being ignored. I guess that means that the MLE with missing information is even more biased by the fact that this information is ignored.

What information are you ignoring? If a datum is missing the only alternative is to interpolate or simulate it. For this you might use the sample mean which is 2. I think the sample would be too small for MME but you could try it for n=3 if in fact the missing datum was included in the sample but lost.
 
Last edited:
I think I am ignoring the fact that the missing data could be greater than 3 if the true value of the K is greater than three. Perhaps the likelihood looks something like

If[x > 3, (3/k) (1/x)^3, 0]+If[x > k, ((k - 3)/k) (1/x)^3, 0]

(where k is my assumed K>3), which is maximum for x=k=3, implying K=3 as above, but I am not sure of my likelihood. I am trying to stay with MLE since I got sidetracked into this while looking at the Expectation-Maximization algorithm
 
sopsku said:
I think I am ignoring the fact that the missing data could be greater than 3 if the true value of the K is greater than three.

Well, of course, k>3 is possible, but I assume to have good random sample, if very small. Given the small sample with a missing data point, I agree that this MLE is about as good as you are going to be able to achieve.

Choosing to interpolate the missing data point as I described will not change your estimate. It will simply increase its power a bit. Of course this technique is usually used on much larger data sets (which are expensive to develop) where a few data points get "lost" somehow, and you want to maximize the power of your estimate. If you were to do this on this set, it would only be as an experiment, not for statistical inference. You really can't make any inferences from this tiny data set
 
Last edited:
I agree with all that you are saying. I am not really trying to "improve" the estimate. What I am interested in is a functional form for the likelihood and thought if I understood the MLE for my toy n=3, U(0,K) example I would be one step closer to this real goal. Given the functional form I wanted to formally apply the EM algorithm. I thought the toy example would be insightful to learning about the EM algorithm but the piecewise continuous nature of the U(0,K) has lead me rather astray. I am back to using the exponential family of distributions to investigate the EM algorithm which make the expectation step more straightforward.

I want to thank you very much for your kind help.
 
sopsku said:
I want to thank you very much for your kind help.

You're welcome.
 
sopsku said:
I had tried to look at it from doing Expectation Maximization: Assume the missing value x* as large and estimating k by using the expectation value of x* = x*/2. If this is greater than 3 iterate again using this as my new k. If it is less than 3, then K=3. This will ultimately always converge to the largest recorded value (=3). Is this a valid argument?

As far as I understand it, the EM algorithm works like this: For the E step, since x* is U(0,k) with k>=3, the log-likelihood given x* and the 2 other datapoints is log(1/k^3), so the expected log-likelihood is also log(1/k^3). For the M step, this is maximized at k=3, so EM stops after 1 iteration.

Also for uniform estimation it's worth taking a look at minimum variance estimators. Wikipedia has a good article on how the German Tank Problem was solved in this way.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 8 ·
Replies
8
Views
5K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 29 ·
Replies
29
Views
6K
  • · Replies 1 ·
Replies
1
Views
3K