Is Maximum Likelihood Estimation Valid with Limited Data?

Click For Summary

Discussion Overview

The discussion revolves around the validity of Maximum Likelihood Estimation (MLE) when applied to limited data. Participants explore the conditions under which MLE can be effectively utilized, particularly in the context of Gaussian models and the implications of having few samples.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant suggests that MLE requires a substantial amount of data to accurately estimate the parameter θ, indicating a belief that limited data may hinder the calculation of θml.
  • Another participant clarifies that while having more samples improves the reliability of the MLE estimate, it is possible to perform MLE with any number of samples, though the quality of the estimate may vary.
  • A participant presents a measurement vector model and questions why MLE seeks to maximize the probability of ε, expressing confusion over the concept of maximizing probability when it seems counterintuitive to minimize ε.
  • In response, another participant explains that maximizing the probability of ε aligns with selecting a model that best explains the observed data, using the example of coin flips to illustrate the reasoning behind choosing a model that reflects the observed outcomes.

Areas of Agreement / Disagreement

Participants express differing views on the necessity of data quantity for effective MLE application, with some asserting that limited data may not yield reliable estimates, while others argue that MLE can still be performed with fewer samples. The discussion remains unresolved regarding the implications of maximizing versus minimizing probabilities in the context of MLE.

Contextual Notes

Participants highlight potential misunderstandings regarding the interpretation of probability in MLE, particularly in relation to the Gaussian model and the implications of limited data on the estimation process.

senmeis
Messages
77
Reaction score
3
Hello,

I have a question to Maximum Likelihood Estimation. The typical form of MLE looks like this:

X = Hθ + W. W is gaussion with N(0, C).
θml = (HTC-1H)-1HTC-1X

I think θml can only be calculated after a lot of measurements are made, that is, there are plenty of samples of H and X. Or, it is impossible to get θml if only information about θ is known. Do I understand it correctly?

Senmeis
 
Physics news on Phys.org
The typical form of MLE is that you have some random variable X that depends on a parameter \theta, and has a density p(x,\theta), then given some samples x_1,...,x_n of X, you find the value of \theta maximizing
\prod_{j=1}^{n} p(x_j, \theta)

or something to that effect (it might be different if your samples are dependent, or you have samples from different random variables etc.). You seem to have a very specific application of this to a Gaussian model. You can do the calculation with any number of samples, but the more samples you have the better odds you have that your MLE estimate is a good estimate of the real value of the parameter.
 
  • Like
Likes   Reactions: 1 person
A measurement vector can be written as:

ø = Du + ε where ε is a zero mean gaussian random vector.

The MLE is Dml when P(ε) is maximum, but why maximum? I think the probability of ε shall be as small as possible. I know I must make a understanding mistake. Can anyone point it out?

Senmeis
 
In words, by picking D to maximize P(ε), you are saying 'My choice of D indicates that the events I just witnessed were not unusual in any way", whereas if you try to minimize P(ε), you are saying "My choice of D indicates the events I just witnessed will never happen again in the history of the universe".

To give a simple example, let's say I flip one hundred coins and all of them come up heads. I then ask you for a MLE of the probability that the coin lands on heads. If you want to maximize the probability that one hundred heads come up and notails comes up, you'll end up saying 'the coin has a probability of 1 of landing on heads' because if that is the case, then the probability that I get 100 heads in a row is 1. If you wanted to minimize the probability that the coin comes up heads 100 times in a row, you will tell me 'the coin has a probability of 0 of landing on heads.', and 100 heads coming up in a row will have a probability of 0. Which sounds more reasonable?
 
senmeis said:
A measurement vector can be written as:

ø = Du + ε where ε is a zero mean gaussian random vector.

The MLE is Dml when P(ε) is maximum, but why maximum? I think the probability of ε shall be as small as possible. I know I must make a understanding mistake. Can anyone point it out?

Senmeis

You want to find the model that most likely could have produced the data that you have. That is the goal of the MLE. If you have to chose between a model the is very unlikely to produce your data, and one that is likely to give those results, you pick the more likely one. If you have tossed a coin 50 times and got 50 heads, you would pick the model that the coin is rigged for heads, not the model that the coin is fair.
 

Similar threads

  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K