Why is the maximum likelihood estimation accurate?

Click For Summary

Discussion Overview

The discussion revolves around the accuracy and theoretical justification of maximum likelihood estimation (MLE) as a method for estimating parameters in statistical models. Participants explore the conditions under which MLE is considered effective, seek intuitive explanations, and question the validity of MLE in various contexts.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants express confusion about why maximizing the likelihood function provides a good estimate of the actual parameter, particularly in general cases beyond the normal distribution.
  • Others inquire about the necessary knowledge to formally prove that the maximum of the likelihood function serves as an estimator for the actual parameter.
  • A participant suggests that MLE may be used primarily due to the lack of better alternatives, raising questions about its appropriateness in classical statistics.
  • There are discussions about the relationship between MLE and Bayesian methods, particularly regarding the use of prior distributions and the implications of using maximum a posteriori (MAP) estimates.
  • One participant argues that the quality of the MLE depends on various criteria for what constitutes a "good" estimator, such as unbiasedness and consistency, and notes that MLE can be asymptotically good under certain conditions.
  • Another participant emphasizes that the MLE should be viewed as a "smart guess" rather than a definitive confidence interval for the true parameter value, highlighting the role of luck in its accuracy.
  • Concerns are raised about the validity of MLE when the underlying assumptions about the population are not met, illustrated by an example involving biased coin flips.
  • A later reply attempts to provide an intuitive understanding of MLE, suggesting that the likelihood function reflects the underlying probability density and that choosing the correct parameter maximizes the likelihood of observed samples.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the effectiveness of MLE or its theoretical underpinnings. Multiple competing views are presented regarding its appropriateness, the conditions under which it is effective, and the nature of its estimation capabilities.

Contextual Notes

Limitations in the discussion include the lack of formal proofs for the claims made about MLE, varying interpretations of what constitutes a "good" estimator, and the dependence on specific assumptions about the underlying population distributions.

Avatrin
Messages
242
Reaction score
6
Hi
I've been googling maximum likelihood estimation. While I do understand how to compute it, I don't understand why maximizing the likelihood function will give us a good estimate of the actual parameter.

In some cases, like the normal distribution, it seems almost obvious. However, in the more general case, I don't know why it is true.

So, I have two questions:
How much knowledge do I need to prove that the maximum of the likelihood function is an estimator of the actual parameter?
Is there a relative intuitive explanation for why this method gives us a good estimate for the actual parameter?
 
Physics news on Phys.org
what would you propose to use instead of maximum likelihood?
 
StoneTemplePython said:
what would you propose to use instead of maximum likelihood?
How is that relevant to anything? I am just looking for a formal proof, or, if that doesn't exist, an intuitive explanation for why it should work in the general case.
 
Avatrin said:
How is that relevant to anything? I am just looking for a formal proof, or, if that doesn't exist, an intuitive explanation for why it should work in the general case.

Actually you said

Avatrin said:
So, I have two questions:
How much knowledge do I need to prove that the maximum of the likelihood function is an estimator of the actual parameter?
Is there a relative intuitive explanation for why this method gives us a good estimate for the actual parameter?

My question is directly related to prodding you to get an answer to portion that I bolded (i.e. question number 2).
 
StoneTemplePython said:
Actually you said
My question is directly related to prodding you to get an answer to portion that I bolded (i.e. question number 2).
Well, I am looking for a relatively intuitive explanation because that makes it easier to get through the formal details afterwards. That's how I got through subjects like topology.

However, if you were just prodding me for an answer... I guess that means this is a case of us using the method because no good alternative exists?
 
Avatrin said:
Well, I am looking for a relatively intuitive explanation because that makes it easier to get through the formal details afterwards. That's how I got through subjects like topology.

However, if you were just prodding me for an answer... I guess that means this is a case of us using the method because no good alternative exists?

I'll give a closely related parallel, which is basically how I think about it. Ignore the classical framework for a moment and consider the Bayesian one.

In Bayesian inference you have a prior distribution and likelihood function(s). Without going into much detail, if you have an improper, uniform prior, then your result (posterior) is just showing the effects of the likelihood function. Your result is an entire distribution -- and perhaps multivariable distribution, which is not easy to compress / work with. (Some would say don't compress these distributions, but this can become intractable as you may guess in large scale data projects.)

What kind of summary item would you use to describe the entirety of your distribution? Obviously this is a lossy compression.

Typically people use either MAP (Maximum APosteriori -- i.e. equivalent to maximum likelihood under these special conditions) or LMS (least mean squared error -- i.e. expected value).

The reality is both of these are (relatively) easy to work with, though you could try to come up with something else I suppose.

In some sense this is a very simple idea: minimize a cost function or choose the most likely 'case'.
- - - -

There are some knotty interpretation issues in classical statistics that make the correct interpretation of results something different than the way most people say. The idea of a posterior distribution doesn't make sense in the classical framework. And the idea of an expected value over said non-existent distributions also doesn't. However the idea of honing in on the most likely 'explanation' does (i.e. ##\text{MAP} \to \text{Max Likelihood}##).

Last I checked there are still significant debates on how appropriate it is to use max likelihood in classical stats. But it is at least something, so people use it.
 
Avatrin said:
I don't understand why maximizing the likelihood function will give us a good estimate of the actual parameter.

It won't necessarily give you a good estimate.

The intuitive idea of "good" can be translated into a precise mathematical definition in different ways, and different definitions of "good" imply different ways of doing things.

Some examples of different criteria for a "good" estimator are 1) unbiased 2) minimum variance 3) maximum likelihood 4) consistent.

In many situations the maximimum liklihood estimator "asymptotically" has all those properties and the maximum likelihood estimator is conceptually simple since it involves the familiar scenario of trying to find where a function attains a maximum value. That's why one often sees maximum likelihood estimators being used, but whether a maximum likelihood estimator is "good" or not depends on the particulars of a given estimation task.
 
  • Like
Likes   Reactions: jasonRF, Tosh5457, atyy and 1 other person
Given a sample result, you should look at the MLE as "what is your smartest guess", not as "what is the confidence interval of the true parameter value".

How good the maximum likelihood estimator is depends entirely on unmeasurable luck. Given a result, you are figuring out which population parameter would make that result most likely. If someone flipped a blank coin and told you that the result was heads, you could maximize the likelihood of that result by saying that the coin had heads on both sides. Your saying that doesn't make it true. There is no valid way to assign a probability to the accuracy of the MLE unless you know something about the entire world of all possible populations that this population is one of.

That being said, it is not smart to ignore the maximum likelihood estimator. Sometimes that is all you can do. If you have a lot of data and know enough about the population, then the MLE can be quite good. In the case of the coin toss, getting just one tail on another flip would make the likelihood of a two-headed coin 0.
 
FactChecker said:
Given a sample result, you should look at the MLE as "what is your smartest guess", not as "what is the confidence interval of the true parameter value".

How good the maximum likelihood estimator is depends entirely on unmeasurable luck. Given a result, you are figuring out which population parameter would make that result most likely. If someone flipped a blank coin and told you that the result was heads, you could maximize the likelihood of that result by saying that the coin had heads on both sides. Your saying that doesn't make it true. There is no valid way to assign a probability to the accuracy of the MLE unless you know something about the entire world of all possible populations that this population is one of.

That being said, it is not smart to ignore the maximum likelihood estimator. Sometimes that is all you can do. If you have a lot of data and know enough about the population, then the MLE can be quite good. In the case of the coin toss, getting just one tail on another flip would make the likelihood of a two-headed coin 0.
In that case, let me change my question: Why is MLE a smart guess?

Since I originally wrote the OP, I think I have an intuitive understanding of why; We expect the sample distribution to reflect the underlying probability density p(x|p). So, there will be many samples where p(x|p) is high; If we choose the correct parameter, p(x|p) will be high where there are many samples, and so the likelihood function will be a larger number than if the parameter is wrong; In which case mange samples will wind up where p(x|p2) is lower than if we had used the correct parameter.

However, there must be some theoretical underpinning behind the MLE that can give me a better understanding of it. Also, my reasoning above only works for analytic functions which while sufficient for practical applications, cannot give me the understanding I want.
 
  • #10
You may be looking for more "theoretical underpinning" than can be formally proven. Although using the MLE seems smart, any formal proof would require a knowledge of all possible statistical populations and the likelihood of each. That is a big order. It is common in Bayesian techniques to start with the assumption of a uniform probability distribution and to adjust it as data is obtained. But that is a different assumption. Sometimes one has to do what seems smart even if it can not be formally proven or even formally analysed.
 
  • Like
Likes   Reactions: StoneTemplePython
  • #11
Avatrin said:
So, there will be many samples where p(x|p) is high;
If you want to understand the utility of the maximum likelihood estimator intuitively, you should also try to think of situations where it would not be useful.

Consider this example. Let the unknown parameter be C. Let a family of discrete distributions have the densities given by:

Pr(X = C + 1000) = .1
Pr(X = C + k) = .01 for k = 1,2,...90

If the sample value of X is 4000, the value of the maximum likelihood estimator of C is 3000. However if C is equal to 3000 , there is a probability of 0.9 that the sample value of X will be in the range of 3001 to 3090.
 
  • Like
Likes   Reactions: atyy

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 3 ·
Replies
3
Views
3K