Why is the maximum likelihood estimation accurate?

In summary: However, there are situations where one might want to use a different estimator, for example if the probability of an event is very small. In this case, a more sophisticated estimator such as the least mean squared error (LMS) may be a better choice.I'll give a closely related parallel, which is basically how I think about it. Ignore the classical framework for a moment and consider the Bayesian one.In Bayesian inference you have a prior distribution and likelihood function(s). Without going into much detail, if you have an improper, uniform prior, then your result (posterior) is just showing the effects of the likelihood function. Your result
  • #1
Avatrin
245
6
Hi
I've been googling maximum likelihood estimation. While I do understand how to compute it, I don't understand why maximizing the likelihood function will give us a good estimate of the actual parameter.

In some cases, like the normal distribution, it seems almost obvious. However, in the more general case, I don't know why it is true.

So, I have two questions:
How much knowledge do I need to prove that the maximum of the likelihood function is an estimator of the actual parameter?
Is there a relative intuitive explanation for why this method gives us a good estimate for the actual parameter?
 
Physics news on Phys.org
  • #2
what would you propose to use instead of maximum likelihood?
 
  • #3
StoneTemplePython said:
what would you propose to use instead of maximum likelihood?
How is that relevant to anything? I am just looking for a formal proof, or, if that doesn't exist, an intuitive explanation for why it should work in the general case.
 
  • #4
Avatrin said:
How is that relevant to anything? I am just looking for a formal proof, or, if that doesn't exist, an intuitive explanation for why it should work in the general case.

Actually you said

Avatrin said:
So, I have two questions:
How much knowledge do I need to prove that the maximum of the likelihood function is an estimator of the actual parameter?
Is there a relative intuitive explanation for why this method gives us a good estimate for the actual parameter?

My question is directly related to prodding you to get an answer to portion that I bolded (i.e. question number 2).
 
  • #5
StoneTemplePython said:
Actually you said
My question is directly related to prodding you to get an answer to portion that I bolded (i.e. question number 2).
Well, I am looking for a relatively intuitive explanation because that makes it easier to get through the formal details afterwards. That's how I got through subjects like topology.

However, if you were just prodding me for an answer... I guess that means this is a case of us using the method because no good alternative exists?
 
  • #6
Avatrin said:
Well, I am looking for a relatively intuitive explanation because that makes it easier to get through the formal details afterwards. That's how I got through subjects like topology.

However, if you were just prodding me for an answer... I guess that means this is a case of us using the method because no good alternative exists?

I'll give a closely related parallel, which is basically how I think about it. Ignore the classical framework for a moment and consider the Bayesian one.

In Bayesian inference you have a prior distribution and likelihood function(s). Without going into much detail, if you have an improper, uniform prior, then your result (posterior) is just showing the effects of the likelihood function. Your result is an entire distribution -- and perhaps multivariable distribution, which is not easy to compress / work with. (Some would say don't compress these distributions, but this can become intractable as you may guess in large scale data projects.)

What kind of summary item would you use to describe the entirety of your distribution? Obviously this is a lossy compression.

Typically people use either MAP (Maximum APosteriori -- i.e. equivalent to maximum likelihood under these special conditions) or LMS (least mean squared error -- i.e. expected value).

The reality is both of these are (relatively) easy to work with, though you could try to come up with something else I suppose.

In some sense this is a very simple idea: minimize a cost function or choose the most likely 'case'.
- - - -

There are some knotty interpretation issues in classical statistics that make the correct interpretation of results something different than the way most people say. The idea of a posterior distribution doesn't make sense in the classical framework. And the idea of an expected value over said non-existent distributions also doesn't. However the idea of honing in on the most likely 'explanation' does (i.e. ##\text{MAP} \to \text{Max Likelihood}##).

Last I checked there are still significant debates on how appropriate it is to use max likelihood in classical stats. But it is at least something, so people use it.
 
  • #7
Avatrin said:
I don't understand why maximizing the likelihood function will give us a good estimate of the actual parameter.

It won't necessarily give you a good estimate.

The intuitive idea of "good" can be translated into a precise mathematical definition in different ways, and different definitions of "good" imply different ways of doing things.

Some examples of different criteria for a "good" estimator are 1) unbiased 2) minimum variance 3) maximum likelihood 4) consistent.

In many situations the maximimum liklihood estimator "asymptotically" has all those properties and the maximum likelihood estimator is conceptually simple since it involves the familiar scenario of trying to find where a function attains a maximum value. That's why one often sees maximum likelihood estimators being used, but whether a maximum likelihood estimator is "good" or not depends on the particulars of a given estimation task.
 
  • Like
Likes jasonRF, Tosh5457, atyy and 1 other person
  • #8
Given a sample result, you should look at the MLE as "what is your smartest guess", not as "what is the confidence interval of the true parameter value".

How good the maximum likelihood estimator is depends entirely on unmeasurable luck. Given a result, you are figuring out which population parameter would make that result most likely. If someone flipped a blank coin and told you that the result was heads, you could maximize the likelihood of that result by saying that the coin had heads on both sides. Your saying that doesn't make it true. There is no valid way to assign a probability to the accuracy of the MLE unless you know something about the entire world of all possible populations that this population is one of.

That being said, it is not smart to ignore the maximum likelihood estimator. Sometimes that is all you can do. If you have a lot of data and know enough about the population, then the MLE can be quite good. In the case of the coin toss, getting just one tail on another flip would make the likelihood of a two-headed coin 0.
 
  • #9
FactChecker said:
Given a sample result, you should look at the MLE as "what is your smartest guess", not as "what is the confidence interval of the true parameter value".

How good the maximum likelihood estimator is depends entirely on unmeasurable luck. Given a result, you are figuring out which population parameter would make that result most likely. If someone flipped a blank coin and told you that the result was heads, you could maximize the likelihood of that result by saying that the coin had heads on both sides. Your saying that doesn't make it true. There is no valid way to assign a probability to the accuracy of the MLE unless you know something about the entire world of all possible populations that this population is one of.

That being said, it is not smart to ignore the maximum likelihood estimator. Sometimes that is all you can do. If you have a lot of data and know enough about the population, then the MLE can be quite good. In the case of the coin toss, getting just one tail on another flip would make the likelihood of a two-headed coin 0.
In that case, let me change my question: Why is MLE a smart guess?

Since I originally wrote the OP, I think I have an intuitive understanding of why; We expect the sample distribution to reflect the underlying probabilty density p(x|p). So, there will be many samples where p(x|p) is high; If we choose the correct parameter, p(x|p) will be high where there are many samples, and so the likelihood function will be a larger number than if the parameter is wrong; In which case mange samples will wind up where p(x|p2) is lower than if we had used the correct parameter.

However, there must be some theoretical underpinning behind the MLE that can give me a better understanding of it. Also, my reasoning above only works for analytic functions which while sufficient for practical applications, cannot give me the understanding I want.
 
  • #10
You may be looking for more "theoretical underpinning" than can be formally proven. Although using the MLE seems smart, any formal proof would require a knowledge of all possible statistical populations and the likelihood of each. That is a big order. It is common in Bayesian techniques to start with the assumption of a uniform probability distribution and to adjust it as data is obtained. But that is a different assumption. Sometimes one has to do what seems smart even if it can not be formally proven or even formally analysed.
 
  • Like
Likes StoneTemplePython
  • #11
Avatrin said:
So, there will be many samples where p(x|p) is high;
If you want to understand the utility of the maximum likelihood estimator intuitively, you should also try to think of situations where it would not be useful.

Consider this example. Let the unknown parameter be C. Let a family of discrete distributions have the densities given by:

Pr(X = C + 1000) = .1
Pr(X = C + k) = .01 for k = 1,2,...90

If the sample value of X is 4000, the value of the maximum likelihood estimator of C is 3000. However if C is equal to 3000 , there is a probability of 0.9 that the sample value of X will be in the range of 3001 to 3090.
 
  • Like
Likes atyy

1. What is maximum likelihood estimation (MLE)?

Maximum likelihood estimation is a statistical method used to determine the parameters of a probability distribution by finding the set of values that maximizes the likelihood of observing the given data.

2. How accurate is maximum likelihood estimation?

MLE is considered to be one of the most accurate methods for estimating parameters because it utilizes all available information from the data and assumes the most likely values for the parameters. However, the accuracy of MLE depends on the quality and quantity of the data.

3. How does maximum likelihood estimation work?

MLE works by first assuming a probability distribution for the data. Then, using the given data, the method calculates the likelihood of observing the data for different values of the parameters. The set of values that maximizes the likelihood is considered to be the most accurate estimation of the parameters.

4. What are the assumptions of maximum likelihood estimation?

MLE assumes that the data follows a specific probability distribution and that the observations are independent and identically distributed. It also assumes that the parameters are continuous and that the likelihood function is differentiable.

5. Is maximum likelihood estimation always the best method for parameter estimation?

No, MLE may not always be the best method for parameter estimation as it assumes a specific probability distribution and may not be suitable for all types of data. Other methods, such as Bayesian estimation, may be more appropriate in certain situations.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
906
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
737
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
Back
Top