MLE estimator for mean always equal to the mean?

  • Context: Graduate 
  • Thread starter Thread starter Bipolarity
  • Start date Start date
  • Tags Tags
    Mean Mle
Click For Summary
SUMMARY

The maximum likelihood estimator (MLE) for the mean, denoted as ##\mu##, maximizes the joint distribution ##\prod^{n}_{i = 1} p(x_{i},\mu)## based on a sample of independent and identically distributed points from a distribution ##p(x, \mu)##. For Gaussian, Poisson, and Bernoulli distributions, the MLE for the mean equals the sample mean ##\frac{(x_{1}+x_{2}+...+x_{n})}{n}##. However, this is not universally true; the Cauchy distribution serves as a counterexample where the mean is undefined, and the MLE is the median. The discussion highlights the significance of exponential families and the behavior of MLE in pathological distributions.

PREREQUISITES
  • Understanding of maximum likelihood estimation (MLE)
  • Familiarity with Gaussian, Poisson, and Bernoulli distributions
  • Knowledge of the Cauchy distribution and its properties
  • Basic calculus for maximizing functions
NEXT STEPS
  • Explore the properties of the Cauchy distribution and its implications for MLE
  • Study exponential families of distributions and their characteristics
  • Learn about the Laplace distribution and its relationship to MLE
  • Investigate local maxima in likelihood functions and their significance
USEFUL FOR

Statisticians, data scientists, and researchers interested in understanding maximum likelihood estimation, particularly in the context of various probability distributions and their properties.

Bipolarity
Messages
773
Reaction score
2
Suppose you have a distribution ##p(x, \mu)##.
You take a sample of n points ## (x_{1}...x_{n})## from independent and identical distributions of ##p(x, \mu)##.

The maximum likelihood estimator (MLE) for the mean ## \mu ## is the value of ## \mu ## that maximizes the joint distribution ## \prod^{n}_{i = 1} p(x_{i},\mu) ##. It is easy to find using calculus.

The sample mean is simply ## \frac{(x_{1}+x_{2}+...+x_{n})}{n} ##.
It turns out that for Gaussian, Poisson, and Bernoulli distributions, the MLE estimator for the mean equals the sample mean. I was curious if this is the case for ALL distributions? If so, how would I prove this? If not, what is one distribution for which this isn't the case?

Thanks!

BiP
 
Physics news on Phys.org
I would start by trying some stuff with the uniform distribution.
 
Consider a family of discrete densities defined by a parameter N that have the form p(X=N) = 0.5 p(X = N+1)= 0.5.

Suppose we take 3 independent samples from such a distribution and get {2,2,2}.
 
Bipolarity said:
Suppose you have a distribution ##p(x, \mu)##.
You take a sample of n points ## (x_{1}...x_{n})## from independent and identical distributions of ##p(x, \mu)##.

The maximum likelihood estimator (MLE) for the mean ## \mu ## is the value of ## \mu ## that maximizes the joint distribution ## \prod^{n}_{i = 1} p(x_{i},\mu) ##. It is easy to find using calculus.

The sample mean is simply ## \frac{(x_{1}+x_{2}+...+x_{n})}{n} ##.
It turns out that for Gaussian, Poisson, and Bernoulli distributions, the MLE estimator for the mean equals the sample mean. I was curious if this is the case for ALL distributions? If so, how would I prove this? If not, what is one distribution for which this isn't the case?

Thanks!

BiP
You have noticed something special about so-called "exponential families". https://en.wikipedia.org/wiki/Exponential_family Many famous families of distributions are exponential families but there are also plenty of famous families of distributions which aren't.
 
Bipolarity said:
I was curious if this is the case for ALL distributions?
Consider the distribution with pdf given by ##\frac 1 {\pi(1+x^2)}## for ##x \in \mathbb R##. This is the Cauchy distribution. Given a finite sample drawn from this distribution, you certainly can calculate ##\frac{\sum x_i} n##, but this has no meaning because this distribution does not have a mean. This is a pathological distribution. The mean and variance are undefined (do the integrals).
 
My example would be the Laplace distribution aka double exponential (warning: there are more distributions with the same name) with pdf given by exp(-|x - mu|)/2. The mean is well-defined and it's mu. The mle based on a sample of size n is the median (the middle observation if n is odd, and anything between the two middle observations if n is even).

To bring the Cauchy distribution into the story, we should make it a one-parameter distribution with pdf proportional to 1 /(1 + (x - mu)^2). Now we have a family of distributions depending on mu. The parameter mu is the centre of symmetry of these distributions but indeed they do not have an expectation value (nor a variance). But the mle, based on a sample of size n from this distribution, is for large n the best you can possibly do. You must look out for local maxima then. There is a theorem that for large n there will be one "good" global maximum of the likelihood, and a Poisson (1) distributed number of "bad" local maxima.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
1
Views
4K
  • · Replies 10 ·
Replies
10
Views
3K
Replies
5
Views
5K