# MLE estimator for mean always equal to the mean?

1. Sep 2, 2015

### Bipolarity

Suppose you have a distribution $p(x, \mu)$.
You take a sample of n points $(x_{1}...x_{n})$ from independent and identical distributions of $p(x, \mu)$.

The maximum likelihood estimator (MLE) for the mean $\mu$ is the value of $\mu$ that maximizes the joint distribution $\prod^{n}_{i = 1} p(x_{i},\mu)$. It is easy to find using calculus.

The sample mean is simply $\frac{(x_{1}+x_{2}+...+x_{n})}{n}$.
It turns out that for Gaussian, Poisson, and Bernoulli distributions, the MLE estimator for the mean equals the sample mean. I was curious if this is the case for ALL distributions? If so, how would I prove this? If not, what is one distribution for which this isn't the case?

Thanks!

BiP

2. Sep 2, 2015

### micromass

I would start by trying some stuff with the uniform distribution.

3. Sep 3, 2015

### Stephen Tashi

Consider a family of discrete densities defined by a parameter N that have the form p(X=N) = 0.5 p(X = N+1)= 0.5.

Suppose we take 3 independent samples from such a distribution and get {2,2,2}.

4. Sep 4, 2015

### gill1109

You have noticed something special about so-called "exponential families". https://en.wikipedia.org/wiki/Exponential_family Many famous families of distributions are exponential families but there are also plenty of famous families of distributions which aren't.

5. Sep 5, 2015

### D H

Staff Emeritus
Consider the distribution with pdf given by $\frac 1 {\pi(1+x^2)}$ for $x \in \mathbb R$. This is the Cauchy distribution. Given a finite sample drawn from this distribution, you certainly can calculate $\frac{\sum x_i} n$, but this has no meaning because this distribution does not have a mean. This is a pathological distribution. The mean and variance are undefined (do the integrals).

6. Sep 5, 2015

### gill1109

My example would be the Laplace distribution aka double exponential (warning: there are more distributions with the same name) with pdf given by exp(-|x - mu|)/2. The mean is well-defined and it's mu. The mle based on a sample of size n is the median (the middle observation if n is odd, and anything between the two middle observations if n is even).

To bring the Cauchy distribution into the story, we should make it a one-parameter distribution with pdf proportional to 1 /(1 + (x - mu)^2). Now we have a family of distributions depending on mu. The parameter mu is the centre of symmetry of these distributions but indeed they do not have an expectation value (nor a variance). But the mle, based on a sample of size n from this distribution, is for large n the best you can possibly do. You must look out for local maxima then. There is a theorem that for large n there will be one "good" global maximum of the likelihood, and a Poisson (1) distributed number of "bad" local maxima.