MLE of μ for X1,...Xn with Known σ2i

kimkibun · Jul 15, 2012

is it possible to estimate all parameters of an n-observation (X₁,...X_n) with same mean, μ, but different variances (σ²₁,σ²₂,...,σ²_n)? if we assume that σ²_i are known for all i in {1,...n}, what is the mle of of μ?

chiro · Jul 15, 2012

kimkibun said:

is it possible to estimate all parameters of an n-observation (X₁,...X_n) with same mean, μ, but different variances (σ²₁,σ²₂,...,σ²_n)? if we assume that σ²_i are known for all i in {1,...n}, what is the mle of of μ?

It is possible to do so, but you will have to weight the variances correctly for your estimator.

We know that E[Sum(Xi)/N] = E[mu] since all distributions have the same mean.

In terms of the variance we know that the variance of a normal estimator of a mean is sigma^2/n. Now because we can add variances and because all samples are independent, it means our final variance of the estimator is Sum (sigma_i^2/n_i) wher sigma_i is the standard deviation at index i and n_i is the number of samples from that particular distribution. If all sigma's are the same, this should simplify to sigma^2/N where N is the total number of samples (I'd double check for yourself).

So with that said our estimator will have the distribution (X - mean)/SQRT(Sum(sigma_i^2/n_i)) ~ N(0,1) under CLT assumption.

Now here is the thing though: if you are using classical statistical tests, you are going to assume the Central Limit Theorem which assumes you have enough samples for it to be normally distributed (or close enough to it) so that you can estimate the mean.

Because you have lots of sigma's, if you don't have enough samples with respect to the number of sigma's or if the sigma's are wildly different, then the assumption might break down.

There is even more to consider, but the above should give you an idea in terms of using just the population parameters of expectation.

If you are using things like t-tests, you should use the above kind of formulation and go from the derivation to include the above condition of how the variance of the estimator of the mean is the sum of all the variances. I'm guessing the S^2 will look similar but you'd have to make sure.

kimkibun · Jul 15, 2012

i forgot to tell you that the n-observations was drawn out from a normal population..well anyway, is it possible that different variances might affect the maximum likelihood of the mean?

Parlyne · Jul 15, 2012

kimkibun said:

i forgot to tell you that the n-observations was drawn out from a normal population..well anyway, is it possible that different variances might affect the maximum likelihood of the mean?

They do. If you go through the maximum likelihood logic with different variances, you'll find that the maximum likelihood estimator for the mean is the standard weighted average:
\frac{\sum_{i=1}^N w_i x_i}{\sum_{i=1}^N w_i}
with
w_i = \frac{1}{\sigma_i^{\phantom{i}2}}.

kimkibun · Jul 17, 2012

Parlyne said:

They do. If you go through the maximum likelihood logic with different variances, you'll find that the maximum likelihood estimator for the mean is the standard weighted average:
\frac{\sum_{i=1}^N w_i x_i}{\sum_{i=1}^N w_i}
with
w_i = \frac{1}{\sigma_i^{\phantom{i}2}}.

can you please show me how you get that?

the mle for μ that i got is this,

(Ʃx_i/Ʃσ_i²)(Ʃ1/σ_i²)

is it possible to find the mle of the parameter σ_i²?

Parlyne · Jul 17, 2012

kimkibun said:

can you please show me how you get that?

the mle for μ that i got is this,

(Ʃx_i/Ʃσ_i²)(Ʃ1/σ_i²)

is it possible to find the mle of the parameter σ_i²?

Presuming Gaussian errors, the probability of measuring a value within dx_i of x_i given mean \mu and width \sigma_i is
\frac{dx_i}{\sigma_i\sqrt{2\pi}}e^{-\frac{(x_i-\mu)^2}{2\sigma_i^{\phantom{i}2}}}.

The probability of the full data set is the product of the probabilities for each value, which will be
\frac{\prod_{i=1}^N dx_i}{(2\pi)^\frac{N}{2}\prod_{i=1}^N \sigma_i} e^{-\frac{1}{2}\chi^2},
with
\chi^2 = \sum_{i=1}^N\left(\frac{x_i-\mu}{\sigma_i}\right)^2.

Since \chi^2 is the only part of the probability expression which depends on \mu, maximizing it over \mu is equivalent to minimizing \chi^2. So, we take the derivative of \chi^2 with respect to \mu and set it to 0 as follows.
\begin{align} 0 &= \frac{d}{d\mu}\sum_{i=1}^N \frac{x_i^{\phantom{1}2}-2\mu x_i +\mu^2}{\sigma_i^{\phantom{i}2}}\\<br /> &= \sum_{i=1}^N \frac{-2x_i+2\mu}{\sigma_i^{\phantom{i}2}}\\<br /> &= 2 \mu \sum_{i=1}^N \frac{1}{\sigma_i^{\phantom{i}2}} - 2 \sum_{i=1}^N \frac{x_i}{\sigma_i^{\phantom{1}2}}\\<br /> \mu &= \frac{\sum_{i=1}^N \frac{x_i}{\sigma_i^{\phantom{i}2}}}{\sum_{i=1}^N \frac{1}{\sigma_i^{\phantom{i}2}}}\end{align}

As for \sigma_i, since there's a different value for each measurement, there's nothing to estimate. Each is taken as exact. What you can look for an estimator of is the width of the distribution in \mu, which can be found through error propagation in the equation for the mle of \mu, taking the associated \sigma_i as the error in x_i.

kimkibun · Jul 18, 2012

thank you so much sir! i really appreciate your efforts! God Bless you!

MLE of μ for X1,...Xn with Known σ2i

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective