Distributivity/Inheritance of Max Likelihood Estimators

WWGD · Oct 21, 2020

Hi,
IIRC, Maximum Likelihood Estimators ( MLEs) satisfy an " Inheritance" property , so that if ##m_1,m_2,..,m_n## are MLEs for ##M_1,M_2,...,M_n## respectfully and f is a Random Variable of the ##M_i##, then the MLE for f is given by ##f(m_1,m_2,...,m_n)##. Is this correct? If so, is there a " Standard" name for this property?

For context sake, I am trying to find the MLE for an RV X/Y ( where Y>0) by finding MLE ( X), MLE (Y) , so that by inheritance MLE(X/Y)= MLE(X)/MLE(Y).
Thanks.

Office_Shredder · Oct 21, 2020

What does ##m_i## is an MLE of ##M_i## mean? Normally an MLE involves having some data, and the ##M_i## would be like, a random variable depending on some parameter. Or are the ##M_i## the parameters of some implicit random variable here?

WWGD · Oct 21, 2020

Formally, ##m_i## is the value of the parameter of the distribution in question that maximizes the likelihood function. The likelihood function is the conditional density, conditioned on the observed random sample values. Hope I expressed myself clearly.

WWGD · Oct 21, 2020

Maybe easier to give examples. For the discrete case, say we have a population known to be Bernoulli and we take a random sample ( Random meaning Independent and I.D) and obtain values ##x_1,x_2,..., x_n##. The likelihood function is then

P( p| x_1, x_2,.., x_n)=P(p|x_1)P(p|x_2)...P(p| x_n)
We then find the value of p that maximizes the probability of having observed the given sample values. We just differentiate against the parameter(s) p in question, set equal to 0 and solve por p. In our case of the Bernoulli, we find that the estimator

P^:=## \frac {1}{n}\Sigma_{i=1}^n x_i ##

Is the Maximum likelihood estimator for the population proportion p in a Bernoulli population.

Stephen Tashi · Oct 22, 2020

WWGD said:

The likelihood function is then

P( p| x_1, x_2,.., x_n)=P(p|x_1)P(p|x_2)...P(p| x_n)

This notation is unclear. If ##P_i## is the parameter of the i-th bernoulli random varible ##X_i## then the notation ##P(p|x_i)## seems to indicate a conditional probability that ##P_i = p_i## given that ##X_i = x_i##. However no prior distribution has been stated for the ##P_i##, so this conditional probability cannot be calculated.

WWGD said:

if ##m_1,m_2,..,m_n## are MLEs for ##M_1,M_2,...,M_n## respectfully and f is a Random Variable of the ##M_i##, then the MLE for f is given by ##f(m_1,m_2,...,m_n)##.

In your example, you didn't say what the function ##f## is.

Assuming ##f## is a real valued function, to compute a maximum liklihood estimate of ##f##, we must know the family ##\mathbb{F}_\Lambda## of joint distributions for random variables ##X_i## that is parameterized by a single parameter ##\Lambda##. As I understand the question, it assumes we know the family ##\mathbb{F}_\overrightarrow{M}## of joint distributions for the ##X_i## that is parameterized by the ##n## parameters ##M_1,M_2,...M_n##.

For an arbitrary function ##f(M_1,M_2,...M_n)## it isn't clear (to me) that ##\mathbb{F}_\Lambda## is uniquely determined just from knowing ##\mathbb{F}_\overrightarrow{M}## and also knowing that ##\lambda= f(m_1,m_2,...,m_n)##

Stephen Tashi · Oct 22, 2020

For example, consider the case of two discrete random variables ##X_1,X_2## and the family of joint distributions ##\mathbb{F}_\overrightarrow{M}## given by

##g (X_1,X_2,M_1,M2) = g_a(X_1,M_1) g_b(X_2,M_2)## with
##g_a( 0,-1) = 0.3##
##g_a( 1,-1) = 0.7##
##g_a( 0, 1) = 0.2##
##g_a( 1, 1) = 0.8##
##g_b( 0,-1) = 0.2##
##g_b( 1,-1) = 0.8##
##g_b( 0, 1) = 0.3##
##g_b( 1, 1) = 0.7##
and ##f(M_1 + M_2) = (M_1)^2 +(M_2)^2 = \Lambda##

If the family ##\mathbb{F}_\Lambda## is given by the function ##h(X_1,X_2,\Lambda)##, what is the value of ##h(1,1,2)##?

WWGD · Oct 22, 2020

Stephen Tashi said:

For example, consider the case of two discrete random variables ##X_1,X_2## and the family of joint distributions ##\mathbb{F}_\overrightarrow{M}## given by

##g (X_1,X_2,M_1,M2) = g_a(X_1,M_1) g_b(X_2,M_2)## with
##g_a( 0,-1) = 0.3##
##g_a( 1,-1) = 0.7##
##g_a( 0, 1) = 0.2##
##g_a( 1, 1) = 0.8##
##g_b( 0,-1) = 0.2##
##g_b( 1,-1) = 0.8##
##g_b( 0, 1) = 0.3##
##g_b( 1, 1) = 0.7##
and ##f(M_1 + M_2) = (M_1)^2 +(M_2)^2 = \Lambda##

If the family ##\mathbb{F}_\Lambda## is given by the function ##h(X_1,X_2,\Lambda)##, what is the value of ##h(1,1,2)##?

But in ML we are taking random samples so that for any i,j, ##X_i,X_j## are independent.

Stephen Tashi · Oct 22, 2020

WWGD said:

But in ML we are taking random samples so that for any i,j, ##X_i,X_j## are independent.

In the example, ##X_1## and ##X_2## are independent.

WWGD · Oct 22, 2020

Stephen Tashi said:

This notation is unclear. If ##P_i## is the parameter of the i-th bernoulli random varible ##X_i## then the notation ##P(p|x_i)## seems to indicate a conditional probability that ##P_i = p_i## given that ##X_i = x_i##. However no prior distribution has been stated for the ##P_i##, so this conditional probability cannot be calculated.
In your example, you didn't say what the function ##f## is.

Assuming ##f## is a real valued function, to compute a maximum liklihood estimate of ##f##, we must know the family ##\mathbb{F}_\Lambda## of joint distributions for random variables ##X_i## that is parameterized by a single parameter ##\Lambda##. As I understand the question, it assumes we know the family ##\mathbb{F}_\overrightarrow{M}## of joint distributions for the ##X_i## that is parameterized by the ##n## parameters ##M_1,M_2,...M_n##.

For an arbitrary function ##f(M_1,M_2,...M_n)## it isn't clear (to me) that ##\mathbb{F}_\Lambda## is uniquely determined just from knowing ##\mathbb{F}_\overrightarrow{M}## and also knowing that ##\lambda= f(m_1,m_2,...,m_n)##

No, it is a single population we're drawing from by assumption. We define a random sample ##X_1,..., X_n## as a collection of i.i.d Random Variables. Since they are Independent and Identically distributed they come from a single Bernoulli population with parameter p. By independence, the probability of observing ##X_1,X_2,.., X_n ## from said population is

##P(p|X_1)P(p| X_2)...p(p|X_n)##

This is defined to be the likelihood function associated with the sample ##X_1, X_2,.., X_n## and the value of p that maximizes the likelihood function is called the maximum likelihood estimator.

Stephen Tashi · Oct 23, 2020

WWGD said:

No, it is a single population we're drawing from by assumption. We define a random sample ##X_1,..., X_n## as a collection of i.i.d Random Variables.

Your original post does not make that clear.

Since they are Independent and Identically distributed they come from a single Bernoulli population with parameter p. By independence, the probability of observing ##X_1,X_2,.., X_n ## from said population is

##P(p|X_1)P(p| X_2)...p(p|X_n)##

That is a misleading use of the notation "##P(.|.)## since such notation is used to indicate conditional probabilities.

If the family of probability distributions is ##g(x,p)## then when the parameter is ##p##, the liklihood of observing the sequence ##(x_1,x_2,x_3)## is ##g(x_1,p) g(x_2,p) g(x_3,p)##.

This is defined to be the likelihood function associated with the sample and the value of p that maximizes the likelihood function is called the maximum likelihood estimator.

Yes. So, with reference to your original post, what does it mean when you ask "the MLE for ##f## is given by ##f(m_1,m_2,...m_n)## Is this correct?" ?

For the the concept of an MLE for ##f## to make sense, we need a family of probability distributions that is parameterized by values of ##f##. If we have such distributions then we ask what value of ##f## maximizes the probability of the observed data.

For an arbitrary function ##f##, the given distribution ##g(x,p)## may not be sufficient to determine a family of distributions that is parameterized by the value of ##f##.

Perhaps you intend the notation ##f(m_1,m_2,...m_n)## to mean that the ##X_i## are identically distributed and that the common distribution belongs to a single parameter family of distributions of the form ##g(x,p)## And ##m_i## is the maximum likihood estimate for ##p## based only on the ##i##_th the outcome of ##X_i##.

If that's what your notation means, then how does it make sense to speak of an "MLE for f"? We aren't given a family of distributions for the ##X_i## that is parameterized by a single parameter that is interpreted as a value of ##f##?

For example, in the case of Bernoulli random variables, suppose ##f(m_1,m_2,m_3) = (m_1)^2 + (m_2)^2 + 6 m_3## Then ##f## can take on values in ##[0,6]##. What family of distributions defines the joint distribution of ##(X_1,X_2,X_3)## give the value of ##f##?

Office_Shredder · Oct 23, 2020

For what it's worth, my guess is the actual in context question looks something like

X and Y are gaussians with mean x and y, and stdev 1. I have data ##x_1,...,x_n## for ##X##, ##y_1,...,y_n## for ##Y##. This gives me a MLE for x and y. Does the assumption of the distribution of X/Y as a ratio of gaussians, and the data ##x_1/y_1,...,x_n/y_n## give me a MLE of x/y which is equal to the ratio of x and y that I got before?

Did this look like what you actually care about

WWGD · Oct 23, 2020

Office_Shredder said:

For what it's worth, my guess is the actual in context question looks something like

X and Y are gaussians with mean x and y, and stdev 1. I have data ##x_1,...,x_n## for ##X##, ##y_1,...,y_n## for ##Y##. This gives me a MLE for x and y. Does the assumption of the distribution of X/Y as a ratio of gaussians, and the data ##x_1/y_1,...,x_n/y_n## give me a MLE of x/y which is equal to the ratio of x and y that I got before?

Did this look like what you actually care about

Yes, that's it, but for any function of X, Y. E.g: if x,y are Mles for X, Y, is ##x^2+y^2## an Mle for ##X^2+Y^2##?

Stephen Tashi · Oct 23, 2020

Office_Shredder said:

For what it's worth, my guess is the actual in context question looks something like

X and Y are gaussians with mean x and y, and stdev 1. I have data ##x_1,...,x_n## for ##X##, ##y_1,...,y_n## for ##Y##. This gives me a MLE for x and y. Does the assumption of the distribution of X/Y as a ratio of gaussians, and the data ##x_1/y_1,...,x_n/y_n## give me a MLE of x/y which is equal to the ratio of x and y that I got before?

Did this look like what you actually care about

And, in that case, the we are not making an MLE estimate of the function f(X,Y) = X/Y. We are making an MLE estimate of the parameters x,y with respect to the data and the family of two-parameter distributions for the random variable f(X,Y).

In the above situation, shall we assume we have data for the ##x_i,y_i## values individually? Or do we only know the the value of ratios? [Edit: A post appearing while I was composing this, says we know the individual ##x_i,y_i## values]

Stephen Tashi · Oct 23, 2020

WWGD said:

Yes, that's it, but for any function of X, Y. E.g: if x,y are Mles for X, Y, is ##x^2+y^2## an Mle for ##X^2+Y^2##?

Are we assuming ##X## and ##Y## are independent random variables?

WWGD · Oct 23, 2020

Stephen Tashi said:

Are we assuming ##X## and ##Y## are independent random variables?

Not necessarily. Observations themselves are a random sample; independent, i.d., but the variables X,Y not necessarily so.

WWGD · Oct 23, 2020

Ok, I think I found the result I was looking for:

WWGD · Oct 23, 2020

Thanks all for your contribution. Sorry if I was unclear. My friend loaned me his book.

As an example, if V^ is the maximum likelihood estimator for the population variance, then its square root is the MLE for the population standard deviation.

Stephen Tashi · Oct 23, 2020

WWGD said:

Not necessarily. Observations themselves are a random sample; independent, i.d., but the variables X,Y not necessarily so.

Ok. We still need to state the question precisely. For example, suppose I have the observations ##(x_i, y_i)## and want to estimate the parameters ##M_1,M_2## of the two parameter joint distribution of the random tuple ##(X_i,Y_i)## The maximum liklihood esimate for ##M_1,M_2## is, by definition, the values ##m_1,m_2## that maximize the liklihood of the data using the joint distribution that has those parameters.

The maximum liklihood estimation of ##M_1,M_2## using the joint distribution is a different process that making a maximum liklihood estimate of ##M_1## using the marginal distribution of ##X## and a maximum liklihood estimate of ##M_2## using the marginal distribution of ##Y##. And it may not be possible to get unique maximum liklihood estimates that way if ##X## and ##Y## are not independent.

How are we going to introduce a function ##f(X,Y)## into this scenario? The function ##f(X,Y)## is a random variable that has a two parameter distribution also determined by the parameters ##M_1,M_2##. So we can define the maximum liklihood estimate of ##M_1,M_2## given data of the form ##f(x_i,y_i)##. But if we are also given the individual values ##(x_i,y_i)## then the maxiumum liklihood estimate of ##M_1,M_2## based on both forms of the data is (also by defnition) a different process that making a maximum liklihood esimate of ##M_1,M_2## based only on the data ##f_i = f(x_i,y_i) ##

Furthermore, the distribution of ##f## is not necessarily a single parameter distribution. To claim that the maximum liklihood estimate for the parameter of ##f## is ##f(m_1,m_2)## assumes that we are dealing with a 1 parameter distribution.

To make your question precise, you should clarify whether the joint distribution of ##(X_i,Y_i)## is a assumed to be a single parameter distribution. If it is a single parameter distribution then we will assume the distribution of ##f(X,Y)## is also a single parameter distribution. The next point to clarify is whether the maximum liklihood estimate that involves ##f(X,Y)## is made using only the data ##f_i = f(x_i,y_i)## or whether the estimate also uses the ##(x_i,y_i)## data.

Stephen Tashi · Oct 23, 2020

WWGD said:

Ok, I think I found the result I was looking for

That clears up the question.

For example, suppose ##g(X,Y,M_1,M_2)## is a family of joint distributions for ##(X_i,Y_i)## and ##h( X,Y,h_1(M_1), h_2(M_2))## is a different way of expressing the same family of distributions.

Then if ##(m_1,m_2)## is the maximum liklihood estimate for for ##M_1,M_2## based on some data then ##(h_1(m_1), h_2(m_2)) ## is the maxmium liklihood estimate for ##( h_1(M_1), h_2(M_2))## based on the same data.

The language used in that page:

the maximum liklihood estimate of any function ##h(\theta_1, \theta_2,..\theta_k)##

only makes sense if ##h## is a parameter of some distribution - otherwise what it means mean to make a maximum liklihood estimate of ##h## is undefined.

And the claim (The Invariance Property) relies on the fact that the family of distributions defined as functions of the ##\theta_i## is the same family of distributions as the family expressed using functions of the ##\theta_i## ( -or perhaps a more general relation between two families of distributions is sufficient. The two might describe different random variables but we can interpret one set of data as applying to both and each distribution in one family mutually implies a distribution in the other. It's complicated to state such conditions precisely!)

WWGD · Oct 25, 2020

Update:
Apologies to all for mixing things up. I was mixing terms in my head and did some edits in my first post in case you're interested. For the tl; dr, we are dealing with random variables as estimators for population parameters; specifically, given a random sample {##X_1,...,X_n##}we are looking for values or general formulas that maximize the likelihood of observing the given sample.

Distributivity/Inheritance of Max Likelihood Estimators

1. What is distributivity of maximum likelihood estimators?

2. How does distributivity affect the accuracy of maximum likelihood estimators?

3. Can distributivity be applied to all types of maximum likelihood estimators?

4. What is inheritance of maximum likelihood estimators?

5. How does inheritance affect the computational complexity of maximum likelihood estimators?

Similar threads

Hot Threads

Recent Insights