Distributivity/Inheritance of Max Likelihood Estimators

In summary, the Maximum Likelihood Estimator (MLE) has an "Inheritance" property where if ##m_1,m_2,..,m_n## are MLEs for ##M_1,M_2,...,M_n## respectively and f is a Random Variable of the ##M_i##, then the MLE for f is given by ##f(m_1,m_2,...,m_n)##. This is commonly used in finding MLEs for a population by finding the MLEs for each individual parameter and using them in conjunction to find the MLE for the population. The likelihood function is used to calculate the probability of observing a sample from a specific population, and the value of the parameter that
  • #1
WWGD
Science Advisor
Gold Member
7,010
10,469
Hi,
IIRC, Maximum Likelihood Estimators ( MLEs) satisfy an " Inheritance" property , so that if ##m_1,m_2,..,m_n## are MLEs for ##M_1,M_2,...,M_n## respectfully and f is a Random Variable of the ##M_i##, then the MLE for f is given by ##f(m_1,m_2,...,m_n)##. Is this correct? If so, is there a " Standard" name for this property?

For context sake, I am trying to find the MLE for an RV X/Y ( where Y>0) by finding MLE ( X), MLE (Y) , so that by inheritance MLE(X/Y)= MLE(X)/MLE(Y).
Thanks.
 
Physics news on Phys.org
  • #2
What does ##m_i## is an MLE of ##M_i## mean? Normally an MLE involves having some data, and the ##M_i## would be like, a random variable depending on some parameter. Or are the ##M_i## the parameters of some implicit random variable here?
 
  • #3
Formally, ##m_i## is the value of the parameter of the distribution in question that maximizes the likelihood function. The likelihood function is the conditional density, conditioned on the observed random sample values. Hope I expressed myself clearly.
 
  • #4
Maybe easier to give examples. For the discrete case, say we have a population known to be Bernoulli and we take a random sample ( Random meaning Independent and I.D) and obtain values ##x_1,x_2,..., x_n##. The likelihood function is then

P( p| x_1, x_2,.., x_n)=P(p|x_1)P(p|x_2)...P(p| x_n)
We then find the value of p that maximizes the probability of having observed the given sample values. We just differentiate against the parameter(s) p in question, set equal to 0 and solve por p. In our case of the Bernoulli, we find that the estimator

P^:=## \frac {1}{n}\Sigma_{i=1}^n x_i ##

Is the Maximum likelihood estimator for the population proportion p in a Bernoulli population.
 
  • #5
WWGD said:
The likelihood function is then

P( p| x_1, x_2,.., x_n)=P(p|x_1)P(p|x_2)...P(p| x_n)

This notation is unclear. If ##P_i## is the parameter of the i-th bernoulli random varible ##X_i## then the notation ##P(p|x_i)## seems to indicate a conditional probability that ##P_i = p_i## given that ##X_i = x_i##. However no prior distribution has been stated for the ##P_i##, so this conditional probability cannot be calculated.

WWGD said:
if ##m_1,m_2,..,m_n## are MLEs for ##M_1,M_2,...,M_n## respectfully and f is a Random Variable of the ##M_i##, then the MLE for f is given by ##f(m_1,m_2,...,m_n)##.

In your example, you didn't say what the function ##f## is.

Assuming ##f## is a real valued function, to compute a maximum liklihood estimate of ##f##, we must know the family ##\mathbb{F}_\Lambda## of joint distributions for random variables ##X_i## that is parameterized by a single parameter ##\Lambda##. As I understand the question, it assumes we know the family ##\mathbb{F}_\overrightarrow{M}## of joint distributions for the ##X_i## that is parameterized by the ##n## parameters ##M_1,M_2,...M_n##.

For an arbitrary function ##f(M_1,M_2,...M_n)## it isn't clear (to me) that ##\mathbb{F}_\Lambda## is uniquely determined just from knowing ##\mathbb{F}_\overrightarrow{M}## and also knowing that ##\lambda= f(m_1,m_2,...,m_n)##
 
  • #6
For example, consider the case of two discrete random variables ##X_1,X_2## and the family of joint distributions ##\mathbb{F}_\overrightarrow{M}## given by

##g (X_1,X_2,M_1,M2) = g_a(X_1,M_1) g_b(X_2,M_2)## with
##g_a( 0,-1) = 0.3##
##g_a( 1,-1) = 0.7##
##g_a( 0, 1) = 0.2##
##g_a( 1, 1) = 0.8##
##g_b( 0,-1) = 0.2##
##g_b( 1,-1) = 0.8##
##g_b( 0, 1) = 0.3##
##g_b( 1, 1) = 0.7##
and ##f(M_1 + M_2) = (M_1)^2 +(M_2)^2 = \Lambda##

If the family ##\mathbb{F}_\Lambda## is given by the function ##h(X_1,X_2,\Lambda)##, what is the value of ##h(1,1,2)##?
 
  • #7
Stephen Tashi said:
For example, consider the case of two discrete random variables ##X_1,X_2## and the family of joint distributions ##\mathbb{F}_\overrightarrow{M}## given by

##g (X_1,X_2,M_1,M2) = g_a(X_1,M_1) g_b(X_2,M_2)## with
##g_a( 0,-1) = 0.3##
##g_a( 1,-1) = 0.7##
##g_a( 0, 1) = 0.2##
##g_a( 1, 1) = 0.8##
##g_b( 0,-1) = 0.2##
##g_b( 1,-1) = 0.8##
##g_b( 0, 1) = 0.3##
##g_b( 1, 1) = 0.7##
and ##f(M_1 + M_2) = (M_1)^2 +(M_2)^2 = \Lambda##

If the family ##\mathbb{F}_\Lambda## is given by the function ##h(X_1,X_2,\Lambda)##, what is the value of ##h(1,1,2)##?
But in ML we are taking random samples so that for any i,j, ##X_i,X_j## are independent.
 
  • #8
WWGD said:
But in ML we are taking random samples so that for any i,j, ##X_i,X_j## are independent.

In the example, ##X_1## and ##X_2## are independent.
 
  • #9
Stephen Tashi said:
This notation is unclear. If ##P_i## is the parameter of the i-th bernoulli random varible ##X_i## then the notation ##P(p|x_i)## seems to indicate a conditional probability that ##P_i = p_i## given that ##X_i = x_i##. However no prior distribution has been stated for the ##P_i##, so this conditional probability cannot be calculated.
In your example, you didn't say what the function ##f## is.

Assuming ##f## is a real valued function, to compute a maximum liklihood estimate of ##f##, we must know the family ##\mathbb{F}_\Lambda## of joint distributions for random variables ##X_i## that is parameterized by a single parameter ##\Lambda##. As I understand the question, it assumes we know the family ##\mathbb{F}_\overrightarrow{M}## of joint distributions for the ##X_i## that is parameterized by the ##n## parameters ##M_1,M_2,...M_n##.

For an arbitrary function ##f(M_1,M_2,...M_n)## it isn't clear (to me) that ##\mathbb{F}_\Lambda## is uniquely determined just from knowing ##\mathbb{F}_\overrightarrow{M}## and also knowing that ##\lambda= f(m_1,m_2,...,m_n)##
No, it is a single population we're drawing from by assumption. We define a random sample ##X_1,..., X_n## as a collection of i.i.d Random Variables. Since they are Independent and Identically distributed they come from a single Bernoulli population with parameter p. By independence, the probability of observing ##X_1,X_2,.., X_n ## from said population is

##P(p|X_1)P(p| X_2)...p(p|X_n)##

This is defined to be the likelihood function associated with the sample ##X_1, X_2,.., X_n## and the value of p that maximizes the likelihood function is called the maximum likelihood estimator.
 
  • #10
WWGD said:
No, it is a single population we're drawing from by assumption. We define a random sample ##X_1,..., X_n## as a collection of i.i.d Random Variables.
Your original post does not make that clear.

Since they are Independent and Identically distributed they come from a single Bernoulli population with parameter p. By independence, the probability of observing ##X_1,X_2,.., X_n ## from said population is

##P(p|X_1)P(p| X_2)...p(p|X_n)##

That is a misleading use of the notation "##P(.|.)## since such notation is used to indicate conditional probabilities.

If the family of probability distributions is ##g(x,p)## then when the parameter is ##p##, the liklihood of observing the sequence ##(x_1,x_2,x_3)## is ##g(x_1,p) g(x_2,p) g(x_3,p)##.

This is defined to be the likelihood function associated with the sample and the value of p that maximizes the likelihood function is called the maximum likelihood estimator.
Yes. So, with reference to your original post, what does it mean when you ask "the MLE for ##f## is given by ##f(m_1,m_2,...m_n)## Is this correct?" ?

For the the concept of an MLE for ##f## to make sense, we need a family of probability distributions that is parameterized by values of ##f##. If we have such distributions then we ask what value of ##f## maximizes the probability of the observed data.

For an arbitrary function ##f##, the given distribution ##g(x,p)## may not be sufficient to determine a family of distributions that is parameterized by the value of ##f##.

Perhaps you intend the notation ##f(m_1,m_2,...m_n)## to mean that the ##X_i## are identically distributed and that the common distribution belongs to a single parameter family of distributions of the form ##g(x,p)## And ##m_i## is the maximum likihood estimate for ##p## based only on the ##i##_th the outcome of ##X_i##.

If that's what your notation means, then how does it make sense to speak of an "MLE for f"? We aren't given a family of distributions for the ##X_i## that is parameterized by a single parameter that is interpreted as a value of ##f##?

For example, in the case of Bernoulli random variables, suppose ##f(m_1,m_2,m_3) = (m_1)^2 + (m_2)^2 + 6 m_3## Then ##f## can take on values in ##[0,6]##. What family of distributions defines the joint distribution of ##(X_1,X_2,X_3)## give the value of ##f##?
 
  • #11
For what it's worth, my guess is the actual in context question looks something like

X and Y are gaussians with mean x and y, and stdev 1. I have data ##x_1,...,x_n## for ##X##, ##y_1,...,y_n## for ##Y##. This gives me a MLE for x and y. Does the assumption of the distribution of X/Y as a ratio of gaussians, and the data ##x_1/y_1,...,x_n/y_n## give me a MLE of x/y which is equal to the ratio of x and y that I got before?

Did this look like what you actually care about
 
  • Like
Likes WWGD
  • #12
Office_Shredder said:
For what it's worth, my guess is the actual in context question looks something like

X and Y are gaussians with mean x and y, and stdev 1. I have data ##x_1,...,x_n## for ##X##, ##y_1,...,y_n## for ##Y##. This gives me a MLE for x and y. Does the assumption of the distribution of X/Y as a ratio of gaussians, and the data ##x_1/y_1,...,x_n/y_n## give me a MLE of x/y which is equal to the ratio of x and y that I got before?

Did this look like what you actually care about
Yes, that's it, but for any function of X, Y. E.g: if x,y are Mles for X, Y, is ##x^2+y^2## an Mle for ##X^2+Y^2##?
 
  • #13
Office_Shredder said:
For what it's worth, my guess is the actual in context question looks something like

X and Y are gaussians with mean x and y, and stdev 1. I have data ##x_1,...,x_n## for ##X##, ##y_1,...,y_n## for ##Y##. This gives me a MLE for x and y. Does the assumption of the distribution of X/Y as a ratio of gaussians, and the data ##x_1/y_1,...,x_n/y_n## give me a MLE of x/y which is equal to the ratio of x and y that I got before?

Did this look like what you actually care about

And, in that case, the we are not making an MLE estimate of the function f(X,Y) = X/Y. We are making an MLE estimate of the parameters x,y with respect to the data and the family of two-parameter distributions for the random variable f(X,Y).

In the above situation, shall we assume we have data for the ##x_i,y_i## values individually? Or do we only know the the value of ratios? [Edit: A post appearing while I was composing this, says we know the individual ##x_i,y_i## values]
 
  • #14
WWGD said:
Yes, that's it, but for any function of X, Y. E.g: if x,y are Mles for X, Y, is ##x^2+y^2## an Mle for ##X^2+Y^2##?

Are we assuming ##X## and ##Y## are independent random variables?
 
  • #15
Stephen Tashi said:
Are we assuming ##X## and ##Y## are independent random variables?
Not necessarily. Observations themselves are a random sample; independent, i.d., but the variables X,Y not necessarily so.
 
  • #16
Ok, I think I found the result I was looking for:

20201023_165717.jpg
 
  • Like
Likes FactChecker
  • #17
Thanks all for your contribution. Sorry if I was unclear. My friend loaned me his book.

As an example, if V^ is the maximum likelihood estimator for the population variance, then its square root is the MLE for the population standard deviation.
 
Last edited:
  • #18
WWGD said:
Not necessarily. Observations themselves are a random sample; independent, i.d., but the variables X,Y not necessarily so.

Ok. We still need to state the question precisely. For example, suppose I have the observations ##(x_i, y_i)## and want to estimate the parameters ##M_1,M_2## of the two parameter joint distribution of the random tuple ##(X_i,Y_i)## The maximum liklihood esimate for ##M_1,M_2## is, by definition, the values ##m_1,m_2## that maximize the liklihood of the data using the joint distribution that has those parameters.

The maximum liklihood estimation of ##M_1,M_2## using the joint distribution is a different process that making a maximum liklihood estimate of ##M_1## using the marginal distribution of ##X## and a maximum liklihood estimate of ##M_2## using the marginal distribution of ##Y##. And it may not be possible to get unique maximum liklihood estimates that way if ##X## and ##Y## are not independent.

How are we going to introduce a function ##f(X,Y)## into this scenario? The function ##f(X,Y)## is a random variable that has a two parameter distribution also determined by the parameters ##M_1,M_2##. So we can define the maximum liklihood estimate of ##M_1,M_2## given data of the form ##f(x_i,y_i)##. But if we are also given the individual values ##(x_i,y_i)## then the maxiumum liklihood estimate of ##M_1,M_2## based on both forms of the data is (also by defnition) a different process that making a maximum liklihood esimate of ##M_1,M_2## based only on the data ##f_i = f(x_i,y_i) ##

Furthermore, the distribution of ##f## is not necessarily a single parameter distribution. To claim that the maximum liklihood estimate for the parameter of ##f## is ##f(m_1,m_2)## assumes that we are dealing with a 1 parameter distribution.

To make your question precise, you should clarify whether the joint distribution of ##(X_i,Y_i)## is a assumed to be a single parameter distribution. If it is a single parameter distribution then we will assume the distribution of ##f(X,Y)## is also a single parameter distribution. The next point to clarify is whether the maximum liklihood estimate that involves ##f(X,Y)## is made using only the data ##f_i = f(x_i,y_i)## or whether the estimate also uses the ##(x_i,y_i)## data.
 
  • #19
WWGD said:
Ok, I think I found the result I was looking for

That clears up the question.

For example, suppose ##g(X,Y,M_1,M_2)## is a family of joint distributions for ##(X_i,Y_i)## and ##h( X,Y,h_1(M_1), h_2(M_2))## is a different way of expressing the same family of distributions.

Then if ##(m_1,m_2)## is the maximum liklihood estimate for for ##M_1,M_2## based on some data then ##(h_1(m_1), h_2(m_2)) ## is the maxmium liklihood estimate for ##( h_1(M_1), h_2(M_2))## based on the same data.

The language used in that page:
the maximum liklihood estimate of any function ##h(\theta_1, \theta_2,..\theta_k)##
only makes sense if ##h## is a parameter of some distribution - otherwise what it means mean to make a maximum liklihood estimate of ##h## is undefined.

And the claim (The Invariance Property) relies on the fact that the family of distributions defined as functions of the ##\theta_i## is the same family of distributions as the family expressed using functions of the ##\theta_i## ( -or perhaps a more general relation between two families of distributions is sufficient. The two might describe different random variables but we can interpret one set of data as applying to both and each distribution in one family mutually implies a distribution in the other. It's complicated to state such conditions precisely!)
 
Last edited:
  • #20
Update:
Apologies to all for mixing things up. I was mixing terms in my head and did some edits in my first post in case you're interested. For the tl; dr, we are dealing with random variables as estimators for population parameters; specifically, given a random sample {##X_1,...,X_n##}we are looking for values or general formulas that maximize the likelihood of observing the given sample.
 

1. What is distributivity of maximum likelihood estimators?

Distributivity of maximum likelihood estimators refers to the property of the maximum likelihood estimator (MLE) that allows it to be applied to multiple independent samples. This means that the MLE for the combined sample is equal to the combination of the MLEs for each individual sample.

2. How does distributivity affect the accuracy of maximum likelihood estimators?

The distributivity property of maximum likelihood estimators does not affect their accuracy. The MLE is still the most efficient estimator, regardless of whether it is applied to a single sample or multiple independent samples.

3. Can distributivity be applied to all types of maximum likelihood estimators?

Yes, distributivity can be applied to all types of maximum likelihood estimators, including those used for discrete and continuous distributions. However, it is important to note that the distributivity property may not hold for non-independent samples or for estimators that are based on different likelihood functions.

4. What is inheritance of maximum likelihood estimators?

Inheritance of maximum likelihood estimators refers to the property that the MLE for a subset of the data is also the MLE for the entire dataset. This means that if a subset of the data is selected, the MLE calculated using only that subset will be the same as the MLE calculated using the entire dataset.

5. How does inheritance affect the computational complexity of maximum likelihood estimators?

Inheritance does not affect the computational complexity of maximum likelihood estimators. The computational complexity is determined by the specific likelihood function used and the size of the dataset, not by the inheritance property.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
848
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
926
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Back
Top