What Does the Likelihood Function Tell Us?

Arman777 · Mar 8, 2021

I am trying to understand the meaning of the Likelihood function or the value of the Likelihood itself. Let us suppose we are tossing coins and we have a sample data given as

$$\hat{x} = \{H,H,T,T,T,T\}$$

For instance in this case the Likelihood Function can be given as

$$L(\theta|\hat{x}) = C(6,2)\theta^2(1-\theta)^4 = 15\theta^2(1-\theta)^4$$

The function looks like this

Now, my question is what does this graph tells us ? We can say that for ##\theta = 0.333##, ##L(\theta|\hat{x}) ## is maximum.

In general it seems to me that the Likelihood is way of obtaining the best parameters that describes distribution the given data. If we change the parameters that describe the distribution, we change the Likelihood.

We know that for fair coin ##\theta_{True} = 0.5##. So my question is does the Likelihood means for this data (##\hat{x}##), the best parameter that describes the fairness of the coin is ##0.3333## (i.e when the Likelihood is maximum) ? We know that if the coin is truly fair, for a large sample we will obtain the true ##\theta## parameter (i.e ##N \rightarrow \infty, \theta \rightarrow \theta_{True}##).

In general, I am trying to understand the meaning of the likelihood. Does the Likelihood is a way to obtain the best ##\theta## parameter that describes the distribution for the given data. But, depending on the given data the ##\theta## may not be the true ##\theta## (i.e., ##\theta \neq \theta_{True}## for the given data) .

(For a Gaussian ##θ = {µ,σ}##, for a Poisson distribution, ##θ = λ## and for a binomial distribution, ##θ = p##, the probability of success in one trial.)

FactChecker · Mar 8, 2021

The likelihood function gives the probability of getting that sample result from a distribution that actually has that parameter value. It should not be said that it is the probability that the parameter is that value because the parameter is not a random variable. The parameter is some fixed value and it may easily be different from the value that maximizes the likelihood function. Of course, it would take some strange luck if the actual parameter value is one that makes the sample result very improbable, compared to other parameter values.

Arman777 · Mar 8, 2021

FactChecker said:

The likelihood function gives the probability of getting that sample result from a distribution that actually has that parameter value. It should not be said that it is the probability that the parameter is that value because the parameter is not a random variable. The parameter is some fixed value and it may easily be different from the value that maximizes the likelihood function. Of course, it would take some strange luck if the actual parameter value is one that makes the sample result very improbable, compared to other parameter values.

So for instance if ##L(\theta = 0.3| \{H,H,T,T,T,T\}) = 0.059535##.

So this means if the parameter of the distribution was ##0.3##, the probability to obtain ## \{H,H,T,T,T,T\}## is ##0.059..## ?

Stephen Tashi · Mar 8, 2021

Arman777 said:

In general it seems to me that the Likelihood is way of obtaining the best parameters that describes distribution the given data.

In a probabilistic situation there is no way of estimating the parameters of a distribution that is guaranteed to work. In most probability models (such as your example) there is not even a method of estimating parameters that is known to have the highest probability of getting the correct answer. So what it means for an estimate to give the "best parameters" does not have a simple definition.

The Theory of Statistical Estimation uses precise definitions to define various concepts of "best" estimators. These concepts include the terms "minimum variance", "unbiased", "maximum liklihood" and "asymptotically efficient". If you study these concepts, you'll begin to appreciate the technicality and sophistication that is required in order to give "best parameters" an unambiguous meaning.

The maximum (or maxima) of the liklihood function are, by definition, the values of parameters that ( according to a model employing a particular family of distributions) make the data most probable. These values are not necessarily the values of parameters that are most probably the true values. If your mathematical model for the problem takes a Bayesian approach, it may be possible to say something about the probability that an estimate is correct. Otherwise, using the maximum likihood method to estimate a parameter is a subjective decision.

The famous advocate of the maximum liklihood method was Ronald Fisher, who gave various philosophical arguments in its favor.

Arman777 · Mar 8, 2021

Stephen Tashi said:

The maximum (or maxima) of the liklihood function are, by definition, the values of parameters that ( according to a model employing a particular family of distributions) make the data most probable. These values are not necessarily the values of parameters that are most probably the true values.

Thats what I said, right ?

stevendaryl · Mar 9, 2021

FactChecker said:

The likelihood function gives the probability of getting that sample result from a distribution that actually has that parameter value.

I looked it up, and you are right. But I have to say that the notation is confusing: ##L(\theta | r)## seems like it should mean the conditional probability of ##\theta## given ##r##, when it actually means the conditional probability of ##r## given ##\theta##.

Arman777 · Mar 9, 2021

stevendaryl said:

I looked it up, and you are right. But I have to say that the notation is confusing: ##L(\theta | r)## seems like it should mean the conditional probability of ##\theta## given ##r##, when it actually means the conditional probability of ##r## given ##\theta##.

I guess in general ##L(\theta|r) = p(r|\theta)##

FactChecker · Mar 9, 2021

The concept of conditional probability and the notation are about the probability of one event, given that another event occurred. It is misguided to consider the value of ##\theta## to be an event, since ##\theta## is not a random variable. So you are suggesting a concept and associated notation that is easily misleading. A better notation than ##p(r|\theta)## would be ##p_{\theta}(r)##

Stephen Tashi · Mar 9, 2021

Arman777 said:

I guess in general ##L(\theta|r) = p(r|\theta)##

No. The notation "##L(\theta | r)##" should not be used to denote the liklihood function.

Yes, you are correct that the description of the lliklihood function ##L(\theta,r)## in common language is that it is "the probability of the data ##r## when the value of the parameter is ##\theta##".

However, in mathematics, there is a technical definition for the "|" notation and the concept of conditional probability. The use of notation like ##p(r|\theta)## assumes there is a probability space where a joint distribution for ##r## and ##\theta## exists. The joint distribution and the marginal its marginal distributions are used to define the conditional distributions ##p(r|\theta)## and ##p(\theta|r)##.

In your example, the model for the problem is that the given data ##r## is an outcome from a distribution with an unknown parameter ##\theta##, not a parameter that will be selected from a probability distribution. So it assumes ##\theta## is an unknown number, not a random variable. Your example does not mention anything about a joint probability distribution for ##(\theta,r)##.

If you find it natural to think of the unknown parameter ##\theta## as having a probabiity distribution, then you have Bayesian tendencies and you could study the Bayesian approach to modeling coin tossing.

stevendaryl · Mar 9, 2021

FactChecker said:

The concept of conditional probability and the notation are about the probability of one event, given that another event occurred. It is misguided to consider the value of ##\theta## to be an event, since ##\theta## is not a random variable. So you are suggesting a concept and associated notation that is easily misleading. A better notation than ##p(r|\theta)## would be ##p_{\theta}(r)##

I tend to read ##P(A|B)## as "the probability of A being true, under the assumption that B is true". That often makes sense in contexts in which B is not a random variable. And also, in Bayesian probability, anything with an unknown value can be treated as a random variable.

Arman777 · Mar 9, 2021

stevendaryl said:

P(A|B) as "the probability of A being true, under the assumption that B is true"

That is what is says in most articles. In general I have understood this. i) There is only one Likelihood Function for a given data/distribution.

Let's suppose we have a data set of coin toss given as

$$\hat{x} = \{T,T,H,T,H,T\}$$

One can try to find the ##\theta## parameter that maximises the probability for the distribution. Which is given as

$$\theta_{ML} = max[L(\theta_{ML} | \hat{x})]$$ii)This ##\theta_{ML}## does not mean its the true parameter for the distribution. Its just the best parameter that describes the distribution for the given ##\hat{x}##

Example:

For this case ##L(\theta = 0.3| \{T,T,H,T,H,T\}) > L(\theta = 0.5| \{T,T,H,T,H,T\})##

We cannot tell the coin is fair or not but we can only tell that for this data this is the best parameter.

FactChecker · Mar 9, 2021

stevendaryl said:

I tend to read ##P(A|B)## as "the probability of A being true, under the assumption that B is true". That often makes sense in contexts in which B is not a random variable. And also, in Bayesian probability, anything with an unknown value can be treated as a random variable.

IMO, that opens up a can of worms regarding the "prior distribution" of B that is not appropriate for the level of this question and is difficult to make rigorous, even at the most advanced level.

Stephen Tashi · Mar 9, 2021

Arman777 said:

ii)This ##\theta_{ML}## does not mean its the true parameter for the distribution. Its just the best parameter that describes the distribution for the given ##\hat{x}##

You correctly understand that the maximum liklihood estimate for a parameter need not be its true value.

It's unclear whether you understand that "best parameter" can have different meanings. If you are just beginning to study statistics, the different meanings of "best" won't be covered in the early chapters.

Stephen Tashi · Mar 9, 2021

stevendaryl said:

I tend to read ##P(A|B)## as "the probability of A being true, under the assumption that B is true".

That's the usual way of translating notation into common language, but it may not be technically correct if the notation for conditional probability is obeyed.

For example, if we are given the function ##g(x,y) = x + 2y ## then is ##g(3 |1)## acceptable notation for "##g(x,y)## evaluated at ##x=3## given that ##y = 1##"? The usual notation for that concept is simply "##g(3,1)##".

stevendaryl · Mar 9, 2021

Stephen Tashi said:

For example, if we are given the function ##g(x,y) = x + 2y ## then is ##g(3 |1)## acceptable notation for "##g(x,y)## evaluated at ##x=3## given that ##y = 1##"?

I don't want to belabor this, but I certainly didn't say that ##g(x|y)## should mean "##g(x,y)## evaluated at ##x=3## given that ##y = 1##". I said that the meaning of ##P(x|y)## should be "the probability of ##x## given that ##y## is true. My suggestion only applies when (1) we're talking about probabilities, and (2) the ##x## and ##y## are to be interpreted as claims that can be true or false.

Arman777 · Mar 10, 2021

Stephen Tashi said:

If you are just beginning to study statistics, the different meanings of "best" won't be covered in the early chapters.

Yeah I am really really beginner. But I understand the idea

Stephen Tashi · Mar 10, 2021

stevendaryl said:

I said that the meaning of ##P(x|y)## should be "the probability of ##x## given that ##y## is true.

I agree. That is the correct way to translate the notation to ordinary speech.

But from a mathematica point of view, that translation of the "P(...)" notation is often ambiguous. Translating "P" as "the probability" assumes we are in a context where "the probability" is unambiguously given by some probability measure function ##g(...)## But if we are in a context where several different probability distributions are involved (or no distribution has been asserted) we often take the liberty of using the "P(...)" notation ambiguously. For example, to write "P( it rains on Monday)" seems to be unambiguous if the event "it rains on Monday" is well defined. But the same event can be a member of different probability spaces and can be assigned different probabilities in those spaces.

With respect to the topic of this thread, the liklihood function is not a probability distribution. So use of conditional probability notation within its arguments is not appropriate. If we wouldn't use "##g(x|y)##" notation when talking about a generic function of two variables, we shouldn't use "##L(x|y)##" type notation for the liklihood function.

There are topics where being picky about notation is unimportant and other topics where being picky promotes understanding. I think the notation for conditional probability is a topic where studying the implications of notation helps students. For example, contrary to most students' first impressions, the function arguments in the notations ##P(A \cap B)## and ##P(A|B)## denote the same set. The difference in the two notations indicates two different probability measures are to be used in assigning probability to that set.

What Does the Likelihood Function Tell Us?

What is the likelihood in scientific research?

How is likelihood calculated?

What is the difference between likelihood and probability?

Why is likelihood important in scientific research?

What are some limitations of using likelihood in scientific research?

Similar threads

Hot Threads

Recent Insights