# How Likelihood Functions work

I'm getting a bit lost on some of the basics. So a Likelihood function determines the plausibility of parameters given the observed random variables. This is fine and all, but something seems a bit off. The observed random variables themselves must be generated from a probability distribution as well. So the logic becomes circular. Is there something I'm not seeing?

tnich
Homework Helper
I'm getting a bit lost on some of the basics. So a Likelihood function determines the plausibility of parameters given the observed random variables. This is fine and all, but something seems a bit off. The observed random variables themselves must be generated from a probability distribution as well. So the logic becomes circular. Is there something I'm not seeing?
Random variables can be generated from probability distributions, but also from physical processes. If you first generate values of random variables from a probability distribution and then find likelihoods of distribution parameters based on those values, then yes, you have created a circular process. It is not circular, though, if you measure some physical process and use likelihood functions to help construct a mathematical model of the process.

FallenApple
Random variables can be generated from probability distributions, but also from physical processes. If you first generate values of random variables from a probability distribution and then find likelihoods of distribution parameters based on those values, then yes, you have created a circular process. It is not circular, though, if you measure some physical process and use likelihood functions to help construct a mathematical model of the process.
Thanks, that really cleared up all of the confusion.

Stephen Tashi
So the logic becomes circular. Is there something I'm not seeing?

It isn't clear what line of reasoning you're thinking about when you say "the logic".

So a Likelihood function determines the plausibility of parameters given the observed random variables.
What is your definition of "plausibility"? The likihood function does not determine the "probability" of the parameters given the observed random variables - if that's what you're thinking. It also does not determine the "liklihood" of the parameters. It's better to think of the liklhood function as giving the liklihood of the data for given values of the parmeters - as opposed to the liklihood of the parameters for given values of the data.

If we are considering a family of probability distributions and each member of the family is specified by giving specific values to some parameters then the liklihood function gives the "liklihood" of the data as a function of the parameters and the data. The phrase "liklihood of the data" is used instead of "probability of the data" because it is incorrect to say that evaluating a probability density function produces a probability. Evaluating a probability density function, in the case of a continuous distribution, gives a "probability density". not a "probability". For example, the probability density of a random variable U that is uniformly distributed on [0,1] is the constant function f(x) = 1. The fact that f(1/3) = 1 does imply that the probability that the value 1/3 occurs is 1. "Liklihood of" is a way to say "probability density of".

One procedure for estimating parameters from given values of data is to use the values of the parameters that maximize the value of the liklihood function. It should be emphasized that (like many things in statistics - e.g. hypothesis testing) this is a procedure - i.e. merely one procedure out of several possible procedures, not a technique that can be proven to be the unique optimal way to do things. If your remark about "the logic becomes circular" indicates skeptism about a proof that maximum liklihood estimation is optimal, your skeptism is correct. However, if you are studying a respectable textbook, I doubt the textbook says that the Maximuj Liklihood estimation procedure is an optimal way to estimate parameters in all cases. There can be theorems along those lines ,but they deal with specific cases - and they have to define the specific function we are trying to optimize.

FallenApple
FactChecker