Undergrad Understanding Maximum Likelihood Estimation: Unpacking the Basics

Click For Summary
A likelihood function assesses the plausibility of parameters based on observed random variables, but confusion arises when considering the origin of these variables, as they must also come from a probability distribution, leading to a perceived circular logic. However, this circularity can be avoided when the random variables are derived from physical processes rather than solely from probability distributions. The likelihood function does not provide the probability of parameters given the data; instead, it calculates the likelihood of the data for specific parameter values. Maximum likelihood estimation is just one method for estimating parameters, and while it can be effective, it is not universally optimal and lacks definitive proof of correctness. Ultimately, caution is advised, as there is no absolute logic guaranteeing the accuracy of the estimated parameters.
FallenApple
Messages
564
Reaction score
61
I'm getting a bit lost on some of the basics. So a Likelihood function determines the plausibility of parameters given the observed random variables. This is fine and all, but something seems a bit off. The observed random variables themselves must be generated from a probability distribution as well. So the logic becomes circular. Is there something I'm not seeing?
 
Physics news on Phys.org
FallenApple said:
I'm getting a bit lost on some of the basics. So a Likelihood function determines the plausibility of parameters given the observed random variables. This is fine and all, but something seems a bit off. The observed random variables themselves must be generated from a probability distribution as well. So the logic becomes circular. Is there something I'm not seeing?
Random variables can be generated from probability distributions, but also from physical processes. If you first generate values of random variables from a probability distribution and then find likelihoods of distribution parameters based on those values, then yes, you have created a circular process. It is not circular, though, if you measure some physical process and use likelihood functions to help construct a mathematical model of the process.
 
  • Like
Likes FallenApple
tnich said:
Random variables can be generated from probability distributions, but also from physical processes. If you first generate values of random variables from a probability distribution and then find likelihoods of distribution parameters based on those values, then yes, you have created a circular process. It is not circular, though, if you measure some physical process and use likelihood functions to help construct a mathematical model of the process.
Thanks, that really cleared up all of the confusion.
 
FallenApple said:
So the logic becomes circular. Is there something I'm not seeing?

It isn't clear what line of reasoning you're thinking about when you say "the logic".

So a Likelihood function determines the plausibility of parameters given the observed random variables.
What is your definition of "plausibility"? The likihood function does not determine the "probability" of the parameters given the observed random variables - if that's what you're thinking. It also does not determine the "liklihood" of the parameters. It's better to think of the liklhood function as giving the liklihood of the data for given values of the parmeters - as opposed to the liklihood of the parameters for given values of the data.

If we are considering a family of probability distributions and each member of the family is specified by giving specific values to some parameters then the liklihood function gives the "liklihood" of the data as a function of the parameters and the data. The phrase "liklihood of the data" is used instead of "probability of the data" because it is incorrect to say that evaluating a probability density function produces a probability. Evaluating a probability density function, in the case of a continuous distribution, gives a "probability density". not a "probability". For example, the probability density of a random variable U that is uniformly distributed on [0,1] is the constant function f(x) = 1. The fact that f(1/3) = 1 does imply that the probability that the value 1/3 occurs is 1. "Liklihood of" is a way to say "probability density of".

One procedure for estimating parameters from given values of data is to use the values of the parameters that maximize the value of the liklihood function. It should be emphasized that (like many things in statistics - e.g. hypothesis testing) this is a procedure - i.e. merely one procedure out of several possible procedures, not a technique that can be proven to be the unique optimal way to do things. If your remark about "the logic becomes circular" indicates skeptism about a proof that maximum liklihood estimation is optimal, your skeptism is correct. However, if you are studying a respectable textbook, I doubt the textbook says that the Maximuj Liklihood estimation procedure is an optimal way to estimate parameters in all cases. There can be theorems along those lines ,but they deal with specific cases - and they have to define the specific function we are trying to optimize.
 
  • Like
Likes FallenApple
Good question. The short answer is that there is no circular logic because there is no hard "logic" at all that applies.

The maximum likelihood estimator allows you to determine the model parameter that makes the given data most likely. There is no formal "logic" that will say that the parameter is correct. So you are wise to be cautious. If you also obtain a confidence interval for the parameter, you can see how unusual it would be to get that data if the true parameter was, in fact, outside of that interval. Even then, you would have no hard logic to say whether it is in or out of the confidence interval -- only hypothetical probabilities.
 
  • Like
Likes FallenApple
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K