Why is likelihood function defined as such?

In summary, the likelihood function is defined as f(\theta|\vec{x}) when we are given the data vector \vec{x}. However, in frequentist statistics, no probability distribution is assumed for the parameters of a distribution, so when we don't know these parameters, we say they have "definite but unknown values". If you want to compute the posterior distribution of the parameters, you must become a Bayesian.
  • #1
CantorSet
44
0
Hi everyone,

This is not a homework question but something I thought of while reading.

In the method of maximum likelihood estimation, they're trying to maximize the likelihood function
[itex] f(\vec{x}| \theta ) [/itex] with respect to [itex]\theta[/itex]. But shouldn't the likelihood function be defined as [itex] f(\theta| \vec{x} ) [/itex] since we are GIVEN the data vector [itex] \vec{x} [/itex] while [itex] \theta [/itex] is the unknown parameter?
 
Physics news on Phys.org
  • #2
CantorSet said:
But shouldn't the likelihood function be defined as [itex] f(\theta| \vec{x} ) [/itex] since we are GIVEN the data vector [itex] \vec{x} [/itex] while [itex] \theta [/itex] is the unknown parameter?

Philosophically, I think you're right. But in frequentist statistics ( the usual kind taught in introductory statistics courses) no probability distribution is ever assumed for the parameters of a distribution. When we don't know these parameters, we say they have "definite but unknown values". If you don't have a "prior" probability distribution for the parameters, you can't compute the "posterior" distribution [itex] f(\theta|\vec{x})[/itex].

My personal view of frequentist statistics is this: The commonsense person's outlook is "I have certain ideas. Given the data, what is the probability that my ideas are true?" Frequentist statistics answers "I assumed certain ideas and calculated the probability of the data. Based on that probability, I will stipluate some decisions." This evades the question! Things are done backwards in comparison to what people naturally want to know.

The authoritative terminology used ("statistical significance" "rejection" of a hypothesis, "confidence" intervals) makes many laymen think that they are getting information about the probability that some idea is true given the data. But if you look under the hood at what's happening, what your are getting is a quantification of the probability of the data based on the assumption that certain ideas are true.

I'm not trying to say that frequentist statistics isn't effective. But how to apply it requires empirical trial and error. People observe that certain methods work in certain types of situations.

If you want to compute [itex] f(\theta|\vec{x})[/itex] you must become a Bayesian. That's my favorite kind of statistics.
 
  • #3
CantorSet said:
Hi everyone,

This is not a homework question but something I thought of while reading.

In the method of maximum likelihood estimation, they're trying to maximize the likelihood function
[itex] f(\vec{x}| \theta ) [/itex] with respect to [itex]\theta[/itex]. But shouldn't the likelihood function be defined as [itex] f(\theta| \vec{x} ) [/itex] since we are GIVEN the data vector [itex] \vec{x} [/itex] while [itex] \theta [/itex] is the unknown parameter?

Generally that would be written: [itex] L(\vec{x}|\theta) = f(\theta|\vec{x}) [/itex].

The expression [itex] f(\vec{x}| \theta)[/itex] is just a conditional probability when f = P.
 
Last edited:
  • #4
Thanks for the responses, guys.
 
  • #5


The likelihood function is defined as such because it represents the probability of obtaining the observed data given a specific set of parameters. In other words, it is a function that measures how likely it is that the data we have collected would have been generated by a particular set of parameters. This is why it is commonly denoted as f(\vec{x}| \theta ), where \vec{x} represents the data and \theta represents the parameters.

The reason for maximizing the likelihood function with respect to \theta is because we want to find the set of parameters that would make our observed data most likely. This is known as the maximum likelihood estimate, and it allows us to make inferences about the underlying population from which the data was collected.

In summary, the likelihood function is defined as such because it represents the probability of obtaining our observed data given a specific set of parameters, and maximizing it allows us to find the most likely set of parameters that would have generated our data.
 

1. What is the purpose of a likelihood function?

The likelihood function is used to determine how likely a set of observed data is to occur under a given statistical model. It helps us to assess the validity of a model by comparing the observed data to the expected data.

2. How is the likelihood function defined?

The likelihood function is defined as the probability of observing the given data, assuming a specific set of parameter values in the statistical model. It is typically denoted as L(θ | x), where θ represents the parameters and x represents the data.

3. Why is the likelihood function defined in terms of parameters?

The likelihood function is defined in terms of parameters because it allows us to explore the range of possible values for the parameters and determine which values are most likely to produce the observed data. This helps us to estimate the most plausible values for the parameters.

4. How is the likelihood function used in statistical inference?

The likelihood function is used in statistical inference to make inferences about the population based on a sample of data. It is used to calculate the likelihood of different parameter values and determine which values are most supported by the observed data. This helps us to make conclusions about the population based on the sample.

5. How does the likelihood function relate to maximum likelihood estimation?

The likelihood function is essential in maximum likelihood estimation, as it is used to find the maximum likelihood estimate of the parameters. This is done by finding the values of the parameters that maximize the likelihood function, resulting in the most likely set of parameter values that would produce the observed data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
864
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
981
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
Back
Top