Why is likelihood function defined as such?

CantorSet · Aug 29, 2011

Hi everyone,

This is not a homework question but something I thought of while reading.

In the method of maximum likelihood estimation, they're trying to maximize the likelihood function
[itex]f(\vec{x}| \theta )[/itex] with respect to [itex]\theta[/itex]. But shouldn't the likelihood function be defined as [itex]f(\theta| \vec{x} )[/itex] since we are GIVEN the data vector [itex]\vec{x}[/itex] while [itex]\theta[/itex] is the unknown parameter?

Stephen Tashi · Aug 29, 2011

CantorSet said:

But shouldn't the likelihood function be defined as [itex]f(\theta| \vec{x} )[/itex] since we are GIVEN the data vector [itex]\vec{x}[/itex] while [itex]\theta[/itex] is the unknown parameter?

Philosophically, I think you're right. But in frequentist statistics ( the usual kind taught in introductory statistics courses) no probability distribution is ever assumed for the parameters of a distribution. When we don't know these parameters, we say they have "definite but unknown values". If you don't have a "prior" probability distribution for the parameters, you can't compute the "posterior" distribution [itex]f(\theta|\vec{x})[/itex].

My personal view of frequentist statistics is this: The commonsense person's outlook is "I have certain ideas. Given the data, what is the probability that my ideas are true?" Frequentist statistics answers "I assumed certain ideas and calculated the probability of the data. Based on that probability, I will stipluate some decisions." This evades the question! Things are done backwards in comparison to what people naturally want to know.

The authoritative terminology used ("statistical significance" "rejection" of a hypothesis, "confidence" intervals) makes many laymen think that they are getting information about the probability that some idea is true given the data. But if you look under the hood at what's happening, what your are getting is a quantification of the probability of the data based on the assumption that certain ideas are true.

I'm not trying to say that frequentist statistics isn't effective. But how to apply it requires empirical trial and error. People observe that certain methods work in certain types of situations.

If you want to compute [itex]f(\theta|\vec{x})[/itex] you must become a Bayesian. That's my favorite kind of statistics.

SW VandeCarr · Aug 30, 2011

CantorSet said:

Hi everyone,

This is not a homework question but something I thought of while reading.

In the method of maximum likelihood estimation, they're trying to maximize the likelihood function
[itex]f(\vec{x}| \theta )[/itex] with respect to [itex]\theta[/itex]. But shouldn't the likelihood function be defined as [itex]f(\theta| \vec{x} )[/itex] since we are GIVEN the data vector [itex]\vec{x}[/itex] while [itex]\theta[/itex] is the unknown parameter?

Generally that would be written: [itex]L(\vec{x}|\theta) = f(\theta|\vec{x})[/itex].

The expression [itex]f(\vec{x}| \theta)[/itex] is just a conditional probability when f = P.

CantorSet · Aug 31, 2011

Thanks for the responses, guys.

Why is likelihood function defined as such?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect