Why is likelihood function defined as such?

CantorSet · Aug 29, 2011

Hi everyone,

This is not a homework question but something I thought of while reading.

In the method of maximum likelihood estimation, they're trying to maximize the likelihood function
f(\vec{x}| \theta ) with respect to \theta. But shouldn't the likelihood function be defined as f(\theta| \vec{x} ) since we are GIVEN the data vector \vec{x} while \theta is the unknown parameter?

Stephen Tashi · Aug 29, 2011

CantorSet said:

But shouldn't the likelihood function be defined as f(\theta| \vec{x} ) since we are GIVEN the data vector \vec{x} while \theta is the unknown parameter?

Philosophically, I think you're right. But in frequentist statistics ( the usual kind taught in introductory statistics courses) no probability distribution is ever assumed for the parameters of a distribution. When we don't know these parameters, we say they have "definite but unknown values". If you don't have a "prior" probability distribution for the parameters, you can't compute the "posterior" distribution f(\theta|\vec{x}).

My personal view of frequentist statistics is this: The commonsense person's outlook is "I have certain ideas. Given the data, what is the probability that my ideas are true?" Frequentist statistics answers "I assumed certain ideas and calculated the probability of the data. Based on that probability, I will stipluate some decisions." This evades the question! Things are done backwards in comparison to what people naturally want to know.

The authoritative terminology used ("statistical significance" "rejection" of a hypothesis, "confidence" intervals) makes many laymen think that they are getting information about the probability that some idea is true given the data. But if you look under the hood at what's happening, what your are getting is a quantification of the probability of the data based on the assumption that certain ideas are true.

I'm not trying to say that frequentist statistics isn't effective. But how to apply it requires empirical trial and error. People observe that certain methods work in certain types of situations.

If you want to compute f(\theta|\vec{x}) you must become a Bayesian. That's my favorite kind of statistics.

SW VandeCarr · Aug 30, 2011

CantorSet said:

Hi everyone,

This is not a homework question but something I thought of while reading.

In the method of maximum likelihood estimation, they're trying to maximize the likelihood function
f(\vec{x}| \theta ) with respect to \theta. But shouldn't the likelihood function be defined as f(\theta| \vec{x} ) since we are GIVEN the data vector \vec{x} while \theta is the unknown parameter?

Generally that would be written: L(\vec{x}|\theta) = f(\theta|\vec{x}).

The expression f(\vec{x}| \theta) is just a conditional probability when f = P.

CantorSet · Aug 31, 2011

Thanks for the responses, guys.

Why is likelihood function defined as such?

Thread 'Value of intuitionistic logic'

Thread 'Formal derivation of statement from Peano Arithmetic system'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Help me understand skewness in QQ-plots please

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

A Distribution of Range of Samples taken from N(0,1)

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers