Why is likelihood function defined as such?

CantorSet
Messages
44
Reaction score
0
Hi everyone,

This is not a homework question but something I thought of while reading.

In the method of maximum likelihood estimation, they're trying to maximize the likelihood function
f(\vec{x}| \theta ) with respect to \theta. But shouldn't the likelihood function be defined as f(\theta| \vec{x} ) since we are GIVEN the data vector \vec{x} while \theta is the unknown parameter?
 
Physics news on Phys.org
CantorSet said:
But shouldn't the likelihood function be defined as f(\theta| \vec{x} ) since we are GIVEN the data vector \vec{x} while \theta is the unknown parameter?

Philosophically, I think you're right. But in frequentist statistics ( the usual kind taught in introductory statistics courses) no probability distribution is ever assumed for the parameters of a distribution. When we don't know these parameters, we say they have "definite but unknown values". If you don't have a "prior" probability distribution for the parameters, you can't compute the "posterior" distribution f(\theta|\vec{x}).

My personal view of frequentist statistics is this: The commonsense person's outlook is "I have certain ideas. Given the data, what is the probability that my ideas are true?" Frequentist statistics answers "I assumed certain ideas and calculated the probability of the data. Based on that probability, I will stipluate some decisions." This evades the question! Things are done backwards in comparison to what people naturally want to know.

The authoritative terminology used ("statistical significance" "rejection" of a hypothesis, "confidence" intervals) makes many laymen think that they are getting information about the probability that some idea is true given the data. But if you look under the hood at what's happening, what your are getting is a quantification of the probability of the data based on the assumption that certain ideas are true.

I'm not trying to say that frequentist statistics isn't effective. But how to apply it requires empirical trial and error. People observe that certain methods work in certain types of situations.

If you want to compute f(\theta|\vec{x}) you must become a Bayesian. That's my favorite kind of statistics.
 
CantorSet said:
Hi everyone,

This is not a homework question but something I thought of while reading.

In the method of maximum likelihood estimation, they're trying to maximize the likelihood function
f(\vec{x}| \theta ) with respect to \theta. But shouldn't the likelihood function be defined as f(\theta| \vec{x} ) since we are GIVEN the data vector \vec{x} while \theta is the unknown parameter?

Generally that would be written: L(\vec{x}|\theta) = f(\theta|\vec{x}).

The expression f(\vec{x}| \theta) is just a conditional probability when f = P.
 
Last edited:
Thanks for the responses, guys.
 
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top