Why is the Maximum Likelihood Function a product?

In summary: Consider the case where I have 10 quarters and I flip them and got only one head. The MLE simply states that my maximized estimator for p is 1/10. Similarly, if I have 10 quarters and 7 heads, the MLE simply states my maximized estimator is 7/10 for p. The MLE is simply trying to maximize the parameter for a distribution based on some sufficient statistic. It would be unreasonable to use 9/10 if my data showed that I only got 1 heads for my trial. There are practical consequences to dismissing combinations when estimating parameters. Consider the case where I have 1,000,000 quarters and 700,000,000 times
  • #1
pluviosilla
17
0
Why is the Maximum Likelihood function a product?

Explanations of how the Maximum Likelihood function is constructed usually just mention that events are independent and so the probability of several such events is just the product of the separate probabilities. I get the logic w.r.t. independent events. What I don't get is how we can just assume events are independent and (what appears to me in my confusion to be) the disregard of combinations in calculating the probability.

For example, suppose I want to know the probability of 4 heads and 7 tails in 11 coin flips, I may NOT exclude the binomial coefficient in my calculation.

Correct: [itex]P(4, 7) = \binom{N}{k}\cdot p^kq^{N-k} = \binom{11}{4}\cdot p^4q^{7} [/itex]

Incorrect: [itex]P(4, 7) = p^4q^{7}[/itex] <<< gives probability of a rigid sequence of 4 heads 7 tails

The reasoning, I take it, is that we take the product of the the respective probabilities of each bit of data in the sample statistic, then we use calculus to determine which population parameters maximize this probability, ignoring combinations because they just constitute a constant coefficient that does not effect the calculation of the maximum.

QUESTION: How can we simply assume events are independent?
QUESTION: Am I right to assume we ignore combinations because they only add a constant coefficient which does not affect the calculation of the maximum?
QUESTION: Are there no practical consequences (problems) arising from dismissing the combinations? With a large sample size we get a probability that is extremely (and artificially) small.

Examples of MLE discussion include the **marvelous** on-line lectures in econometrics by Ben Lambert here and here. This guy gives very accessible, very lucid lectures, but I am still puzzled by the aforementioned points:

Disclaimer: this is NOT homework and I am NOT enrolled in anyone's class.
 
Physics news on Phys.org
  • #2
pluviosilla said:
Disclaimer: this is NOT homework and I am NOT enrolled in anyone's class.

Even so, you might get a better answer by presenting it as homework. I must admit, I don't really understand what is confusing you. What don't you get about the probability for 11 coin flips? If you add up all the probabilities for 0 heads and 11 tails; 1 head and 10 tails; ...; 11 heads and no tails, you get 1, so how can all these probabilities be too low?
 
  • #3
1. You don't have to assume i.i.d for MLE. You simply need to be able to write a reasonable joint distribution that is independent of the sample size. Also, inherently the maximization has some steps that need to be accounted for with regards to dependency.
2. Maximize the parameter has nothing to do with the distribution of the data, at least no in the sense you are talking about. If I have 10 quarters and I flip them and got only one head. The MLE simply states that my maximized estimator for p is 1/10. Similarly, if I have 10 quarters and 7 heads, the MLE simply states my maximized estimator is 7/10 for p. The MLE is simply trying to maximize the parameter for a distribution based on some sufficient statistic. It would be unreasonable to use 9/10 if my data showed that I only got 1 heads for my trial.
3.The combinations are going to exist for every possible point on the likelihood function. The likelihood function is independent of the sample size. If I have 1,000,000 quarters and 700,000,000 times I get heads, my p parameter is still going to be estimated as 7/10. (When I say independent I don't mean that the sample size doesn't factor in the estimate, but rather the sample size doesn't change the likelihood function.)
 
  • Like
Likes pluviosilla
  • #4
pluviosilla said:
For example, suppose I want to know the probability of 4 heads and 7 tails in 11 coin flips, I may NOT exclude the binomial coefficient in my calculation.

What calculation are you doing ?

Correct: [itex]P(4, 7) = \binom{N}{k}\cdot p^kq^{N-k} = \binom{11}{4}\cdot p^4q^{7} [/itex]Incorrect: [itex]P(4, 7) = p^4q^{7}[/itex] <<< gives probability of a rigid sequence of 4 heads 7 tails

The value of p in [0,1] that maximizes the function ##p^4(1-p)^7## is the same value that maximizes the function ##\binom{11}{4}\cdot p^4(1-p)^7## or the function ##98p^4(1-p)^7## or any other positive constant times ##p^4(1-p)^7##.

It is technically correct to include the binomial coefficient in a calculation of the maximum liklihood estimate for p when the given data is "7 heads out of 11". It just happens that including it as a factor doesn't change the answer.There is substance to your general question. The scenario of maximum liklihood estimation is that we pick a family of distributions defined by parameters and estimate the parameters by selecting those that maximize the liklihood of "the data". This raises the question: What exactly is "the data"?

In an actual experiment of coin flips there would be a specific sequence of heads and tails that would be "the data" if it were recorded. However, the person performing the experiment might give an incomplete description of the result by saying "heads came up 7 out of 11 times". An incompetent technician might even report something like "I remember the first two tosses came up heads and the last one did also, but I've forgotten what happened on the others".

To me, it seems possible to have situation where the maximum liklihood estimator for one description of "the data" would give different results that the maximum liklihood estimator for a different description of "the data".

Suppose the outcome of a 11 independent tosses of coin is called a "primo" if a prime number of heads comes up and a "non-primo" if a composite number of heads comes up. If "the data" for the experiment is that "The outcome was a primo", we could try to do a maximum liklihood estimate for p. In that situation, I think we would have to consider the binomial coefficients in our calculations.

In you example, it just happens that an incomplete description of 11 tosses gives the same maximum liklihood estimate for p as a complete description of the sequence of 11 tosses.
 
  • Like
Likes pluviosilla

1. Why is the Maximum Likelihood Function a product?

The Maximum Likelihood Function (MLF) is a statistical method used to estimate the parameters of a probability distribution by maximizing the likelihood of the observed data. This is achieved by taking the product of the individual probabilities of each data point, which is why it is referred to as a "product".

2. What is the purpose of using a product in the MLF?

The product in the MLF allows us to combine the probabilities of each data point and determine the overall likelihood of the observed data. This method takes into account all the data points and their individual probabilities, providing a more accurate estimation of the parameters of the distribution.

3. How does the product in the MLF affect the estimation of parameters?

The product in the MLF allows for a more precise estimate of the parameters because it takes into consideration all the data points and their individual probabilities. It also ensures that the estimated parameters make the observed data more likely, making the MLF a reliable method for parameter estimation.

4. Can the MLF be used for any type of data?

Yes, the MLF can be used for any type of data as long as it follows a known probability distribution and the data points are independent. For example, the MLF can be used for data sets that follow a normal, binomial, or exponential distribution.

5. How is the MLF different from other statistical methods?

The MLF differs from other statistical methods in that it takes into account the probabilities of individual data points, rather than just the mean or sum of the data. This allows for a more accurate estimation of the parameters of the distribution. Additionally, the MLF can be applied to any type of data as long as it follows a known probability distribution, whereas some other methods may have limitations on the type of data they can be applied to.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
954
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
876
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
916
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
658
Back
Top