# A Improving intuition on applying likelihood ratio test

1. Dec 12, 2016

I am trying to better understand likelihood ratio test and have found a few helpful resources that explicitly solve problems, but was just curious if you have any more to recommend. Links that perhaps work out full problems and also nicely explain the theory. Similar links you have found illuminating for the Wald and Lagrange multiplier tests would also be of much interest!

2. Dec 16, 2016

### chiro

The likelihood ratios are just probabilities with respect to each other. You have one probability for one hypothesis [given the data] and another hypothesis [given the data].

It would be easier to assess the log likelihood and to understand how the logarithm changes as the probability increases.

You should find that as the probability decreases the negative of the log of the likelihood increases meaning that you get a massive chi-squared statistic and this means that it is not going to be likely that the model fits based on the data you have.

Just remember that the probability is a probability of getting a particular parameter estimate given the sample data [you take sample data and you estimate a parameter based on that sample data].

The Chi-Square distribution is a statistical result of the log-likelihood but the intuition behind interpreting the actual value is that higher probabilities correspond to better likelihood of a hypothesis being true [at least more evidence shown that it is] and you are in essence using the two probabilities to compare [relatively] just how one hypothesis is going to be better with respect to another.

If the log-likelihood is confusing then just think about then the different probabilities are greater or less than each other and how one is either close to zero or close to one.

3. Dec 16, 2016

Thank you for the response. I guess my questions largely lie in how one constructs the probability distributions themselves. For example, in the first link, they state that the maximum likelihood estimate of $\mu$ is given by $L(\overline{x}) = \prod_{i=1}^{n} \frac {1}{\sqrt{2\pi \overline{\sigma}}} e^{\frac{(x_i -\overline{x})^2}{2\overline{\sigma}^2}}$. Although why these are valid maximum likelihood estimates is not very clear to me.

4. Dec 17, 2016

### chiro

The MLE is an optimization problem to find when the probability is greatest given the sample data.

The probability distributions are either estimated from the data, constructed from assumptions, or can be statistical distributions that are the study of statistical inference.

5. Dec 20, 2016

### ChrisVer

6. Dec 21, 2016

### chiro

Do you know the Central Limit Theorem? This will be useful to understand a lot of normal distribution statistics.

With MLE you start out with a likelihood function that is either derived or flat out assumed. The derivation is done on first principles of probability modeling [a good example is a binomial distribution for counts of independent events or a Poisson for rates].

You will need to give us more information to assess how the likelihood is derived if it uses a first principles approach or if it's just assumed.

7. Dec 24, 2016

I am aware of the Central Limit Theorem. So it appears you assume a model and continue adjusting/adding parameters such that your model matches observations?

8. Dec 24, 2016

### chiro

For MLE you assume that every sample point has a distribution and use that to optimize the likelihood that gives the parameters [that you are estimating] to maximize it.

It's a lot like maximizing a cost function or some other attribute - here you are optimizing the probability value given a sample with respect to a parameter you are estimating.

I mention the CLT because it says that given enough information you can approximate any estimator by a Normal distribution and all large scale statistics just assume that and use the Normal for statistical inference.

The likelihood is often chosen by thinking about the process itself and deriving a likelihood function based on those attributes. You can just estimate the distribution and update it from the data but it will lack the fundamentals of a first principles approach since you deduce the likelihood from beliefs and ideas which give context to the data as opposed to just taking it and using the data by itself.