# Stats question- likelihood function for a ratio

1. Feb 2, 2012

### Zoe-b

1. The problem statement, all variables and given/known data
According to genetic linkage theory, observed frequencies of four phenotypes
resulting from crossing tomato plants are in the ratio 9/16 + a : 3/16 - a : 3/16 - a : 1/16 + a.
In 1931, J.W. MacArthur reported the following frequencies:
Observed
Phenotype frequency
Tall, cut-leaf 926
Tall, potato-leaf 288
Dwarf, cut-leaf 293
Dwarf, potato-leaf 104
Total 1611

Write down the likelihood of a given these observations. Find the maximum likeli-
hood estimate of a, use it to calculate expected frequencies for the four phenotypes
and compare them with the observed frequencies. Does genetic linkage theory look
plausible?
2. Relevant equations
likelihood of a data set (x1...xn) occuring is the product of fX(x), if Xi are independent. Then I could find max likelihood etc as usual

3. The attempt at a solution
Basically I just cannot start this problem at all. I have only ever found likelihood before as a function producing one value, here I seem to want to find the probability that the ratio is b:c:d:e which I have no idea how to do. Also surely I need to know something about the variance of a? At least then I could write down a probability to do with each bit of the ratio?
Sorry if I'm not making much sense/if this is obvious but I have spent a long time attempting this with no luck. Google hasn't helped at all either..

2. Feb 3, 2012

### lanedance

maybe lets simplify and see how we would do it for a two variable case and see if we can build up on that

so lets say you have a discrete probability distribution with one of 2 outcomes B & C, with probabilities P(B)=b and so on with c=1-b

Now lets say you roll the dice n number of times, in this case the probabilty distribution is binomial and the probabilty of getting k B events, given a probabilty B is
$$P(N_B=k|b) = \frac{n!}{k!(n-k)!}(1-b)^{n-k}b^k$$

but this is just the likelihood of the estimator b, so now
$$L(b|N_B=k) = P(N_B=k|b) = \frac{n!}{k!(n-k)!}(1-b)^{n-k}b^k$$

Taking the logarithm
$$ln\{L(b|N_B=k)\} = ln(\frac{n!}{k!(n-k)!})+(n-k)ln(1-b)+k.ln(b)$$

Differentiating to find the MLE gives
$$-\frac{n-k}{1-b} +\frac{k}{b}=0$$

Note that the logarithm and and differentiation mean multiplicative constants fall away as they don't change the form of the likelihood function, but just normalise
$$k(1-b)=(n-k)b$$

Giving
$$b=\frac{k}{n}$$

which is what we would have guessed anyway, but hopefully you can build from there.

Last edited: Feb 3, 2012
3. Feb 3, 2012

### lanedance

the tricky part will is finding that probabilities, and this is only a start, but lets call each outcome B,C,D,E eg. getting B=b, means out of n trials you found b "Tall, cut-leaf" plants

Lets deal just with the B information, the first observation, first we can treat this as a binomial distribution, with n trials and b successes
$$P(B=b|a) = \frac{n!}{b!(n-b)!}(1-\frac{9}{16} - a)^{n-b}(\frac{9}{16} + a)^b$$

Now lets look at adding the C information, however the outcome for C is not independent from the B outcome, so I'm thinking you can probably stack them up using conditional probabilty
$$P(B=b,C=c|a) = P(C=c|B=b,a)P(B=b|a)$$

Note that if B & C were independent $P(C=c|B=b,a)=P(C=c|a)$ and this would reduce to the multiplicative form above, but this is not the case and $P(C=c|B=b,a)\neq P(C=c|a)$ .

We just found $P(B=b|a)$, now if we take B=b can you find $P(C=c|B=b,a)$?

then you can repeat and stack up the last piece of information as
$$P(B=b,C=c,D=d|a) = P(D=d|C=c,B=b,a)P(C=c|B=b,a)P(B=b|a)$$

Last edited: Feb 3, 2012
4. Feb 3, 2012

### I like Serena

Hi Zoe-b!

Can you deduce from your ratio what the chances are on each of your phenotypes (as functions of a)?

Suppose you have a specific set of those plants with in total nB, nC, nD, and nE of each type.
What would the chance be on that specific set?

Note that this chance defines your likelihood as function of a.

5. Feb 3, 2012

### lanedance

I'm not too sure if the variance of a makes sense here, but at the end of the day you will have the likelihood function for a which should help you decide whether the value of a to support the theory is reasonable given the data