# Hidden Markov Model

1. Mar 8, 2015

### bowlbase

1. The problem statement, all variables and given/known data
Consider an HMM with two possible states, “R” and “G” (for “regulatory” and “gene” sequences respectively). Each state emits one character, chosen from the alphabet {A,C,G,T}.

The transition probabilities of this HMM are:
aRG = aGR = 1/4
aRR = aGG = 3/4
The emission probabilities are:
eR (A)= eR (C)= eR (G)= eR (T)=1/4
eG (A)= eG (T)=2/10 and
eG (C)= eG (G)=3/10

Assume that the initial state of the HMM is “R” or “G” with equal probabilities. Given a sequence S = ACGT and an HMM path π = RGGR, calculate the probability Pr(S, π) of the sequence and the path.

2. Relevant equations
$$P(S,\pi) = \prod_{i=1} a_{\pi_{i-1}},\pi_i e_{\pi_i}(x_i)$$

3. The attempt at a solution

We discussed this equation in class but never actually used it or spent time describing how to wield it. I just don't see how the information I'm given comes together in that equation.

Thanks for any help.

2. Mar 9, 2015

### bowlbase

I think that I have found a way to do this, but I want to make sure it is correct.

I believe that the equation is telling me to multiply the two probabilities (prob.to be ACGT within R or G and prob. that R or G changed) and then multiply all of those together. I'm just not sure about order..

So, I have S=ACGT and pi = RGGR

ACGT
RGGR

The first probability is 1/2 for either R and G. Then A, being in R, has prob 1/4. And then there is a just a 1/4 probability that R->G for the next character.

I'm not sure if 1/2 should be included from the beginning or not because the of how the last character works. But I have

(1/4*1/4)(3/10*3/4)(3/10*1/4)(1/4)

So, I've multiplied each emission probability with the probability of R/G swapping. Then multiplied all of those together. The last probability I did not multiply against anything (other than the series) because there is no following change. Though, I wonder if I should multiply by 1/2 since the first choice was probability 1/2 between R and G.

Is this the correct method?

Thanks.

3. Mar 10, 2015

### haruspex

It isn't entirely clear, but judging from the equation you are given it looks like the RGGR path starts at the second state (1), and the first emitted character is just after entry to that state. (Consider the i=1 term in the product.). If so, you need to consider (and sum) the sequences RRGGR and GRGGR.