
#1
Jan313, 04:10 AM

P: 21

Hi,
I try to teach myself Hidden Markov Models. I am using this text www.cs.sjsu.edu/~stamp/RUA/HMM.pdf as Material. The Introduction with the example was reasonable but now I have trouble in unterstanding some of the derivation. I can follow the math and use the formulars to get results but I also want to understand the meaning behind it. One question that arise at hidden markov models is to determine the likelihood of the oberserved sequence O with the given Model as written below: b = probability of the observation a = transiting from one state to another [itex]\pi[/itex] = starting probability O = observation X = state sequence [itex]\lambda[/itex] = hidden markov model ( Section 4, Page 6 in the Text ) [itex]P(O  X, \lambda) = b_{x0}(O_o)* b_ {x1}(O_o) *\cdots* b_{xT1}(O_{T1}) [/itex] [itex]P(X\lambda) = \pi_{x_0}a_{x_0,x_1}a_{x_1,x_2}*\cdots* a_{x_{T1},x_{T2}}[/itex] [itex]P(O, X\lambda) = \frac{P(O \cap X \cap \lambda)}{P( \lambda)}[/itex] [itex]P(O  X, \lambda) *P(X\lambda) = \frac{P(O \cap X \cap \lambda)}{P(X \cap \lambda)} \frac{P(X \cap \lambda)}{P(\lambda)}=\frac{P(O \cap X \cap \lambda)}{P(\lambda)}[/itex] [itex]P(O, X\lambda) = P(O  X, \lambda) * P(X\lambda) [/itex] [itex]P(O  \lambda) = \sum\limits_{X} P(O,X\lambda)[/itex] [itex]P(O  \lambda) = \sum\limits_{X} P(O  X, \lambda) * P(X\lambda)[/itex] My question is: Why do I get the likelihood of the oberserved sequence by summing up over all all possible state sequences. Can someone please explain it differently? 



#2
Jan313, 11:59 AM

Sci Advisor
P: 3,175

[itex] P(A) = \sum_{i=1}^N P(A \cap s_i) = \sum_{i=1}^N p(A  s_i) p(s_i) [/itex] The inconvenient thing about reading applied probability is understanding what spaces of outcomes are being discussed. Authors often don't make this clear. Usually several different spaces of outcomes are involved. Conditional probabilites in a given article refer to events in a different probability space than unconditional probabilities. In the above example, the probablity [itex] p(A  s_i) [/itex] is the probability of a set of outcomes, and this set contains the same outcomes ("points") as the set [itex] A \cap s_i [/itex] but we are considering this set of outcomes to be in a different probability space. The "whole space" for [itex] P(A  s_i) [/itex] is the set [itex] s_i [/itex] instead of [itex] S [/itex]. My guess about the article you linked is that you should think of the whole space of outcomes as all possible sequences of information that tell both the sequence of states of the Markov process and the sequence of observations. So a "point" in the whole space is a sequence (i.e. a vector) with both kinds of information. When you draw a circle A that represents a particular sequence of observations, it contains more than one point, because it contains all the vectors that agree with the sequence of observations and have otherwise arbitrary sequences of states of the Markov process. 



#3
Jan1013, 09:15 AM

P: 21

That’s how I thought about it as well. I tried to think about it like
in total probability. When you look at the picture above you can see the events [itex]S_n[/itex] which compose the universal set. Like you said one can define a smaller set [itex]A[/itex] which comprises outcomes in some or all of the events [itex]S_1[/itex] through [itex]S_n[/itex] If the events [itex]S_i[/itex] are mutually exclusive and dont have outcomes in common then [itex]S_i \cap A[/itex] have none in common you get: [itex]Pr(S_i \cap A) + Pr(S_2 \cap A) + \cdots + Pr(S_n \cap A) = Pr(A)[/itex] I tried to connect this with the formula for the likelihood. I hoped that this would explain me why adding up over all x would result in [itex]P(O\lambda)[/itex] But as you mentioned above one single event represents here a sequence rather than a single point. In the example with the total probability all events compose the universal set which result in 1 . Since [itex]Pr(S_1) + Pr(S_2) + \cdots + Pr(S_n) = 1[/itex]. So in return it makes sense that Pr(S) get eliminated. But I can't associate this with the formula [itex]P(Oλ)=∑XP(OX,λ)∗P(Xλ)[/itex] since there are multiple dependencies 



#4
Jan1013, 12:18 PM

Sci Advisor
P: 3,175

Hidden Markov Models  Likelihood of ...As I see the space, a point (an "outcome") in the space looks like [itex] (o_1,o_2,o_3,...,o_T, x_1,x_2,...,x_T) [/itex] where the [itex] o_i [/itex] give a sequence of observations and the [itex] x_i [/itex] give a sequence of states. Asking for the probabiliy of an event [itex] O [/itex] asks for the probability of the set of outcomes which have the form [itex] (o_1,o_2,...,o_T, *,*,...)[/itex] where the [itex] o_i [/itex] are the particular observations that define the event [itex] O [/itex] and the '*' are any sequence of states. So it seems natural to me that you sum over all possible sequences of states. 



#5
Jan2113, 02:12 PM

P: 21




Register to reply 
Related Discussions  
Statistical models and likelihood functions  Calculus & Beyond Homework  3  
Using Hidden Markov Model  Set Theory, Logic, Probability, Statistics  0  
derivation of bayesian inference in Hidden Markov model  Set Theory, Logic, Probability, Statistics  19  
Hidden Markov Modeling and Background needed for Dynamical Systems  Set Theory, Logic, Probability, Statistics  3  
Hidden Markov Model  Set Theory, Logic, Probability, Statistics  0 