Hidden Markov Models - Likelihood of

In summary, the author is trying to teach himself Hidden Markov Models but is having trouble understanding the derivation. He can follow the math and use the formulas to get results, but he also wants to understand the meaning behind it. One question that arises at Hidden Markov Models is to determine the likelihood of the observed sequence. The example given in the article was reasonable, but now the author has trouble understanding some of the derivation.
  • #1
SamTaylor
20
0
Hi,

I try to teach myself Hidden Markov Models. I am using this text
"www.cs.sjsu.edu/~stamp/RUA/HMM.pdf" [Broken] as Material. The Introduction with
the example was reasonable but now I have trouble in unterstanding
some of the derivation.

I can follow the math and use the formulas to get results but
I also want to understand the meaning behind it.

One question that arise at hidden markov models is to determine
the likelihood of the oberserved sequence O with the given Model
as written below:

b = probability of the observation
a = transiting from one state to another
[itex]\pi[/itex] = starting probability
O = observation
X = state sequence
[itex]\lambda[/itex] = hidden markov model

( Section 4, Page 6 in the Text )

[itex]P(O | X, \lambda) = b_{x0}(o_O)* b_ {x1}(o_O) *\cdots* b_{xT-1}(O_{T-1}) [/itex]
[itex]P(X|\lambda) = \pi_{x_0}a_{x_0,x_1}a_{x_1,x_2}*\cdots* a_{x_{T-1},x_{T-2}}[/itex]
[itex]P(O, X|\lambda) = \frac{P(O \cap X \cap \lambda)}{P( \lambda)}[/itex]
[itex]P(O | X, \lambda) *P(X|\lambda) = \frac{P(O \cap X \cap \lambda)}{P(X \cap \lambda)} \frac{P(X \cap \lambda)}{P(\lambda)}=\frac{P(O \cap X \cap \lambda)}{P(\lambda)}[/itex]
[itex]P(O, X|\lambda) = P(O | X, \lambda) * P(X|\lambda) [/itex]
[itex]P(O | \lambda) = \sum\limits_{X} P(O,X|\lambda)[/itex]
[itex]P(O | \lambda) = \sum\limits_{X} P(O | X, \lambda) * P(X|\lambda)[/itex]

My question is: Why do I get the likelihood of the oberserved sequence
by summing up over all all possible state sequences. Can someone please
explain it differently?
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
SamTaylor said:
My question is: Why do I get the likelihood of the oberserved sequence
by summing up over all all possible state sequences. Can someone please
explain it differently?

You could start by thinking of a simpler situation. Think of the Venn diagram approach to probability spaces. You have the whole space of outcomes ("points') S, which is a big circle. Inside S, there is an event ( as set of points) defined by a smaller circle A. Create another set of events that partitions S into the subsets s1,s2,...,sN. ( You can imagine drawing an irregular grid on S and letting s1,s2,.. be the "squares" of the grid.)

[itex] P(A) = \sum_{i=1}^N P(A \cap s_i) = \sum_{i=1}^N p(A | s_i) p(s_i) [/itex]

The inconvenient thing about reading applied probability is understanding what spaces of outcomes are being discussed. Authors often don't make this clear. Usually several different spaces of outcomes are involved. Conditional probabilites in a given article refer to events in a different probability space than unconditional probabilities. In the above example, the probablity [itex] p(A | s_i) [/itex] is the probability of a set of outcomes, and this set contains the same outcomes ("points") as the set [itex] A \cap s_i [/itex] but we are considering this set of outcomes to be in a different probability space. The "whole space" for [itex] P(A | s_i) [/itex] is the set [itex] s_i [/itex] instead of [itex] S [/itex].

My guess about the article you linked is that you should think of the whole space of outcomes as all possible sequences of information that tell both the sequence of states of the Markov process and the sequence of observations. So a "point" in the whole space is a sequence (i.e. a vector) with both kinds of information. When you draw a circle A that represents a particular sequence of observations, it contains more than one point, because it contains all the vectors that agree with the sequence of observations and have otherwise arbitrary sequences of states of the Markov process.
 
  • #3
That’s how I thought about it as well. I tried to think about it like
in total probability. When you look at the picture above you can
see the events [itex]S_n[/itex] which compose the universal set.
Like you said one can define a smaller set [itex]A[/itex] which comprises
outcomes in some or all of the events [itex]S_1[/itex] through [itex]S_n[/itex]
If the events [itex]S_i[/itex] are mutually exclusive and don't have outcomes
in common then [itex]S_i \cap A[/itex] have none in common you get:
[itex]Pr(S_i \cap A) + Pr(S_2 \cap A) + \cdots + Pr(S_n \cap A) = Pr(A)[/itex]

I tried to connect this with the formula for the likelihood. I hoped that
this would explain me why adding up over all x would result in [itex]P(O|\lambda)[/itex]
But as you mentioned above one single event represents here a sequence
rather than a single point. In the example with the total probability all events
compose the universal set which result in 1 .

Since [itex]Pr(S_1) + Pr(S_2) + \cdots + Pr(S_n) = 1[/itex].

So in return it makes sense that Pr(S) get eliminated.

But I can't associate this with the formula [itex]P(O|λ)=∑XP(O|X,λ)∗P(X|λ)[/itex]
since there are multiple dependencies
 

Attachments

  • venn_diagram.jpg
    venn_diagram.jpg
    15.6 KB · Views: 429
  • #4
SamTaylor said:
But I can't associate this with the formula [itex]P(O|λ)=∑XP(O|X,λ)∗P(X|λ)[/itex]
since there are multiple dependencies

I don't understand what you mean by multiple dependencies. Since all the terms in expressiong have the "[itex] | \lambda[/itex]" in them we can say that we are in the probability space defined by the subset [itex] \lambda [/itex] and , with that understanding, leave out the [itex] | \lambda [/itex] from the formula.

As I see the space, a point (an "outcome") in the space looks like [itex] (o_1,o_2,o_3,...,o_T, x_1,x_2,...,x_T) [/itex] where the [itex] o_i [/itex] give a sequence of observations and the [itex] x_i [/itex] give a sequence of states.

Asking for the probabiliy of an event [itex] O [/itex] asks for the probability of the set of outcomes which have the form [itex] (o_1,o_2,...,o_T, *,*,...)[/itex] where the [itex] o_i [/itex] are the particular observations that define the event [itex] O [/itex] and the '*' are any sequence of states. So it seems natural to me that you sum over all possible sequences of states.
 
Last edited:
  • #5
Asking for the probabiliy of an event [itex] O [/itex] asks for the probability of the set of outcomes which have the form [itex] (o_1,o_2,...,o_T, *,*,...)[/itex] where the [itex] o_i [/itex] are the particular observations that define the event [itex] O [/itex] and the '*' are any sequence of states. So it seems natural to me that you sum over all possible sequences of states.

Ok, that way its pretty clear to me. I don't know why it didnt come to my mind that way right away. However the [itex] \lambda [/itex] is still a bit confusion to me in the derivation. Looking at the quote above one could have written down the formula right away without the derivation.
 

1. What is a Hidden Markov Model (HMM)?

A Hidden Markov Model (HMM) is a statistical model that is used to model sequential data, where the underlying process is assumed to be a Markov process with unobserved (hidden) states. It is commonly used in speech recognition, natural language processing, and bioinformatics.

2. How does a Hidden Markov Model work?

A Hidden Markov Model works by assuming that the underlying process can be modeled as a Markov process with a finite set of hidden states. The model then estimates the probability of transitioning between these states and the probability of observing certain outcomes based on the current state. This allows for the prediction of future states and outcomes, even when the current state is not directly observable.

3. What is the likelihood of a Hidden Markov Model?

The likelihood of a Hidden Markov Model refers to the probability of observing a specific sequence of outcomes, given a set of parameters and assumptions about the hidden states and transitions. It is typically calculated using the forward-backward algorithm, which takes into account all possible paths of hidden states that could lead to the observed outcomes.

4. How is the likelihood of a Hidden Markov Model used?

The likelihood of a Hidden Markov Model is commonly used in parameter estimation, model selection, and prediction. By comparing the likelihoods of different models, one can determine which model best fits the observed data. The likelihood can also be used to predict future outcomes or to estimate the probability of certain events occurring.

5. What are some applications of Hidden Markov Models?

Hidden Markov Models have a wide range of applications in various fields, including speech recognition, natural language processing, bioinformatics, finance, and many others. They are often used to analyze and predict sequential data, such as audio signals, text, and genetic sequences, and have been shown to perform well in these tasks.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Replies
1
Views
2K
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
823
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • General Engineering
Replies
1
Views
728
  • Topology and Analysis
Replies
2
Views
1K
  • Sticky
  • Topology and Analysis
Replies
9
Views
5K
  • Introductory Physics Homework Help
Replies
4
Views
666
Back
Top