Derivation of bayesian inference in Hidden Markov model

zyh · Jul 1, 2008

hi, everyone, I have a problem when I learn the Bayesian tracking in an Hidden Markov model. Firstly ,the Hidden Markov model is represented here:
http://en.wikipedia.org/wiki/Recursive_Bayesian_estimation

secondly, the problem I encounted is in the follow formula ONE

[tex]$P(X_{i}|y_{0},\ldots,y_{i-1})=\int P(X_{i},X_{i-1}|y_{0},\ldots,y_{i-1})dX_{i-1}$
[/tex]

and then get formula TWO
[tex]$=\int P(X_{i}|X_{i-1},y_{0,\ldots}y_{i-1})P(X_{i-1}|y_{0},\ldots y_{i-1})dX_{i-1}$[/tex]

then get the formula Three
[tex]$=\int P(X_{i}|X_{i-1})P(X_{i-1}|y_{0},\ldots y_{i-1})dX_{i-1}$[/tex]

How can I get the formula Two to three. why
[tex]$P(X_{i}|X_{i-1},y_{0,\ldots}y_{i-1})=P(X_{i}|X_{i-1})$[/tex]
Thank you very much for reading my post, any suggestion is appreciated.

quadraphonics · Jul 2, 2008

Well, it's hard to tell exactly what's going on with your notation, but it looks to me like what you want is the probability of the current state of a Markov chain given all previous states (I don't see anything "hidden" in what you've written). In that case, the way you go from step 2 to step 3 is by application of the Markov assumption. I.e., part of the definition of a Markov chain is that, given the previous state, the current state is conditionally independent of all other previous states. Which is to say [itex]P(x_n|x_{n-1}, \ldots x_0) = P(x_n|a_{n-1}) [/itex]. So, you get from step 2 to step 3 by simply substituting this into the expression. Again, this equation is part of the definition of a Markov chain.

zyh · Jul 2, 2008

Thank you quadraphonics for your help.
I have learned this top about "Hidden markov model" from the <<computer vision, -a modern approach>>.
here are the two "assumptions"

only the immediate past matters:

This formula is the same as you said:

[tex]$P(X_{i}|X_{1},\ldots X_{i-1})=P(X_{i}|X_{i-1})$[/tex]
measurements depend only on the current state

[tex]$P(Y_{i},Y_{j},\ldots Y_{k}|X_{i})=P(Y_{i}|X_{i})P(Y_{j},\ldots,Y_{k}|X_{i})$[/tex]

Why could I get from step 2 to step 3 by simply substituting "assumption one" into the expression. they are different expressions.

can
[tex]$P(X_{i}|X_{i-1},y_{0,\ldots}y_{i-1})=P(X_{i}|X_{i-1})$[/tex] be proofed from the "assumption one"?

quadraphonics · Jul 7, 2008

Okay, so the [itex]y_i[/itex]'s are observations, and the [itex]X_i[/itex]'s are (hidden) states, right? To demonstrate the result you're after, I'd re-write [itex]P(X_i | X_{i-1}, y_i, \ldots, y_{i-1})[/itex] using Bayes' Theorem, and then apply the assumptions to simplify the result until you get down to [itex]P(X_i|X_{i-1})[/itex].

zyh · Jul 7, 2008

quadraphonics said:

Okay, so the [itex]y_i[/itex]'s are observations, and the [itex]X_i[/itex]'s are (hidden) states, right? To demonstrate the result you're after, I'd re-write [itex]P(X_i | X_{i-1}, y_i, \ldots, y_{i-1})[/itex] using Bayes' Theorem, and then apply the assumptions to simplify the result until you get down to [itex]P(X_i|X_{i-1})[/itex].

Thanks.
Yes, you are right, [tex]y_i[/tex] is the observation from hidden state [tex]X_i[/tex],and I want to simplify the result to get down to [tex]P(X_i)|X_{i-1})[/tex].But I can't get the proof.

quadraphonics · Jul 7, 2008

Well, why don't you show us what you have so far on the proof, and where you get stuck?

zyh · Jul 8, 2008

hi, quadraphonics.
I want to prove the equation according to the "two assumptions" before.
[tex]$P(X_{i}|X_{i-1},y_{0,\ldots}y_{i-1})=P(X_{i}|X_{i-1})$[/tex]

I got stuck to prove it, till now, I don't have any ideas.

quadraphonics · Jul 8, 2008

Well, why don't you try my suggestion of rewriting the left-hand-side using Bayes theorem, and then seeing if you can apply the assumptions to get what you want?

D H · Jul 8, 2008

In a discrete time Markov process, the state X_n at time t_n depends only on the state at the previous time step, X_n-1. In a hidden Markov process, the state X_n directly determines the observables Y_n at that time but the observables do not affect the state. There is no feedback. The observables are not a part of the state proper. That is why it is called a hidden Markov model.

zyh · Jul 8, 2008

quadraphonics said:

Well, why don't you try my suggestion of rewriting the left-hand-side using Bayes theorem, and then seeing if you can apply the assumptions to get what you want?

thanks, frankly, I try to use Bayse theorem several times to prove the equation, but I failed.

[tex]P(X_{i}|X_{i-1},y_{0,\ldots}y_{i-1})=\frac{P(X_{i},X_{i-1},y_{0,}\ldots,y_{i-1})}{P(X_{i},y_{0},\ldots,y_{0})}=?[/tex]

or
[tex]P(X_{i}|X_{i-1},y_{0,\ldots}y_{i-1})=\frac{P(X_{i-1},y_{0},...,y_{i-1}|X_{i})P(X_{i})}{P(X_{i},y_{0},\ldots,y_{0})}=?[/tex]

zyh · Jul 8, 2008

D H said:

In a discrete time Markov process, the state X_n at time t_n depends only on the state at the previous time step, X_n-1. In a hidden Markov process, the state X_n directly determines the observables Y_n at that time but the observables do not affect the state. There is no feedback. The observables are not a part of the state proper. That is why it is called a hidden Markov model.

thanks DH, do you mean that "the observables do not affect the state"
so, it's obviously that [tex]$P(X_{i}|X_{i-1},y_{0,\ldots}y_{i-1})=P(X_{i}|X_{i-1})$[/tex]. Because Xi is only depend on X_i-1 and y0,...y_i-1 can't affect Xi.

Is my thinking right?

D H · Jul 8, 2008

Yes. That's why I put "the observables do not affect the state" in italics. That is what it means for the state to be hidden.

zyh · Jul 8, 2008

OK, Thank you everyone, my problem is totally solved!

quadraphonics · Jul 9, 2008

Err, not so fast people. The point here was to *prove* that the observables don't affect the state (i.e., the current state is conditionally independent of the previous observables, given the previous state). We know this should be the case, but having D H assure us that it is doesn't add up to a proof. Let's also note that the observables *do* affect the probability of being in a particular state if you *don't* know the previous state, so the situation is not quite as simple as it might appear.

As far as the derivation goes, you need to be a little bit tricky with the application of Bayes' Theorem. Specifically, you want all of the terms to still be conditioned on [itex]X_{i-1}[/itex]. I.e.:

[tex]
P(X_i|X_{i-1}, y_0, \ldots, y_{i-1}) = \frac{P(y_0, \ldots, y_{i-1}|X_{i-1}, X_i)P(X_i|X_{i-1})}{P(y_0, \ldots, y_{i-1}|X_{i-1})}
[/tex]

zyh · Jul 9, 2008

thanks for your hint.

I find the tricky called "conditional Bayes' Theorem" http://en.wikipedia.org/wiki/Bayes'_theorem#Extensions_of_Bayes.27_theorem

[tex]P(A|B\cap C)=\frac{{P(A\cap B\cap C)}}{{P(B\cap C)}}=\frac{{P(B|A\cap C)\, P(A|C)\, P(C)}}{{P(C)\, P(B|C)}}=\frac{{P(B|A\cap C)\, P(A|C)}}{{P(B|C)}}\,.[/tex]

so, the yi is only depend on xi, and the equation proved that:
[tex]P(y_{0},\ldots,y_{i-1}|X_{i-1},X_{i})=P(y_{0},\ldots,y_{i-1}|X_{i-1})[/tex]

I think it's a full proof now. Thank you very much, I really appreciate your help and I have gain a lot knowledge.

zyh · Oct 14, 2008

HI,quadraphonics. I still get confused that why this formula below proved?

[tex]P(y_{0},\ldots,y_{i-1}|X_{i-1},X_{i})=P(y_{0},\ldots,y_{i-1}|X_{i-1})[/tex]

It still need to be proved. But I can't find its proof. Can some one help me? thanks

quadraphonics · Oct 15, 2008

This is a consequence of the two assumptions that go into an HMM. First is the Markov assumption on the hidden variable [itex]X_i[/itex]:

[tex]
P(X_i | X_1 , \ldots, X_{i-1}, Y_1, \ldots, Y_{i-1}) = P(X_i | X_{i-1})
[/tex]

I.e., the current state is conditionally independent of all previous states and observables, given the previous state. You can also flip this around and show that the current state is independent of all future states and observations, given the next state. The next assumption is a similar one on the observable process [itex]Y_i[/itex]:

[tex]
P(Y_i | X_1 , \ldots, X_{N}, Y_1, \ldots, Y_{i-1}, Y_{i+1}, \ldots, Y_{N}) = P(Y_i | X_i)
[/tex]

where [itex]N[/itex] is the number of observations. That is, the current observable is conditionally independent of everything else, given the current state. To prove the relation in question, first consider a simple case with just two observations:

[tex]
P(Y_1, Y_2 | X_2, X_3) = P(Y_2|X_2, X_3, Y_1)P(Y_1|X_2, X_3)
[/tex]

We want to show that [itex] X_3 [/itex] can be eliminated from the LHS. The first term on the RHS is independent of both [itex]X_3 [/itex] and [itex] Y_1 [/itex] according to the second HMM assumption. However, let's go ahead and leave [itex] Y_1 [/itex] in place, since we don't want to eliminate it. To eliminate [itex]X_3 [/itex] from the second term, introduce [itex]X_1 [/itex] and then integrate it back out:

[tex]
P(Y_1 | X_2, X_3) = \sum_{x_1} P(Y_1 | x_1, X_2, X_3) P(x_1 | X_2, X_3)
[/tex]

Now, the first term on the RHS is independent of [itex] X_2[/itex] and [itex]X_3[/itex], again owing to the second HMM assumption. But, again, let's leave [itex] X_2[/itex] in anyhow. The second RHS term is independent of [itex]X_3[/itex], according to the first HMM assumption. If you put that all back together, you should see how to prove the relation.

quadraphonics · Oct 15, 2008

double post

zyh · Oct 28, 2008

Thank you quadraphonics.
Because I haven't receive the notification email, I'm sorry for the late response. I will carefully read your post.

zyh · Oct 28, 2008

Yes, I have read your derivation. In which [tex]X_3[/tex] will be eliminated through decomposition to a integration equation. That's great!
[tex]P(Y_1, Y_2 | X_2, X_3) =P(Y_1, Y_2 | X_2) [/tex]

Note that you give another two assumption of HMM, which is a bit different notation with mine in "third post". My original question in my first post can be easily proved by your assumption one. But I do think these two types of assumptions are equivalent.

Thanks for helping!

Derivation of bayesian inference in Hidden Markov model

1. What is a Hidden Markov Model (HMM)?

2. What is the Bayesian inference approach in HMMs?

3. How is the Hidden Markov Model represented mathematically?

4. What is the derivation of the Bayesian inference in HMMs?

5. How is the Viterbi algorithm used in HMMs?

Similar threads

Hot Threads

Recent Insights