Basic markov processes questions

mxbob468 · Jan 18, 2014

I'm taking my first Markov processes class and I'm a little confused. This is in relation to a HW problem that I've been wrestling with for the past couple of hours. The task is to find all of the stationary distributions of a Markov process (finite state space).

I know that if my process were irreducible then it would have a unique distribution but it's not; there are 3 communication classes. First question: am I correct in guessing that there should be 3 distinct stationary distributions? One for each class?

I know that I can blunt-force-trauma compute the stationary distributions by raising the transition matrix to some large power and inspecting. I know this is a bad idea in general and this only works because it's a 5x5 matrix. I know in the case wherein the stationary distribution would be unique the alternative is to find all the eigenvectors corresponding to eigenvalue 1. Second question: is this also true for my case, wherein the distributions aren't unique?

I went ahead and computed these eigenvectors and for the longest time was pulling my hair out because they didn't match the rows in the asymptotic transition matrix. Then I remembered degenerate eigenvalues => degenerate eigenspace => linear combinations of eigenvectors (long time since I've had a linear algebra class). So now I can match the eigenvectors computed by MATLAB to the rows in the asymptotic transition matrix. Third question (maybe related to first): if I could not compute the asymptotic transition matrix how would I know whether I've found all the stationary distributions of the process? How do I even know I've found the right ones? Ie those that match the rows in the asymp transition matrix.

Thanks for any help anyone can give me

Ray Vickson · Jan 18, 2014

mxbob468 said:

I'm taking my first Markov processes class and I'm a little confused. This is in relation to a HW problem that I've been wrestling with for the past couple of hours. The task is to find all of the stationary distributions of a Markov process (finite state space).

I know that if my process were irreducible then it would have a unique distribution but it's not; there are 3 communication classes. First question: am I correct in guessing that there should be 3 distinct stationary distributions? One for each class?

I know that I can blunt-force-trauma compute the stationary distributions by raising the transition matrix to some large power and inspecting. I know this is a bad idea in general and this only works because it's a 5x5 matrix. I know in the case wherein the stationary distribution would be unique the alternative is to find all the eigenvectors corresponding to eigenvalue 1. Second question: is this also true for my case, wherein the distributions aren't unique?

I went ahead and computed these eigenvectors and for the longest time was pulling my hair out because they didn't match the rows in the asymptotic transition matrix. Then I remembered degenerate eigenvalues => degenerate eigenspace => linear combinations of eigenvectors (long time since I've had a linear algebra class). So now I can match the eigenvectors computed by MATLAB to the rows in the asymptotic transition matrix. Third question (maybe related to first): if I could not compute the asymptotic transition matrix how would I know whether I've found all the stationary distributions of the process? How do I even know I've found the right ones? Ie those that match the rows in the asymp transition matrix.

Thanks for any help anyone can give me

What do you mean by a "communication class"? Is that what the rest of the world would call a class of intercommunicating states (= just plain "class")? I will assume so.

Now whether or not you have a single steady-state distribution, or several, depends on more details than you have supplied. For example, you could have two transient classes (that do not communicate with each other) which both feed into a single persistent class. In this case there is a unique steady-state distribution with zero probability on all the transient states and positive probabilities on the persistent states; these latter can be obtained by treating the persistent class as a little Markov chain all on its own (which it would be, anyway, if we start from one of the persistent states at time 0). Or, you could have a single transient class feeding two disjoint persistent classes, and in that case there are several "steady states", depending on the starting state at t=0. If we start from any state in one of the persistent classes, the steady-state would be the same (that is, would not depend on where we start---as long as it is in the class). If we start from a transient state the resulting steady-state distribution would be a mixture of the two persistent-state distributions, with mixing coefficients equal to the probabilities of going from the initial state to either of the two persistent classes.

When I used to teach this stuff I would rarely, if ever, refer my students to eigenvalue methods, because one can set up the necessary equations as simple linear systems---once one has first clarified the structure of the chain by looking at the transition diagram. For example, if f(i,j) is the passage probability from transient state i to persistent state j, we have f(i,j) = f(i,j') for all j' in the same (persistent) class as j---so we might as well speak of f(i,S) = probability of eventually reaching persistent class S from transient state i. Then the steady-state probability of state j in S, starting from transient state i is p(i,j) = f(i,S)* p(j), where p(j) = steady-state probability of state j (found as if S were a little Markov chain all on its own). You can easily develop linear equations for the probabilities f(i,S), i in T (the transient class), so simple high-school equation solving is enough.

All of what I said above assumes a finite number of states; things can become a lot trickier in countable-state chains.

mxbob468 · Jan 19, 2014

Ray Vickson said:

What do you mean by a "communication class"? Is that what the rest of the world would call a class of intercommunicating states (= just plain "class")? I will assume so.

Well I didn't make it up on my own? I got the term from here.

Ray Vickson said:

Or, you could have a single transient class feeding two disjoint persistent classes, and in that case there are several "steady states", depending on the starting state at t=0.

This is in fact exactly what I have.

Ray Vickson said:

If we start from any state in one of the persistent classes, the steady-state would be the same (that is, would not depend on where we start---as long as it is in the class).If we start from a transient state the resulting steady-state distribution would be a mixture of the two persistent-state distributions, with mixing coefficients equal to the probabilities of going from the initial state to either of the two persistent classes.

This part I don't understand. My understanding is that stationary/steady-state distributions are joint distributions ie vectors whose entries sum to 1. Are you saying that if I start in some state wp1 then over time my joint distribution will evolve into some, possibility different, joint distribution and that's what you're calling the steady-state? That makes sense of course but I don't see how it helps me find the joint distribution (ie row vector by which I left multiply my transition matrix) that is invariant under the transition (in this instance where I have 3 different classes).

Ray Vickson said:

For example, if f(i,j) is the passage probability from transient state i to persistent state j, we have f(i,j) = f(i,j') for all j' in the same (persistent) class as j---so we might as well speak of f(i,S) = probability of eventually reaching persistent class S from transient state i.

Why is the bolded part true?

Ray Vickson said:

Then the steady-state probability of state j in S, starting from transient state i is p(i,j) = f(i,S)* p(j), where p(j) = steady-state probability of state j (found as if S were a little Markov chain all on its own).

I'm not sure I understand what you mean by steady-state probability here.

Ray Vickson said:

All of what I said above assumes a finite number of states; things can become a lot trickier in countable-state chains.

I'm sure that's the case.

Ray Vickson · Jan 19, 2014

mxbob468 said:

Well I didn't make it up on my own? I got the term from here.

OK, but I have read numerous papers and books about Markov chains over many decades and have never before seen or heard that term. It appears the author may be making up his/her own nomenclature. Anyway, let's continue.
This is in fact exactly what I have.
This part I don't understand. My understanding is that stationary/steady-state distributions are joint distributions ie vectors whose entries sum to 1. Are you saying that if I start in some state wp1 then over time my joint distribution will evolve into some, possibility different, joint distribution and that's what you're calling the steady-state? That makes sense of course but I don't see how it helps me find the joint distribution (ie row vector by which I left multiply my transition matrix) that is invariant under the transition (in this instance where I have 3 different classes).

Why is the bolded part true?

The quantity f(i,j) = probability that j is visited eventually, starting from i. If j' and j are in the same closed class, then (with probability 1) j is eventually visited if and only if j' is eventually visited, so f(i,j) = f(i,j').I'm not sure I understand what you mean by steady-state probability here.

A steady-state probability distribution ##\pi_i = (\pi_{i,j},j=1,2,\ldots,n)## (for starting state i) is one that satisfies
[tex] \pi_{i,j} = \sum_{k} \pi_{i,k} P_{kj}, [/tex]
which can be written as
[tex] \mathbf{p}_i = \mathbf{p}_i \mathbf{P}[/tex]
where ##\mathbf{p}_i = (\pi_{i,1},\pi_{i,2}, \ldots, \pi_{i,n})## is a row-vector, and ##\mathbf{P}## is the one-step transition matrix. When there is a single closed class (possibly several transient classes) the steady-state vector is the same for all starting states so we can drop the i subscript and just write ##(\pi_1, \pi_2, \ldots, \pi_n)##. Note that a steady-state exists, even for a periodic chain (even though a limiting distribution does not). I will show you an example of this below.
I'm sure that's the case.

The quantities ##f_{i,j} = f(i,j) \equiv \Pr\{ \exists\: t \geq 1: \: X_t = j|X_0= i\}## are the "visit" probabilities to state j; they can be obtained from the first-passage distributions, as follows. Let ##T_{i,j}## be the (possibly defective random variable), which is the first time that state j is reached, starting from state i. (The random variable ##T## is defective if ##\Pr(T < \infty) < 1##.) In other words, let
[tex] T_{i,j} = \cases{ \text{smallest }n \text{ such that }X_n = j ,& \text{ if such }n\text{ exists} \\
\infty & \text{ otherwise} } [/tex]
The first-passage probability distribution from i to j is
[tex] f_{i,j}(n) = \Pr \{ T_{i,j} = n \}, n = 1,2, \ldots [/tex]
and the attainment or visit probability is
[tex] f_{i,j} = \Pr \{ T_{i,j} < \infty \} = \sum_{n=1}^{\infty} f_{i,j}(n).[/tex]
These satisfy the linear equations
[tex] f_{i,j} = p_{i,j} + \sum_{k: k \neq j} p_{i,k} f_{k,j}, \forall \: i \neq j [/tex]
For fixed terminal state j these are linear equations in ##f_{1,j}, f_{2,j} \ldots, f_{j-1,j}, f_{j+1,j}, \ldots, f_{n,j}##.

Note that those linear equations do not have a unique solution. The simple solution ##f_{i,j} = 1 \, \forall i,j## always satisfies the equations, but is not always the solution that respects the probabilistic definitions. To get unique solutions consistent with probabilistic meanings we need to use additional information about the class structure of the MC. For example, if state j is transient and state i is persistent, you cannot get ever get from i to j, so f(i,j) = 0. If i and j are in different closed classes, then f(i,j) = 0. If i and j are in the same closed class then f(i,j) = 1. (These come from theorems about the T_{i,j}, not from the linear equations.) Finally, as I said already, if j and j' are in the same closed class then f(i,j) = f(i,j'); this follows most easily from "j is attained if and only if j' is attained", but it can also be proved from the linear equations.

Let's bring this all together in an example:
[tex] \mathbf{P} = \pmatrix{1/8 & 3/8 & 1/4 & 0 & 1/ 8& 1/8\\
5/16 & 3/16 & 1/16 & 1/8 & 3/16 & 1/8 \\
0 & 0 & 1/4 & 3/4 & 0 & 0\\
0 & 0 & 1/2 & 1/2 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 & 1 & 0} [/tex]
Class T1 ={1,2} is transient, while classes C2 = {3,4} and C3 = {5,6} are persistent (closed). Class C2 is "ergodic", so the 2x2 submatrix on rows/columns 3,4 has a limiting distribution ##(\pi_3,\pi_4) = (2/5,3/5)##. Class C3 is periodic, so its submatrix (on rows/columns 5,6) does not have a limit; however, it has a stationary (long-run) distribution ##(\pi_5,\pi_6) = (1/2,1/2)##. This makes sense, since if we start in class C3 then in the long run we spend 50% of the time in each of states 5 and 6. If P_3 is the submatrix of rows/colums 5,6, we have ##(\pi_5,\pi_6) = (\pi_5, \pi_6) P_3##.

Using a computer algebra program such as Maple we can quite easily get a closed-form expression for the matrix ##\mathbf{P}^n##; some of its elements will be different for even or odd n, because of the periodic states 5 and 6. Anyway, we have
[tex] \mathbf{P}_e \equiv \lim_{n \to \infty} \mathbf{P}^{2n} =
\pmatrix{0 & 0 & 7/38 & 21/76 & 69/247 & 257/988 \\
0 & 0 & 31/190 & 93/380 & 66/247 & 321/988 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 }[/tex]
and
[tex] \mathbf{P}_o \equiv \lim_{n \to \infty} \mathbf{P}^{2n+1} = \pmatrix{0 & 0 & 7/38 & 21/76 & 257/988 & 69/247 \\
0 & 0 & 31/190 & 93/380 & 321/988 & 66/247 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 }[/tex]
The steady-state matrix is
[tex] \mathbf{Q} = \frac{1}{2} \mathbf{P}_e + \frac{1}{2} \mathbf{P}_o
=\pmatrix{0 & 0 & 7/38 & 21/76 & 41/152 & 41/152 \\
0 & 0 & 31/190 & 93/380 & 45/152 & 45/152 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 0 & 0 & 1/2 & 1/2 \\
0 & 0 & 0 & 0 & 1/2 & 1/2 }[/tex]
You can verify directly that ##\mathbf{Q} = \mathbf{Q \, P}##.

So, how on Earth would we get the elements of Q without using a computer algebra package to raise P to high powers? Note first that the elements of Q in rows/columns 3..6 are exactly as stated before: we have solutions of the equations
[tex]q_3 = q_3 P_{3,3} + q_4 P_{4,3}\\ q_4 = q_3 P_{3,4} + q_4 P_{4,4}[/tex]
etc. For the transient states we have
[tex]f_{13} = f_{14} = 35/76, \: f_{23} = f_{24} = 31/76 \\
f_{15} = f_{16} = 41/76, \: f_{25} = f_{26} = 45/76.[/tex]
We can get these from the previous equations. For example,
[tex] f_{13} = (1/4) + (1/8)\,f_{13}+(3/8)\, f_{14}\\
f_{14} = 0 + (1/4) \, f_{34} + (1/8)\,f_{13}+(3/8)\, f_{14}\\
f_{23} = (1/16) + (1/8) \, f_{34} + (5/16)\,f_{23} + (3/16) \, f_{24}\\
f_{24} = (1/8) + (1/16) \, f_{43} + (5/16)\,f_{23} + (3/16) \, f_{24}[/tex]
Since we know ##f_{34} = f_{43} = 1## the ##f_{13}, f_{14}## equations are exactly the same, as are the ##f_{23}, f_{24}## equations. That establishes ##f_{13} = f_{14}## and ##f_{23} = f_{24}## if you do not believe it already. Similarly, ##f_{15} = f_{16}, f_{25} = f_{26}## follows from the equations and using ##f_{56} = f_{65} = 1##.

Finally, for transient starting states we have ##q_{1,1} = q_{1,2} = q_{2,1} = q_{2,2} = 0## because states 1 and 2 are transient, so will eventually be "left" and so will not be occupied for any positive long-run fraction of time. In addition we have
[tex] q_{1,3} = f_{13} q_{3,3}, \;q_{1,4} = f_{14}q_{4,4} [/tex]
etc. You can demonstrate this directly from the numerical results, but it is also simple to see from probabilistic considerations: the expected long-run fraction of time we are in state 3 equals the probability we get to class C2, times the long-run fraction of time we are in state 3, given we are in class C2. When translated, that gives the result stated already.

OK, that's enough for now.

Basic markov processes questions

1. What is a Markov process?

2. What are the basic components of a Markov process?

3. How is a Markov process different from a Markov chain?

4. What are some real-world applications of Markov processes?

5. How is the stability of a Markov process determined?

Similar threads

Hot Threads

Recent Insights