Basic markov processes questions

  • Thread starter mxbob468
  • Start date
In summary: So basically, if you cannot compute the asymptotic transition matrix then you cannot determine for sure whether you've found all the stationary distributions of the process. However, if you can match the eigenvectors computed by MATLAB to the rows in the asymptotic transition matrix, then you've found all the stationary distributions.
  • #1
mxbob468
49
0
I'm taking my first Markov processes class and I'm a little confused. This is in relation to a HW problem that I've been wrestling with for the past couple of hours. The task is to find all of the stationary distributions of a Markov process (finite state space).

I know that if my process were irreducible then it would have a unique distribution but it's not; there are 3 communication classes. First question: am I correct in guessing that there should be 3 distinct stationary distributions? One for each class?

I know that I can blunt-force-trauma compute the stationary distributions by raising the transition matrix to some large power and inspecting. I know this is a bad idea in general and this only works because it's a 5x5 matrix. I know in the case wherein the stationary distribution would be unique the alternative is to find all the eigenvectors corresponding to eigenvalue 1. Second question: is this also true for my case, wherein the distributions aren't unique?

I went ahead and computed these eigenvectors and for the longest time was pulling my hair out because they didn't match the rows in the asymptotic transition matrix. Then I remembered degenerate eigenvalues => degenerate eigenspace => linear combinations of eigenvectors (long time since I've had a linear algebra class). So now I can match the eigenvectors computed by MATLAB to the rows in the asymptotic transition matrix. Third question (maybe related to first): if I could not compute the asymptotic transition matrix how would I know whether I've found all the stationary distributions of the process? How do I even know I've found the right ones? Ie those that match the rows in the asymp transition matrix.

Thanks for any help anyone can give me
 
Physics news on Phys.org
  • #2
mxbob468 said:
I'm taking my first Markov processes class and I'm a little confused. This is in relation to a HW problem that I've been wrestling with for the past couple of hours. The task is to find all of the stationary distributions of a Markov process (finite state space).

I know that if my process were irreducible then it would have a unique distribution but it's not; there are 3 communication classes. First question: am I correct in guessing that there should be 3 distinct stationary distributions? One for each class?

I know that I can blunt-force-trauma compute the stationary distributions by raising the transition matrix to some large power and inspecting. I know this is a bad idea in general and this only works because it's a 5x5 matrix. I know in the case wherein the stationary distribution would be unique the alternative is to find all the eigenvectors corresponding to eigenvalue 1. Second question: is this also true for my case, wherein the distributions aren't unique?

I went ahead and computed these eigenvectors and for the longest time was pulling my hair out because they didn't match the rows in the asymptotic transition matrix. Then I remembered degenerate eigenvalues => degenerate eigenspace => linear combinations of eigenvectors (long time since I've had a linear algebra class). So now I can match the eigenvectors computed by MATLAB to the rows in the asymptotic transition matrix. Third question (maybe related to first): if I could not compute the asymptotic transition matrix how would I know whether I've found all the stationary distributions of the process? How do I even know I've found the right ones? Ie those that match the rows in the asymp transition matrix.

Thanks for any help anyone can give me

What do you mean by a "communication class"? Is that what the rest of the world would call a class of intercommunicating states (= just plain "class")? I will assume so.

Now whether or not you have a single steady-state distribution, or several, depends on more details than you have supplied. For example, you could have two transient classes (that do not communicate with each other) which both feed into a single persistent class. In this case there is a unique steady-state distribution with zero probability on all the transient states and positive probabilities on the persistent states; these latter can be obtained by treating the persistent class as a little Markov chain all on its own (which it would be, anyway, if we start from one of the persistent states at time 0). Or, you could have a single transient class feeding two disjoint persistent classes, and in that case there are several "steady states", depending on the starting state at t=0. If we start from any state in one of the persistent classes, the steady-state would be the same (that is, would not depend on where we start---as long as it is in the class). If we start from a transient state the resulting steady-state distribution would be a mixture of the two persistent-state distributions, with mixing coefficients equal to the probabilities of going from the initial state to either of the two persistent classes.

When I used to teach this stuff I would rarely, if ever, refer my students to eigenvalue methods, because one can set up the necessary equations as simple linear systems---once one has first clarified the structure of the chain by looking at the transition diagram. For example, if f(i,j) is the passage probability from transient state i to persistent state j, we have f(i,j) = f(i,j') for all j' in the same (persistent) class as j---so we might as well speak of f(i,S) = probability of eventually reaching persistent class S from transient state i. Then the steady-state probability of state j in S, starting from transient state i is p(i,j) = f(i,S)* p(j), where p(j) = steady-state probability of state j (found as if S were a little Markov chain all on its own). You can easily develop linear equations for the probabilities f(i,S), i in T (the transient class), so simple high-school equation solving is enough.

All of what I said above assumes a finite number of states; things can become a lot trickier in countable-state chains.
 
Last edited:
  • #3
Ray Vickson said:
What do you mean by a "communication class"? Is that what the rest of the world would call a class of intercommunicating states (= just plain "class")? I will assume so.

Well I didn't make it up on my own? I got the term from here.

Ray Vickson said:
Or, you could have a single transient class feeding two disjoint persistent classes, and in that case there are several "steady states", depending on the starting state at t=0.

This is in fact exactly what I have.

Ray Vickson said:
If we start from any state in one of the persistent classes, the steady-state would be the same (that is, would not depend on where we start---as long as it is in the class).If we start from a transient state the resulting steady-state distribution would be a mixture of the two persistent-state distributions, with mixing coefficients equal to the probabilities of going from the initial state to either of the two persistent classes.

This part I don't understand. My understanding is that stationary/steady-state distributions are joint distributions ie vectors whose entries sum to 1. Are you saying that if I start in some state wp1 then over time my joint distribution will evolve into some, possibility different, joint distribution and that's what you're calling the steady-state? That makes sense of course but I don't see how it helps me find the joint distribution (ie row vector by which I left multiply my transition matrix) that is invariant under the transition (in this instance where I have 3 different classes).
Ray Vickson said:
For example, if f(i,j) is the passage probability from transient state i to persistent state j, we have f(i,j) = f(i,j') for all j' in the same (persistent) class as j---so we might as well speak of f(i,S) = probability of eventually reaching persistent class S from transient state i.

Why is the bolded part true?
Ray Vickson said:
Then the steady-state probability of state j in S, starting from transient state i is p(i,j) = f(i,S)* p(j), where p(j) = steady-state probability of state j (found as if S were a little Markov chain all on its own).

I'm not sure I understand what you mean by steady-state probability here.

Ray Vickson said:
All of what I said above assumes a finite number of states; things can become a lot trickier in countable-state chains.

I'm sure that's the case.
 
  • #4
mxbob468 said:
Well I didn't make it up on my own? I got the term from here.

OK, but I have read numerous papers and books about Markov chains over many decades and have never before seen or heard that term. It appears the author may be making up his/her own nomenclature. Anyway, let's continue.
This is in fact exactly what I have.
This part I don't understand. My understanding is that stationary/steady-state distributions are joint distributions ie vectors whose entries sum to 1. Are you saying that if I start in some state wp1 then over time my joint distribution will evolve into some, possibility different, joint distribution and that's what you're calling the steady-state? That makes sense of course but I don't see how it helps me find the joint distribution (ie row vector by which I left multiply my transition matrix) that is invariant under the transition (in this instance where I have 3 different classes).

Why is the bolded part true?

The quantity f(i,j) = probability that j is visited eventually, starting from i. If j' and j are in the same closed class, then (with probability 1) j is eventually visited if and only if j' is eventually visited, so f(i,j) = f(i,j').I'm not sure I understand what you mean by steady-state probability here.

A steady-state probability distribution ##\pi_i = (\pi_{i,j},j=1,2,\ldots,n)## (for starting state i) is one that satisfies
[tex] \pi_{i,j} = \sum_{k} \pi_{i,k} P_{kj}, [/tex]
which can be written as
[tex] \mathbf{p}_i = \mathbf{p}_i \mathbf{P}[/tex]
where ##\mathbf{p}_i = (\pi_{i,1},\pi_{i,2}, \ldots, \pi_{i,n})## is a row-vector, and ##\mathbf{P}## is the one-step transition matrix. When there is a single closed class (possibly several transient classes) the steady-state vector is the same for all starting states so we can drop the i subscript and just write ##(\pi_1, \pi_2, \ldots, \pi_n)##. Note that a steady-state exists, even for a periodic chain (even though a limiting distribution does not). I will show you an example of this below.
I'm sure that's the case.

The quantities ##f_{i,j} = f(i,j) \equiv \Pr\{ \exists\: t \geq 1: \: X_t = j|X_0= i\}## are the "visit" probabilities to state j; they can be obtained from the first-passage distributions, as follows. Let ##T_{i,j}## be the (possibly defective random variable), which is the first time that state j is reached, starting from state i. (The random variable ##T## is defective if ##\Pr(T < \infty) < 1##.) In other words, let
[tex] T_{i,j} = \cases{ \text{smallest }n \text{ such that }X_n = j ,& \text{ if such }n\text{ exists} \\
\infty & \text{ otherwise} } [/tex]
The first-passage probability distribution from i to j is
[tex] f_{i,j}(n) = \Pr \{ T_{i,j} = n \}, n = 1,2, \ldots [/tex]
and the attainment or visit probability is
[tex] f_{i,j} = \Pr \{ T_{i,j} < \infty \} = \sum_{n=1}^{\infty} f_{i,j}(n).[/tex]
These satisfy the linear equations
[tex] f_{i,j} = p_{i,j} + \sum_{k: k \neq j} p_{i,k} f_{k,j}, \forall \: i \neq j [/tex]
For fixed terminal state j these are linear equations in ##f_{1,j}, f_{2,j} \ldots, f_{j-1,j}, f_{j+1,j}, \ldots, f_{n,j}##.

Note that those linear equations do not have a unique solution. The simple solution ##f_{i,j} = 1 \, \forall i,j## always satisfies the equations, but is not always the solution that respects the probabilistic definitions. To get unique solutions consistent with probabilistic meanings we need to use additional information about the class structure of the MC. For example, if state j is transient and state i is persistent, you cannot get ever get from i to j, so f(i,j) = 0. If i and j are in different closed classes, then f(i,j) = 0. If i and j are in the same closed class then f(i,j) = 1. (These come from theorems about the T_{i,j}, not from the linear equations.) Finally, as I said already, if j and j' are in the same closed class then f(i,j) = f(i,j'); this follows most easily from "j is attained if and only if j' is attained", but it can also be proved from the linear equations.

Let's bring this all together in an example:
[tex] \mathbf{P} = \pmatrix{1/8 & 3/8 & 1/4 & 0 & 1/ 8& 1/8\\
5/16 & 3/16 & 1/16 & 1/8 & 3/16 & 1/8 \\
0 & 0 & 1/4 & 3/4 & 0 & 0\\
0 & 0 & 1/2 & 1/2 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 & 1 & 0} [/tex]
Class T1 ={1,2} is transient, while classes C2 = {3,4} and C3 = {5,6} are persistent (closed). Class C2 is "ergodic", so the 2x2 submatrix on rows/columns 3,4 has a limiting distribution ##(\pi_3,\pi_4) = (2/5,3/5)##. Class C3 is periodic, so its submatrix (on rows/columns 5,6) does not have a limit; however, it has a stationary (long-run) distribution ##(\pi_5,\pi_6) = (1/2,1/2)##. This makes sense, since if we start in class C3 then in the long run we spend 50% of the time in each of states 5 and 6. If P_3 is the submatrix of rows/colums 5,6, we have ##(\pi_5,\pi_6) = (\pi_5, \pi_6) P_3##.

Using a computer algebra program such as Maple we can quite easily get a closed-form expression for the matrix ##\mathbf{P}^n##; some of its elements will be different for even or odd n, because of the periodic states 5 and 6. Anyway, we have
[tex] \mathbf{P}_e \equiv \lim_{n \to \infty} \mathbf{P}^{2n} =
\pmatrix{0 & 0 & 7/38 & 21/76 & 69/247 & 257/988 \\
0 & 0 & 31/190 & 93/380 & 66/247 & 321/988 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 }[/tex]
and
[tex] \mathbf{P}_o \equiv \lim_{n \to \infty} \mathbf{P}^{2n+1} = \pmatrix{0 & 0 & 7/38 & 21/76 & 257/988 & 69/247 \\
0 & 0 & 31/190 & 93/380 & 321/988 & 66/247 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 }[/tex]
The steady-state matrix is
[tex] \mathbf{Q} = \frac{1}{2} \mathbf{P}_e + \frac{1}{2} \mathbf{P}_o
=\pmatrix{0 & 0 & 7/38 & 21/76 & 41/152 & 41/152 \\
0 & 0 & 31/190 & 93/380 & 45/152 & 45/152 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 2/5 & 3/5 & 0 & 0 \\
0 & 0 & 0 & 0 & 1/2 & 1/2 \\
0 & 0 & 0 & 0 & 1/2 & 1/2 }[/tex]
You can verify directly that ##\mathbf{Q} = \mathbf{Q \, P}##.

So, how on Earth would we get the elements of Q without using a computer algebra package to raise P to high powers? Note first that the elements of Q in rows/columns 3..6 are exactly as stated before: we have solutions of the equations
[tex]q_3 = q_3 P_{3,3} + q_4 P_{4,3}\\ q_4 = q_3 P_{3,4} + q_4 P_{4,4}[/tex]
etc. For the transient states we have
[tex]f_{13} = f_{14} = 35/76, \: f_{23} = f_{24} = 31/76 \\
f_{15} = f_{16} = 41/76, \: f_{25} = f_{26} = 45/76.[/tex]
We can get these from the previous equations. For example,
[tex] f_{13} = (1/4) + (1/8)\,f_{13}+(3/8)\, f_{14}\\
f_{14} = 0 + (1/4) \, f_{34} + (1/8)\,f_{13}+(3/8)\, f_{14}\\
f_{23} = (1/16) + (1/8) \, f_{34} + (5/16)\,f_{23} + (3/16) \, f_{24}\\
f_{24} = (1/8) + (1/16) \, f_{43} + (5/16)\,f_{23} + (3/16) \, f_{24}[/tex]
Since we know ##f_{34} = f_{43} = 1## the ##f_{13}, f_{14}## equations are exactly the same, as are the ##f_{23}, f_{24}## equations. That establishes ##f_{13} = f_{14}## and ##f_{23} = f_{24}## if you do not believe it already. Similarly, ##f_{15} = f_{16}, f_{25} = f_{26}## follows from the equations and using ##f_{56} = f_{65} = 1##.

Finally, for transient starting states we have ##q_{1,1} = q_{1,2} = q_{2,1} = q_{2,2} = 0## because states 1 and 2 are transient, so will eventually be "left" and so will not be occupied for any positive long-run fraction of time. In addition we have
[tex] q_{1,3} = f_{13} q_{3,3}, \;q_{1,4} = f_{14}q_{4,4} [/tex]
etc. You can demonstrate this directly from the numerical results, but it is also simple to see from probabilistic considerations: the expected long-run fraction of time we are in state 3 equals the probability we get to class C2, times the long-run fraction of time we are in state 3, given we are in class C2. When translated, that gives the result stated already.

OK, that's enough for now.
 
Last edited:

1. What is a Markov process?

A Markov process is a mathematical model used to describe a system that changes over time. It is a type of stochastic process, which means that it involves random variables. In a Markov process, the future state of the system is only dependent on the current state, and not on any previous states.

2. What are the basic components of a Markov process?

The basic components of a Markov process are the state space, transition probabilities, and initial state. The state space is the set of all possible states that the system can be in. The transition probabilities represent the likelihood of the system moving from one state to another. The initial state is the starting point of the system.

3. How is a Markov process different from a Markov chain?

A Markov process is a continuous-time model, while a Markov chain is a discrete-time model. This means that a Markov process can take on an infinite number of states, while a Markov chain can only take on a finite number of states. Additionally, a Markov process is typically used to model continuous systems, while a Markov chain is used for discrete systems.

4. What are some real-world applications of Markov processes?

Markov processes have a wide range of applications in various fields, including finance, economics, biology, and physics. They are commonly used in stock market analysis, predicting weather patterns, studying the spread of diseases, and modeling chemical reactions. They are also used in machine learning and artificial intelligence algorithms.

5. How is the stability of a Markov process determined?

The stability of a Markov process is determined by the behavior of the system over time. If the system reaches a steady state, where the probabilities of being in each state remain constant, then the process is considered stable. If the system does not reach a steady state, then it is considered unstable.

Similar threads

  • Calculus and Beyond Homework Help
Replies
6
Views
300
  • Calculus and Beyond Homework Help
Replies
7
Views
1K
  • Calculus and Beyond Homework Help
Replies
8
Views
2K
  • Calculus and Beyond Homework Help
Replies
6
Views
2K
Replies
1
Views
790
  • Engineering and Comp Sci Homework Help
Replies
4
Views
2K
  • Calculus and Beyond Homework Help
Replies
3
Views
1K
  • Linear and Abstract Algebra
Replies
12
Views
2K
  • Calculus and Beyond Homework Help
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
388
Back
Top