I How can I compute Markov transition probabilities from given data?

  • Thread starter user366312
  • Start date

user366312

Gold Member
66
2
Summary
I know how to mathematically calculate the probability of various Markov Properties.
But, how can I calculate Markov probability from data?
Suppose, I have a Markov Chain as follows:

##S=\{1, 2\}##

##
\alpha = \begin{bmatrix}
0.5&0.5\end{bmatrix}
##

##
P =
\begin{bmatrix}
0.5&0.5\\
0&1
\end{bmatrix}
##

And, the following data regarding 5 steps of Markov Chain taken 12 times:



Markov Chain:
     Steps    1    2    3    4    5
     --------------------------------
       1      2    2    1    1    1
       2      1    1    1    1    1
       3      1    1    1    1    1
       4      1    1    1    1    1
       5      2    2    2    1    1
       6      2    2    2    1    1
       7      1    1    1    1    1
       8      1    1    1    1    1
       9      1    1    1    1    1
      10      2    2    2    2    2
      11      1    1    1    1    1
      12      1    1    1    1    1
How can I calculate,say, ##P(X_1 = 1|X_0 = 1)## and ##P(X_5 = 2|X_2 = 1)## from this table?

Kindly, explain your answer.
 

FactChecker

Science Advisor
Gold Member
2018 Award
4,868
1,673
In the data, count the number of transitions from one state directly to another (or to itself). That should give you probabilities for a transition matrix. From that, the rest follows. The data you show doesn't seem to match the probability matrix, P. Maybe there is not enough data, but the difference is large.

I don't understand why you are asking about ##P(X_1=1|X_0=1)##. Are you sure that you really understand the transition matrix, P?
 

user366312

Gold Member
66
2
In the data, count the number of transitions from one state directly to another (or to itself). That should give you probabilities for a transition matrix. From that, the rest follows. The data you show doesn't seem to match the probability matrix, P. Maybe there is not enough data, but the difference is large.
I used the following code:

R code:
alpha <- c(1, 1) / 2
mat <- matrix(c(1 / 2, 0, 1 / 2, 1), byrow=TRUE, nrow = 2, ncol = 2) # Different than yours

chainSim <- function(alpha, mat, n)
{
    out <- numeric(n) # create a numeric vector of size-n
    out[1] <- sample(1:2, 1, prob = alpha) # create 1 random number
    for(i in 2:n)
    {
        #set.seed(1)
        out[i] <- sample(1:2, 1, prob = mat[out[i - 1], ])
    }
    out
}

set.seed(1)
# Doing once
#sss <- chainSim(alpha, mat, 1 + 5)
#sss
# [1] 2 2 2 2 2 2

# Doing 100 times
sim <- replicate(chainSim(alpha, mat, 1 + 5), n = 12)
t(sim)
# rowMeans(sim - 1)

# mean(sim[2, sim[1, ] == 1] == 1)
# [1] 0.4583333

I don't understand why you are asking about ##P(X_1=1|X_0=1)##.
Coz, that is what I was told in my class to find from the simulation.

Are you sure that you really understand the transition matrix, P?
I think so. Coz, I passed the MidTerm test.
 

FactChecker

Science Advisor
Gold Member
2018 Award
4,868
1,673
I don't see anything wrong with the code, but your results have many more transitions from 2 to 2 than I get from 2 to one (9 versus 3). I thought those two transitions should have equal probability. I have tried a couple of seeds and no seed at all and seem to keep getting more transitions 2 -> 2 than 2->1. I don't understand that. I will have to defer to anyone who understands this better.

CORRECTION: From the data, I thought that 1 was supposed to be the absorbing state. That is wrong. From the definition of P, 2 is the absorbing state and 1 transitions to 1 and 2 with equal probabilities.
 
Last edited:

FactChecker

Science Advisor
Gold Member
2018 Award
4,868
1,673
I made a couple of 1000 simulation runs (using different seeds) and keep getting about twice as many 2->2 transitions than 2->1 transitions. So there is definitely something that I don't understand.

CORRECTION: From the data, I thought that 1 was supposed to be the absorbing state. That is wrong. From the definition of P, 2 is the absorbing state and 1 transitions to 1 and 2 with equal probabilities.
 
Last edited:

FactChecker

Science Advisor
Gold Member
2018 Award
4,868
1,673
I think that your code defining mat is wrong: mat <- matrix(c(1 / 2, 0, 1 / 2, 1), byrow=TRUE, nrow = 2, ncol = 2)
That gives probabilities [0.5, 1.0] in chainSim where it should be using [0.5, 0.5].
I changed it to mat <- matrix(c(1 / 2, 1/2, 0, 1), byrow=TRUE, nrow = 2, ncol = 2) and got transition results that make sense. 2 is an absorbing state and the transitions 1->1 and 1->2 are equally probable. Looking at your code, that seems to be what you intended.
 

user366312

Gold Member
66
2
..., that seems to be what you intended.
Okay. Thanks.

Could you now shed some light on finding ##P(X_1=1|X_0=1)##?

The exact question was like the following:

Estimate P(X1 = 1|X0 = 1) from simulation. Compare your result with the exact probability you calculated.
 

FactChecker

Science Advisor
Gold Member
2018 Award
4,868
1,673
Okay. Thanks.

Could you now shed some light on finding ##P(X_1=1|X_0=1)##?

The exact question was like the following:
To do that, I recommend greatly increasing the number of runs. Then count how many times the initial state is 1, #initialState1. In those cases, also count how many runs have the second state of 1, #secondState1. the estimated probability would be #secondState1/#initialState1

I set the R code to do 100000 runs and used this code at the end to do the counting:

R code:
numInitial=which(sim[1,]==1)
N1= length(numInitial)
print(N1)
numSecond = which(sim[2,numInitial]==1)
N2 = length(numSecond)
print(N2)
prob = N2/N1
print(prob)
 

Want to reply to this thread?

"How can I compute Markov transition probabilities from given data?" You must log in or register to reply here.

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving
Top