# I How can I compute Markov transition probabilities from given data?

#### user366312

Gold Member
Summary
I know how to mathematically calculate the probability of various Markov Properties.
But, how can I calculate Markov probability from data?
Suppose, I have a Markov Chain as follows:

$S=\{1, 2\}$

$\alpha = \begin{bmatrix} 0.5&0.5\end{bmatrix}$

$P = \begin{bmatrix} 0.5&0.5\\ 0&1 \end{bmatrix}$

And, the following data regarding 5 steps of Markov Chain taken 12 times:

Markov Chain:
     Steps    1    2    3    4    5
--------------------------------
1      2    2    1    1    1
2      1    1    1    1    1
3      1    1    1    1    1
4      1    1    1    1    1
5      2    2    2    1    1
6      2    2    2    1    1
7      1    1    1    1    1
8      1    1    1    1    1
9      1    1    1    1    1
10      2    2    2    2    2
11      1    1    1    1    1
12      1    1    1    1    1
How can I calculate,say, $P(X_1 = 1|X_0 = 1)$ and $P(X_5 = 2|X_2 = 1)$ from this table?

Related Set Theory, Logic, Probability, Statistics News on Phys.org

#### FactChecker

Gold Member
2018 Award
In the data, count the number of transitions from one state directly to another (or to itself). That should give you probabilities for a transition matrix. From that, the rest follows. The data you show doesn't seem to match the probability matrix, P. Maybe there is not enough data, but the difference is large.

I don't understand why you are asking about $P(X_1=1|X_0=1)$. Are you sure that you really understand the transition matrix, P?

#### user366312

Gold Member
In the data, count the number of transitions from one state directly to another (or to itself). That should give you probabilities for a transition matrix. From that, the rest follows. The data you show doesn't seem to match the probability matrix, P. Maybe there is not enough data, but the difference is large.
I used the following code:

R code:
alpha <- c(1, 1) / 2
mat <- matrix(c(1 / 2, 0, 1 / 2, 1), byrow=TRUE, nrow = 2, ncol = 2) # Different than yours

chainSim <- function(alpha, mat, n)
{
out <- numeric(n) # create a numeric vector of size-n
out <- sample(1:2, 1, prob = alpha) # create 1 random number
for(i in 2:n)
{
#set.seed(1)
out[i] <- sample(1:2, 1, prob = mat[out[i - 1], ])
}
out
}

set.seed(1)
# Doing once
#sss <- chainSim(alpha, mat, 1 + 5)
#sss
#  2 2 2 2 2 2

# Doing 100 times
sim <- replicate(chainSim(alpha, mat, 1 + 5), n = 12)
t(sim)
# rowMeans(sim - 1)

# mean(sim[2, sim[1, ] == 1] == 1)
#  0.4583333

I don't understand why you are asking about $P(X_1=1|X_0=1)$.
Coz, that is what I was told in my class to find from the simulation.

Are you sure that you really understand the transition matrix, P?
I think so. Coz, I passed the MidTerm test.

#### FactChecker

Gold Member
2018 Award
I don't see anything wrong with the code, but your results have many more transitions from 2 to 2 than I get from 2 to one (9 versus 3). I thought those two transitions should have equal probability. I have tried a couple of seeds and no seed at all and seem to keep getting more transitions 2 -> 2 than 2->1. I don't understand that. I will have to defer to anyone who understands this better.

CORRECTION: From the data, I thought that 1 was supposed to be the absorbing state. That is wrong. From the definition of P, 2 is the absorbing state and 1 transitions to 1 and 2 with equal probabilities.

Last edited:

#### FactChecker

Gold Member
2018 Award
I made a couple of 1000 simulation runs (using different seeds) and keep getting about twice as many 2->2 transitions than 2->1 transitions. So there is definitely something that I don't understand.

CORRECTION: From the data, I thought that 1 was supposed to be the absorbing state. That is wrong. From the definition of P, 2 is the absorbing state and 1 transitions to 1 and 2 with equal probabilities.

Last edited:

#### FactChecker

Gold Member
2018 Award
I think that your code defining mat is wrong: mat <- matrix(c(1 / 2, 0, 1 / 2, 1), byrow=TRUE, nrow = 2, ncol = 2)
That gives probabilities [0.5, 1.0] in chainSim where it should be using [0.5, 0.5].
I changed it to mat <- matrix(c(1 / 2, 1/2, 0, 1), byrow=TRUE, nrow = 2, ncol = 2) and got transition results that make sense. 2 is an absorbing state and the transitions 1->1 and 1->2 are equally probable. Looking at your code, that seems to be what you intended.

#### user366312

Gold Member
..., that seems to be what you intended.
Okay. Thanks.

Could you now shed some light on finding $P(X_1=1|X_0=1)$?

The exact question was like the following:

Estimate P(X1 = 1|X0 = 1) from simulation. Compare your result with the exact probability you calculated.

#### FactChecker

Gold Member
2018 Award
Okay. Thanks.

Could you now shed some light on finding $P(X_1=1|X_0=1)$?

The exact question was like the following:
To do that, I recommend greatly increasing the number of runs. Then count how many times the initial state is 1, #initialState1. In those cases, also count how many runs have the second state of 1, #secondState1. the estimated probability would be #secondState1/#initialState1

I set the R code to do 100000 runs and used this code at the end to do the counting:

R code:
numInitial=which(sim[1,]==1)
N1= length(numInitial)
print(N1)
numSecond = which(sim[2,numInitial]==1)
N2 = length(numSecond)
print(N2)
prob = N2/N1
print(prob)

"How can I compute Markov transition probabilities from given data?"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving