## Can you Combinie two transition probability matrices?

Hello,

How would you/or is it possible to combine two transitional probability matrices?

Say for example I have 2 matrices that both measure the probability of moving from one state to another (the states are 1, 2 and 3)

Here is a picture

http://dl.dropbox.com/u/54057365/All/TPM1and2.JPG

Can these be combined into one probability matrix? Do you just take the average of each element?

To be more specific

Say for example I record the speed of a car each second on a road. Say I do this twice and get two TPMs with speed as the state. Each TPM has different probabilities.

So I would have two matrices like so

http://dl.dropbox.com/u/54057365/All/tpmbig.JPG

Can you combine them into one and assume that the resulting TPM is the "average" TPM for speed along that stretch of road?

Thank you

Many thanks

John
 PhysOrg.com science news on PhysOrg.com >> Galaxies fed by funnels of fuel>> The better to see you with: Scientists build record-setting metamaterial flat lens>> Google eyes emerging markets networks
 Hey bradyj7 and welcome to the forums. The main characteristics for a markov chain is that you have to following basic probability axioms. Apart from that, there are really no restrictions for your transition matrix (every row has to sum up to 1 and the matrix must be square) As for things related to the process, as long as the process exhibits the properties of first order conditional independence, then you are set. Mathematically there is no problem with finding the scalar average of the two matrices: in other words set your "average" transition matrix to 1/2 x (A+B) where A is the first matrix and B is the second. Since this process will create a matrix with that preserves the row property (sum equals 1) as well as the other Kolmogorov axioms, the resulting matrix is a proper transition matrix. Having said the above though, you need to be very careful about doing something like this because you need a deep understanding of the process to see what kinds of things you can do to get what you call an 'average' transition matrix. It might be helpful to know a bit more about the process before you do something like this. Also it would actually be wiser to create a system that is conditional on the car type and then take it from there. This means that you have to formulate a probability model that is conditional on the car type in which you should get distribution for your new stochastic process based on the above information. How much background do you with probability/statistics?

Hello Chiro,

Thank you for taking the time understand the question and your detailed response.

I'm currently doing a diploma in statistics and have an undergraduate degree in engineering/maths so I would have an understanding of the basic concepts.

 Mathematically there is no problem with finding the scalar average of the two matrices: in other words set your "average" transition matrix to 1/2 x (A+B) where A is the first matrix and B is the second. Since this process will create a matrix with that preserves the row property (sum equals 1) as well as the other Kolmogorov axioms, the resulting matrix is a proper transition matrix.
I thought that this would be OK. However, when I did it, the rows in the resulting 'average' matrix do not sum to 1. Take for example, these two matrices

http://dl.dropbox.com/u/54057365/All/TPM1and2.JPG

In the second matrix, no speed value for 2 was recorded hence the row (row 2) sums to 0. The reason why is because the speed is recorded every second and for this particular trip the recorded values were 1 and the next one was 3.

So if I take the average of these two matrices, I would get

0.65 0.05 0.3
0.25 0.25 0 ......Row 2 does not sum to 1?
0.1 0.25 0.85

Essentially, what I am doing is driving a car along a street and recording the speed every second. Then creating frequency and transition probability matrices. Then generating a speed profile of a car on the street using a markov chain.

The same car drives the route at the same time each day. Would this cover this requirement?

 This means that you have to formulate a probability model that is conditional on the car type in which you should get distribution for your new stochastic process based on the above information.
Thank you

John

## Can you Combinie two transition probability matrices?

The first matrix with a zero row in 2nd column is not a valid transition matrix.
 Hello, I was thinking that this was the reason. I have prepared a little spread sheet of my method if you wouldn't mind looking at it? http://dl.dropbox.com/u/54057365/All/tpm3.JPG A sample of recorded values are in Col A and you will see a frequency a matrix and transitional probability matrix. As you pointed out row 2 is incorrect. However the next time the car drives the route a value of 2 could be recorded. And in order to get the average of the two matrices then they would need to be the same size, correct? Could you suggest a solution to this? Many thanks John
 Hi Chiro, Would you have any further thoughts on this? Kind regards John

 Quote by bradyj7 Hi Chiro, Would you have any further thoughts on this? Kind regards John

Sorry dude, the timezone here in Australia is different and I forgot you made another reply (didn't come up in my inbox).

Whenever you have the case where you don't have any probability data, basically you stick with the approach of getting frequency information for each 'atomic' event, which in this case corresponds to each 'cell' in your matrix.

Once you get the frequency information you just divide by the relevant total and that is your empirical probability based on data.

If its not based on data, then it will just be probability that obeys the Kolmogorov axioms (i.e. probability between 0 and 1 and all probabilities in each row add up to 1).

From what you have said, the averaging idea is a good idea but only if there is an equal amount of data in each transition matrix. Let me explain:

Since all of the data is talking about the exact same process, then each matrix represents an 'independent' collection of probability information for the same process.

This means that you can do a special kind of average to combine the two matrices and based on what they represent, this is not only ok but expected.

Lets illustrate the point with a simple stochastic process (not markov but the same principles apply):

Lets have experiment information A = {1,2,3,4,5} A_N = 15 and B = {3,6,11,14,15} A_N = 29. Now they are all 'measurements' from the same experiment but they are partitioned into sets A and B.

Now if we wanted to figure out the probability info taking into account both A and B we would calculate P(x) = [A(x) + B(x)]/(A_N + B_N).

But what if we had P_A(x) that represented a probability for data in A and P_B(x) that represents data in B?

We know P_A(x) = A(x)/A_N and P_B(x) = B(x)/B_N. Now our goal is to find P(x) in terms of P_A(x) and P_B(x) (which in your case is the two transition matrices).

P(x) = [A(x) + B(x)]/(A_N + B_N) = A(x)/[A_N + B_N] + B(x)/[A_N + B_N]

We now use the substitution A(x)/[A_N + B_N] = A(x)/A_N x [1 - B_N/(A_N+B_N)] and B(x)/[A_N+B_N] = B(x)/B_N x [1 - A_N/(A_N+B_N)].

But we have A(x)/A_N as our P_A(x) and B(x)/B_N as our P_B(x) which means that we can calculate P(x) as:

P(x) = P_A(x) x [1 - B_N/(A_N+B_N)] + P_B(x) x [1 - A_N/(A_N+B_N)]

This is what you have to do. Basically your A_N and B_N are the 'total histogram counts' for each row and your P_A(x) and P_B(x) correspond to probability information in each cell of your matrix.

The way this works is that if one matrix has a lot of data compared to the other, it will be weighted more which if you think about it makes sense: imagine if one matrix had 10000 bits of data and your second matrix had 2.

The above is not only the how but the why: because all the data is part of the same process all I am doing is show to go from partitioned data (your two matrices) to unpartitioned data (you're 'averaged' matrix).

If you have any other questions I'll do my best to answer them.

Hello Chiro,

Its cool, I'm based in Ireland so we are in opposite time zones

You said that

 From what you have said, the averaging idea is a good idea but only if there is an equal amount of data in each transition matrix.
Do you mean by this that each matrix would need to have the same number transition states i.e 1,2,3,4,5 or that it matrix would need to created from the same number of 'recorded values'?

I think that you refer to this again here

 The way this works is that if one matrix has a lot of data compared to the other, it will be weighted more which if you think about it makes sense: imagine if one matrix had 10000 bits of data and your second matrix had 2.
Am I correct?

Which ever the case may be, can I just say that for each experiment there will always be 20 recorded values (if this is what you are referring to). But I cannot say for certain that each experiment will contain at least one occurrance of the states 1,2,3,4,5. These are speed values and are dependent on how fast an individual accelerates. For instance somebody speed value could go from 1 to 2 to 3 but for somebody who accelerates fast it may go from 1 to 3.

Just so I understand what your proposing, I put together another two examples. Both examples have 20 recorded values. Example 1 has at at least one occurance of each state and all probabilities sum to 1. Example 2 has no occurances of state 4.

http://dl.dropbox.com/u/54057365/All/TPM1.JPG
http://dl.dropbox.com/u/54057365/All/TPM4.JPG

So that I can be certain that I understand you correctly would you mind posting what you would calculate as being the 'average' of these two matrices based on what you explained in the previous post?

Many thanks

John
 I think I may have babbled on a lot more than usual. I'll say the specifics: Consider your excel spreadsheets: http://dl.dropbox.com/u/54057365/All/TPM1.JPG http://dl.dropbox.com/u/54057365/All/TPM4.JPG Now consider the formula we derived for calculating the final probability transition matrix: P(x) = P_A(x) x [1 - B_N/(A_N+B_N)] + P_B(x) x [1 - A_N/(A_N+B_N)] In your particular problem: P(x) represents the probability in your given row that you are working on where x is the cell in that row. P_A(x) represents the probability value in the same row at cell x for your first matrix. P_B(x) is the same thing for your second matrix. A_N is the total number of counts for the same row we are dealing with in matrix A B_N is the total number of counts for the same row we are dealing with in matrix B Now as an example of actual values: Lets look at the first row for both matrices: A_N = 6, B_N = 2, P_A(1) = 0.17, P_B(1) = 0 P(1) for this row = 0.17 x (1 - 2/(6+2)) + 0 = 0.17 x 3/4 = 0.1275 Because the first matrix has more data it is weighted more. You just apply the same formula for each cell in the given row. You can use a spreadsheet to do this for and everything will be calculated instantly. If you study the formula for a little bit you will see that it is basically a 'weighted average' of the two matrices where the one that has the most data is given a bigger weight (closer to 1). If both matrices have the exact same amount of data the weight will be exactly half since B_N/(A_N+B_N) = A_N/(A_N+B_N) = 1/2 and (1 - 1/2) = 1/2.
 Hello Chiro, Thanks very much for the help, I understand it all now! Cheers John
 Hello Chiro, I have one further question about the formula for calculating the weighted average of the matrices, if you don't mind? What would the formula be for 3 matrices? I've attempted it below, but I'm unsure what goes in the place that I've marked with **?**, is it P_A(x) or P_B(x), or does it matter? or is the equation incomplete? P(x) = P_A(x) x [1 - B_N/(A_N+B_N+C_N)] + P_B(x) x [1 - A_N/(A_N+B_N+C_N)] + P_C(x) x [1 - **?**/(A_N+B_N+C_N)] Thank you John

 Quote by bradyj7 Hello Chiro, I have one further question about the formula for calculating the weighted average of the matrices, if you don't mind? What would the formula be for 3 matrices? I've attempted it below, but I'm unsure what goes in the place that I've marked with **?**, is it P_A(x) or P_B(x), or does it matter? or is the equation incomplete? P(x) = P_A(x) x [1 - B_N/(A_N+B_N+C_N)] + P_B(x) x [1 - A_N/(A_N+B_N+C_N)] + P_C(x) x [1 - **?**/(A_N+B_N+C_N)] Thank you John
Hey John.

Do you want the basic idea for N matrices or just the special case of N = 3?

I'd recommend to discuss the general formula although there will be a few extra symbols here and there.

The idea is exactly the same in that you calculate the 'weights' of the data of each matrix with respect to the total data in 'each' matrix. Because we had two data points we know that w1 + w2 = 1 which is why we have the (1 - blah).

In the general sense we just use a simple formula to calculate weights and then multiply the P_blah(x) terms by the weights.

 Hello Chiro, Yes, it would great if you could describe the basic formula for N matrices. That way I'll be able to apply it to different numbers of matrices and I'll also have a better understanding. Thank you John
 Well the basic idea is that you weight everything by the number of data points in one row of one matrix relative to the total across all rows. Lets define a bit of notation. Let N be the number of matrices you have. Let N_(i,j) be the number of data points for the ith matrix for the jth row. Let S_(i,j)(x) be the frequency count for the ith matrix in the jth row for the xth element. From this we have P(i,j)(x) is the probability for the ith matrix for the jth row at the xth column. We can write this as: P(i,j)(x) = S_(i,j)(x) / N_(i,j) Now we have all of this information at hand (exactly like in your spreadsheet). Let P(r,x) be the 'combined' probability matrix for row r and column x corresponding to our final probability we wish to calculate. This is given as: P(r,x) = Sum (i = 1 to N) w_i P(i,r,x) and our job is to determine w_i. Note that we do this on a row by row basis and not on a matrix by matrix basis unless each row in each matrix has the exact same amount of data: other-wise its important to make sure you do this on a row by row basis. Now it turns out calculating w_i is really easy. All we have to do is consider the weight of the data of one matrix with respect to all the data in all the matrices. Lets introduce the following variable T where: T = Sum (i = 1 to N) N_(i,r) where r is the same row we are dealing with in our calculation. w_i = N_(i,r) / T. That is it! All I have done is figured out the weight of the data of one matrix with respect to everything in total for a particular row and used that to denote the combination of how the matrix contributes to the overall data set. Now take these formulas apply them across all matrices and all rows and you will get a final matrix that represents all the data you have collected.
 Hello Chiro, Thank you for this. I understand the theory behind what you are saying, but I'm struggling to see how you derive a specific formula for N = some number from this. Would you mind posting the formula for N=3 so I can compare it to N=2? I really appreciate your help. John

 Quote by bradyj7 Hello Chiro, Thank you for this. I understand the theory behind what you are saying, but I'm struggling to see how you derive a specific formula for N = some number from this. Would you mind posting the formula for N=3 so I can compare it to N=2? I really appreciate your help. John
Ok lets see how we go.

Lets assume we are using the standard convention that has been used in this thread for N's P(x)'s and so on.

First lets look at the N = 2 case.

If we have N_A, N_B, P_A(x), P_B(x) and lets say that our row is fixed with N = 3 and T = N_A + N_B. Now we have

w_1 = N_A/(N_A+N_B) and w_2 = N_B(N_A+N_B)

But we know that N_A/(N_A+N_B) + N_B(N_A+N_B) = 1 using some simple algebra.

This means that w_1 = 1 - N_B/(N_A+N_B) and w_2 = 1 - N_A/(N_A+N_B) which gives our formula for the N = 2 case. The derivation was a little different and was in a different form, but this is why it was the case.

Now for N = 3.

Lets define N_A, N_B, and N_C in the usual way and P_A(x), P_B(x) and P_C(x) in the usual way and T = N_A + N_B + N_C.

w_1 = N_A/T, w_2 = N_B/T, w_3 = N_C/T

Now if you wanted the same 'form' as the early derivation we do the same thing by writing things in terms of 1 - blah in the same way for N = 2.

So we know N_A/T + N_B/T + N_C/T = 1 which means we can use the following substitutions:

w_1 = 1 - (N_B + N_C)/T, w_2 = 1 - (N_A + N_C)/T, w_3 = 1 - (N_A + N_B)/T and then use those weights like we did in the very first derivation of N = 2 case.

So basically the link is that in the first derivation we looked at things in terms of complementary weights by using the fact that all weights sum up to 1 and in the second case we just use the simple idea of calculating a weight to be the fraction of the data in one matrix for one row relative to the total data for all matrices in that row. By using the algebraic trick of writing weight_blah = 1 - rest of weights we get different expressions that mean mathematically and intuitively the same thing.
 Hello Chiro, Thank you for explaining this to me. It took me a few times going through it to understand it, but I get it now and can write the formula for an value of N. Result. Could I ask you another question, if you don't mind? It is in relation to Markov Chain states. You have already seen from my earlier posts that I am recording the speed of a car as it travels down a street and I am creating a transition probability matrix of the probability of moving from one speed to another and modelling this with a markov chain. The output is ultimately a speed - Time profile of the car. I've created a very simple spreadsheet to illustrate the process (These are not real values, I made this up to keep the values small) http://dl.dropbox.com/u/54057365/All/forum1.JPG I understand all this. My question is to do with the following: I've found a paper online where somebody is doing the same process (i.e modelling the speed time profile of a car). However, they have used speed AND acceleration as the states in the Markov chain. As far as I know, they compute the acceleration directly from the recorded speed values. Here are two figures illustrating their process. http://dl.dropbox.com/u/54057365/All/mw1.JPG http://dl.dropbox.com/u/54057365/All/mw2.JPG So my question is, if they did compute the acceleration directly from the recorded speed values, would that not make one of the states redundant as you calculate one directly from the other? They are recorded at fixed time intervals (i.e 1 sec) and they're both scalar quantities. What benefit, if any, would using acceleration as a state also bring to the model? I hope that this is enough information for you to understand it. I would be grateful for comments that you could make on this. Thank you John