# Total number of possible n-state sequences

1. Jul 15, 2012

### AryanK

Hi everyone,

I'm doing an investigation of markov properties and in an example I have made the following transition matrix:

http://img152.imageshack.us/img152/1584/matrixki.png [Broken]

If all the probabilities were above zero, finding the total number of possible 4-state sequences (i.e. ACBA, BACB etc.) would've been very simple. However, P(C|B)=0 (I must have at least one zero-probability option at this point) and I don't know how to find the number of total possible 4-state sequences without counting them one by one. Any help?

Thanks,
AryanK

Last edited by a moderator: May 6, 2017
2. Jul 15, 2012

### chiro

Hey AryanK and welcome to the forums.

If you want to find for example the probability of starting at A and ending at C after n steps you simply calculate A^n where A is your transition matrix which means you multiply A*A*..*A with n multiplies.

If you want to find a particular sequence, simply multiply the right probabilities together.

For example lets look at ACBA. A to C has probability 1/2. C to B has probability 1/2 and B to A has probability 2/3. Since each transition is independent, you find the probability through multiplication which gives 1/2*1/2*2/3 = 1/6.

Same idea for the other ones.

3. Jul 15, 2012

### AryanK

Hi and thanks for the prompt reply. :)

Yes, I understand how to calculate the probability of a specific sequence. However, what I'm interested in is how to calculate the number of all possible distinct sequences the matrix can produce. I'm interested in this as I'm trying to figure out a method/model to find the most likely n-state sequence of a given transition matrix.

For example, take the following matrix:
http://img404.imageshack.us/img404/5540/test2xt.png [Broken]

Here I know that the total number of possible 4-state sequences are 3*3*3*3, as there are four states, and there are 3 possibilities in each state. However, in the original matrix, P(C|B)=0, therefore whenever a sequence contains B, it has only 2 possible options for the next state instead of 3. This should decrease the total number of possible sequences, but I don't know how to calculate the new total number.

Sorry if this sounds confusing. I've always had problems describing math-related stuff! :D

Last edited by a moderator: May 6, 2017
4. Jul 15, 2012

### chiro

Well the easiest way for this case is to count the number of times a B occurs. For this we use a binomial model where one state is B and the other state is not B (i.e A,C,).

Since we are only interested in the transitions, it means the last realization doesn't count and we are only interested in the first three. So we change the binomial from n = 4 to n = 3 and apply the results. For 3 B's we get 2*2*2 as an example. Since order doesn't matter we can use this to find out probabilistically how many B's we get for 3 B's, 2 B's 1 B, and no B's.

This means we get a probability for no B's, 1 B, 2 B's, and 3 B's. This is easy to calculate. Then by mapping a B to 2 and a non-B to 3 we get our answer.

So for 2*2*2 (ie BBBX for some X = {A,B,C}) the probability is given by the binomial distribution which tells us the chance of this happening. Same with the other ones.

5. Jul 15, 2012

### haruspex

The number of possible 4-state sequences will be 34, less those which contain a consecutive pair BC. 9 start with BC, 9 have a central BC, 9 finish with BC. Those 27 only contain one repeat: BCBC. So we have 34 - 26 = 55.
However, not sure that helps much wrt finding the most common sequence. Maybe you were intending to construct a new matrix for transitions between 4-state sequences. 55 is still rather a large number, perhaps.
I hope chiro's post helps - I can't follow it.

6. Jul 16, 2012

### haruspex

More thoughts...
For the odds of the different sequences, compute the single state odds in the usual way then extend one state at a time: P(AB) = P(A)P(B|A).
If the offered example is much simplified from the real problem, it may be that the total number of possible sequences makes it unmanageable to compute the exact odds of every sequence. In that case, can discontinue some sequences if the odds fall below a threshold. E.g. In the example, the most likely sequence must have odds at least 1/55, so can discard any sequence as soon as it falls below that.

7. Jul 17, 2012

### AryanK

Thanks for the replies! You have both suggested pretty smart methods that I failed to think of. :D

I'm basically trying to figure out a less-intense way of finding the most likely sequence, because calculating the odds for every single sequence can be suicidal in more complex cases.

@Harusoex: Yeah, I've tried setting some limitations to eliminate a couple of candidates, but I still have to find a way to narrow things even further. I've actually found an algorithm (Viterbi) which does something similar to what I want, but I'm still trying to find a simpler way, so wish me luck! :)

Last edited: Jul 17, 2012
8. Jul 17, 2012

### SW VandeCarr

OK. Initially ignore there are just three possible states in a sequence of four and say there are 4!=24 sequences including repeats. Forbidden sequences are BC, AA, BB and CC. Each of these can occur three possuble ways in a sequence of four. Therefore, if we subtract sequences with forbidden two state sequences, there are 12 remaining possible 4 state sequences.

Last edited: Jul 17, 2012
9. Jul 17, 2012

### haruspex

With no constraints there are 81 possible 4-state sequences.
Why are AA, BB and CC forbidden?

10. Jul 17, 2012

### haruspex

Viterbi is for hidden Markov models, no?
I wondered if there was a higher threshold you could use for discarding partial sequences early. So I tried constructing a pathological example. Consider M states that behave exactly the same - i.e. for any other given state S, each has the same transition probabilities to and from S. A further N states, Ci, form a deterministic chain C1->C2->..->CN. Each of the M has a very small chance p of transiting to C1. CN has an evens chance of transiting to C1, otherwise equally likely to any of the M.
At any instant, prob of being in a (particular one of the) M states is 1/(M+2pNM), and for each of the N it's 2p/(1+2pN). For the N-state sequences, prob of C1...CN is also 2p/(1+2pN). For X.C1...CN-1, where X is one of the M, it's p/(M+2pNM). Set p << 1/2M.
Thus, for the first state in the sequence, C1 looks poor, but the C1...CN sequence is the single most likely of length N.

11. Jul 18, 2012

### AryanK

@SW VandeCarr: Thanks for the reply! Well, in this case repetition is allowed, so that automatically leads to many more sequences than the number you've suggested. I think haruspex's method is pretty persuasive. :)

@haruspex: Well, markov models aren't deterministic, as down the road they are random. However, the most likely sequence IS a deterministic chain, so that chain must somehow be defined. Another idea that came to my mind is that since the most likely sequence is directly related to the matrix itself, it could be possible to get the most likely 2-state sequences of each row (in this case AC, BA, and CB), remove any other combination, and state (in a formal fashion) that the most likely can be generated using only those sequences.

Sometimes I wish it would suffice to say "just look at the matrix and follow the states with highest probabilities"!

12. Jul 18, 2012

### SW VandeCarr

OK. You did say sequences of states, not state transitions, but if you're only interested in the number of 4 state paths (not probability) involving only state transitions, it seems consecutively repeating states can be ignored or collapsed to one state. BTW my model failed to take into account states like ABAB where two of the states repeat non- consecutively. Your formulation requires one state to repeat, so that's taken care if we think of the repeat as A' etc. To cover repeat pairs, we add all pairs of the matrix to the number of combinations. Then we have 4!+9=33. Remove the 12 "illegal" sequences (including those with BC) and we have 21 paths. Note, any sequences "stuck'' in states eventually become "unstuck" with increasing probability over time.

EDIT: Just food for thought since you still haven't gotten your answer. BTW if you used the method in post 5 to exclude AA,BB and CC in addition to BC, it seems you would exclude 27x4=108 sequences out of 81 (less repeat pairs)..

Last edited: Jul 18, 2012
13. Jul 19, 2012

### haruspex

No. 9 start with AA, 9 have AA in the middle, and 9 finish with AA. But now you have more repeats than in the BC case: AAAx and xAAA. That comes to 5 repeating cases, one of them (AAAA) being repeated twice: 27-6 = 21. Eliminating all with AA, BB or CC gives 6 more repeats: AABB etc. 81-3*21 + 6 = 24. But of course, there's a much easier way of working that out.
If you want to eliminate those with BC as well then there are more repeats to worry about.