## Can you Combinie two transition probability matrices?

It's basically the same as a markov chain for discrete time except that the transition probabilities are based on continuous time.

So if you had say three events in continuous time, a continuous time markov model would be a 3x3 matrix (in terms of its transition matrix) but each entry would be a function of some continuous variable instead of a constant.

In terms of what its used for, it's basically used for continuous time parameters instead of discrete time: that's the major difference and if you had phenomena that were based on continuous time and you could solve for the right transition matrix (where it's form is not too complicated), then it would a lot better to use that.

But your model is not markovian which means that you can't really use this framework, and one reason why I suggest not to go down the continuous path is because a non-markovian continuous time model will be really really complicated: so the suggestion was made to use a discrete branch binary tree where the number intervals are low enough to make it simple, but high enough to make the model useful as an analytic tool.

 Okay, thank you for explaining that and for your help.
 Hi Chiro, A bit off the topic, but just in relation to discrete markov chains, is there a statistical test for the sample size of measurements to ensure that the transition probability matrix (that you construct) for the process significantly represents the process that you are trying to model?
 There is a general area of statistics that concerns itself with sample size and power calculation in relation to a variety of statistical tests. But with regards to probabilities, that is a little different because you are more interested in whether the distribution based on your sample is good enough to be used as some kind of population model, and typically most statistical tests are concerned with estimation of specific parameters (like say the mean) or non-parametric parameters (i.e. distribution invariant ones like the median). One thing you can do is to treat your probabilities as proportions and then consider an interval that corresponds to that probability given a sample size. So your probabilities are proportions just like a Bernoulli trial, and you can then look at the power and sample size calculations for getting one particular proportion in some region correctly or incorrectly and then look at how that applies to the distribution and individual probabilites.

 Quote by chiro Basically what you would have to do is break it up into a small number of intervals (which you have done) and then consider all the branches to get a complete set of conditional distributions. So instead of making your conditional distribution based on a continuous variable, you make it based on a discrete one. So in other words you restrict your parking and journey times to fit into "bins" and then you look at each conditional distribution for each of the branches. For example if you allow the smallest time interval to be ten minutes: then you consider conditional distributions for total times in terms of lumps of these intervals. So if you have n of these intervals, you will get 2^n branches. Some branches may have zero probabilities, but in general you will have 2^n individual branches corresponding to all the possibilities that you can take. So an example might be P(Total Journey Time = 30 minutes| First 10 = Travel, Second 10 = Travel, Last Ten = Park) and any other attributes you need. To count up total journeys, you basically sum up all the positive branches (i.e. when all the times you have a journey) and for parking you do the same for those. Basically what this will look like is a dependent binomial variable, and what you do is estimate these probabilities from your sample. From this you will have a distribution for n intervals given a history of what you did and by considering whatever subset of these probabilities you wish, you can find things like the expectation.
Hello Chiro,

Can I ask you a couple of questions regarding this post?

To help visualise the problem, here is a sample of raw data.

https://dl.dropbox.com/u/54057365/All/rawdata.JPG

So the first thing is to bin the journey and parking times. So say I select 5 minutes as the bin size, I'll have 288 bins. This is a lot of bins but I presume 5 minutes is reasonable for short journeys and short parking times (trip to shop etc).

Apologies, I don't want to come across as stupid but I don't fully understand what you mean by branches? and 2^288 is a lot of branches?

You provided an example

 So an example might be P(Total Journey Time = 30 minutes| First 10 = Travel, Second 10 = Travel, Last Ten = Park) and any other attributes you need.
Could you explain what you mean by "travel" and "park", a car would be parked during a journey??

From the sample data above, I have "binned" the journeys with a journey time of 30-35 minutes.

https://dl.dropbox.com/u/54057365/All/rawdata1.JPG

Would it be possible to show me by way example what you mean by branches and an example of a probability calculation? I would really appreciate it and it would help me understand it better.

I just can't visualise how to calculate (or generate) a journey time or parking time now that we are assuming that they are dependent events. Assuming independence was easier!

Are we generating or calculating a journey time given start time distance and the previous journey time and parking time?

I like the idea of calculating the probabilities first and then getting the expected values but I just can't grasp what probabilities to calculate and how to do it. I know basic stuff!

Thank you for your time, I know that I have asked a lot of questions on this forum and I am grateful for the help.

John

 So by branches, I mean that for each outcome chronologically you have two options (i.e. park next or move based on what you're doing now) where there is a chronological element. Think of it like modelling whether a stock goes up or down given it's current state and it's complete history. In terms of estimating probabilities and getting confidence interval, what you are doing is estimating a probability distribution with n probabilities and this is equivalent to estimating the parameters of a binomial if you only have two probabilities per branch point: park or keep moving. If you can only park or keep moving, then this is only two choices which means you only have one probability. You can use either the standard frequentist results to get the variance of the estimator and also use programs to calculate the power and sample size for some given significance level amongst other things. Something like this: http://biostat.mc.vanderbilt.edu/wik...owerSampleSize If you have a distribution with more than two outcomes, you will want to use the multi-nomial. Basically the way I am suggesting you start off is to consider the branch model and then simplify as much as possible without losing the actual nature of your model (i.e. don't make it too simple so that it's accuracy reaches a point to be useless). So with your binned data, I'm a little confused since you have both parking and journey data in a trip: I was under the impression you could only travel or be parked in one indivisible event so maybe you could clarify what is going on. The parameters like journey and parking times will be based on your branch distribution, but you can throw in some kind of residual term to take care of the finer details (like say if your branch size is 30 minutes, but you want to take into account variation to get probabilities for say 35 minutes or 46 minutes given that the time is between 30 and 60 minutes). As for the distance, it would probably make sense for this to be a function of your actual times (even it's an approximation) and if you had more data (like location: for example if you are in a busy city or a wide open space) then you would incorporate this as well. So now we have to look at calculating a branch. Basically if you have the data divided up into the branches, then it's just a matter of getting the mean and variance of the bernoulli data and this becomes the basic frequentist interval of which you can use a normal distribution if your sample size is big enough. So lets say you have a branch for the time t = [120,150) minutes where your branch size is 30 minutes. You will have quite a few possibilities leading up to this time level (if a 30 minute interval is assumed, you will have 2^4 or 16 possibilities) so you will have 16 different sets of data say how frequently you will park or move in those next 30 minutes. You can calculate this stuff from your data to generate a table full of 1's and 0's and these are used to estimate your probabilities. Now as you have mentioned, this will be a lot of data but if you have 30 minute intervals for 12 hours in total that's just under 17-million probabilities for the entire distribution which isn't actually too bad (a computer will calculate and deal with this data pretty quickly). For fine grained stuff, as mentioned above you can add a distribution for getting values in that particular range. With this technique you can adjust these coarse and fine-grained distributions (i.e. change the interval from 30 to 40 minutes but then change the fine-grained one to something with more complexity).
 Hello Chiro, Thank you for explaining this. I've spent the day reading examples of Binomial distributions and this is a good method, I understand what your suggesting, well most of it:) Just to clarify the issue with the data. Each row represents the travel for one car per day. https://dl.dropbox.com/u/54057365/All/rawdata.JPG So the journey time is the length of the journey (minutes) and the parking time (minutes) is the time between the end of journey 1 and start of journey 2. The 1st and 2nd rows in the sample only made 2 journeys that day, whereas the 3rd row had a 3rd journey. It had a 4th journey too but I left it out so it would fit on the screen. Altogether I have 20,000 rows (days of travel). In my mind, I am treating a journey as an event and naturally I am thinking that at the end of the event there is only one option which is park. Are you considering a journey as an event here? I'm not sure if I'm following you correctly here? So when you say bin the data, do you mean bin the journey times (for example sort the rows from small to large) work out the probabilities and then do the same for the parking times? Should I keep the order of the journeys as they are Journey 1, Journey 2, Journey 3 etc? or just treat them as one list? Like this https://dl.dropbox.com/u/54057365/All/rawdatalist.JPG and then bin them (by journey time first) like this (the journey time column is sorted)? https://dl.dropbox.com/u/54057365/Al...listsorted.JPG If you treat them as one list then you will you not be able to capture the relationships between successive journeys? For example a 2nd journey could be the return leg of a trip and could be of similar length and time? I have a couple of questions about the remainder of the post but if it is okay I'll wait until I understand these points before continuing. Thanks for your time John
 Basically the way I envisage this is that if you have journey data, then basically the way you calculate the branch data is just to create the data so that you get a 1 or a 0 for whether something is in a particular binary distribution, and these become your bins. So lets say you have a journey that goes for 1 hour and 20 minutes and then goes into parking for an hour. If you broke up this part of the journey, you have 3 bins with journey and then another two bins (all half hour in length) marked as parking. This would result in a record with 1 1 1 0 0 and then you can take this data and build all the conditional frequencies and estimate the parameter for the probability as well as the confidence interval. So in terms of your conditional distributions you will have say something like P(X4|X1=a,X2=b,X3=c) and in this example, the above would be a realization if a,b,c all equal 1. This is the simplest bin structure with a fixed interval size. So these give all the conditional distributions for each bin (i.e. the distributions for at some time, you have not only have a specific journey/parking history, but you can for example sum up the journey/parking times and do probability/statistics with it). So with this you can then use a computer and get distributions for whatever you want (for example: you can find the distributions for all parking times in some interval given a history, basically whatever you want). If you want finer control, you can add a distribution within each cell or you can have some kind of hierarchical scheme (i.e. with multiple levels), but that's going to be up to you. Basically more resolution equals more complexity, so if you need to add complexity add it, but if you don't then that's a good idea to review the model and if it's use is adequate then you know where you stand.
 Hey, oh I understand what you mean now regarding the bins. This is what I was thinking a few days ago but I wasn't sure if it was the same thing. I've done this with some sample data. The bin sizes are 15 minutes. https://dl.dropbox.com/u/54057365/All/sampletable.JPG Would you mind me asking a couple of questions about the conditional frequencies and probabilities? Would it be possible to show me an example of a conditional frequency and how to calculate it? Apologies if this is basic but I've never done something like this before. For example, say you wanted to build the conditional frequency for parking time given a journey time of 15 minutes? Is this what your suggesting for the conditional probabilities? Do you work out the conditional frequency, then the probability and then the expected value? So basically if you had 3 journeys you would repeat this process 3 times, calculating the expected journey time and parking time? You suggest making the distances a function of journey times instead of generating distances? Currently the copula function generates a start time in the morning, the number of trips and the total distance travelled during the day. I'm kinda keen to keep the copula function. I'm guessing that if calculate the individual journey distances as a function of time then they probably won't add up to the total distance generated by copula. I'm nearly there, I think I'll be confident putting this into practice with large dataset once I know how to calculate the probabilities and expected values. Thanks again your time John
 If you have journey times and parking times with parking and lengths and you convert this data to the binned branch structure, then you will have complete distribution for any kind of binned journey and parking history that is accurate to that bin size and you can increase it by adding more data within each bin. So you will have a tonne of entries with say n bins (n columns) and a 1 and 0 in each cell corresponding whether that entry was used for parking or journey. Also before I forget, another point to note that if the bin size is too large you may a lot of events going on in the one bin and you have think about that possibility. So say you wanted a conditional distribution given all the information that you spend at least k units (i.e. the bins) parking. Then what you do is instead of using all the record information, you just select the records that meet that criteria. Now if you have multiple data, you can form a sub-distribution for this data given the variation in this data-set as opposed to the whole data set. So lets say you have a data-base and then you say "I want a conditional distribution that represents the constraint that you must park in at least three bins". So you use your database or statistical software to return back this data (and this is why using something like SQL is useful if you are doing complicated stuff that your statistical software or excel can't do easily) and you get your constrained data. Then just like any distribution, you find the number of unique entries and you give it a distribution, and this new distribution is your conditional distribution where you have P(X|There is at least three binned parking slots). If you calculate the probabilities using the frequencies in this new subset of data you will get a distribution but the state-space is now this subset and not the whole thing. So basically all the process really is, is starting with a complete set of data that represents the global distribution of data and then getting the right slice of that data and calculating its distribution. Each conditional distribution refers to a particular slice and that distribution becomes its own random variable which has events that are a subset of the universal random variable that describes all of your data. You can then do all the probability stuff, expectations, variances whatever but the only thing that has changed is the state-space: conditional ones always look at a subset of the more general state-space and that's all that is different.
 Thanks again, I'm going to use bin sizes of 5 minutes to get a good level of detail and to ensure that events are not being missed. Could we just do one example? just to be sure. This sample data has bin size of 15 minutes. https://dl.dropbox.com/u/54057365/All/sampletable.JPG So say we take your example: P(X|There is at least three binned parking slots i.e 45 minutes). Then this data would be returned from the database. https://dl.dropbox.com/u/54057365/All/examplesample.JPG Would you mind demonstrating how you would calculate the probability and the expected value using the frequencies? I understand that this is very basic but its all new to me and would be really helpful to see and example. One last question, do you think that the time day needs to be factored into the problem or is it already factored in way because ultimately you are working out the parking times and journey times given one or the other? Thanks for your time
 So in your example, you have four records. Now get all the different possibilities that exist (i.e. all the unique records) and these become your possibilities (i.e. your events) in your probability space. In this case it seems that P(Bin_n = 1) = 1 for n > 3 (i.e. 4th bin and after) so this implies P(Bin_n = 0) = 0 for those same bins. You have empty cells so I don't know how you handle them, but you need to handle those. Then to get probabilities you simply take all the unique records and find the frequency of those in comparison to the others (just like you do with a normal set of unconditional data). So you get those unique records and then you start to see where the variation is: I showed the above example that for this data set, there is no variation for n > 3 bins but in general you should lots of variations. Now once you have your records you find out where the variation is and these become random variables in your joint distribution. So to find the frequencies you look at what the variables are and you get the computer to get the frequencies for a particular attribute. So if you want to find P(Attribute1 = 1) then you return the number of times this occurs and divide it by the total number in the database and that's your frequency. It's up to you to decide what an event is and a random variable corresponds to: it can be an atomic one (can't be divided up any further) or it can be a conglomerate one (made up things that can be divided), but basically the way to get probabilities is to fix an event definition, do a search for how many times that occurs in your total data set, divide it by the number of total items in the data set and that is your probability for that event.
 Hello Chiro, I was talking to you about a Markov Cahin model of vehicle velocities a few months ago. I'm making the model but I was wondering if you comment on something. You have probably forgotten so just to recap first. I am recording the velocity of a car every second as it makes journeys. I have about 1000 journeys in total. I created a transition probability matrix of the probabilities of transitioning from one velocity to another. I intend to use the model to generate a synthetic velocity-time profile for a car. So here is an example of a actual velocity profile for a car. https://dl.dropbox.com/u/54057365/All/vel1.JPG and this is an example of a velocity profile generated from the markov chain. https://dl.dropbox.com/u/54057365/All/vel2.JPG Visually you will notice that the actual profile is much smoother than the one generated by the markov chain. You can see in the actual profile that when a car is accelerating to a high velocity the profile is smooth as the velocity increases, but in the generated cycle the velocity is fluctuating as if it is slowing down and speeding up while accelerating to a high speed. When you compute summary statistics for the profiles such as average velocity and standard deviation of the velocity, they appear similar in nature. But I'm curious about the jagged appearance of the generated profile. Could you offer any insight as to what is happening? Appreciate your comments. John
 With regards to the jagged-ness, typically what happens is that this sort of thing is treated as a time-series and there all kinds of analysis and techniques that are done including ways to "smooth" the data. The simplest one is known as Moving Average: http://en.wikipedia.org/wiki/Moving_average There are, of course, many more but the MA is a basic one. As for the summary statistics, you might want to specify the constraints that you want to calculate (for example specific periods, conditional constraints, etc). If you don't add constraints, then you just use integration and for this you can "bin" the values so that they contain small intervals for probability and then you can use numerical integration techniques to get both the mean and the variance of this approximated distribution. The numerical techniques do allow depending on the technique to smooth out the value so that you get a kind of interpolation happening rather than an average or some other approximation. As an example lets say you bin the velocities into bins of interval size 0.05. Lets say the probability for the bin corresponding to [1,1.05) is 0.03. Then one numerical scheme (mid-point) for xf(x)dx over this interval would be [1.05+1]/2 * 0.03 * (1.05-1) with the mid-point numerical integration technique. In general, you could use any quadrature scheme you wish and depending on the requirements and the data, some might be better than others: just depends on the problem and your needs. There are other techniques to calculate the mean of a general signal or function not given any probability information if you want to pursue this as well, and you can use this formulation to get the variance of a signal. When I mean the above, I mean doing it with just the v(t) data instead of with a PDF of v(t) against v(t) like you do with normal xf(x)dx. I can dig up the thread if you want.
 Thanks very much for your help Chiro. I was thinking of smoothing the data, I will investigate the different methods. I'll try MA and I've used Kernel smoothing before so I'll try that too.
 Hi Chiro, I have been working away on the travel pattern model that you helped me with a lot a few weeks ago. I need to bin the data at 5 minute intervals. Its important that the output of the journey start/stop times have a resolution of 5 minutes because that is important in electricity grid modelling. A problem that I have been having is that I have no observations in the dataset for a lot of the conditional probabilities. It was suggested to me that a Bayesian probability approach could over come this. I don't know much about Bayesian probability but have been reading up on it over the last week, however I have not been bale to find an example of what I am trying to do. Say for example I have the starting times of journeys in the morning. I binned the data into 15 minute bins so I have 96 bins in total (4*24=96). So for example a journey start time of 08:05 am would be in bin number 29. As an example here is the data for bins numbers 28-50 (8am until 12.30pm). https://dl.dropbox.com/u/54057365/All/bin.JPG I've calculate the frequency density of the bins in the last column. Would you know how I could do the following (this was suggested to me): Taking Dirichlet prior distribution over the density of each bin for a multinomial model, you estimate the parameters. This way you get a non-zero probability for each bin. Each parameter is basically some prior parameter plus the frequency of the data in that bin. Would you know if this could be done in excel? Appreciate your comments. Regards John
 Bayesian probability is a fancy way of saying that your parameters have a distribution: in other words, your parameters are a random variable. That's all it is. It just makes the highest generalization possible and it is useful not just in an abstract way, but in a very applied way as well. I don't know how you can do the Dirichlet Prior calculations in Excel but you could always create the function from the definition by using either a VBA routine or a single formula entry in the spread-sheet. Here is the Wikipedia site with the Dirichlet PDF on that site: http://en.wikipedia.org/wiki/Dirichlet_distribution If you write a VBA function or some other similar routine to calculate the above, then you can calculate probabilities and moments (like expectation and variance).
 Thread Tools

 Similar Threads for: Can you Combinie two transition probability matrices? Thread Forum Replies Calculus & Beyond Homework 2 Advanced Physics Homework 2 Advanced Physics Homework 6 Quantum Physics 4 Advanced Physics Homework 1