## Can you Combinie two transition probability matrices?

There really is no difference between your way and their way.

Instead of you having before and after velocities, they have before velocity and after velocity in which the the after velocity is implied through the relationship after velocity = before velocity + delta_acceleration * delta_time.

If the delta_time was known and constant you could convert your matrix to their matrix very easily using a spreadsheet and if you used the same kind of resolution as they did, then it would more or less the same kind of system (the resolution they use is very high since they use a massive matrix).

They are not creating any excess redundancy, but yes in terms of the information given in the transition matrix having two velocities is exactly the same as having a fixed time_delta, one velocity and an acceleration for that time delta.

In terms of benefit, that is a good question.

I can't really answer this to be honest. I don't really know anything about the domain of these kinds of experiments and what people are trying to do.

Since you are doing the experiment, it would help if you think about what these kinds of experiments are trying to achieve.

For example lets say we wanted to do these to get an idea of fuel use in a car. Now fuel might have a direct correlation with acceleration. For example if you accelerate a lot then you tend to use a lot of fuel than if you just coast on a certain velocity. In this case, having things in terms of acceleration might help you build an inferential statement to test.

The above is just a guess and I really don't know anything about cars in any major detail, but I think the statement should give you an idea that could help you come up with your own answer.
 Hello Chiro, You helped me out a lot with a probability question a months ago. I was wondering if I could ask you another question? I'm trying to work out a conditional probability. I have hundreds of measurements of two variables (1) Start Time and (2) Journey time. I've created a frequency table. The frequency table shows the number of occurrences for each variable pair in my data set. For example there were 5 journeys that started at 8am and lasted 5 minutes. How would I work out the probable Journey time given a start time? P(JT | ST) = P(JT n ST)/P(ST) For example 9am P(JT | ST) = P(JT n ST)/P(ST) P( JT | 9am) = ? Thanks for your help John

 Quote by bradyj7 Hello Chiro, You helped me out a lot with a probability question a months ago. I was wondering if I could ask you another question? I'm trying to work out a conditional probability. I have hundreds of measurements of two variables (1) Start Time and (2) Journey time. I've created a frequency table. The frequency table shows the number of occurrences for each variable pair in my data set. For example there were 5 journeys that started at 8am and lasted 5 minutes. How would I work out the probable Journey time given a start time? P(JT | ST) = P(JT n ST)/P(ST) For example 9am P(JT | ST) = P(JT n ST)/P(ST) P( JT | 9am) = ? Thanks for your help John

In terms of the P(JT and ST) you simply divide the appropriate cell by the total cell count (which in this case given the data is 1528 and the bottom right hand corner).

To calculate P(ST) you look at the sum on the right hand side and divide it by the total cell count (again for this example, it is 1528).

To generate the actual distribution, you do this for all cells of JT and ST. You don't have to use individual cells: you can group cells together. If you wish to do this then you need to make sure you include all the information, and that you don't do 'over-lapping' cells. If you do over-lapping cells then your probabilities will not be independent and it will screw up your distribution (and you could possibly get a non-valid probability).

So once you have picked your divisions for JT and ST, calculate P(JT=y|ST=x) = P(JT=y,ST=x)/P(ST=x) and then store this result somewhere.

For your example, the y value can be any JT value, but in reality we want JT to be more constrained. If you don't constrain JT=y for some specific y, you'll get a probability of 1 and the reason is that if you allow all possible events of JT then you are going to calculate P(JT|ST=x) = P(JT,ST=x)/P(ST=x) = P(U,ST=x)/P(ST=x) = P(ST=x)/P(ST=x) = 1.

For now let's assume y correponds to < 4 for JT. We calculate this as:

P(JT<4|ST=9am) = P(JT<4,ST=9am)/P(ST=9am). Now P(JT<4, ST=9am) = (5+18)/1528. and P(ST=9am) = 121/1528. This implies P(JT<4|ST=9am) =23/121.

In terms of the P(X,Y) part (i.e. P(X AND Y)), you simply add up all the corresponding frequency cells for these events and divide by the total. The reason is that X AND Y corresponds both X and Y being satisfied. In Terms of P(X=x), you look at the appropriate final frequency count with respect to the total frequency count.
 Hello Chiro, Thank you for your reply and for taking the time to explain this to me. I'm just trying to understand the theory. Is this type of problem a conditional expectation problem? as described here http://en.wikipedia.org/wiki/Conditional_expectation Can I expand the problem a little. Say I wanted to calculate the expected journey time given Journey Start Time and Distance travelled. P(JT | ST, D) I've expanded the frequency table with actual measurements. For example,there were 25 journeys that started at 8am, were 3 miles long with a journey time of 5 minutes. The journey times are rounded to the nearest 5 minutes. To calculate the expected journey time given that the start time is 8am and the distance is 3 miles would the calculation be (taking the middle point in each gap to minimize the error)? (5/2*3 + (5 + 5/2)*25 + (10+5/2)*33 + (15+5/2)*7 + (20+5/2)*2 ) / ( 3 + 25 + 33 + 7 + 2) This appears to give a reasonable answer but I'm not sure if it is correct. I understand most of your explanation below, but would you mind doing a sample calculation using the method that you have described? The total number of observations is 17,711. Thanks very much John

 Quote by bradyj7 Hello Chiro, Thank you for your reply and for taking the time to explain this to me. I'm just trying to understand the theory. Is this type of problem a conditional expectation problem? as described here http://en.wikipedia.org/wiki/Conditional_expectation Can I expand the problem a little. Say I wanted to calculate the expected journey time given Journey Start Time and Distance travelled. P(JT | ST, D) I've expanded the frequency table with actual measurements. For example,there were 25 journeys that started at 8am, were 3 miles long with a journey time of 5 minutes. The journey times are rounded to the nearest 5 minutes. To calculate the expected journey time given that the start time is 8am and the distance is 3 miles would the calculation be (taking the middle point in each gap to minimize the error)? (5/2*3 + (5 + 5/2)*25 + (10+5/2)*33 + (15+5/2)*7 + (20+5/2)*2 ) / ( 3 + 25 + 33 + 7 + 2) This appears to give a reasonable answer but I'm not sure if it is correct. I understand most of your explanation below, but would you mind doing a sample calculation using the method that you have described? The total number of observations is 17,711. Thanks very much John
The problem you posed in your response before the one quoted above is simply a calculation of conditional probability. In other words to figure out say P(A) we use #TimesAHappens/#TotalNumberofTimesAllUp.

Conditional expectation has the same interpretation and definition as normal expectation except that you are dealing with a conditional probability distribution as opposed to a non-conditional one. It's still a normal distribution though, but what conditioning does is it limits the probability to a subset of the whole probability space and not the whole probability space.

To understand things consider P(A|U) = P(A and U)/P(U) = P(A)/1 = P(A). If instead of U we used some small subset, we would be considering A with respect to a small subset B instead of U. In other words, we are considering an event with respect to an existing subset and you can draw Venn diagrams to appreciate this in more detail.

In terms of the conditional expectation, you will have to Set ST and D to some specific values. They can be multiple values of ST and D (example ST < 4) but they have to be specific.

You use the above method in my last response to calculate the distribution.

Once you have this distribution, use normal expectation methods. In other words you will have a probability distribution for every value of JT and then just like normal expectation, you calculate Sigma j(t)p(t) where j(t) corresponds to the JT value at t and p(t) is the probability for that value of jt which is calculated above.

Think of it in terms that you have P(JT = x| ST,D) and then you are adding P(JT=1|ST,D)*1 + P(JT=2|ST,D)*2 + ... + and so on. This will give you a conditional expectation for JT given the ST and D restrictions.
 Hey bradyj7. I'm just wondering if you could write down your question in mathematical form. It seems like you want to find some kind of expectation or sums of expectations. Also you might want to look at this result: http://en.wikipedia.org/wiki/Law_of_total_expectation I gotta take off now, but I'll take a closer look later and wait for your response.
 Hi Chiro, Thank you for taking the time to look at the problem. I'm not sure exactly how to express the question in mathematical form, but you are right in saying that it is a sum of expectations problem. Can I just say, if you need me to provide more detail on any part of the simulation, I can, I just didn't want to clog up the thread with unnecessary information. As I described, the simulation begins by generating the 4 random variables (modelled with the copula function) as demonstrated in part 1 of the example https://dl.dropbox.com/u/54057365/All/pic%20sim.JPG The key thing is that this part generates the "Departure time from Home in the morning" and the "Arrival Time Home in the evening". It also generates the number of journeys during the day and the total distance travelled during the day. In part 2, the distances of the individual journeys are generated from distributions. I believe the method is called "Stick breaking Construction" in probability theory. Now the hard part. The simulation constructs the starting time and end time of the journeys during the day. It does this using the initial simulated "departure time from home" value and two expectation tables that I have created using the collected data. 1. The expected journey time given journey start time and distance E(JT | ST, D) 2. The expected parking given journey end time and the time already parked during the day E(PT, Stop Time, Time already parked during the day) So taking the example that I prepared, I presume the formula looks like this: Given 8am departure time, 3 journey distances (12, 10, 18 miles). So, 8am + E(JT, 8am, 12 miles) + E(PT, 8.30am, 0) + E(JT, 11am, 10 miles) + E(PT, 11.30am, 2.5 hours) + E(JT, 6pm, 18 miles) = ? So to recap, my question is, theoretically should the sum of the expectations above and the "Departure time in the morning" equal the generated "Arrival time home in the evening" (i.e. 7pm). The problem that I am having is that they do not equal - but I'm not sure if they should? I appreciate your advice. Thanks John
 It might be helpful if you can tell me the criteria that you ultimately want to find. I'm assuming it has to do with the parking time since you want it to be less than 24 hours, but maybe you could just outline the criteria for parking (between certain hours, constraints on a parking "session", etc). Typically if you have a specific distribution (even it's a complicated multi-variable joint distribution) and you want to say calculate a probability or expectation under the constraint, the first thing is to outline what the constraint is in the simplest terms and then work towards the definition rather than the reverse. So I recommend you just outline what your end goal for your current problem so we can work backwards from there to the mathematical constraints.
 This sounds like a good idea. Basically my end goal is to develop a Monte Carlo simulation approach for modelling the power demands of electric cars. Obviously the times during the day in which the cars will be plugged in is when they are parked. The whole purpose of the simulation is to be able to simulate the travel patterns of cars in order to determine when they are parked and likely to charge. The reason why it is a Monte Carlo simulation is because vehicle travel behaviour is stochastic in nature and I wanted a tool to be able to generate stochastic travel patterns, so the realistic demands on the grid can be determined The reason why I used the copula function is because the 4 variables "Departure time from home", "Arrival Time home", the number of journeys and the total distance travelled during the day are correlated. Here is the correlation matrix: Departure time and arrival time refer to the departure time and arrival time from the home. As an example , Depart time on day 1 is negatively correlated with Distance Day 1 (-0.38), this implies that later a person leaves home the shorter the distance travelled. I am happy with the theory up to this point. Basically now I'm trying to work out a way in which to determine the start times and end time of journeys during the day and the parking times between the journeys. So at this point I have the following: 1. The start time of the first journey 2. The number of journeys 3. The total distance travelled So next it simulates journey distances as described in the previous post. I thought that creating expectation tables for journey times and parking times, using the large number of actual observations in the dataset would be a logical approach to "piecing together" the movements during the day. The expected journey times are reasonable given the journey start time and distance to travel. It is difficult to determine the factors that influence the parking time of a car. So I thought the end time of a journey and the time already parked during the day were reasonable. For example if the journey end time is at night then one would assume that parking time will be long. The reason why I am using the "time already parked during the day" is because this prevents the parking time going over 24 hours. The expected parking times become very large (>24 hours) if you do not use this constraint. So that is it. I'm open to any changes in the structure that you could suggest. One option that I was thing of was at the beginning of the simulation, not generating an "Arrival time home". Instead take the arrival time home as being the sum of the expected journey times and parking times during the day. Can you do this? Another option would be to scrap the expected parking time table and just generate random parking times depending on certain inputs? I have a very large database of travel patterns so have the capability of generating all sorts of distributions or expectations etc. Thanks for your time John
 I will have a closer look later on, but one thing you definitely want to think about is whether journey times affect subsequent parking times and whether parking times affect subsequent journey times. We assume that time of day affects parking times and journey times (and you have data for it), but if the actual journey times can be considered independent from the parking times, then basically what you can do is add up independent variables corresponding to different parking and journey times and then look at the properties that total random variable (where you can look at expectation). However if they are not independent, then it means that you will need a complex markovian style model where the next random variable (so if you were travelling, you now park and if were parked, your now travelling) will depend on the state of the previous one and this can be done with the right markov model in which you can look at a distribution for not only so many parking and journey events, but also for a restricted period of time (so basically at some point you have some event where the parking or journey or both times are 0). The independent case for expectations would mean that for finding expectations, you sum them but if they have strong dependencies, then this will not work.

Hi Chiro,

Thank you for your advice, I had not thought of the problem like that. I'm honestly don't know whether or not they should be treated as independent or dependent events? One would assume that they are dependent. What is your opinion?

So are you saying that if we assume that they are independent events then the current sum of of expectations method is okay provided that they are not strongly correlated? and the arrival time home calculated using the sum of expectations method does not theoretically have to equal the arrival time home that was generated using the copula function?

This might sound like a stupid question but to investigate if they are correlated, would I just calculate the persons correlation between the journey times and subsequent parking times?

I don't quite follow this bit
 then basically what you can do is add up independent variables corresponding to different parking and journey times and then look at the properties that total random variable (where you can look at expectation).
Have I not done this by creating the expected journey time and parking time expectation tables?

With regards to the second option, I believe that they are actually dependent events and would be extremely interested in learning more about this method, if you would care to explain it in more detail. Personally I think that this is a better option. I would like it to be as realistic as possible.

I only know the basics of markov chains, but I sure I could follow a complex model.

Just in relation to the expected journey times and parking time tables, I think that the expected journey times are quite reasonable given the journey start time and distance (obviously journey time is related to distance). The expected parking time are not very reasonable.

I have the start times, end times, dates, distances, parking times of many many journey so I can create pretty much generate any distribution.

Look forward to hearing your suggestion.