# Can you Combinie two transition probability matrices?

Tags: combinie, matrices, probability, transition
 P: 4,542 You should be able to do this provided it treats the distribution definitions in the right manner in the algorithm. Mixed distributions occur quite frequently (particularly in insurance statistics) and if you want say model a multi-variate distribution where one was discrete and the other continuous then you do the same sort of thing as a normal Cartesian product. As an example lets you have a normal distribution and discrete uniform: then the cartesian product of these sets would look like a "staircase normal" where you would five sets of normal distributions side by side each being one slice for the appropriate discrete event. Provided your algorithm has treated the data correctly, then this won't be a problem at all. In fact if this is done correctly, all later statistical techniques should work properly. You would have to check that the actual algorithm is able to treat the distribution function as it should (the multi-variate) if it deals with either mixed distributions (continuous and discrete in the same distribution) or distributions where you have a mixture of discrete and continuous random variables permuted with all possible combinations of both.
 P: 122 Hello Chiro, Can I ask you a quick question, it is a bit usual. In my previous post I posted a graph of a fitted copula and raw data for 2 variables, number of journeys and distance travelled. This was it: https://dl.dropbox.com/u/54057365/All/copula.JPG This is raw data on its own. https://dl.dropbox.com/u/54057365/All/data.JPG Would you be able to explain what makes the copula generate the values between the red "lines"? One might expect the generated blue data to be in "lines" also similar to the red raw data. I think I know the answer myself, is it because one of the variables is continuous and those points between the lines actually do model the correlation structure? That is not a very good explanation though is it. Thanks John
 P: 122 Hello Chiro, I have a problem which I'm difficult to find a solution to. Hopefully you could offer some insight. I have a copula function that generates the total distance travelled in a day (i.e. 40 km) and the number of journeys (i.e 4) The question is how to calculate the distances of the individual journeys. Originally, I was doing the following: Sampling distance x1 from the distribution of journeys distances which were made on days were the total distance travelled was 40km. f(x1 | Distances on days were total distance travelled = 40) Then, I sample distance x2 from f(x2 | Distances on days were total distance travelled = 40 - x1) Then sample x3 from f(x2 | Distances on days were total distance travelled = 40 - x1 - x2) Then x4 = total distance - x1 - x2 - x3 The problem with this is that journey distances don't "make sense". The problem is that for example if you travel 2 km to the shop or to work chances are that you next journey will be 2 km in order to return home. But this not always the case, you could stop off on the way of a journey. For example x1 could be 5 km, x2 could be 2 km and then x3 could be 7 (5+2). Could you suggest a better approach? Would there be a way to look at the relationships between consecutive journeys distances? and some way of sampling them? I have all the data. Appreciate your comments J
 P: 4,542 For this you will need to consider what the distribution is for an individual journey given by the data. So you will need to look at conditional expectation with regards to the expected journey for all possible journeys in a single day (you have mentioned four) and this is basically E_y[E_x[X|Y]] = E[X] which is known as the law of total expectation http://en.wikipedia.org/wiki/Law_of_total_expectation So you are trying to find E[X] for all possible conditional information relative to the choice of Y (which is the number of possible journey times in one data given your data) and the formulas for this are just the formulas for expectation (and if this is data in an excel spreadsheet then convert it to a binned PDF and use that formula).
 P: 122 Hello Chiro, Could I ask you a question? You have been very helpful in the past. I'm trying to the compare the overall similarity of journeys based on some statistics for example average velocity and acceleration etc. Each journey has for example 4 measurable attributes with equal weights. I have some baseline statistics and some comparative statistics from other journeys. The objective is to determine a measure of the how similar the other journeys are to the baseline journey. Would you be able suggest a suitable measure? Can you take an average of the percentage differences? Could you use the norm, of the differencevector: (journey1 - base) and take the norm of this vector? Appreciate your comments
 P: 4,542 I would recommend a couple of things in this instance. The first would involve a two sample t-test or one of its non-parametric forms to test whether pairs of parameters (i.e. baseline vs other journey) provides evidence of being statistically significantly the same. You should look into techniques like Bonferroni or other mechanisms that are used to do multiple sets of comparisons where you would test say four pairs of tests in which the significance level would be alpha/4. The other thing I would recommend is doing a chi-square (Pearsons chi-square good-ness of fit) on the parameters by considering that each attribute is a random variable. I would personally start out looking at 2-sample t-tests and the non-parametric equivalents first. I would also look at ANOVA's (also check non-parametric if you need to) to test whether all groups of journeys have the same parameter as the base-line. So do the ANOVA first and then do the pair-wise comparisons after that while thinking about whether you should use multiple pair-wise comparisons by applying Bonferroni correction of alpha values (i.e. the probability used to reject or accept the hypothesis that they are the same/different).
 P: 122 Hello Chiro, Could I ask you a question? You have been very helpful in the past. I am trying to quantify the difference between two discrete distributions. I have been reading online and there seems to be a few different ways such as a Kolmogorov-Smirnov test and a chi squared test. My first question is which of these is the correct method for comparing the distributions below? The distributions are discrete distributions with 24 bins. My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps? I've been told that if you use a two sample Kolmogorov-Smirnov test, a measure of how different the distributions are will be the p-value. Is that correct? http://www.mathworks.co.uk/help/stats/kstest2.html I appreciate your help and comments Kind Regards
 P: 4,542 What attribute specifically are you trying to see the difference in? The Chi-Square test acts like a lot like a 2-norm (think of Pythagoras Theorem) for an n-dimensional vector in the way that you get an analog of "distance" between two vectors. If you know some kind of attribute (even if its qualitative, you can find a way to give a quantitative description with further clarification), then you can mould a norm or a test-statistic in that manner.
 P: 122 Hi, Well I developed a model which simulates car journeys. The distribution of the arrival times home in the evening simulated by the model is "different" than the actual distribution of the arrival times home observed in actual real world data. The model appears to be not that accurate. What I ideally would like to say is that the distribution produced by the model is some percentage different from the the real world distribution. Would a Chi squared or Kolmogorov-Smirnov test quantify the difference? What would you recommend in this case? Can these tests be used for discrete data? The times are rounded to the nearest hour. What would you think of summing up the sum up the point wise absolute value of the differences between the two distributions. Would that be a good idea? abs( Data_bin1_model - Data_bin1_data) + abs( Data_bin2_model - Data_bin2_data) + .....+bs( Data_bin24_model - Data_bin24_data) = I'd prefer to use a statistical test if there was suitable available. Thank you for your help.
 P: 4,542 I think you will want to go with something like a Pearson Chi-square Goodness-Of-Fit test given what you have said above.
 P: 122 Hi, I really struggling with this. Is the P-value form the Chi squared test the percentage difference between the 2 distributions? why did you choose the Chi squared test over the KS test? Thank you
 P: 4,542 Its not a percentage difference but instead a probability corresponding to some variance where p-value = P(chi-square^2 > x) for some x where the x corresponds to the test-statistic (i.e. the X^2 test statistic). Basically the larger the deviation, the smaller the chance that the two distributions are equal and the larger the deviation, the smaller the p-value.

 Related Discussions Calculus & Beyond Homework 2 Advanced Physics Homework 2 Advanced Physics Homework 6 Quantum Physics 4 Advanced Physics Homework 1