Can you Combinie two transition probability matrices?by bradyj7 Tags: combinie, matrices, probability, transition 

#55
Nov312, 01:30 AM

P: 4,570

You should be able to do this provided it treats the distribution definitions in the right manner in the algorithm.
Mixed distributions occur quite frequently (particularly in insurance statistics) and if you want say model a multivariate distribution where one was discrete and the other continuous then you do the same sort of thing as a normal Cartesian product. As an example lets you have a normal distribution and discrete uniform: then the cartesian product of these sets would look like a "staircase normal" where you would five sets of normal distributions side by side each being one slice for the appropriate discrete event. Provided your algorithm has treated the data correctly, then this won't be a problem at all. In fact if this is done correctly, all later statistical techniques should work properly. You would have to check that the actual algorithm is able to treat the distribution function as it should (the multivariate) if it deals with either mixed distributions (continuous and discrete in the same distribution) or distributions where you have a mixture of discrete and continuous random variables permuted with all possible combinations of both. 



#56
Nov512, 04:39 AM

P: 122

Hello Chiro,
Can I ask you a quick question, it is a bit usual. In my previous post I posted a graph of a fitted copula and raw data for 2 variables, number of journeys and distance travelled. This was it: https://dl.dropbox.com/u/54057365/All/copula.JPG This is raw data on its own. https://dl.dropbox.com/u/54057365/All/data.JPG Would you be able to explain what makes the copula generate the values between the red "lines"? One might expect the generated blue data to be in "lines" also similar to the red raw data. I think I know the answer myself, is it because one of the variables is continuous and those points between the lines actually do model the correlation structure? That is not a very good explanation though is it. Thanks John 



#57
Nov1212, 04:36 AM

P: 122

Hello Chiro,
I have a problem which I'm difficult to find a solution to. Hopefully you could offer some insight. I have a copula function that generates the total distance travelled in a day (i.e. 40 km) and the number of journeys (i.e 4) The question is how to calculate the distances of the individual journeys. Originally, I was doing the following: Sampling distance x1 from the distribution of journeys distances which were made on days were the total distance travelled was 40km. f(x1  Distances on days were total distance travelled = 40) Then, I sample distance x2 from f(x2  Distances on days were total distance travelled = 40  x1) Then sample x3 from f(x2  Distances on days were total distance travelled = 40  x1  x2) Then x4 = total distance  x1  x2  x3 The problem with this is that journey distances don't "make sense". The problem is that for example if you travel 2 km to the shop or to work chances are that you next journey will be 2 km in order to return home. But this not always the case, you could stop off on the way of a journey. For example x1 could be 5 km, x2 could be 2 km and then x3 could be 7 (5+2). Could you suggest a better approach? Would there be a way to look at the relationships between consecutive journeys distances? and some way of sampling them? I have all the data. Appreciate your comments J 



#58
Nov1212, 10:55 PM

P: 4,570

For this you will need to consider what the distribution is for an individual journey given by the data.
So you will need to look at conditional expectation with regards to the expected journey for all possible journeys in a single day (you have mentioned four) and this is basically E_y[E_x[XY]] = E[X] which is known as the law of total expectation http://en.wikipedia.org/wiki/Law_of_total_expectation So you are trying to find E[X] for all possible conditional information relative to the choice of Y (which is the number of possible journey times in one data given your data) and the formulas for this are just the formulas for expectation (and if this is data in an excel spreadsheet then convert it to a binned PDF and use that formula). 



#59
Jan2513, 04:12 PM

P: 122

Hello Chiro,
Could I ask you a question? You have been very helpful in the past. I'm trying to the compare the overall similarity of journeys based on some statistics for example average velocity and acceleration etc. Each journey has for example 4 measurable attributes with equal weights. I have some baseline statistics and some comparative statistics from other journeys. The objective is to determine a measure of the how similar the other journeys are to the baseline journey. Would you be able suggest a suitable measure? Can you take an average of the percentage differences? Could you use the norm, of the differencevector: (journey1  base) and take the norm of this vector? Appreciate your comments 



#60
Jan2513, 06:58 PM

P: 4,570

I would recommend a couple of things in this instance.
The first would involve a two sample ttest or one of its nonparametric forms to test whether pairs of parameters (i.e. baseline vs other journey) provides evidence of being statistically significantly the same. You should look into techniques like Bonferroni or other mechanisms that are used to do multiple sets of comparisons where you would test say four pairs of tests in which the significance level would be alpha/4. The other thing I would recommend is doing a chisquare (Pearsons chisquare goodness of fit) on the parameters by considering that each attribute is a random variable. I would personally start out looking at 2sample ttests and the nonparametric equivalents first. I would also look at ANOVA's (also check nonparametric if you need to) to test whether all groups of journeys have the same parameter as the baseline. So do the ANOVA first and then do the pairwise comparisons after that while thinking about whether you should use multiple pairwise comparisons by applying Bonferroni correction of alpha values (i.e. the probability used to reject or accept the hypothesis that they are the same/different). 



#61
Feb1513, 03:39 AM

P: 122

Hello Chiro,
Could I ask you a question? You have been very helpful in the past. I am trying to quantify the difference between two discrete distributions. I have been reading online and there seems to be a few different ways such as a KolmogorovSmirnov test and a chi squared test. My first question is which of these is the correct method for comparing the distributions below? The distributions are discrete distributions with 24 bins. My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps? I've been told that if you use a two sample KolmogorovSmirnov test, a measure of how different the distributions are will be the pvalue. Is that correct? http://www.mathworks.co.uk/help/stats/kstest2.html I appreciate your help and comments Kind Regards 



#62
Feb1513, 04:06 AM

P: 4,570

What attribute specifically are you trying to see the difference in?
The ChiSquare test acts like a lot like a 2norm (think of Pythagoras Theorem) for an ndimensional vector in the way that you get an analog of "distance" between two vectors. If you know some kind of attribute (even if its qualitative, you can find a way to give a quantitative description with further clarification), then you can mould a norm or a teststatistic in that manner. 



#63
Feb1613, 06:50 AM

P: 122

Hi,
Well I developed a model which simulates car journeys. The distribution of the arrival times home in the evening simulated by the model is "different" than the actual distribution of the arrival times home observed in actual real world data. The model appears to be not that accurate. What I ideally would like to say is that the distribution produced by the model is some percentage different from the the real world distribution. Would a Chi squared or KolmogorovSmirnov test quantify the difference? What would you recommend in this case? Can these tests be used for discrete data? The times are rounded to the nearest hour. What would you think of summing up the sum up the point wise absolute value of the differences between the two distributions. Would that be a good idea? abs( Data_bin1_model  Data_bin1_data) + abs( Data_bin2_model  Data_bin2_data) + .....+bs( Data_bin24_model  Data_bin24_data) = I'd prefer to use a statistical test if there was suitable available. Thank you for your help. 



#64
Feb1613, 08:51 PM

P: 4,570

I think you will want to go with something like a Pearson Chisquare GoodnessOfFit test given what you have said above.




#65
Feb1813, 05:30 PM

P: 122

Hi,
I really struggling with this. Is the Pvalue form the Chi squared test the percentage difference between the 2 distributions? why did you choose the Chi squared test over the KS test? Thank you 



#66
Feb1813, 08:07 PM

P: 4,570

Its not a percentage difference but instead a probability corresponding to some variance where pvalue = P(chisquare^2 > x) for some x where the x corresponds to the teststatistic (i.e. the X^2 test statistic).
Basically the larger the deviation, the smaller the chance that the two distributions are equal and the larger the deviation, the smaller the pvalue. 


Register to reply 
Related Discussions  
powers of Markov transition matrices  Calculus & Beyond Homework  2  
Transition Probability  Advanced Physics Homework  2  
Transition Probability  Advanced Physics Homework  6  
Transition Probability for a Laser system  Quantum Physics  4  
Probability of transition in hydrogen atom  Advanced Physics Homework  1 