Can you Combinie two transition probability matrices?

Click For Summary
SUMMARY

This discussion centers on the methodology for combining two transition probability matrices (TPMs) in the context of Markov chains. The primary approach discussed involves calculating the average of the two matrices using the formula 1/2 x (A + B), where A and B represent the individual matrices. However, it is crucial to ensure that both matrices are valid transition matrices, meaning each row must sum to 1. The conversation emphasizes the importance of understanding the underlying data and the conditions under which averaging is appropriate, particularly when the matrices may represent different amounts of data.

PREREQUISITES
  • Understanding of Markov chains and their properties
  • Familiarity with transition probability matrices (TPMs)
  • Knowledge of probability axioms, particularly Kolmogorov's axioms
  • Basic statistical concepts, including empirical probability calculations
NEXT STEPS
  • Study the derivation and application of the weighted average formula for combining multiple transition matrices
  • Learn about the implications of first-order conditional independence in Markov processes
  • Explore advanced topics in stochastic processes and their applications in real-world scenarios
  • Investigate methods for handling missing data in transition probability matrices
USEFUL FOR

Data scientists, statisticians, and researchers involved in modeling stochastic processes, particularly those working with Markov chains and transition probability matrices.

  • #61
Hello Chiro,

Could I ask you a question? You have been very helpful in the past.

I am trying to quantify the difference between two discrete distributions. I have been reading online and there seems to be a few different ways such as a Kolmogorov-Smirnov test and a chi squared test.

My first question is which of these is the correct method for comparing the distributions below?

The distributions are discrete distributions with 24 bins.

My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps?

I've been told that if you use a two sample Kolmogorov-Smirnov test, a measure of how different the distributions are will be the p-value. Is that correct?

http://www.mathworks.co.uk/help/stats/kstest2.html

I appreciate your help and comments

Kind Regards

https://dl.dropbox.com/u/54057365/All/phy.JPG
 
Last edited by a moderator:
Physics news on Phys.org
  • #62
What attribute specifically are you trying to see the difference in?

The Chi-Square test acts like a lot like a 2-norm (think of Pythagoras Theorem) for an n-dimensional vector in the way that you get an analog of "distance" between two vectors.

If you know some kind of attribute (even if its qualitative, you can find a way to give a quantitative description with further clarification), then you can mould a norm or a test-statistic in that manner.
 
  • #63
Hi,

Well I developed a model which simulates car journeys. The distribution of the arrival times home in the evening simulated by the model is "different" than the actual distribution of the arrival times home observed in actual real world data. The model appears to be not that accurate.

What I ideally would like to say is that the distribution produced by the model is some percentage different from the the real world distribution.

Would a Chi squared or Kolmogorov-Smirnov test quantify the difference?

What would you recommend in this case?

Can these tests be used for discrete data? The times are rounded to the nearest hour.

What would you think of summing up the sum up the point wise absolute value of the differences between the two distributions. Would that be a good idea?

abs( Data_bin1_model - Data_bin1_data) + abs( Data_bin2_model - Data_bin2_data) + ...+bs( Data_bin24_model - Data_bin24_data) =

I'd prefer to use a statistical test if there was suitable available.

Thank you for your help.
 
Last edited:
  • #64
I think you will want to go with something like a Pearson Chi-square Goodness-Of-Fit test given what you have said above.
 
  • #65
Hi,

I really struggling with this. Is the P-value form the Chi squared test the percentage difference between the 2 distributions? why did you choose the Chi squared test over the KS test?

Thank you
 
  • #66
Its not a percentage difference but instead a probability corresponding to some variance where p-value = P(chi-square^2 > x) for some x where the x corresponds to the test-statistic (i.e. the X^2 test statistic).

Basically the larger the deviation, the smaller the chance that the two distributions are equal and the larger the deviation, the smaller the p-value.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 20 ·
Replies
20
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K