Finding conditional and joint probabilities from a table of data

Click For Summary
SUMMARY

This discussion focuses on calculating conditional and joint probabilities using a Markov Chain simulation in R. The provided code snippet demonstrates how to simulate a Markov Chain with a transition matrix and initial probabilities. Key calculations include determining probabilities such as P(X1=1|X0=1) and P(X5=2|X0=1,X2=1), with emphasis on using the 'length' function for accurate probability counts instead of the 'mean' function, which can yield misleading results. The conversation highlights the importance of correctly interpreting simulation outputs to derive meaningful probabilities.

PREREQUISITES
  • Understanding of Markov Chains and their properties
  • Familiarity with R programming language and its syntax
  • Knowledge of conditional and joint probability concepts
  • Experience with matrix operations in R
NEXT STEPS
  • Learn about Markov Chain Monte Carlo (MCMC) methods for advanced simulations
  • Explore R's 'dplyr' package for data manipulation and probability calculations
  • Study the 'table' function in R for frequency counts in probability calculations
  • Investigate the 'ggplot2' package for visualizing Markov Chain simulation results
USEFUL FOR

Data scientists, statisticians, and R programmers who are interested in probabilistic modeling and simulation techniques, particularly those working with Markov Chains and their applications in data analysis.

user366312
Gold Member
Messages
88
Reaction score
3
TL;DR
Finding conditional and joint probabilities from a table of data generated by Markov Chain simulation.
Let,

Code:
    alpha <- c(1, 1) / 2
    mat <- matrix(c(1 / 2, 0, 1 / 2, 1), nrow = 2, ncol = 2)

    chainSim <- function(alpha, mat, n)
    {
      out <- numeric(n)
      out[1] <- sample(1:2, 1, prob = alpha)
      for(i in 2:n)
        out[i] <- sample(1:2, 1, prob = mat[out[i - 1], ])
      out
    }
Suppose the following is the result of a 5-step Markov Chain simulation repeated 10 times:

Code:
> sim
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    1    1    2    2    2    1    1    1     2
[2,]    2    1    2    2    2    2    2    1    1     2
[3,]    2    1    2    2    2    2    2    1    2     2
[4,]    2    2    2    2    2    2    2    1    2     2
[5,]    2    2    2    2    2    2    2    2    2     2
[6,]    2    2    2    2    2    2    2    2    2     2

What would be the values of the following?

  1. P(X1=1|X0=1)P(X1=1|X0=1)
  2. P(X2=1|X0=1)P(X2=1|X0=1)
  3. P(X5=2|X2=1)P(X5=2|X2=1)
  4. P(X1=1,X3=1)P(X1=1,X3=1)
  5. P(X5=2|X0=1,X2=1)P(X5=2|X0=1,X2=1)
  6. E(X2)E(X2)
I tried them as follows:

  1. mean(sim[2, sim[1, ] == 1] == 1)
  2. mean(sim[3, sim[1, ] == 1] == 1)
  3. mean(sim[6, sim[3, ] == 1] == 2)
  4. mean(sim[4, ] == 1 && sim[2, ]== 1)
  5. ?
  6. c(1,2) * mean(sim[2, ])
What would be the solution of (5)?

Am I correct withe the rest?
 
Last edited:
Technology news on Phys.org
Are you getting meaningful probabilities between 0 and 1 from your code?
I guess that I wasn't clear in a similar thread of yours Post #8 of similar thread
I think that your use of 'mean' is wrong here. It will give you the average of a lot of values of 1 and 2, which will be over 1. That can not be a probability. You must count the number of entries, not their values. You can do that by using the 'length' function as I showed in Post #8 of the other thread.

PS. I just noticed that this thread is several days old, so my answer here might already be known and understood by the OP.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
5K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
4K
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 32 ·
2
Replies
32
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 36 ·
2
Replies
36
Views
5K
  • · Replies 5 ·
Replies
5
Views
3K