Joint probability of partitioned vectors

Click For Summary
SUMMARY

The discussion centers on the joint probability of partitioned vectors as introduced in Bishop's "Pattern Recognition and Machine Learning." Specifically, the notation p(Xa, Xb) is clarified, establishing that it is equivalent to p(X) since X is defined as the combination of Xa and Xb. The relationship between joint probability and conditional/marginal probabilities is also highlighted, with the equation p(Xa, Xb) = p(X) = p(Xa | Xb) * p(Xb) serving as a key takeaway. The distinction between lowercase "p" for probability density and uppercase "P" for probability is emphasized.

PREREQUISITES
  • Understanding of multivariate Gaussian distributions
  • Familiarity with conditional and marginal probabilities
  • Knowledge of vector partitioning concepts
  • Basic grasp of probability density functions
NEXT STEPS
  • Study the derivation of marginal and conditional probabilities in multivariate distributions
  • Explore the implications of partitioned vectors in machine learning contexts
  • Learn about the properties of probability density functions and their applications
  • Examine examples of joint probability calculations in practical scenarios
USEFUL FOR

Students and professionals in statistics, data science, and machine learning who are looking to deepen their understanding of probability theory, particularly in the context of multivariate distributions and vector partitioning.

scinoob
Messages
17
Reaction score
0
Hi everybody, I apologize if this question is too basic but I did 1 hour of solid Google searching and couldn't find an answer and I'm stuck.

I'm reading Bishop's Pattern Recognition and Machine Learning and in the second chapter he introduces partitioned vectors. Say, if X is a D-dimensional vector, it can be partitioned like:

X = [Xa, Xb] where Xa is the first M components of X and Xb is the remaining D-M components of X.

I have no problem with this simple concept. Later in the same chapter he talks about conditional and marginal multivariate Gaussian distributions and he uses the notation p(Xa, Xb). I'm trying to understand how certain integrals involving this notation are expanded but I'm actually struggling to understand even this expression. It seems to suggest that we're denoting the joint probability of the components of Xa and the components of Xb. But those are just the components of X anyway!

What is the difference between P(Xa, Xb) and P(X)?

It will be more helpful for me if we considered a more concrete example. Say, X = [X1, X2, X3, X4] and Xa = [X1, X2] while Xb = [X3, X4]. Now, the joint probability P(X) would simply be P(X1, X2, X3, X4), right? What is P(Xa, Xb) in this case?

Thanks in advance!
 
Physics news on Phys.org
My guess: in later chapters he discusses Xa and Xb as separate entities.
 
scinoob said:
Hi everybody, I apologize if this question is too basic but I did 1 hour of solid Google searching and couldn't find an answer and I'm stuck.

I'm reading Bishop's Pattern Recognition and Machine Learning and in the second chapter he introduces partitioned vectors. Say, if X is a D-dimensional vector, it can be partitioned like:

X = [Xa, Xb] where Xa is the first M components of X and Xb is the remaining D-M components of X.

I have no problem with this simple concept. Later in the same chapter he talks about conditional and marginal multivariate Gaussian distributions and he uses the notation p(Xa, Xb). I'm trying to understand how certain integrals involving this notation are expanded but I'm actually struggling to understand even this expression. It seems to suggest that we're denoting the joint probability of the components of Xa and the components of Xb. But those are just the components of X anyway!

What is the difference between P(Xa, Xb) and P(X)?

It will be more helpful for me if we considered a more concrete example. Say, X = [X1, X2, X3, X4] and Xa = [X1, X2] while Xb = [X3, X4]. Now, the joint probability P(X) would simply be P(X1, X2, X3, X4), right? What is P(Xa, Xb) in this case?

Thanks in advance!
There is no difference between p(Xa, Xb) and p(X), because X = (Xa, Xb). It starts to get interesting when we introduce marginal and conditional probability densities e.g. p(Xa | Xb) and p(Xb). Obviously, p(Xa, Xb) = p(X) = p(Xa | Xb) . p(Xb)

You get p(Xb) from p(X) by integrating out over Xa.

NB use small "p" for probability density. Use capital "P" for probability.
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 50 ·
2
Replies
50
Views
5K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
7K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K