Variance captured in coordinate axis.

Click For Summary
The discussion focuses on measuring the variance captured by specific coordinates in a dataset, particularly when comparing two sets of data points, A and B. After applying principal component analysis (PCA) to A, the user seeks to determine how well the first three principal components of A represent the variance in B. The challenge arises because the principal components of A may not align with the variance structure of B. The user suggests evaluating the variance captured by each coordinate by treating them as separate random variables. Understanding this relationship is crucial for accurately interpreting the differences between datasets A and B.
simpleton
Messages
56
Reaction score
0
Hi all,

Note: The text below is the motivation for my question. To jump to the question immediately, please skip to the line that says HI!

I have a set of data points, let's call it A, and I ran principal component analysis to get the top 3 principal components to be able to represent the data points as a 3D plot.

Now, I have another set of data points, let's call it B, and I want to see how B differs from A. To do so, I want to plot B along the top 3 principal components of A. However, this coordinate system may be unfair to B, because most of the variance of B may not be captured in the first 3 principal components of A. Therefore, I want to be able to measure how much of the variance of B is captured in the first 3 principal components of A. Since principal components of A may not be eigenvectors of B, I cannot take the square of eigenvalues of each corresponding principal component, as in doing PCA).

Therefore, my question is:

HI! <---- For those who have been reading this post in its entirety, please ignore this

Suppose you are given a matrix M of data points. How do you measure how much variance in the dataset is captured in a particular coordinate of M?

As an example, suppose all my points are of the form (a,1) for different values of a and all a are distinct. Then the first coordinate will capture 100% of the variance while the second coordinate will capture 0% of the variance.
 
Physics news on Phys.org
As in your example: consider the coordinates separately and form random variables for each coordinate.
 
Thread 'How to define a vector field?'
Hello! In one book I saw that function ##V## of 3 variables ##V_x, V_y, V_z## (vector field in 3D) can be decomposed in a Taylor series without higher-order terms (partial derivative of second power and higher) at point ##(0,0,0)## such way: I think so: higher-order terms can be neglected because partial derivative of second power and higher are equal to 0. Is this true? And how to define vector field correctly for this case? (In the book I found nothing and my attempt was wrong...

Similar threads

Replies
4
Views
3K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 16 ·
Replies
16
Views
5K
  • · Replies 8 ·
Replies
8
Views
1K
  • · Replies 1 ·
Replies
1
Views
3K
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K