Variance captured in coordinate axis.

Click For Summary
SUMMARY

This discussion focuses on measuring the variance captured by specific coordinates in a dataset when comparing two sets of data points, A and B, using principal component analysis (PCA). The user seeks to understand how to quantify the variance of dataset B in relation to the top three principal components derived from dataset A. The challenge arises from the fact that the principal components of A may not align with the eigenvectors of B, complicating the direct application of PCA techniques. The user illustrates this with an example where the first coordinate captures all variance while the second captures none.

PREREQUISITES
  • Understanding of Principal Component Analysis (PCA)
  • Familiarity with eigenvalues and eigenvectors
  • Basic knowledge of variance and its calculation
  • Experience with data representation in multi-dimensional spaces
NEXT STEPS
  • Research methods for measuring variance in datasets, such as the Coefficient of Determination (R²)
  • Explore techniques for projecting data points onto principal components
  • Learn about the implications of dimensionality reduction on data interpretation
  • Investigate alternative dimensionality reduction techniques, such as t-SNE or UMAP
USEFUL FOR

Data scientists, statisticians, and analysts interested in advanced data visualization techniques and variance analysis in multi-dimensional datasets.

simpleton
Messages
56
Reaction score
0
Hi all,

Note: The text below is the motivation for my question. To jump to the question immediately, please skip to the line that says HI!

I have a set of data points, let's call it A, and I ran principal component analysis to get the top 3 principal components to be able to represent the data points as a 3D plot.

Now, I have another set of data points, let's call it B, and I want to see how B differs from A. To do so, I want to plot B along the top 3 principal components of A. However, this coordinate system may be unfair to B, because most of the variance of B may not be captured in the first 3 principal components of A. Therefore, I want to be able to measure how much of the variance of B is captured in the first 3 principal components of A. Since principal components of A may not be eigenvectors of B, I cannot take the square of eigenvalues of each corresponding principal component, as in doing PCA).

Therefore, my question is:

HI! <---- For those who have been reading this post in its entirety, please ignore this

Suppose you are given a matrix M of data points. How do you measure how much variance in the dataset is captured in a particular coordinate of M?

As an example, suppose all my points are of the form (a,1) for different values of a and all a are distinct. Then the first coordinate will capture 100% of the variance while the second coordinate will capture 0% of the variance.
 
Physics news on Phys.org
As in your example: consider the coordinates separately and form random variables for each coordinate.
 

Similar threads

Replies
5
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 16 ·
Replies
16
Views
5K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K