Is Calculating Variance Reduction in PCA Accurate?

Click For Summary

Discussion Overview

The discussion revolves around the calculation and interpretation of variance reduction in Principal Component Analysis (PCA), specifically addressing the use of correlation versus covariance matrices, the implications of equal eigenvalues, and the interpretation of component correlations and loadings.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant suggests that PCA should be calculated from the covariance matrix rather than the correlation matrix.
  • Another participant counters that PCA can be performed on the correlation matrix, particularly when variables are measured in different units, and emphasizes the need to interpret results accordingly.
  • Concerns are raised about the interpretation of equal eigenvalues for the 4th and 5th components, with one participant suggesting this may indicate high correlation or potential data issues.
  • A participant notes that principal components (PCs) are not correlated with each other by definition, although they may appear correlated after rotation of loadings.
  • Questions are posed regarding the calculation of component correlations for PC1 and the meaning of component loading measures.

Areas of Agreement / Disagreement

Participants express disagreement regarding the use of correlation versus covariance matrices for PCA, with no consensus reached on the best approach. Additionally, there is uncertainty about the implications of equal eigenvalues and the nature of correlations between components.

Contextual Notes

Some assumptions about data characteristics and the interpretation of PCA results remain unresolved, particularly regarding the implications of using different matrices and the significance of equal eigenvalues.

Philip Wong
Messages
95
Reaction score
0
hi guys, several things about PCA (principle component analysis) I hope someone can run over with me and correct me if I'm wrong.

say I've done a PCA on correlation matrix and the eigenvlaues are: 2.37,1.18,0.58,0.28,0.28.

1) if I then do a reduced space plot using the first 2 pcs, is this how I calculate the proportion of total variance being thrown away:

the sum of all eigenvalues is: 4.59
proportion of variance thrown away is: (2.37*4.59)/(1.18*2.59) = 0.7734. So there is about 13% or 0.13 of data being thrown away?


2) let's go back to the eigenvalues I worked out above, I should pay more attention on interpreting the 4th and 5th components (both = 0.28). Because the closer the eigenvalues of any pair of components the more they are correlated (i.e. the higher the covariance). hence the 4th and 5th component give the same eigenvalue, it meant that they are highly correlated. It seems rather unusual to have equal eigenvalue, I might want to go back and look at my original data, such that I might have a type 1 error for 4th and 5th components (i.e. they might be the same sample printed twice).
is my interpretation corrected?

3) let say everything was correct (i.e. indeed the 4th and 5th component indeed is separate data sets giving the same eigenvalue). how do I calculate the component correlations for PC1?

do is use the following formula: (eigenvalues for PC1)/ (n-1). where n-1 is the degrees of freedom. i.e. 2.37/ (5-1) . 2.37/4 = 0.5925. 0.5925 is relatively high in correlation sense (because it only goes up to 1), therefore components for PC1 is relatively correlated.

4) lastly what does component loading measures?


I might have several more questions relating to PCA and PCO. that I'll add later, but for now can somebody please go over with me the questions above!

thanks!
 
Physics news on Phys.org


For starters you need to caculate the PCA from the covariance matrix, not the correlation matrix.
 


You are wrong. You can indeed calculate the principal components from the correlation matrix. In some cases it is even advisable. When your variables are measured in different units you can't make meaningful linear combinations out of them. When you do it from the correlation matrix you are doing it on standardized non dimensional variables. So the pcs are also non dimensional. However you need to take that into account when you interpret the results. Getting the pcs from the covariance and the correlation matrix yield different results.
 


Philip Wong the pcs aren't never correlated between each other. That's one of the restrictions when you do a pca. They might be after you do a rotation on the loadings. Getting PCs with equal eigen values (variance) is just a coincidence. Those two principal components are only the same, if the loadings (the variable coefficients) are exactly the same on both linear combinations.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
9K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 0 ·
Replies
0
Views
3K
  • · Replies 60 ·
3
Replies
60
Views
11K
  • · Replies 39 ·
2
Replies
39
Views
7K