- #1
Philip Wong
- 95
- 0
hi guys, several things about PCA (principle component analysis) I hope someone can run over with me and correct me if I'm wrong.
say I've done a PCA on correlation matrix and the eigenvlaues are: 2.37,1.18,0.58,0.28,0.28.
1) if I then do a reduced space plot using the first 2 pcs, is this how I calculate the proportion of total variance being thrown away:
the sum of all eigenvalues is: 4.59
proportion of variance thrown away is: (2.37*4.59)/(1.18*2.59) = 0.7734. So there is about 13% or 0.13 of data being thrown away?
2) let's go back to the eigenvalues I worked out above, I should pay more attention on interpreting the 4th and 5th components (both = 0.28). Because the closer the eigenvalues of any pair of components the more they are correlated (i.e. the higher the covariance). hence the 4th and 5th component give the same eigenvalue, it meant that they are highly correlated. It seems rather unusual to have equal eigenvalue, I might want to go back and look at my original data, such that I might have a type 1 error for 4th and 5th components (i.e. they might be the same sample printed twice).
is my interpretation corrected?
3) let say everything was correct (i.e. indeed the 4th and 5th component indeed is separate data sets giving the same eigenvalue). how do I calculate the component correlations for PC1?
do is use the following formula: (eigenvalues for PC1)/ (n-1). where n-1 is the degrees of freedom. i.e. 2.37/ (5-1) . 2.37/4 = 0.5925. 0.5925 is relatively high in correlation sense (because it only goes up to 1), therefore components for PC1 is relatively correlated.
4) lastly what does component loading measures?
I might have several more questions relating to PCA and PCO. that I'll add later, but for now can somebody please go over with me the questions above!
thanks!
say I've done a PCA on correlation matrix and the eigenvlaues are: 2.37,1.18,0.58,0.28,0.28.
1) if I then do a reduced space plot using the first 2 pcs, is this how I calculate the proportion of total variance being thrown away:
the sum of all eigenvalues is: 4.59
proportion of variance thrown away is: (2.37*4.59)/(1.18*2.59) = 0.7734. So there is about 13% or 0.13 of data being thrown away?
2) let's go back to the eigenvalues I worked out above, I should pay more attention on interpreting the 4th and 5th components (both = 0.28). Because the closer the eigenvalues of any pair of components the more they are correlated (i.e. the higher the covariance). hence the 4th and 5th component give the same eigenvalue, it meant that they are highly correlated. It seems rather unusual to have equal eigenvalue, I might want to go back and look at my original data, such that I might have a type 1 error for 4th and 5th components (i.e. they might be the same sample printed twice).
is my interpretation corrected?
3) let say everything was correct (i.e. indeed the 4th and 5th component indeed is separate data sets giving the same eigenvalue). how do I calculate the component correlations for PC1?
do is use the following formula: (eigenvalues for PC1)/ (n-1). where n-1 is the degrees of freedom. i.e. 2.37/ (5-1) . 2.37/4 = 0.5925. 0.5925 is relatively high in correlation sense (because it only goes up to 1), therefore components for PC1 is relatively correlated.
4) lastly what does component loading measures?
I might have several more questions relating to PCA and PCO. that I'll add later, but for now can somebody please go over with me the questions above!
thanks!