Principal Components and the Residual Matrix

  • Context: Undergrad 
  • Thread starter Thread starter Jrb599
  • Start date Start date
  • Tags Tags
    Components Matrix
Click For Summary

Discussion Overview

The discussion revolves around the relationship between principal components, the residual matrix, and the necessity of standardizing data in the context of principal component analysis (PCA). Participants explore the implications of standardization on the accuracy of the residual matrix calculations.

Discussion Character

  • Exploratory, Technical explanation, Conceptual clarification

Main Points Raised

  • One participant asserts that if all principal components are used to reconstruct the original data, the residual matrix should be zero, contingent on the data being standardized.
  • Another participant questions whether the eigenvalues of the principal component eigenvectors were considered, noting their relation to the variance of unstandardized random variables.
  • A participant confirms they accounted for the eigenvalues but still encountered issues with their calculations.
  • There is a suggestion to check the eigenvalues for standardized data and to calculate the PCA matrix's inverse to revert to the original space.
  • A later reply indicates that the initial issues were resolved once the participant realized their program was performing mean-centering.

Areas of Agreement / Disagreement

Participants express differing views on the necessity of standardization for accurate residual matrix calculations, indicating a lack of consensus on the implications of standardization in PCA.

Contextual Notes

Some participants mention the importance of eigenvalues and mean-centering, suggesting that assumptions about data preprocessing may affect the results, but these aspects remain unresolved in the discussion.

Jrb599
Messages
24
Reaction score
0
I've been reading about principal components and residual matrixs.

It's my understanding if you used every principal component to recalculate your orginal data, then the residual matrix should be 0.

Therefore, I created a fake dataset of two random variables and calculated the principal components.


When I do eigenvector1,1*princomp1,1+Eigenvector1,2*princomp1,2 = var 1
similarly
When I do eigenvector2,1*princomp2,1+Eigenvector2,2*princomp2,2 = var 2

so therefore the residual matrix is 0 which is what I wanted. However, this is only true when I standardize the data.

If I don't standardized the data, the two formulas I listed above aren't true.

What is throwing me for a loop is none of the papers I read said anything about standardizing the data, but it looks like the data must be standardized for this to hold. I don't want to make any assumptions so I thought I would ask. Is this correct?
 
Physics news on Phys.org
Hey Jrb599 and welcome to the forums.

Did you take into account the eigen-values for the principal component eigen-vectors?

The eigen-values represent the variance component which is related the un-standardized random variables' variance attributes.
 
Hi Chiro,

Thanks for the response. Yeah I've taken the eigenvalues into account, and I still can't get it to work
 
Just out of curiosity, what eigen-values do you get from PCA for the standardized data? Are they unit length?

Also you should calculate the PCA matrix and get its inverse to go from PCA space to original space since the PCA is a linear transformation from original space to new space.

Try this to get the original random variables if you are in the initial PCA space.
 
Chiro - I realized the program I was using was still doing mean-centering. It's working now
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 34 ·
2
Replies
34
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 17 ·
Replies
17
Views
7K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K