Mean centering of the covariance matrix in PCA

Click For Summary
SUMMARY

The discussion focuses on the importance of mean centering in Principal Component Analysis (PCA). It is established that mean centering the data matrix before applying PCA enhances the interpretation of variance within the dataset. The covariance matrix is calculated using the formula Cov=sum([Xi-Xmean][Yi-Ymean])/N-1, and mean centering ensures that the transformation aligns with the centroid of the data, thereby improving the explanatory power of the principal components. The relationship between eigenvalues and eigenvectors is also highlighted, noting that the largest eigenvalue decreases when mean centering is applied.

PREREQUISITES
  • Understanding of Principal Component Analysis (PCA)
  • Familiarity with covariance matrix calculations
  • Knowledge of eigenvalues and eigenvectors
  • Basic concepts of linear transformations in multivariate statistics
NEXT STEPS
  • Explore the mathematical foundations of PCA and its applications
  • Learn about covariance matrix properties and their implications in PCA
  • Study the relationship between eigenvalues, eigenvectors, and variance in PCA
  • Investigate the role of singular value decomposition (SVD) in PCA
USEFUL FOR

This discussion is beneficial for data scientists, statisticians, and machine learning practitioners who are looking to deepen their understanding of PCA and its implementation in data analysis.

physical101
Messages
41
Reaction score
0
Hi all,
I thought I posted this last night but have received no notification of it being moved or can't find it the thread I have started list.

I was wondering if you could help me understand how PCA, principal component analysis, works a little better. I have read often that it to get the best results using PCA you should mean centre the variables within your matrix first. I thought however that one method of calculating the principal components was the covariation matrix method where the eigenvalues and eigenvectors gives you the direction of the greatest variance within the matrix. I also assumed that the elements of the covariation matrix was calculated by using the following formula:

Cov=sum([Xi-Xmean][Yi-Ymean])/N-1

If i subtracted the mean from the original data matrix would it matter because I would get the same distribution regardless using the above calculation.

I hope some one can help

Thanks
 
Physics news on Phys.org
well I've been thinking about it all day and have thought that in the original transformation of the data by multiplication with the eigenvalue to carryout a linear transform - in the uncentered cases this transform will be done with a vector coming from the origin at zero in all dimensions. In the mean centered cases the vector will be coming from the centroid of the data, now at zero. I can only imagine that this will affect the ransformation in such away that it better explains the varience within the dataset as the covarience matrix won't change and hence neither will the eigenvectors. do you think i am getting close?
 
I think mean-subtraction is recommended because Var[X]<=E[X^2]. For the n-d case I find it easier to understand PCA in more general terms using the (reduced) singular value decompostion where we write the data matrix X as
X = Y+PDQ'
where Y is some predefined matrix (e.g. row-repeated column means), D is square diagonal and P and Q are rectangular matrices with P'P=I=Q'Q. The columns of P and Q are the eigenvectors corresponding to the non-zero eigenvalues of XX' and X'X respectively, and the diagonal of D has the square roots of the non-zero eigenvalues. This representation works for both low-dimensional many-data and high-dimensional few-data problems.

I'm not sure if there is a simple relation between the eigenvectors for Y=0 vs Y=Xbar, but it should be possible to show that the largest eigenvalue decreases.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
9K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K