Principal component analysis and data compression in Machine Learning

AI Thread Summary
Data compression using Principal Component Analysis (PCA) can be approached in two ways: applying SVD directly to the data matrix or to its covariance matrix. The first method, X_k = X * W_k from SVD of X, retains the original data structure, while the second method, X_compress = X * W_k from the covariance matrix, may lose information about the actual values of X. Whitened PCA on raw data is typically preferred to address scaling issues, while PCA on the covariance matrix does not involve whitening and focuses on variability. The choice between these methods depends on the context and the desired outcome, with the covariance matrix approach emphasizing the variability of the dataset. Ultimately, understanding the implications of each method is crucial for effective data compression in machine learning.
Wille
Messages
16
Reaction score
4
TL;DR Summary
I wonder how to accurately perform data compression on the m x n matrix X using PCA. I have seen both X_k=X*W_k (k = new number of dimensions < n) where W comes from [U,S,W]=svd(X), but I have also seen X_compress=X*W_k with W from [U,S,W]=svd((1/m)*X^T*X), i.e. the svd performed on the covariance matrix of X. Which is correct? When I do these two techniques I do not get the same W.
I wonder how to accurately perform data compression on the m x n matrix X using PCA. Each row is a data point, and each column is a feature.
So m data points with n features. If I like to go k < n dimensions, how is the correct way of doing so? How to I accurately create the matrix W_k, which is the first k columns from the matrix W, and then create the compressed data X_k=X*W_k?

I have seen two approaches:
One as X_k=X*W_k where W comes from [U,S,W]=svd(X)
(this is from Wikipedia https://en.wikipedia.org/wiki/Principal_component_analysis )

but I have also seen
X_compress=X*W_k with W from [U,S,W]=svd((1/m)*X^T*X), i.e. the svd performed on the covariance matrix of X.
(seen in an online Machine Learning course)

Which is correct? When I do these two techniques I do not get the same W, i.e. not the same result for X_k=X*W_k.

Thanks.
 
Technology news on Phys.org
Wouldn't using the covariance matrix lose the information about the actual X values?
 
In Matlab, the PCA function whitens the data to have unit variance, whereas PCACOV (by acting on the covariance matrix) preserves the original scaling - could this be the issue?
 
BWV said:
In Matlab, the PCA function whitens the data to have unit variance, whereas PCACOV (by acting on the covariance matrix) preserves the original scaling - could this be the issue?
I don't know. Do you mean the svd function? Matlab or not, which of the two approaches is correct to use?
 
Wille said:
I don't know. Do you mean the svd function? Matlab or not, which of the two approaches is correct to use?
Neither is wrong - it just depends on the context - typically PCA on raw data is whitened first, as difference in scale (say different units) can create problems. PCA on cov matrix is of course not whitened (otherwise you would be doing it on the correlation matrix). Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
 
  • Like
Likes Wille and FactChecker
BWV said:
Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
Exactly. You would lose the information of the actual values of the X variables. You would only have information about how they vary from their means. In other words, you would have information about the shape of the scattered data, but not know where the data is centered. I guess you could add back in a matrix of means, but I think that would unnecessarily complicate things.
 
BWV said:
Neither is wrong - it just depends on the context - typically PCA on raw data is whitened first, as difference in scale (say different units) can create problems. PCA on cov matrix is of course not whitened (otherwise you would be doing it on the correlation matrix). Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
Ok. I found this explanation:
https://www.quora.com/Why-does-PCA-...to-get-the-principal-components-of-features-X

It says:
"This is because covariance matrix accounts for variability in the dataset, and variability of the dataset is a way to summarize how much information we have in the data (Imagine a variable with all same values as its observations, then the variance is 0, and intuitively speaking, there’s not too much information from this variable because every observation is the same). The diagonal elements of the covariance matrix stand for variability of each variable itself, and off-diagonal elements in covariance matrix represents how variables are correlated with each other.

Ultimately we want our transformed variables to contain as much as information (or equivalently, account for as much variability as possible)."

I.e., the author suggests that using the covariance matrix is the way to do the PCA.
 
  • Like
Likes BWV
Back
Top