Principal component analysis and data compression in Machine Learning

Click For Summary

Discussion Overview

The discussion revolves around the application of Principal Component Analysis (PCA) for data compression on an m x n matrix, where participants explore different methods for deriving the transformation matrix W_k and the implications of using raw data versus the covariance matrix in this context.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant questions how to accurately create the matrix W_k for data compression using PCA, noting two different approaches: one using SVD on the raw data and the other on the covariance matrix.
  • Another participant raises a concern that using the covariance matrix may lose information about the actual values of the data points.
  • It is noted that in Matlab, the PCA function whitens the data to have unit variance, while the PCACOV function preserves the original scaling, which may affect the results.
  • Some participants argue that neither approach is inherently wrong, but the choice depends on the context, with PCA on raw data typically requiring whitening to address scale differences.
  • There is a discussion about the implications of using the covariance matrix for data compression, suggesting that it may lead to losing the original data and retaining only a diagonalized covariance matrix.
  • A later reply references an external explanation that supports using the covariance matrix for PCA, emphasizing its role in accounting for variability in the dataset.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of using the covariance matrix versus raw data for PCA, with no consensus reached on which method is definitively correct for data compression.

Contextual Notes

Participants highlight limitations regarding the loss of information when using the covariance matrix and the effects of data scaling on PCA results, but these aspects remain unresolved within the discussion.

Wille
Messages
16
Reaction score
4
TL;DR
I wonder how to accurately perform data compression on the m x n matrix X using PCA. I have seen both X_k=X*W_k (k = new number of dimensions < n) where W comes from [U,S,W]=svd(X), but I have also seen X_compress=X*W_k with W from [U,S,W]=svd((1/m)*X^T*X), i.e. the svd performed on the covariance matrix of X. Which is correct? When I do these two techniques I do not get the same W.
I wonder how to accurately perform data compression on the m x n matrix X using PCA. Each row is a data point, and each column is a feature.
So m data points with n features. If I like to go k < n dimensions, how is the correct way of doing so? How to I accurately create the matrix W_k, which is the first k columns from the matrix W, and then create the compressed data X_k=X*W_k?

I have seen two approaches:
One as X_k=X*W_k where W comes from [U,S,W]=svd(X)
(this is from Wikipedia https://en.wikipedia.org/wiki/Principal_component_analysis )

but I have also seen
X_compress=X*W_k with W from [U,S,W]=svd((1/m)*X^T*X), i.e. the svd performed on the covariance matrix of X.
(seen in an online Machine Learning course)

Which is correct? When I do these two techniques I do not get the same W, i.e. not the same result for X_k=X*W_k.

Thanks.
 
Technology news on Phys.org
Wouldn't using the covariance matrix lose the information about the actual X values?
 
In Matlab, the PCA function whitens the data to have unit variance, whereas PCACOV (by acting on the covariance matrix) preserves the original scaling - could this be the issue?
 
BWV said:
In Matlab, the PCA function whitens the data to have unit variance, whereas PCACOV (by acting on the covariance matrix) preserves the original scaling - could this be the issue?
I don't know. Do you mean the svd function? Matlab or not, which of the two approaches is correct to use?
 
Wille said:
I don't know. Do you mean the svd function? Matlab or not, which of the two approaches is correct to use?
Neither is wrong - it just depends on the context - typically PCA on raw data is whitened first, as difference in scale (say different units) can create problems. PCA on cov matrix is of course not whitened (otherwise you would be doing it on the correlation matrix). Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
 
  • Like
Likes   Reactions: Wille and FactChecker
BWV said:
Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
Exactly. You would lose the information of the actual values of the X variables. You would only have information about how they vary from their means. In other words, you would have information about the shape of the scattered data, but not know where the data is centered. I guess you could add back in a matrix of means, but I think that would unnecessarily complicate things.
 
BWV said:
Neither is wrong - it just depends on the context - typically PCA on raw data is whitened first, as difference in scale (say different units) can create problems. PCA on cov matrix is of course not whitened (otherwise you would be doing it on the correlation matrix). Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
Ok. I found this explanation:
https://www.quora.com/Why-does-PCA-...to-get-the-principal-components-of-features-X

It says:
"This is because covariance matrix accounts for variability in the dataset, and variability of the dataset is a way to summarize how much information we have in the data (Imagine a variable with all same values as its observations, then the variance is 0, and intuitively speaking, there’s not too much information from this variable because every observation is the same). The diagonal elements of the covariance matrix stand for variability of each variable itself, and off-diagonal elements in covariance matrix represents how variables are correlated with each other.

Ultimately we want our transformed variables to contain as much as information (or equivalently, account for as much variability as possible)."

I.e., the author suggests that using the covariance matrix is the way to do the PCA.
 
  • Like
Likes   Reactions: BWV

Similar threads

  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
63
Views
11K
  • · Replies 25 ·
Replies
25
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 7 ·
Replies
7
Views
8K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 29 ·
Replies
29
Views
7K
  • · Replies 4 ·
Replies
4
Views
1K
Replies
3
Views
2K