Principal component analysis and data compression in Machine Learning

Wille · Sep 13, 2021

I wonder how to accurately perform data compression on the m x n matrix X using PCA. Each row is a data point, and each column is a feature.
So m data points with n features. If I like to go k < n dimensions, how is the correct way of doing so? How to I accurately create the matrix W_k, which is the first k columns from the matrix W, and then create the compressed data X_k=X*W_k?

I have seen two approaches:
One as X_k=X*W_k where W comes from [U,S,W]=svd(X)
(this is from Wikipedia https://en.wikipedia.org/wiki/Principal_component_analysis )

but I have also seen
X_compress=X*W_k with W from [U,S,W]=svd((1/m)*X^T*X), i.e. the svd performed on the covariance matrix of X.
(seen in an online Machine Learning course)

Which is correct? When I do these two techniques I do not get the same W, i.e. not the same result for X_k=X*W_k.

Thanks.

FactChecker · Sep 13, 2021

Wouldn't using the covariance matrix lose the information about the actual X values?

BWV · Sep 13, 2021

In Matlab, the PCA function whitens the data to have unit variance, whereas PCACOV (by acting on the covariance matrix) preserves the original scaling - could this be the issue?

Wille · Sep 13, 2021

BWV said:

In Matlab, the PCA function whitens the data to have unit variance, whereas PCACOV (by acting on the covariance matrix) preserves the original scaling - could this be the issue?

I don't know. Do you mean the svd function? Matlab or not, which of the two approaches is correct to use?

BWV · Sep 13, 2021

Wille said:

I don't know. Do you mean the svd function? Matlab or not, which of the two approaches is correct to use?

Neither is wrong - it just depends on the context - typically PCA on raw data is whitened first, as difference in scale (say different units) can create problems. PCA on cov matrix is of course not whitened (otherwise you would be doing it on the correlation matrix). Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix

FactChecker · Sep 13, 2021

BWV said:

Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix

Exactly. You would lose the information of the actual values of the X variables. You would only have information about how they vary from their means. In other words, you would have information about the shape of the scattered data, but not know where the data is centered. I guess you could add back in a matrix of means, but I think that would unnecessarily complicate things.

Wille · Sep 14, 2021

BWV said:

Neither is wrong - it just depends on the context - typically PCA on raw data is whitened first, as difference in scale (say different units) can create problems. PCA on cov matrix is of course not whitened (otherwise you would be doing it on the correlation matrix). Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix

Ok. I found this explanation:
https://www.quora.com/Why-does-PCA-...to-get-the-principal-components-of-features-X

It says:
"This is because covariance matrix accounts for variability in the dataset, and variability of the dataset is a way to summarize how much information we have in the data (Imagine a variable with all same values as its observations, then the variance is 0, and intuitively speaking, there’s not too much information from this variable because every observation is the same). The diagonal elements of the covariance matrix stand for variability of each variable itself, and off-diagonal elements in covariance matrix represents how variables are correlated with each other.

Ultimately we want our transformed variables to contain as much as information (or equivalently, account for as much variability as possible)."

I.e., the author suggests that using the covariance matrix is the way to do the PCA.

Principal component analysis and data compression in Machine Learning

1. What is Principal Component Analysis (PCA)?

2. How does PCA help with data compression in Machine Learning?

3. What are the benefits of using PCA for data compression in Machine Learning?

4. Are there any limitations to using PCA for data compression in Machine Learning?

5. How do you implement PCA for data compression in Machine Learning?

Similar threads

Hot Threads

Recent Insights