Principal component analysis and data compression in Machine Learning

In summary, the conversation discusses the correct way to perform data compression on a m x n matrix using PCA. There are two approaches mentioned - one using the matrix W from the SVD of X and the other using the SVD of the covariance matrix of X. Both approaches have their own advantages and it ultimately depends on the context and the desired outcome. Using the covariance matrix allows for a better representation of the variability in the data, while using the SVD of X preserves the original scaling. However, using the covariance matrix may result in the loss of information about the actual values of the X variables. Ultimately, the correct approach to use depends on the specific situation.
  • #1
Wille
16
4
TL;DR Summary
I wonder how to accurately perform data compression on the m x n matrix X using PCA. I have seen both X_k=X*W_k (k = new number of dimensions < n) where W comes from [U,S,W]=svd(X), but I have also seen X_compress=X*W_k with W from [U,S,W]=svd((1/m)*X^T*X), i.e. the svd performed on the covariance matrix of X. Which is correct? When I do these two techniques I do not get the same W.
I wonder how to accurately perform data compression on the m x n matrix X using PCA. Each row is a data point, and each column is a feature.
So m data points with n features. If I like to go k < n dimensions, how is the correct way of doing so? How to I accurately create the matrix W_k, which is the first k columns from the matrix W, and then create the compressed data X_k=X*W_k?

I have seen two approaches:
One as X_k=X*W_k where W comes from [U,S,W]=svd(X)
(this is from Wikipedia https://en.wikipedia.org/wiki/Principal_component_analysis )

but I have also seen
X_compress=X*W_k with W from [U,S,W]=svd((1/m)*X^T*X), i.e. the svd performed on the covariance matrix of X.
(seen in an online Machine Learning course)

Which is correct? When I do these two techniques I do not get the same W, i.e. not the same result for X_k=X*W_k.

Thanks.
 
Technology news on Phys.org
  • #2
Wouldn't using the covariance matrix lose the information about the actual X values?
 
  • #3
In Matlab, the PCA function whitens the data to have unit variance, whereas PCACOV (by acting on the covariance matrix) preserves the original scaling - could this be the issue?
 
  • #4
BWV said:
In Matlab, the PCA function whitens the data to have unit variance, whereas PCACOV (by acting on the covariance matrix) preserves the original scaling - could this be the issue?
I don't know. Do you mean the svd function? Matlab or not, which of the two approaches is correct to use?
 
  • #5
Wille said:
I don't know. Do you mean the svd function? Matlab or not, which of the two approaches is correct to use?
Neither is wrong - it just depends on the context - typically PCA on raw data is whitened first, as difference in scale (say different units) can create problems. PCA on cov matrix is of course not whitened (otherwise you would be doing it on the correlation matrix). Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
 
  • Like
Likes Wille and FactChecker
  • #6
BWV said:
Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
Exactly. You would lose the information of the actual values of the X variables. You would only have information about how they vary from their means. In other words, you would have information about the shape of the scattered data, but not know where the data is centered. I guess you could add back in a matrix of means, but I think that would unnecessarily complicate things.
 
  • #7
BWV said:
Neither is wrong - it just depends on the context - typically PCA on raw data is whitened first, as difference in scale (say different units) can create problems. PCA on cov matrix is of course not whitened (otherwise you would be doing it on the correlation matrix). Not sure why you would use the cov matrix for data compression as you would lose the original data and just retain a diagonalized cov matrix
Ok. I found this explanation:
https://www.quora.com/Why-does-PCA-...to-get-the-principal-components-of-features-X

It says:
"This is because covariance matrix accounts for variability in the dataset, and variability of the dataset is a way to summarize how much information we have in the data (Imagine a variable with all same values as its observations, then the variance is 0, and intuitively speaking, there’s not too much information from this variable because every observation is the same). The diagonal elements of the covariance matrix stand for variability of each variable itself, and off-diagonal elements in covariance matrix represents how variables are correlated with each other.

Ultimately we want our transformed variables to contain as much as information (or equivalently, account for as much variability as possible)."

I.e., the author suggests that using the covariance matrix is the way to do the PCA.
 
  • Like
Likes BWV

1. What is Principal Component Analysis (PCA)?

Principal Component Analysis (PCA) is a statistical method used to reduce the dimensionality of a dataset while retaining as much information as possible. It works by identifying the most important features or variables in a dataset and creating a new set of variables, called principal components, that are a linear combination of the original features. This helps to simplify and compress large datasets, making it easier to analyze and visualize data.

2. How does PCA help with data compression in Machine Learning?

PCA helps with data compression in Machine Learning by reducing the number of features or variables in a dataset while retaining most of the information. This not only reduces the storage space needed for the data, but it also speeds up the computation time for algorithms since they have fewer dimensions to work with. Additionally, PCA can help to improve the performance of Machine Learning models by reducing the effects of overfitting and multicollinearity.

3. What are the benefits of using PCA for data compression in Machine Learning?

There are several benefits of using PCA for data compression in Machine Learning. These include reducing the complexity and size of large datasets, improving the performance of Machine Learning models, and speeding up the computation time for algorithms. PCA can also help to identify important features in a dataset and remove redundant or irrelevant features, leading to better data understanding and more accurate predictions.

4. Are there any limitations to using PCA for data compression in Machine Learning?

While PCA can be a powerful tool for data compression in Machine Learning, it does have some limitations. One of the main limitations is that it assumes a linear relationship between the original features and the principal components. This may not always be the case for complex datasets. Additionally, PCA works best for continuous numerical data and may not be as effective for categorical or discrete data. Careful consideration and testing should be done before applying PCA to a dataset.

5. How do you implement PCA for data compression in Machine Learning?

There are several steps to implementing PCA for data compression in Machine Learning. First, the data needs to be preprocessed, including scaling and standardizing the features. Then, the covariance matrix or correlation matrix is calculated. Next, the eigenvectors and eigenvalues of the matrix are found, and the principal components are created. Finally, the dataset can be transformed and reduced to the desired number of dimensions using the principal components. This process can be done using various programming libraries and tools, such as scikit-learn in Python or the PCA function in R.

Similar threads

  • Programming and Computer Science
Replies
13
Views
1K
  • Programming and Computer Science
Replies
25
Views
2K
  • Programming and Computer Science
2
Replies
63
Views
9K
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
7
Views
6K
  • Linear and Abstract Algebra
Replies
6
Views
884
  • Calculus and Beyond Homework Help
Replies
14
Views
598
  • Set Theory, Logic, Probability, Statistics
Replies
29
Views
6K
Replies
3
Views
1K
  • Programming and Computer Science
Replies
15
Views
3K
Back
Top