Principal Component Analysis related problem

In summary, the goal of the conversation is to reduce a set of 100 samples from a 200,000 dimensional space without changing the dimension. This is achieved by performing an eigendecomposition on the 100x100 matrix A'*A, where the resulting eigenvectors represent the principal components. To reduce the set to 25 samples, the 25 largest eigenvectors are selected and used to create a new matrix B with dimensions of 200,000x25. The original matrix A is not discarded, but rather used to compute the new matrix B.
  • #1
liengen
8
0

Homework Statement


I have 100 samples. Each sample is from a 200 000 dimensional space. My goal is to reduce the number of samples (but not the dimension). The samples (columns vectors) are represented in the (200000x100) matrix A.

Homework Equations





The Attempt at a Solution


I do a eigendecomposition of the 100x100 matrix A'*A. The eigenvectors are then the principal components and I can just disregard those that are assosiated with very small eigenvalues.

So, let's say that I want do reduce my original set to 25 samples, hence to a new (200000x25) matrix B. What I do not understand is how I'm supposed to compute this matrix.
 
Physics news on Phys.org
  • #2
Do I just use the 25 largest eigenvectors of A'*A? If so, what do I do with the original matrix A? I mean, how do I get from the 100x100 eigenvalue matrix to a 200000x25 matrix?
 

1) What is Principal Component Analysis (PCA)?

Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of a dataset. It does this by identifying patterns and relationships between variables and compressing them into a smaller number of principal components while still retaining most of the original information.

2) When should I use PCA?

PCA is often used when working with high-dimensional datasets, such as in machine learning and data mining. It can be used for data visualization, data compression, feature extraction, and noise reduction. It is also useful when dealing with multicollinearity (high correlation between variables) in a dataset.

3) How does PCA work?

PCA works by calculating the covariance matrix of the dataset and finding the eigenvectors and eigenvalues of this matrix. The eigenvectors represent the principal components, which are new orthogonal axes that capture the most variation in the data. The eigenvalues indicate the amount of variation captured by each principal component.

4) What is the goal of PCA?

The goal of PCA is to reduce the number of variables while still retaining as much of the original information as possible. This can help simplify the data and make it easier to interpret or visualize.

5) How do I interpret the results of a PCA?

The results of a PCA can be interpreted by looking at the values of the eigenvalues and eigenvectors. The eigenvalues represent the amount of variation captured by each principal component, and the eigenvectors represent the direction of that variation. Additionally, PCA can be used for data visualization to see how the data is clustered or separated along the principal components.

Similar threads

  • Calculus and Beyond Homework Help
Replies
2
Views
390
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
  • Calculus and Beyond Homework Help
Replies
5
Views
3K
  • Materials and Chemical Engineering
Replies
20
Views
521
  • Programming and Computer Science
Replies
6
Views
849
  • Set Theory, Logic, Probability, Statistics
Replies
29
Views
6K
  • Calculus and Beyond Homework Help
Replies
1
Views
996
  • Calculus and Beyond Homework Help
Replies
5
Views
12K
  • Calculus and Beyond Homework Help
Replies
3
Views
1K
  • Calculus and Beyond Homework Help
Replies
5
Views
1K
Back
Top