Principal component analysis and greatest variation

Click For Summary

Homework Help Overview

The discussion revolves around principal component analysis (PCA) applied to a dataset with two variables, x and y. Participants are tasked with calculating the sample mean, covariance matrix, and performing PCA to identify the component that explains the greatest variation.

Discussion Character

  • Exploratory, Conceptual clarification, Mathematical reasoning, Problem interpretation

Approaches and Questions Raised

  • Participants discuss the calculations for the sample mean and covariance matrix, with one participant seeking verification of their results. Others provide insights into the steps needed for PCA, including finding eigenvectors and eigenvalues, and the significance of the largest eigenvalue in determining the principal component.

Discussion Status

Some participants have provided guidance on the steps to perform PCA, including the importance of the covariance matrix and the relationship between eigenvalues and variance. There is an ongoing exploration of the correct approach to identifying the principal component, with multiple interpretations being discussed.

Contextual Notes

Participants are working under the constraints of a homework assignment, which may limit the depth of exploration. There is an emphasis on understanding the mathematical concepts behind PCA without providing complete solutions.

visharad
Messages
51
Reaction score
0
Problem - Given the following table
x y
15 50
26 46
32 44
48 43
57 40

a) Find the sample mean
b) Find the covarince matrix
c) Perform principal component analysis and find a size index which explains the greatest variation.

My attempt
a) n = 5
xbar = Sum(x)/n = 35.6
ybar = Sum(y)/n = 44.6
Sample mean = [35.6 44.6]

b) I calculated Var(X) = 1/n * Sum(X-Xbar)^2 = 228.24
Var(Y) = 1/n * Sum(Y-Ybar)^2 = 11.04
COV(X,Y) = 1/n * Sum[(X-Xbar)(Y-Ybar)] = -48.16

I made a 2x2 matrix in which principal diagonal elements are Var(X) and Var(Y). Each of the other two elements equals COV(X, Y)

Please see if there is any mistake in my solutions to parts a and b.
I have no idea how to answer part c. Could you help?
 
Last edited by a moderator:
Physics news on Phys.org
visharad said:
Problem - Given the following table
x y
15 50
26 46
32 44
48 43
57 40

a) Find the sample mean
b) Find the covarince matrix
c) Perform principal component analysis and find a size index which explains the greatest variation.

I have no idea how to answer part c. Could you help?

To get your covariance matrix, I assume you subtracted the means of x and y. Once you have this matrix, you need to find the two eigenvectors and their eigenvalues. The vector with largest eigenvalue will be your principle component vector. From your data, I can tell which one that is. Can you? Note the eigenvectors are mutually orthogonal, which is the goal of PCA. You want to extract the independent vectors that describe the data. The data can be compressed by simply eliminating any vectors whose eigenvalues are too small to make much difference on your analysis. You don't need to do this here IMO.

The next step is to create a feature matrix where your eigenvectors are the row vectors. The transpose of this is your solution. Note that x accounts for over 95% of the total variance in your two component system, so I think your teacher would want you to discard y and retain x in a reduced 1 component system (your size index). I just wanted to take you through the final steps as if you had more than one component.
 
Last edited:
SW VandeCarr said:
To get your covariance matrix, I assume you subtracted the means of x and y. Once you have this matrix, you need to find the two eigenvectors and their eigenvalues. The vector with largest eigenvalue will be your principle component vector. From your data, I can tell which one that is. Can you? Note the eigenvectors are mutually orthogonal, which is the goal of PCA. You want to extract the independent vectors that describe the data. The data can be compressed by simply eliminating any vectors whose eigenvalues are too small to make much difference on your analysis. You don't need to do this here IMO.

The next step is to create a feature matrix where your eigenvectors are the row vectors. The transpose of this is your solution. Note that x accounts for over 95% of the total variance in your two component system, so I think your teacher would want you to discard y and retain x in a reduced 1 component system (your size index). I just wanted to take you through the final steps as if you had more than one component.

Don't you mean the eigenvectors/eigenvalues of MTM , where M is the matrix with the x,y entries? Sorry, I am kind of rusty; I have not seen this in a while.
 
Bacle2 said:
Don't you mean the eigenvectors/eigenvalues of MTM , where M is the matrix with the x,y entries? Sorry, I am kind of rusty; I have not seen this in a while.

It's the variance-covariance matrix (usually just called the covariance matrix). The trace of this matrix is the total variance. What matrices were you referring to? The new matrices are constructed from the eigenvectors obtained from the original covariance matrix.
 
Last edited:
Yes, I was referring to the variance-covariance matrix. The way I remembered it, we did m tests on n subjects, and then calculated mean, then adjusted/normalized, then we calculated the variance-covariance matrix, which we then applied the whole process to. Thanks for the refresher.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K