Principal component analysis and greatest variation

visharad · Nov 26, 2011

Problem - Given the following table
x y
15 50
26 46
32 44
48 43
57 40

a) Find the sample mean
b) Find the covarince matrix
c) Perform principal component analysis and find a size index which explains the greatest variation.

My attempt
a) n = 5
xbar = Sum(x)/n = 35.6
ybar = Sum(y)/n = 44.6
Sample mean = [35.6 44.6]

b) I calculated Var(X) = 1/n * Sum(X-Xbar)^2 = 228.24
Var(Y) = 1/n * Sum(Y-Ybar)^2 = 11.04
COV(X,Y) = 1/n * Sum[(X-Xbar)(Y-Ybar)] = -48.16

I made a 2x2 matrix in which principal diagonal elements are Var(X) and Var(Y). Each of the other two elements equals COV(X, Y)

Please see if there is any mistake in my solutions to parts a and b.
I have no idea how to answer part c. Could you help?

SW VandeCarr · Nov 27, 2011

visharad said:

Problem - Given the following table
x y
15 50
26 46
32 44
48 43
57 40

a) Find the sample mean
b) Find the covarince matrix
c) Perform principal component analysis and find a size index which explains the greatest variation.

I have no idea how to answer part c. Could you help?

To get your covariance matrix, I assume you subtracted the means of x and y. Once you have this matrix, you need to find the two eigenvectors and their eigenvalues. The vector with largest eigenvalue will be your principle component vector. From your data, I can tell which one that is. Can you? Note the eigenvectors are mutually orthogonal, which is the goal of PCA. You want to extract the independent vectors that describe the data. The data can be compressed by simply eliminating any vectors whose eigenvalues are too small to make much difference on your analysis. You don't need to do this here IMO.

The next step is to create a feature matrix where your eigenvectors are the row vectors. The transpose of this is your solution. Note that x accounts for over 95% of the total variance in your two component system, so I think your teacher would want you to discard y and retain x in a reduced 1 component system (your size index). I just wanted to take you through the final steps as if you had more than one component.

Bacle2 · Nov 28, 2011

SW VandeCarr said:

To get your covariance matrix, I assume you subtracted the means of x and y. Once you have this matrix, you need to find the two eigenvectors and their eigenvalues. The vector with largest eigenvalue will be your principle component vector. From your data, I can tell which one that is. Can you? Note the eigenvectors are mutually orthogonal, which is the goal of PCA. You want to extract the independent vectors that describe the data. The data can be compressed by simply eliminating any vectors whose eigenvalues are too small to make much difference on your analysis. You don't need to do this here IMO.

The next step is to create a feature matrix where your eigenvectors are the row vectors. The transpose of this is your solution. Note that x accounts for over 95% of the total variance in your two component system, so I think your teacher would want you to discard y and retain x in a reduced 1 component system (your size index). I just wanted to take you through the final steps as if you had more than one component.

Don't you mean the eigenvectors/eigenvalues of M^TM , where M is the matrix with the x,y entries? Sorry, I am kind of rusty; I have not seen this in a while.

Bacle2 · Nov 28, 2011

I think this tutorial will walk you thru it, from page 10-on:

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

SW VandeCarr · Nov 28, 2011

Bacle2 said:

Don't you mean the eigenvectors/eigenvalues of M^TM , where M is the matrix with the x,y entries? Sorry, I am kind of rusty; I have not seen this in a while.

It's the variance-covariance matrix (usually just called the covariance matrix). The trace of this matrix is the total variance. What matrices were you referring to? The new matrices are constructed from the eigenvectors obtained from the original covariance matrix.

Bacle2 · Nov 30, 2011

Yes, I was referring to the variance-covariance matrix. The way I remembered it, we did m tests on n subjects, and then calculated mean, then adjusted/normalized, then we calculated the variance-covariance matrix, which we then applied the whole process to. Thanks for the refresher.

Principal component analysis and greatest variation

Thread 'Distance between a Clock's hands when the distance is increasing most rapidly'

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Limit of piecewise function using epsilon delta

Volume with spherical coordinates

Use greedy vertex coloring algorithm to prove the upper bound of χ

Does this series converge uniformly?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers