Is a Subset of the Eigenvector Matrix in PCA Equivalent to a Submanifold?

emmasaunders12 · Nov 30, 2014

Hi all,

Could anyone please clarify something for me. PCA of a data matrix X results in a lower dimensional representation Y through a linear projection to the lower dimensional domain, i.e Y=PX. Where rows of P are the eigenvectors of X. From a pure terminology point of view is it correct in stating that P is the manifold learned during PCA, or is Y the manifold?

Also if P is the manifold, would a subset of P. for example P(:,10:end-10) provide a submanifold?

Help appreciated

Thanks

Emma

emmasaunders12 · Dec 1, 2014

sorry the notation should read , Also if P is the manifold, would a subset of P. for example P(all rows, subset of columns) provide a submanifold, i.e a subset of the components that make up the eigenvectors.

Any help appreciated

Thanks

madness · Dec 1, 2014

Your notation seems to be a bit confused. P is a matrix, but it projects a data vector onto a a manifold. The learned manifold would be the image of the matrix P, which is the column space of P. Taking a subset of columns would lead to a submanifold of this column space.

The idea behind PCA is to do dimensionality reduction by only considering the eigenvectors with the highest eigenvalues, which are considered to be the most important ones. This gives you a submanifold of the data space in which the largest proportion of the variance occurs.

emmasaunders12 · Dec 2, 2014

Thanks for the help, so are you stating that the manifold is the actual result of the projection to the lower dimensional domain, i.e the weights or loadings in PCA space? I am trying to understand the use of PCA for manifold learning and it's relation to the eigenvectors. Keeping the same eigenvectors and projecting new data into a lower dimensional domain would define a new manifold but with the same basis vectors? Is this correct? So the eigenvectors don't really define the manifold?

Thanks for any insight

Emma

madness · Dec 2, 2014

Hi Emma.

The eigenvectors form the basis of the manifold which you are projecting into. By taking all eigenvectors, you just project back into the same space you started with but in a different basis. By taking a subset of the eigenvectors, you are projecting into a subspace of the original data space. If you choose only those eigenvectors with the highest eigenvalues, this subspace should "explain most of the variance" in the data, meaning that most of the variability in the data occurs within this lower dimensional subspace.

emmasaunders12 · Dec 2, 2014

Hi Thanks, your making it much more clear for me, one further point, consider just the first eigenvector, if I was to take a certain amount of elements from this and project the data, would this be equivalent to defining a submanifold of the previous manifold?

madness · Dec 2, 2014

I'm not sure what you mean by "take a certain amount of elements" from the eigenvector. To do dimensionality reduction, you choose a certain number of eigenvectors, not elements of eigenvectors.

If you just choose 1 eigenvector, you would project into a one-dimensional manifold. You would end up just taking the dot product of the data vector with the eigenvector, which gives a scalar value.

emmasaunders12 · Dec 2, 2014

Yes thanks, I understand that, in a paper I have read however they take a reduced version of the eigenvector, for example if the first principle component has elements [1,2,3,4,5,6,7,8,9,10], a reduced version would be [4,5,6,7]. Would this then define a submanifold within the original manifold?

Thanks

Emma

madness · Dec 2, 2014

I don't think I understand. The projection via a matrix transformation would not be well defined in this case, since the data vector dimensions don't match the projection matrix dimensions.

Also, the first principle component is the dot product of the data vector with the first eigenvector - it only has 1 element.

emmasaunders12 · Dec 2, 2014

Yes put the subset of elements chosen from the eigenvector are also taken from the data matrix. So its just a sample of the original variables, does that make sense?

madness · Dec 3, 2014

Ok. Well the dimensionality of the learned space is just the number of eigenvectors used, so taking a subset of the data vector won't change the dimension of the subspace.

emmasaunders12 · Dec 3, 2014

But will it give me a submanifold within the manifold or are there necessary conditions that define a manifold/submanifold

Thanks for your help

madness · Dec 3, 2014

Provided that the data entries come from a probability distribution which covers the whole of the real numbers, restricting which ones are linearly combined to generate the learned manifold won't restrict the space you project into - it will be the same manifold regardless of whether you take all the data entries or just some.

It's worth mentioning here that "manifold" is a pretty strong term to describe the space learned by PCA. You simply take a learn a linear subspace of the vector space the data is in. This might technically be a manifold, but in a pretty boring sense. In other words, a manifold is "a topological space that resembles Euclidean space near each point", whereas the learned space here is a Euclidean space.

emmasaunders12 · Dec 4, 2014

Thanks madness for clarifying, this is where my initial confusion lies, if PCA learns the space don't the eigenvectors define the manifold, as new data projected into it will fall in a different position are you stating though that this would then however be a new manifold. One final point, working in the space define by PCA, can I assume eucilidean operations are valid, i.e distance between points have a definite meaning?

madness · Dec 4, 2014

Well, thinking about it more carefully, if you're randomly removing elements from the eigenvectors (which are the basis vectors for the learned manifold), then there are some problems with interpreting the learned manifold. If we just interpret it that certain elements of the data vector are zero, then that's ok, otherwise the concept of the projected manifold doesn't make sense any more.

If you don't remove the same entries from each eigenvector, the projection won't make sense anymore. If you do remove the same entries from each eigenvector, then you're looking at a lower dimensional data space to begin with, and therefore the submanifolds of that space will also be different.

emmasaunders12 · Dec 5, 2014

Hi Madness, yes certain elements of the data matrix and hence eigenvectors are considered to be zero, is this equivalent to a submanifold in comparison to the full manifold when the zeros were not present?

Is a Subset of the Eigenvector Matrix in PCA Equivalent to a Submanifold?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad About the definition of topological manifold using closed sets

Graduate Hopf fibration of 3-sphere

Undergrad Apparent counterexample to Cauchy-Goursat theorem (Complex Analysis)

Graduate Trivial fiber bundle vs product space

Graduate Shauder basis for Hilbert spaces

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers