Is a Subset of the Eigenvector Matrix in PCA Equivalent to a Submanifold?

  • Context: Graduate 
  • Thread starter Thread starter emmasaunders12
  • Start date Start date
  • Tags Tags
    Manifold Pca
Click For Summary

Discussion Overview

The discussion revolves around the relationship between eigenvectors in Principal Component Analysis (PCA) and the concept of manifolds and submanifolds. Participants explore whether the eigenvector matrix itself represents a manifold, if subsets of eigenvectors can define submanifolds, and the implications of projecting data into lower-dimensional spaces.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • Some participants propose that the matrix P, composed of eigenvectors, represents the manifold learned during PCA, while others suggest that the manifold is actually the result of projecting data into a lower-dimensional space.
  • There is a discussion on whether subsets of the eigenvector matrix can define submanifolds, with some arguing that taking a subset of columns from P leads to a submanifold of the column space.
  • Participants question if projecting data using only a single eigenvector or a reduced version of an eigenvector can still define a submanifold, with varying interpretations of what constitutes a valid projection.
  • Some participants clarify that the dimensionality of the learned space is determined by the number of eigenvectors used, and that taking a subset of data entries does not necessarily change the dimensionality of the subspace.
  • Concerns are raised about the validity of projections when elements are removed from eigenvectors, with some suggesting that if the same entries are removed from each eigenvector, the interpretation of the learned manifold may change.

Areas of Agreement / Disagreement

Participants express differing views on whether the eigenvectors define the manifold or if the projection results in a new manifold. The discussion remains unresolved, with multiple competing interpretations and no consensus reached on the definitions and implications of manifolds and submanifolds in the context of PCA.

Contextual Notes

Limitations in the discussion include potential confusion over notation and the definitions of manifolds versus submanifolds. There are also unresolved questions regarding the conditions necessary for a subset of eigenvectors to still represent a valid manifold or submanifold.

emmasaunders12
Messages
43
Reaction score
0
Hi all,

Could anyone please clarify something for me. PCA of a data matrix X results in a lower dimensional representation Y through a linear projection to the lower dimensional domain, i.e Y=PX. Where rows of P are the eigenvectors of X. From a pure terminology point of view is it correct in stating that P is the manifold learned during PCA, or is Y the manifold?

Also if P is the manifold, would a subset of P. for example P(:,10:end-10) provide a submanifold?

Help appreciated

Thanks

Emma
 
Physics news on Phys.org
sorry the notation should read , Also if P is the manifold, would a subset of P. for example P(all rows, subset of columns) provide a submanifold, i.e a subset of the components that make up the eigenvectors.

Any help appreciated

Thanks
 
Your notation seems to be a bit confused. P is a matrix, but it projects a data vector onto a a manifold. The learned manifold would be the image of the matrix P, which is the column space of P. Taking a subset of columns would lead to a submanifold of this column space.

The idea behind PCA is to do dimensionality reduction by only considering the eigenvectors with the highest eigenvalues, which are considered to be the most important ones. This gives you a submanifold of the data space in which the largest proportion of the variance occurs.
 
Thanks for the help, so are you stating that the manifold is the actual result of the projection to the lower dimensional domain, i.e the weights or loadings in PCA space? I am trying to understand the use of PCA for manifold learning and it's relation to the eigenvectors. Keeping the same eigenvectors and projecting new data into a lower dimensional domain would define a new manifold but with the same basis vectors? Is this correct? So the eigenvectors don't really define the manifold?

Thanks for any insight

Emma
 
Hi Emma.

The eigenvectors form the basis of the manifold which you are projecting into. By taking all eigenvectors, you just project back into the same space you started with but in a different basis. By taking a subset of the eigenvectors, you are projecting into a subspace of the original data space. If you choose only those eigenvectors with the highest eigenvalues, this subspace should "explain most of the variance" in the data, meaning that most of the variability in the data occurs within this lower dimensional subspace.
 
Hi Thanks, your making it much more clear for me, one further point, consider just the first eigenvector, if I was to take a certain amount of elements from this and project the data, would this be equivalent to defining a submanifold of the previous manifold?
 
I'm not sure what you mean by "take a certain amount of elements" from the eigenvector. To do dimensionality reduction, you choose a certain number of eigenvectors, not elements of eigenvectors.

If you just choose 1 eigenvector, you would project into a one-dimensional manifold. You would end up just taking the dot product of the data vector with the eigenvector, which gives a scalar value.
 
Yes thanks, I understand that, in a paper I have read however they take a reduced version of the eigenvector, for example if the first principle component has elements [1,2,3,4,5,6,7,8,9,10], a reduced version would be [4,5,6,7]. Would this then define a submanifold within the original manifold?

Thanks

Emma
 
I don't think I understand. The projection via a matrix transformation would not be well defined in this case, since the data vector dimensions don't match the projection matrix dimensions.

Also, the first principle component is the dot product of the data vector with the first eigenvector - it only has 1 element.
 
  • #10
Yes put the subset of elements chosen from the eigenvector are also taken from the data matrix. So its just a sample of the original variables, does that make sense?
 
  • #11
Ok. Well the dimensionality of the learned space is just the number of eigenvectors used, so taking a subset of the data vector won't change the dimension of the subspace.
 
  • #12
But will it give me a submanifold within the manifold or are there necessary conditions that define a manifold/submanifold

Thanks for your help
 
  • #13
Provided that the data entries come from a probability distribution which covers the whole of the real numbers, restricting which ones are linearly combined to generate the learned manifold won't restrict the space you project into - it will be the same manifold regardless of whether you take all the data entries or just some.

It's worth mentioning here that "manifold" is a pretty strong term to describe the space learned by PCA. You simply take a learn a linear subspace of the vector space the data is in. This might technically be a manifold, but in a pretty boring sense. In other words, a manifold is "a topological space that resembles Euclidean space near each point", whereas the learned space here is a Euclidean space.
 
  • #14
Thanks madness for clarifying, this is where my initial confusion lies, if PCA learns the space don't the eigenvectors define the manifold, as new data projected into it will fall in a different position are you stating though that this would then however be a new manifold. One final point, working in the space define by PCA, can I assume eucilidean operations are valid, i.e distance between points have a definite meaning?
 
  • #15
Well, thinking about it more carefully, if you're randomly removing elements from the eigenvectors (which are the basis vectors for the learned manifold), then there are some problems with interpreting the learned manifold. If we just interpret it that certain elements of the data vector are zero, then that's ok, otherwise the concept of the projected manifold doesn't make sense any more.

If you don't remove the same entries from each eigenvector, the projection won't make sense anymore. If you do remove the same entries from each eigenvector, then you're looking at a lower dimensional data space to begin with, and therefore the submanifolds of that space will also be different.
 
  • #16
Hi Madness, yes certain elements of the data matrix and hence eigenvectors are considered to be zero, is this equivalent to a submanifold in comparison to the full manifold when the zeros were not present?
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 175 ·
6
Replies
175
Views
27K
  • · Replies 42 ·
2
Replies
42
Views
13K
  • · Replies 8 ·
Replies
8
Views
19K
Replies
2
Views
936
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 38 ·
2
Replies
38
Views
9K
  • · Replies 20 ·
Replies
20
Views
7K