Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

PCA Manifold Submanifold

  1. Nov 30, 2014 #1
    Hi all,

    Could anyone please clarify something for me. PCA of a data matrix X results in a lower dimensional representation Y through a linear projection to the lower dimensional domain, i.e Y=PX. Where rows of P are the eigenvectors of X. From a pure terminology point of view is it correct in stating that P is the manifold learnt during PCA, or is Y the manifold?

    Also if P is the manifold, would a subset of P. for example P(:,10:end-10) provide a submanifold?

    Help appreciated


  2. jcsd
  3. Dec 1, 2014 #2
    sorry the notation should read , Also if P is the manifold, would a subset of P. for example P(all rows, subset of columns) provide a submanifold, i.e a subset of the components that make up the eigenvectors.

    Any help appreciated

  4. Dec 1, 2014 #3
    Your notation seems to be a bit confused. P is a matrix, but it projects a data vector onto a a manifold. The learned manifold would be the image of the matrix P, which is the column space of P. Taking a subset of columns would lead to a submanifold of this column space.

    The idea behind PCA is to do dimensionality reduction by only considering the eigenvectors with the highest eigenvalues, which are considered to be the most important ones. This gives you a submanifold of the data space in which the largest proportion of the variance occurs.
  5. Dec 2, 2014 #4
    Thanks for the help, so are you stating that the manifold is the actual result of the projection to the lower dimensional domain, i.e the weights or loadings in PCA space? I am trying to understand the use of PCA for manifold learning and it's relation to the eigenvectors. Keeping the same eigenvectors and projecting new data into a lower dimensional domain would define a new manifold but with the same basis vectors? Is this correct? So the eigenvectors don't really define the manifold?

    Thanks for any insight

  6. Dec 2, 2014 #5
    Hi Emma.

    The eigenvectors form the basis of the manifold which you are projecting into. By taking all eigenvectors, you just project back into the same space you started with but in a different basis. By taking a subset of the eigenvectors, you are projecting into a subspace of the original data space. If you choose only those eigenvectors with the highest eigenvalues, this subspace should "explain most of the variance" in the data, meaning that most of the variability in the data occurs within this lower dimensional subspace.
  7. Dec 2, 2014 #6
    Hi Thanks, your making it much more clear for me, one further point, consider just the first eigenvector, if I was to take a certain amount of elements from this and project the data, would this be equivalent to defining a submanifold of the previous manifold?
  8. Dec 2, 2014 #7
    I'm not sure what you mean by "take a certain amount of elements" from the eigenvector. To do dimensionality reduction, you choose a certain number of eigenvectors, not elements of eigenvectors.

    If you just choose 1 eigenvector, you would project into a one-dimensional manifold. You would end up just taking the dot product of the data vector with the eigenvector, which gives a scalar value.
  9. Dec 2, 2014 #8
    Yes thanks, I understand that, in a paper I have read however they take a reduced version of the eigenvector, for example if the first principle component has elements [1,2,3,4,5,6,7,8,9,10], a reduced version would be [4,5,6,7]. Would this then define a submanifold within the original manifold?


  10. Dec 2, 2014 #9
    I don't think I understand. The projection via a matrix transformation would not be well defined in this case, since the data vector dimensions don't match the projection matrix dimensions.

    Also, the first principle component is the dot product of the data vector with the first eigenvector - it only has 1 element.
  11. Dec 2, 2014 #10
    Yes put the subset of elements chosen from the eigenvector are also taken from the data matrix. So its just a sample of the original variables, does that make sense?
  12. Dec 3, 2014 #11
    Ok. Well the dimensionality of the learned space is just the number of eigenvectors used, so taking a subset of the data vector wont change the dimension of the subspace.
  13. Dec 3, 2014 #12
    But will it give me a submanifold within the manifold or are there necessary conditions that define a manifold/submanifold

    Thanks for your help
  14. Dec 3, 2014 #13
    Provided that the data entries come from a probability distribution which covers the whole of the real numbers, restricting which ones are linearly combined to generate the learned manifold won't restrict the space you project into - it will be the same manifold regardless of whether you take all the data entries or just some.

    It's worth mentioning here that "manifold" is a pretty strong term to describe the space learned by PCA. You simply take a learn a linear subspace of the vector space the data is in. This might technically be a manifold, but in a pretty boring sense. In other words, a manifold is "a topological space that resembles Euclidean space near each point", whereas the learned space here is a Euclidean space.
  15. Dec 4, 2014 #14
    Thanks madness for clarifying, this is where my initial confusion lies, if PCA learns the space don't the eigenvectors define the manifold, as new data projected into it will fall in a different position are you stating though that this would then however be a new manifold. One final point, working in the space define by PCA, can I assume eucilidean operations are valid, i.e distance between points have a definite meaning?
  16. Dec 4, 2014 #15
    Well, thinking about it more carefully, if you're randomly removing elements from the eigenvectors (which are the basis vectors for the learned manifold), then there are some problems with interpreting the learned manifold. If we just interpret it that certain elements of the data vector are zero, then that's ok, otherwise the concept of the projected manifold doesn't make sense any more.

    If you don't remove the same entries from each eigenvector, the projection won't make sense anymore. If you do remove the same entries from each eigenvector, then you're looking at a lower dimensional data space to begin with, and therefore the submanifolds of that space will also be different.
  17. Dec 5, 2014 #16
    Hi Madness, yes certain elements of the data matrix and hence eigenvectors are considered to be zero, is this equivalent to a submanifold in comparison to the full manifold when the zeros were not present?
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Similar Discussions: PCA Manifold Submanifold