# PCA Manifold Submanifold

1. Nov 30, 2014

### emmasaunders12

Hi all,

Could anyone please clarify something for me. PCA of a data matrix X results in a lower dimensional representation Y through a linear projection to the lower dimensional domain, i.e Y=PX. Where rows of P are the eigenvectors of X. From a pure terminology point of view is it correct in stating that P is the manifold learnt during PCA, or is Y the manifold?

Also if P is the manifold, would a subset of P. for example P(:,10:end-10) provide a submanifold?

Help appreciated

Thanks

Emma

2. Dec 1, 2014

### emmasaunders12

sorry the notation should read , Also if P is the manifold, would a subset of P. for example P(all rows, subset of columns) provide a submanifold, i.e a subset of the components that make up the eigenvectors.

Any help appreciated

Thanks

3. Dec 1, 2014

Your notation seems to be a bit confused. P is a matrix, but it projects a data vector onto a a manifold. The learned manifold would be the image of the matrix P, which is the column space of P. Taking a subset of columns would lead to a submanifold of this column space.

The idea behind PCA is to do dimensionality reduction by only considering the eigenvectors with the highest eigenvalues, which are considered to be the most important ones. This gives you a submanifold of the data space in which the largest proportion of the variance occurs.

4. Dec 2, 2014

### emmasaunders12

Thanks for the help, so are you stating that the manifold is the actual result of the projection to the lower dimensional domain, i.e the weights or loadings in PCA space? I am trying to understand the use of PCA for manifold learning and it's relation to the eigenvectors. Keeping the same eigenvectors and projecting new data into a lower dimensional domain would define a new manifold but with the same basis vectors? Is this correct? So the eigenvectors don't really define the manifold?

Thanks for any insight

Emma

5. Dec 2, 2014

Hi Emma.

The eigenvectors form the basis of the manifold which you are projecting into. By taking all eigenvectors, you just project back into the same space you started with but in a different basis. By taking a subset of the eigenvectors, you are projecting into a subspace of the original data space. If you choose only those eigenvectors with the highest eigenvalues, this subspace should "explain most of the variance" in the data, meaning that most of the variability in the data occurs within this lower dimensional subspace.

6. Dec 2, 2014

### emmasaunders12

Hi Thanks, your making it much more clear for me, one further point, consider just the first eigenvector, if I was to take a certain amount of elements from this and project the data, would this be equivalent to defining a submanifold of the previous manifold?

7. Dec 2, 2014

I'm not sure what you mean by "take a certain amount of elements" from the eigenvector. To do dimensionality reduction, you choose a certain number of eigenvectors, not elements of eigenvectors.

If you just choose 1 eigenvector, you would project into a one-dimensional manifold. You would end up just taking the dot product of the data vector with the eigenvector, which gives a scalar value.

8. Dec 2, 2014

### emmasaunders12

Yes thanks, I understand that, in a paper I have read however they take a reduced version of the eigenvector, for example if the first principle component has elements [1,2,3,4,5,6,7,8,9,10], a reduced version would be [4,5,6,7]. Would this then define a submanifold within the original manifold?

Thanks

Emma

9. Dec 2, 2014

I don't think I understand. The projection via a matrix transformation would not be well defined in this case, since the data vector dimensions don't match the projection matrix dimensions.

Also, the first principle component is the dot product of the data vector with the first eigenvector - it only has 1 element.

10. Dec 2, 2014

### emmasaunders12

Yes put the subset of elements chosen from the eigenvector are also taken from the data matrix. So its just a sample of the original variables, does that make sense?

11. Dec 3, 2014

Ok. Well the dimensionality of the learned space is just the number of eigenvectors used, so taking a subset of the data vector wont change the dimension of the subspace.

12. Dec 3, 2014

### emmasaunders12

But will it give me a submanifold within the manifold or are there necessary conditions that define a manifold/submanifold

13. Dec 3, 2014

Provided that the data entries come from a probability distribution which covers the whole of the real numbers, restricting which ones are linearly combined to generate the learned manifold won't restrict the space you project into - it will be the same manifold regardless of whether you take all the data entries or just some.

It's worth mentioning here that "manifold" is a pretty strong term to describe the space learned by PCA. You simply take a learn a linear subspace of the vector space the data is in. This might technically be a manifold, but in a pretty boring sense. In other words, a manifold is "a topological space that resembles Euclidean space near each point", whereas the learned space here is a Euclidean space.

14. Dec 4, 2014

### emmasaunders12

Thanks madness for clarifying, this is where my initial confusion lies, if PCA learns the space don't the eigenvectors define the manifold, as new data projected into it will fall in a different position are you stating though that this would then however be a new manifold. One final point, working in the space define by PCA, can I assume eucilidean operations are valid, i.e distance between points have a definite meaning?

15. Dec 4, 2014