PCA and variance on particular axis

In summary, the conversation discusses how to calculate the variance on a given direction using a set of 3D points data and the relationship between euclidean geometry and mean/variance. It is suggested to use geometric manipulations with orthogonal transformations and orthoprojections, and to refer to books on applicable geometry and multivariate analysis. However, a caution is given about the statistical interpretation of these methods.
  • #1
Asuralm
35
0
Hi All:

If given a set of 3D points data, it's very easy to calculate the covariance matrix and get the principle axises. And the eigenvalue will be the variance on the principle axis. I have a problem that if given a random direction, how do I calculate the variance of the data on the given direction?

Can anybody help me with this please?

Thanks
 
Mathematics news on Phys.org
  • #2
Speaking of Applicable Geometry

How much do you know about the relationship between euclidean geometry and mean/variance?

Your question really concerns the values taken by a quadratic form. Many statistical manipulations (and many useful properties of normal distributions) arise in a geometrically natural manner from manipulating quadratic forms by orthogonal transformations and orthoprojections, together with some notions from affine geometry such as convexity. For example, taking the mean of n variables [itex]x_1, \, x_2, \dots x_n[/itex], where we think of this data as the vector [itex]\vec{x} = x_1 \, \vec{e}_1 + \dots x_n \, \vec{e}_n [/itex], corresponds to taking the orthoprojection (defined using standard euclidean inner product) onto the one dimensional subspace spanned by [itex]\vec{e}_1 + \vec{e}_2 + \dots \vec{e}_n[/itex]. If we adopt a new orthonormal basis including the unit vector [itex]\vec{f}_n = \frac{1}{\sqrt{n}} \, \left( \vec{e}_1 + \vec{e}_2 + \dots \vec{e}_n \right)[/itex], this orthoprojection can be thought of very simply, as simply forgetting all but the last component [itex]\sqrt{n} \, \overline{x}[/itex], which agrees (up to a constant multiple) with the arithmetic mean.

See M. G. Kendall, A Course in the Geometry of n Dimensions, Dover reprint, and then try the same author's book Multivariate Analysis.

I must add a caution: do you see why principle component analysis (PCA) is essentially a method for "lying with statistics"? That is, the geometric (or if you prefer, linear algebraic) manipulations of your data set are mathematically valid, but the statistical interpretation is almost always extremely dubious. Fortunately, my remark about the role of euclidean geometry in mathematical statistics holds true for many more legitimate statistical methods, some discussed in the first book by Kendall cited above.
 
Last edited:
  • #3
for your question!

To calculate the variance on a particular axis, you can use the formula for projection in linear algebra. This formula is the dot product of the vector representing the axis and the vector representing each data point, squared and summed over all data points. This will give you the variance on that axis. Here is an example calculation:

Let's say we have a set of 3D points: (1, 2, 3), (4, 5, 6), (7, 8, 9).
And we want to find the variance on the axis represented by the vector (1, 1, 1).
First, we need to normalize this vector to have a length of 1: (1/sqrt(3), 1/sqrt(3), 1/sqrt(3)).
Then, we can calculate the dot product of this normalized vector with each data point:
(1/sqrt(3) * 1 + 1/sqrt(3) * 2 + 1/sqrt(3) * 3)^2 = (6/sqrt(3))^2 = 12
(1/sqrt(3) * 4 + 1/sqrt(3) * 5 + 1/sqrt(3) * 6)^2 = (15/sqrt(3))^2 = 75
(1/sqrt(3) * 7 + 1/sqrt(3) * 8 + 1/sqrt(3) * 9)^2 = (24/sqrt(3))^2 = 144
Now we sum these values and divide by the number of data points to get the variance:
(12 + 75 + 144)/3 = 77
So the variance on the axis represented by (1, 1, 1) for this set of points is 77.

I hope this helps! Let me know if you have any further questions.
 

1. What is PCA and how is it used in scientific research?

PCA (Principal Component Analysis) is a statistical technique used to reduce the dimensionality of a high-dimensional dataset. It is often used in scientific research to identify patterns and relationships among variables and to visualize the data in a lower dimensional space.

2. How does PCA help to reduce the amount of variance in a dataset?

PCA works by finding the directions of maximum variance in a dataset and projecting the data onto these directions. This helps to reduce the dimensionality of the dataset while retaining the most important information, thus reducing the amount of variance in the data.

3. What is the significance of the variance on a particular axis in PCA?

The variance on a particular axis in PCA represents the amount of information or variability in the data that is captured by that axis. The higher the variance, the more important that axis is in explaining the variability in the data.

4. How can the variance on a particular axis be interpreted in a PCA plot?

In a PCA plot, the axes represent the principal components, with the first component capturing the most variance in the data and subsequent components capturing decreasing amounts of variance. Therefore, the variance on a particular axis can be interpreted as the contribution of that axis to the overall variability in the data.

5. Can PCA be used to identify the most important variables in a dataset?

Yes, PCA can be used to identify the most important variables in a dataset by examining the loadings of each variable on the principal components. Variables with higher loadings on a particular component are considered more important in explaining the variability in the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
Replies
2
Views
2K
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
711
  • STEM Educators and Teaching
Replies
5
Views
650
  • Linear and Abstract Algebra
Replies
4
Views
2K
  • Astronomy and Astrophysics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
Back
Top