Principal Component Analysis vs Factor Analysis vs regression

Click For Summary
SUMMARY

The discussion focuses on the distinctions between Principal Component Analysis (PCA) and Factor Analysis (FA), highlighting that both techniques generate factors that represent independent and normalized random variables. The user notes that while PCA and FA yield similar results, the differences lie in the layout of results and the calculation of factor loadings. Additionally, the relationship between these analyses and multiple linear regression is explored, with the user expressing confusion regarding the calculation of regression slopes from PCA factor loadings, particularly under varying dispersion conditions.

PREREQUISITES
  • Understanding of Principal Component Analysis (PCA)
  • Familiarity with Factor Analysis (FA)
  • Knowledge of multiple linear regression techniques
  • Basic proficiency in statistical software, such as Excel or R
NEXT STEPS
  • Research the mathematical foundations of PCA and FA, focusing on their algorithms and applications.
  • Explore the differences in factor loading calculations between PCA and FA.
  • Learn about Singular Value Decomposition (SVD) and its role in PCA.
  • Investigate the implications of data dispersion on regression analysis and factor analysis outcomes.
USEFUL FOR

Statisticians, data analysts, and researchers interested in multivariate statistics, particularly those looking to deepen their understanding of PCA, FA, and their applications in regression analysis.

lalbatros
Messages
1,247
Reaction score
2
I just gave a try to a statistical excel add-in and found PCA and FA quite interresting.
However, I don't see where are the differences between these two analysis, except for the layout of the results.
Additionaly, I see the link with multiple regression, but I don't see the link precisely enough.

I see that both PCA and FA produce factors.
I understand factors as independent and normalized random variables from which the variability of the dataset can be reproduced.
I also appreciate the ranking of the factors according to the their contribution to the global variance.
But where are the difference between PCA and FA?

The comparison with multiple linear regression left me uneasy too.
I first tought I would be able to calculate the slope of simple linear regression from the PCA factor loadings.
This works rather well when dispersion is small, but there are large differences when dispersion is large.

Digging more into the details,
- I observed small differences in the "factor loading" from PCA or FA
- I had difficulty to reproduce all the details by SVD decomposition of the correlation matrix, but numbers are close

Could some of you provide me with helpful comments.
Web links to well-written descriptions and/or comparisons of the methods would be highly appreciated too.
 
Physics news on Phys.org

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
Replies
3
Views
3K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 9 ·
Replies
9
Views
24K
  • · Replies 10 ·
Replies
10
Views
3K