Regression on extracted factors

  • Context: Undergrad 
  • Thread starter Thread starter Adel Makram
  • Start date Start date
  • Tags Tags
    Factors Regression
Click For Summary
SUMMARY

The discussion focuses on performing regression analysis on a dependent variable ##y## using a set of independent variables ##x_1##, ##x_2##...##x_m##, and the challenges faced when employing factor analysis (FA) and singular value decomposition (SVD). The user contemplates using FA to reduce correlated variables into uncorrelated factors ##v_1##, ##v_2##...##v_k## but struggles with the inverse transformation of the factor loading matrix ##F##. The consensus suggests that stepwise linear regression is preferable, as it maintains interpretability by using a limited number of understandable independent variables, with recommendations for bidirectional regression algorithms to manage correlated variables effectively.

PREREQUISITES
  • Understanding of regression analysis techniques, specifically stepwise regression.
  • Familiarity with factor analysis (FA) and singular value decomposition (SVD).
  • Knowledge of linear algebra, particularly matrix operations and transformations.
  • Statistical significance testing and residual analysis in regression models.
NEXT STEPS
  • Learn about stepwise regression algorithms, including forward, backward, and bidirectional methods.
  • Study the principles of factor analysis and its applications in data reduction.
  • Explore singular value decomposition (SVD) and its use in dimensionality reduction.
  • Investigate the interpretation of regression coefficients and residuals in statistical modeling.
USEFUL FOR

Data scientists, statisticians, and researchers involved in regression modeling and data analysis, particularly those dealing with correlated independent variables and seeking to enhance model interpretability.

Adel Makram
Messages
632
Reaction score
15
My initial objective is to make a regression of ##y## dependent variable on a given set of ##x_1##, ##x_2##... and ##x_m## independent variables. Suppose, I am dealing with a data set of ##n## samples, I found that the variables are correlated so I decided to do factor analysis to best represent the variables in fewer uncorrelated factors ##v## with number ##k<m##.
I just would like to know how to regress ##y## on ##v_1##, ##v_2## ... ##v_k## for each data sample so as to take the form ##y=b_0+b_1 v_1+...+b_k v_k##
I know the factor loading matrix represents the variable ##x## as a linear combination of factors ##v## in the form of ##x=Fv## where ##F## is the factor loading matrix but how this may help in my case. I assume I need the opposite, which to represent ##v## in terms of given ##x##. I thought to extract ##v## in term of ##x## by inverse transformation but ##F## is not square matrix so it can not be inverted.
 
Last edited:
Physics news on Phys.org
I thought about the follow too; instead of doing factor analysis, I may do SVD (singular value decomposition) of the original data set of ##mn## matrix. Therafter, I reduce the matrix into a reduced form of ##USV^T## where ##V## is the ##n## x ##k## matrix. Then I can do the regression straight from the set of ##(v_i,y_i)## where ##i=1...n##. Not sure if this would be a convenient method! And even I do that, will the number of extracted factors in factor analysis will be corresponding to the number to the eigen values in the reduced form of the data matrix here?
 
Last edited:
You may want to reconsider your decision to use FA or SVD just because the xis are correlated. Independent variables are almost always correlated to some extent, yet stepwise regression can be used. The disadvantage of FA and SVD is that you end up with obscure factors that are combinations of all your xis and whose interpretation is obscure. I think it is better to only use those techniques when it is your goal to formulate abstract factors and general concepts from data.

The advantage of stepwise linear regression over FA is that the final model is in terms of a limited number of xis, all of which are understandable. Forward stepwise linear regression would first introduce the most statistically significant variable. Then it would remove the influence of that variable from all other variables, ending up with residuals. Then it would consider the variable with the most statistically significant residuals and include it in the model only if it was statistically justified. It continues in that manner till there are no more statistically significant residuals to include in the model. That process keeps correlated variables from getting into the model unless there is still something remaining that the later variable is needed to explain. There are algorithms for forward, backward, and bidirectional regression. I recommend bidirectional.
 
Last edited:

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 64 ·
3
Replies
64
Views
6K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K