Regression on extracted factors

In summary, the conversation discusses methods for regression analysis, specifically in dealing with correlated variables. The speaker initially considers using factor analysis to reduce the number of variables, but is unsure how to represent the factors in terms of the original variables. They also mention the possibility of using SVD, but express concern about the number of extracted factors. The other person suggests using stepwise linear regression instead, which involves introducing the most significant variable and then removing its influence from the remaining variables. They recommend using bidirectional regression to keep correlated variables from being included unnecessarily.
  • #1
Adel Makram
635
15
My initial objective is to make a regression of ##y## dependent variable on a given set of ##x_1##, ##x_2##... and ##x_m## independent variables. Suppose, I am dealing with a data set of ##n## samples, I found that the variables are correlated so I decided to do factor analysis to best represent the variables in fewer uncorrelated factors ##v## with number ##k<m##.
I just would like to know how to regress ##y## on ##v_1##, ##v_2## ... ##v_k## for each data sample so as to take the form ##y=b_0+b_1 v_1+...+b_k v_k##
I know the factor loading matrix represents the variable ##x## as a linear combination of factors ##v## in the form of ##x=Fv## where ##F## is the factor loading matrix but how this may help in my case. I assume I need the opposite, which to represent ##v## in terms of given ##x##. I thought to extract ##v## in term of ##x## by inverse transformation but ##F## is not square matrix so it can not be inverted.
 
Last edited:
Physics news on Phys.org
  • #2
I thought about the follow too; instead of doing factor analysis, I may do SVD (singular value decomposition) of the original data set of ##mn## matrix. Therafter, I reduce the matrix into a reduced form of ##USV^T## where ##V## is the ##n## x ##k## matrix. Then I can do the regression straight from the set of ##(v_i,y_i)## where ##i=1...n##. Not sure if this would be a convenient method! And even I do that, will the number of extracted factors in factor analysis will be corresponding to the number to the eigen values in the reduced form of the data matrix here?
 
Last edited:
  • #3
You may want to reconsider your decision to use FA or SVD just because the xis are correlated. Independent variables are almost always correlated to some extent, yet stepwise regression can be used. The disadvantage of FA and SVD is that you end up with obscure factors that are combinations of all your xis and whose interpretation is obscure. I think it is better to only use those techniques when it is your goal to formulate abstract factors and general concepts from data.

The advantage of stepwise linear regression over FA is that the final model is in terms of a limited number of xis, all of which are understandable. Forward stepwise linear regression would first introduce the most statistically significant variable. Then it would remove the influence of that variable from all other variables, ending up with residuals. Then it would consider the variable with the most statistically significant residuals and include it in the model only if it was statistically justified. It continues in that manner till there are no more statistically significant residuals to include in the model. That process keeps correlated variables from getting into the model unless there is still something remaining that the later variable is needed to explain. There are algorithms for forward, backward, and bidirectional regression. I recommend bidirectional.
 
Last edited:

What is "Regression on extracted factors"?

"Regression on extracted factors" is a statistical method used to analyze the relationship between a dependent variable and a set of independent variables that have been extracted from a larger set of variables. It involves identifying the most important factors that contribute to the variation in the dependent variable and using them to build a regression model.

How is "Regression on extracted factors" different from traditional regression?

The main difference between "Regression on extracted factors" and traditional regression is that the former involves a data reduction step where a smaller set of variables are selected for the regression analysis. This can help to simplify and improve the accuracy of the regression model by removing irrelevant or redundant variables.

What are the benefits of using "Regression on extracted factors"?

There are several benefits to using "Regression on extracted factors" including improved model performance, increased interpretability of results, and the ability to handle multicollinearity (high correlation between independent variables). It can also help to reduce the risk of overfitting and improve the generalizability of the model.

What is the process for performing "Regression on extracted factors"?

The process for performing "Regression on extracted factors" involves several steps including data preprocessing, factor extraction using techniques such as principal component analysis or factor analysis, selecting the most important factors based on criteria such as eigenvalues or factor loadings, and building a regression model using the selected factors. The model can then be evaluated and refined as needed.

When is "Regression on extracted factors" most useful?

"Regression on extracted factors" is most useful in situations where there are a large number of variables available but only a few of them are likely to have a significant impact on the dependent variable. This method can help to simplify the analysis and improve the accuracy and interpretability of the results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
459
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
502
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
983
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
904
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
496
Back
Top