Regression on extracted factors

Adel Makram · Jul 31, 2017

My initial objective is to make a regression of ##y## dependent variable on a given set of ##x_1##, ##x_2##... and ##x_m## independent variables. Suppose, I am dealing with a data set of ##n## samples, I found that the variables are correlated so I decided to do factor analysis to best represent the variables in fewer uncorrelated factors ##v## with number ##k<m##.
I just would like to know how to regress ##y## on ##v_1##, ##v_2## ... ##v_k## for each data sample so as to take the form ##y=b_0+b_1 v_1+...+b_k v_k##
I know the factor loading matrix represents the variable ##x## as a linear combination of factors ##v## in the form of ##x=Fv## where ##F## is the factor loading matrix but how this may help in my case. I assume I need the opposite, which to represent ##v## in terms of given ##x##. I thought to extract ##v## in term of ##x## by inverse transformation but ##F## is not square matrix so it can not be inverted.

Adel Makram · Jul 31, 2017

I thought about the follow too; instead of doing factor analysis, I may do SVD (singular value decomposition) of the original data set of ##mn## matrix. Therafter, I reduce the matrix into a reduced form of ##USV^T## where ##V## is the ##n## x ##k## matrix. Then I can do the regression straight from the set of ##(v_i,y_i)## where ##i=1...n##. Not sure if this would be a convenient method! And even I do that, will the number of extracted factors in factor analysis will be corresponding to the number to the eigen values in the reduced form of the data matrix here?

FactChecker · Aug 1, 2017

You may want to reconsider your decision to use FA or SVD just because the x_is are correlated. Independent variables are almost always correlated to some extent, yet stepwise regression can be used. The disadvantage of FA and SVD is that you end up with obscure factors that are combinations of all your x_is and whose interpretation is obscure. I think it is better to only use those techniques when it is your goal to formulate abstract factors and general concepts from data.

The advantage of stepwise linear regression over FA is that the final model is in terms of a limited number of x_is, all of which are understandable. Forward stepwise linear regression would first introduce the most statistically significant variable. Then it would remove the influence of that variable from all other variables, ending up with residuals. Then it would consider the variable with the most statistically significant residuals and include it in the model only if it was statistically justified. It continues in that manner till there are no more statistically significant residuals to include in the model. That process keeps correlated variables from getting into the model unless there is still something remaining that the later variable is needed to explain. There are algorithms for forward, backward, and bidirectional regression. I recommend bidirectional.

Regression on extracted factors

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect