Limitations of Multivariate Linear Regression

fog37 · Mar 17, 2021

Hello,

With multivariate linear regression, there is a single dependent variable ##y## and multiple independent variables ##x_1##, ##x_2##, ##x_3##, etc.
There is a linear, weighted relationship between ##y## and the various ##x## variables:
$$ y = c_1 x_1 + c_2 x_2 + c_3 x_3 $$
The independent variables are ideally totally independent from each other. Otherwise we run into the problem of collinearity. However, multivariate linear regression can still be used if pairs of independent variables are linearly related...

What happens if we discover that one or two of the independent variables ##x## has a curvilinear correlation with the dependent variable ##y## while the other have a linear correlation? Or if there is curvilinear correlation between the independent variables themselves?
Should multivariate linear regression still be used?

Thank you!

FactChecker · Mar 17, 2021

You can use some transformations of the ##x_i## variables if they look like their relationship to ##y## is nonlinear. The new transformed variables can then be used in the linear regression. In fact, it is common to include terms like ##c_ix_ix_k##. (Because you are allowed to transform the ##x_i## variables, the term "linear" refers more to how the ##c_i## coefficients appear than to how the ##x_i## variables appear.)
Also, there are stepwise regression techniques that will not end up including ##x_i## variables unless they give a statistically significant improvement to the relationship. So stepwise linear regression can still be applied even if the ##x_i## variables are strongly dependent. That is a good thing, the total independence of many variables is not common in many applications.

Dale · Mar 17, 2021

fog37 said:

Summary:: Understand the possible limitations of multivariate linear regression

What happens if we discover that one or two of the independent variables x has a curvilinear correlation with the dependent variable y while the other have a linear correlation?

This can be handled in a couple of different ways depending on the relationship. If the dependent variable is a linear combination of some function of the predictor ##y=c_1 f_1(x_1, ...)+c_2 ...## then you can still use multivariate linear regression. If the dependent variable is some function of a linear combination of the predictors ##y=f(c_1 x_1 + c_2 ...)## then you can use a generalized linear model.

fog37 said:

Or if there is curvilinear correlation between the independent variables themselves?
Should multivariate linear regression still be used?

This is tricky. When you do this the result of the overall regression is valid, however the estimates of the fit parameters ##c_i## for the correlated predictors are unstable and inaccurate. So there are things that you can do with that model, but there are lots of problematic inferences you can make too. When I have a model with strong colinearity in the predictors I usually try to drop one of the predictors from the model. You will lose a little ##R^2## but there is less danger in interpreting the results.

madness · Mar 20, 2021

fog37 said:

Summary:: Understand the possible limitations of multivariate linear regression

Hello,

With multivariate linear regression, there is a single dependent variable ##y## and multiple independent variables ##x_1##, ##x_2##, ##x_3##, etc.
There is a linear, weighted relationship between ##y## and the various ##x## variables:
$$ y = c_1 x_1 + c_2 x_2 + c_3 x_3 $$
The independent variables are ideally totally independent from each other. Otherwise we run into the problem of collinearity. However, multivariate linear regression can still be used if pairs of independent variables are linearly related...

Collinearity would imply that one variable is a linear combination of the other two. The variables can be correlated (i.e. not independent) without being collinear, in which case multivariate linear regression should still do ok.

fog37 said:

What happens if we discover that one or two of the independent variables ##x## has a curvilinear correlation with the dependent variable ##y## while the other have a linear correlation? Or if there is curvilinear correlation between the independent variables themselves?
Should multivariate linear regression still be used?

Thank you!

It might help to take a Bayesian perspective. Performing multivariate linear regression is equivalent to assuming that the data follow a linear-Gaussian model in which the predicted variable is a linear combination of the regressors corrupted by additive Gaussian noise. If in fact there is some curvilinear relationship or non-Gaussian noise in the data, then multivariate linear regression is no longer the optimal method. If we knew the form of the curvilinear relationship then we could fit a model to the data which reflects that structure we believe to be present. If we don't know the form of the curvilinear relationship then various "no free lunch" theorems tell us that there is no one optimal method.

FactChecker · Mar 20, 2021

Dale said:

This is tricky. When you do this the result of the overall regression is valid, however the estimates of the fit parameters ##c_i## for the correlated predictors are unstable and inaccurate.

If some ##x_i## variables are linearly dependent, there are trade-offs that allow multiple answers, but any of those answers are correct and accurate. If there are variables that are strongly correlated, there are trade-offs but some are statistically better predictors of the dependent variable than others. A stepwise regression algorithm would include the better estimators and only include others if they were still statistically significant in reducing the remaining SSE.

So there are things that you can do with that model, but there are lots of problematic inferences you can make too.

Yes, but there are always dangers in interpreting a regression. Correlation does not imply causation.

When I have a model with strong colinearity in the predictors I usually try to drop one of the predictors from the model. You will lose a little ##R^2## but there is less danger in interpreting the results.

That is what a backward elimination stepwise regression would do in a very methodical way.

Dale · Mar 20, 2021

FactChecker said:

That is what a backward elimination stepwise regression would do in a very methodical way.

That is indeed one approach, but personally I prefer to do the elimination before the regression using any relevant non-statistical problem-specific knowledge I have to inform the model. If you use stepwise regression then you need to split your data into one pool for generating the model and a separate pool for testing it. Too many people don't do that when they use stepwise regression.

statdad · Oct 16, 2023

madness said:

Collinearity would imply that one variable is a linear combination of the other two. The variables can be correlated (i.e. not independent) without being collinear, in which case multivariate linear regression should still do ok.
It might help to take a Bayesian perspective. Performing multivariate linear regression is equivalent to assuming that the data follow a linear-Gaussian model in which the predicted variable is a linear combination of the regressors corrupted by additive Gaussian noise. If in fact there is some curvilinear relationship or non-Gaussian noise in the data, then multivariate linear regression is no longer the optimal method. If we knew the form of the curvilinear relationship then we could fit a model to the data which reflects that structure we believe to be present. If we don't know the form of the curvilinear relationship then various "no free lunch" theorems tell us that there is no one optimal method.

"Performing multivariate linear regression is equivalent to assuming that the data follow a linear-Gaussian model in which the predicted variable is a linear combination of the regressors corrupted by additive Gaussian noise."

No. The assumption of Gaussian errors is not one of the traditional regression assumptions. If you make that assumption you are adding one more item to your assumptions about the relationship.

Limitations of Multivariate Linear Regression

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight