Limitations of Multivariate Linear Regression

Click For Summary

Discussion Overview

The discussion revolves around the limitations and considerations of using multivariate linear regression, particularly in the context of independent variables that may exhibit curvilinear relationships with the dependent variable or among themselves. Participants explore various scenarios and methodologies related to regression analysis, including transformations and stepwise regression techniques.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants note that multivariate linear regression assumes a linear relationship between the dependent variable and independent variables, with ideal conditions of independence among the independent variables.
  • There is a suggestion that transformations of the independent variables can be applied if their relationship to the dependent variable is nonlinear, allowing for continued use of linear regression.
  • Participants discuss the implications of collinearity, with some arguing that while collinearity can complicate the interpretation of regression results, multivariate linear regression can still be valid under certain conditions.
  • One participant mentions that if curvilinear relationships exist, generalized linear models may be more appropriate, depending on the nature of the relationships involved.
  • Concerns are raised about the stability and accuracy of fit parameters when independent variables are correlated, with suggestions to drop one of the correlated predictors to improve interpretability.
  • There is a discussion about the use of stepwise regression techniques, with differing opinions on their application and the importance of data splitting for model validation.
  • Some participants advocate for a Bayesian perspective, suggesting that the assumptions underlying multivariate linear regression may not hold if there are curvilinear relationships or non-Gaussian noise in the data.
  • One participant challenges the assumption that Gaussian errors are a traditional regression assumption, indicating that this adds complexity to the model's assumptions.

Areas of Agreement / Disagreement

Participants express multiple competing views regarding the application of multivariate linear regression in the presence of curvilinear relationships and collinearity. The discussion remains unresolved, with no consensus on the best approach to take under these conditions.

Contextual Notes

Limitations include the potential instability of fit parameters in the presence of collinearity, the need for transformations when relationships are nonlinear, and the implications of using stepwise regression without proper validation. The discussion highlights the complexity of assumptions required for multivariate linear regression.

fog37
Messages
1,566
Reaction score
108
TL;DR
Understand the possible limitations of multivariate linear regression
Hello,

With multivariate linear regression, there is a single dependent variable ##y## and multiple independent variables ##x_1##, ##x_2##, ##x_3##, etc.
There is a linear, weighted relationship between ##y## and the various ##x## variables:
$$ y = c_1 x_1 + c_2 x_2 + c_3 x_3 $$
The independent variables are ideally totally independent from each other. Otherwise we run into the problem of collinearity. However, multivariate linear regression can still be used if pairs of independent variables are linearly related...

What happens if we discover that one or two of the independent variables ##x## has a curvilinear correlation with the dependent variable ##y## while the other have a linear correlation? Or if there is curvilinear correlation between the independent variables themselves?
Should multivariate linear regression still be used?

Thank you!
 
Physics news on Phys.org
You can use some transformations of the ##x_i## variables if they look like their relationship to ##y## is nonlinear. The new transformed variables can then be used in the linear regression. In fact, it is common to include terms like ##c_ix_ix_k##. (Because you are allowed to transform the ##x_i## variables, the term "linear" refers more to how the ##c_i## coefficients appear than to how the ##x_i## variables appear.)
Also, there are stepwise regression techniques that will not end up including ##x_i## variables unless they give a statistically significant improvement to the relationship. So stepwise linear regression can still be applied even if the ##x_i## variables are strongly dependent. That is a good thing, the total independence of many variables is not common in many applications.
 
fog37 said:
Summary:: Understand the possible limitations of multivariate linear regression

What happens if we discover that one or two of the independent variables x has a curvilinear correlation with the dependent variable y while the other have a linear correlation?
This can be handled in a couple of different ways depending on the relationship. If the dependent variable is a linear combination of some function of the predictor ##y=c_1 f_1(x_1, ...)+c_2 ...## then you can still use multivariate linear regression. If the dependent variable is some function of a linear combination of the predictors ##y=f(c_1 x_1 + c_2 ...)## then you can use a generalized linear model.

fog37 said:
Or if there is curvilinear correlation between the independent variables themselves?
Should multivariate linear regression still be used?
This is tricky. When you do this the result of the overall regression is valid, however the estimates of the fit parameters ##c_i## for the correlated predictors are unstable and inaccurate. So there are things that you can do with that model, but there are lots of problematic inferences you can make too. When I have a model with strong colinearity in the predictors I usually try to drop one of the predictors from the model. You will lose a little ##R^2## but there is less danger in interpreting the results.
 
Last edited:
  • Skeptical
Likes   Reactions: madness
fog37 said:
Summary:: Understand the possible limitations of multivariate linear regression

Hello,

With multivariate linear regression, there is a single dependent variable ##y## and multiple independent variables ##x_1##, ##x_2##, ##x_3##, etc.
There is a linear, weighted relationship between ##y## and the various ##x## variables:
$$ y = c_1 x_1 + c_2 x_2 + c_3 x_3 $$
The independent variables are ideally totally independent from each other. Otherwise we run into the problem of collinearity. However, multivariate linear regression can still be used if pairs of independent variables are linearly related...

Collinearity would imply that one variable is a linear combination of the other two. The variables can be correlated (i.e. not independent) without being collinear, in which case multivariate linear regression should still do ok.

fog37 said:
What happens if we discover that one or two of the independent variables ##x## has a curvilinear correlation with the dependent variable ##y## while the other have a linear correlation? Or if there is curvilinear correlation between the independent variables themselves?
Should multivariate linear regression still be used?

Thank you!

It might help to take a Bayesian perspective. Performing multivariate linear regression is equivalent to assuming that the data follow a linear-Gaussian model in which the predicted variable is a linear combination of the regressors corrupted by additive Gaussian noise. If in fact there is some curvilinear relationship or non-Gaussian noise in the data, then multivariate linear regression is no longer the optimal method. If we knew the form of the curvilinear relationship then we could fit a model to the data which reflects that structure we believe to be present. If we don't know the form of the curvilinear relationship then various "no free lunch" theorems tell us that there is no one optimal method.
 
Last edited:
Dale said:
This is tricky. When you do this the result of the overall regression is valid, however the estimates of the fit parameters ##c_i## for the correlated predictors are unstable and inaccurate.
If some ##x_i## variables are linearly dependent, there are trade-offs that allow multiple answers, but any of those answers are correct and accurate. If there are variables that are strongly correlated, there are trade-offs but some are statistically better predictors of the dependent variable than others. A stepwise regression algorithm would include the better estimators and only include others if they were still statistically significant in reducing the remaining SSE.
So there are things that you can do with that model, but there are lots of problematic inferences you can make too.
Yes, but there are always dangers in interpreting a regression. Correlation does not imply causation.
When I have a model with strong colinearity in the predictors I usually try to drop one of the predictors from the model. You will lose a little ##R^2## but there is less danger in interpreting the results.
That is what a backward elimination stepwise regression would do in a very methodical way.
 
FactChecker said:
That is what a backward elimination stepwise regression would do in a very methodical way.
That is indeed one approach, but personally I prefer to do the elimination before the regression using any relevant non-statistical problem-specific knowledge I have to inform the model. If you use stepwise regression then you need to split your data into one pool for generating the model and a separate pool for testing it. Too many people don't do that when they use stepwise regression.
 
madness said:
Collinearity would imply that one variable is a linear combination of the other two. The variables can be correlated (i.e. not independent) without being collinear, in which case multivariate linear regression should still do ok.
It might help to take a Bayesian perspective. Performing multivariate linear regression is equivalent to assuming that the data follow a linear-Gaussian model in which the predicted variable is a linear combination of the regressors corrupted by additive Gaussian noise. If in fact there is some curvilinear relationship or non-Gaussian noise in the data, then multivariate linear regression is no longer the optimal method. If we knew the form of the curvilinear relationship then we could fit a model to the data which reflects that structure we believe to be present. If we don't know the form of the curvilinear relationship then various "no free lunch" theorems tell us that there is no one optimal method.
"Performing multivariate linear regression is equivalent to assuming that the data follow a linear-Gaussian model in which the predicted variable is a linear combination of the regressors corrupted by additive Gaussian noise."

No. The assumption of Gaussian errors is not one of the traditional regression assumptions. If you make that assumption you are adding one more item to your assumptions about the relationship.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K