When is Linear Model not Good despite r^2 close to 1?

  • Thread starter Thread starter Bacle
  • Start date Start date
  • Tags Tags
    Linear Model
AI Thread Summary
Linear models can be ineffective even with high r² values due to non-linear residual distributions, such as parabolic or cubic shapes. Overfitting can occur when the number of dimensions exceeds the number of data points, making high correlation misleading. Adjusted r² is preferred over r² as it accounts for the number of parameters in the model. If adjusted r² is low, exploring polynomial models or adding interaction terms may be necessary, but it could indicate that the chosen regressors do not adequately explain the dependent variable. Tools like Eureqa can assist in finding better-fitting models by evaluating adjusted r² and AIC values.
Bacle
Messages
656
Reaction score
1
Hi, All:
I was reading of cases in which linear models in least-squares regression were found to be
innefective, despite values of r, r^2 being close to 1 (obviously, both go together ).
I think the issue has to see with the distribution of the residuals being distinctively non-linear (and, definitely, not being normal), e.g., having a histogram that looks like a parabola, or a cubic, etc.
Just curious to see if someone knows of some examples and/or results in this respect, and of what other checks can be made to see if a linear model makes sense for a data set. Checks I know of are Lack-of-fit Sum of Squares F-test and inference for regression (with Ho:= Slope is zero.)

Thanks.
 
Physics news on Phys.org
Another way - suppose there is overfitting, or not enough data points for the number of dimensions. If you have 100 data points but are using a model with 100 different dimensions it doesn't matter how good your correlation is.
 
A high R^{2} is not the only important statistic to check. I prefer adjusted R^{2}, because the more parameters you add to the former it'll tend to inflate it.
 
Thanks, Pyrrhus:

What do I then do if the adjusted R^2 is low ? Do I start considering linear models on two-or-more variables, or do I consider quadratic, cubic, etc. models?
 
You could try adding square terms, and interaction terms, but if the r-squared is still low it might just be that the regressors don't do a good job to explain the dependent variable.
 
Try this incredible free http://creativemachines.cornell.edu/eureqa" developed at Cornell. I've used it in my own research, rating fits by adjusted r2 and Akaike Information Criterion values.
 
Last edited by a moderator:
Excellent, Thanks!.
 
Back
Top