When is Linear Model not Good despite r^2 close to 1?

In summary, the conversation is about the effectiveness of linear models in least-squares regression and how they may not always be appropriate even if the values of r and r^2 are close to 1. The distribution of residuals and other checks, such as the Lack-of-fit Sum of Squares F-test and inference for regression, can help determine if a linear model is suitable for a data set. Adjusted r2 and Akaike Information Criterion values are also important statistics to consider. If the adjusted r2 is low, other models, such as quadratic or cubic, may need to be considered. A recommendation is made for the use of a program called "Eureqa" developed at Cornell to assist with finding the best fit for a data
  • #1
Bacle
662
1
Hi, All:
I was reading of cases in which linear models in least-squares regression were found to be
innefective, despite values of r, r^2 being close to 1 (obviously, both go together ).
I think the issue has to see with the distribution of the residuals being distinctively non-linear (and, definitely, not being normal), e.g., having a histogram that looks like a parabola, or a cubic, etc.
Just curious to see if someone knows of some examples and/or results in this respect, and of what other checks can be made to see if a linear model makes sense for a data set. Checks I know of are Lack-of-fit Sum of Squares F-test and inference for regression (with Ho:= Slope is zero.)

Thanks.
 
Physics news on Phys.org
  • #2
Another way - suppose there is overfitting, or not enough data points for the number of dimensions. If you have 100 data points but are using a model with 100 different dimensions it doesn't matter how good your correlation is.
 
  • #3
A high [itex] R^{2} [/itex] is not the only important statistic to check. I prefer adjusted [itex] R^{2} [/itex], because the more parameters you add to the former it'll tend to inflate it.
 
  • #4
Thanks, Pyrrhus:

What do I then do if the adjusted R^2 is low ? Do I start considering linear models on two-or-more variables, or do I consider quadratic, cubic, etc. models?
 
  • #5
You could try adding square terms, and interaction terms, but if the r-squared is still low it might just be that the regressors don't do a good job to explain the dependent variable.
 
  • #6
Try this incredible free http://creativemachines.cornell.edu/eureqa" developed at Cornell. I've used it in my own research, rating fits by adjusted r2 and Akaike Information Criterion values.
 
Last edited by a moderator:
  • #7
Excellent, Thanks!.
 

1. What is a linear model and what does r^2 represent?

A linear model is a statistical method used to describe the relationship between two variables. The r^2 value, also known as the coefficient of determination, represents the proportion of variation in the dependent variable that can be explained by the independent variable(s).

2. Can a linear model have a high r^2 value but still not be a good fit?

Yes, a linear model can have a high r^2 value but still not be a good fit. This can occur when the data does not follow a linear trend and the model is unable to accurately capture the relationship between the variables.

3. What are some reasons for a linear model to not be good despite a high r^2 value?

There are several reasons why a linear model may not be a good fit despite a high r^2 value. These include: outliers or influential data points, non-linear relationships between variables, and omitted variables that are important in explaining the variation in the data.

4. How can I determine if a linear model is not a good fit despite a high r^2 value?

To determine if a linear model is not a good fit despite a high r^2 value, you can examine the residual plots and check for patterns or heteroscedasticity (unequal variance). You can also perform hypothesis tests, such as the F-test, to assess the overall significance of the model.

5. Is it possible to improve a linear model that has a high r^2 value but is not a good fit?

Yes, it is possible to improve a linear model that has a high r^2 value but is not a good fit. This can be done by transforming the data, adding additional variables, or considering alternative models, such as polynomial regression or non-linear regression. It is important to assess the assumptions of the model and make adjustments accordingly.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
497
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
488
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • General Math
Replies
5
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
10
Views
8K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
Back
Top