# Determining the best fit regression for a set of data

• anisotropic
In summary, a test to determine the "best fit" regression for a set of data can be done by plotting the residuals vs the predictor variable and if the graph looks like a curve, then the relationship is linear.
anisotropic
determining the "best fit" regression for a set of data

Is there a test one can perform to quickly determine what type of regression (linear vs. non-linear) will best fit the relationship between two variables?

i.e. How can one quickly determine the most probably relationship between two variables (like a sort of "probability of fit" test)?

(Linear vs. non-linear...)

One can always find a better fit curve (by changing the form, if no other restriction is imposed) than a previously given curve (sounds contradictory?) unless the given curve passes through all the observed points.

I think that the easiest and quickest way to determine whether the relationship is linear or not, is to plot the graph of residuals vs the predictor variable. If it looks like a curve, then the relationship is curvilinear.

You can fit nonlinear curves using linear regression.

http://en.wikipedia.org/wiki/Linear_regression

Linear regression, will give the maximum likelihood fit your noise Gaussian. If you make a histogram of your estimation errors it should give you some idea of the statistics of the noise distribution. There are of course more advanced statistical tests.

anisotropic said:
Is there a test one can perform to quickly determine what type of regression (linear vs. non-linear) will best fit the relationship between two variables?

i.e. How can one quickly determine the most probably relationship between two variables (like a sort of "probability of fit" test)?

(Linear vs. non-linear...)

There are two basic approaches to answering this question:

1. The traditional statistics approach is to make various assumptions about the distributions of the data, errors, etc., and calculate some diagnostic summary of the model.

2. The error resampling method involves holding out data to test the model, either once, as in an out-of-sample test, or multiple times, as in k-fold cross-validation or bootstrapping. Note that with error resampling, the model will not necessarily improve (per the test) simply because the model becomes more complex.

For a longer explanation of error resampling, see:

http://matlabdatamining.blogspot.com/2008/03/validating-predictive-models.html"

-Will Dwinnell
http://matlabdatamining.blogspot.com/"

Last edited by a moderator:

Not mentioned yet is their is a a theorem (I forget it's name) that gives a method for striking the best balance between the variance in the estimate and the variance in the error of the fit. If you have a higher order fit then the fit will give a very low variance in the error but the fit will differ a lot from one trial to another. Conversely if you use a lower order fit there may be a lot of variance in the error but the fit will stay relatively constant from one trial to the next.

John Creighto said:
Not mentioned yet is their is a a theorem (I forget it's name) that gives a method for striking the best balance between the variance in the estimate and the variance in the error of the fit. If you have a higher order fit then the fit will give a very low variance in the error but the fit will differ a lot from one trial to another. Conversely if you use a lower order fit there may be a lot of variance in the error but the fit will stay relatively constant from one trial to the next.

I imagine that you are referring to the bias-variance trade-off? The total of these components would appear in error resampling, so that as model complexity increased, total error would decrease (as we reduce one component faster than the other increases) until the optimal fit, after which overfitting sets in, and the test error begins to increase again (as we begin to trade one component of error for the other). I have yet to find a non-academic who actually calculates the value of these component sseparately.

-Will Dwinnell
http://matlabdatamining.blogspot.com/"

Last edited by a moderator:

## 1. How do you determine the best fit regression for a set of data?

The best fit regression for a set of data is determined by analyzing the relationship between the independent and dependent variables and choosing the regression model that best represents this relationship.

## 2. What are the different types of regression models that can be used?

There are several types of regression models that can be used, including linear regression, polynomial regression, logistic regression, and exponential regression. The choice of model depends on the nature of the data and the type of relationship between the variables.

## 3. What is the significance of the regression coefficient in determining the best fit?

The regression coefficient, also known as the slope, represents the change in the dependent variable for a unit change in the independent variable. It is important in determining the best fit as it indicates the strength and direction of the relationship between the variables.

## 4. How do you evaluate the goodness of fit for a regression model?

The goodness of fit for a regression model can be evaluated by calculating the coefficient of determination (R-squared). This measures the proportion of variation in the dependent variable that is explained by the independent variable. A higher R-squared value indicates a better fit for the model.

## 5. What are some potential limitations of using regression analysis?

Some potential limitations of using regression analysis include the assumption of a linear relationship between variables, the influence of outliers on the results, and the need for a large sample size. It is important to carefully consider these limitations when using regression to determine the best fit for a set of data.

• Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
2
Views
766
• Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
30
Views
3K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
11
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
14
Views
539