Why linear in linear regression?

In summary, in linear regression, one estimates parameters that are supposed to be linear with respect to the dependent variable. However, neither ##y(\theta_0)## nor ##y(\theta_0,...,\theta_n)## are strictly linear functions, but rather affine functions of the parameters. This is often abbreviated by calling it linear, but it is not just for convenience. In cases where both terms are necessary, the distinction is important.
  • #1
schniefen
178
4
TL;DR Summary
It appears to be common to say linear regression, but is this correct?
In linear regression, one estimates parameters that are supposed to be linear with respect to the dependent variable, for instance

##y=\theta_0 e^x+\epsilon \ ,##

or

##y=\theta_0+\theta_1 x_1+\theta_2x_2+...+\theta_n x_n+\epsilon \ . ##
Is it not true that neither ##y(\theta_0)## nor ##y(\theta_0,...,\theta_n)## are linear functions, but rather affine functions of the parameters?
 
Physics news on Phys.org
  • #2
Affine linear is often abbreviated by calling it linear. It's for convenience, and it ##\epsilon=0## in your examples, then it is linear. However, it's not only convenience. E.g. if we consider the tangent line at ##x=2## at the curve ##y=x^2##, then it is an affine linear object on the surrounding ##x,y-##plane. But we also speak of the tangent space at ##x=2##. In that case we have implicitly identified the point ##(2,4)## with the origin of the tangent space and all of a sudden the affine linear line in the plane, became a linear line in the tangent space.

So, as long as not both terms are necessary in a certain context, affine linear is often just called linear.
 
Last edited:
  • Like
Likes DaveE, Dale and schniefen
  • #3
Could you explain how the line ##x=2## is "an affine linear object on the surrounding ##x,y##-plane"? An affine transformation is a linear transformation plus a translation.

If ##\epsilon \neq 0##, are then both terms necessary?
 
  • #4
schniefen said:
Could you explain how the line ##x=2## is "an affine linear object on the surrounding ##x,y##-plane"? An affine transformation is a linear transformation plus a translation.

If ##\epsilon \neq 0##, are then both terms necessary?
Both terms are necessary if you have a global coordinate system and you perform geometry. Then the distinction is reasonable. If you only want to say: not curved, then linear will do.
 
  • Like
Likes schniefen
  • #5
schniefen said:
TL;DR Summary: It appears to be common to say linear regression, but is this correct?

In linear regression, one estimates parameters that are supposed to be linear with respect to the dependent variable, for instance

##y=\theta_0 e^x+\epsilon \ ,##

or

##y=\theta_0+\theta_1 x_1+\theta_2x_2+...+\theta_n x_n+\epsilon \ . ##
Is it not true that neither ##y(\theta_0)## nor ##y(\theta_0,...,\theta_n)## are linear functions, but rather affine functions of the parameters?
In regression you make an assumption about the form of the functional relationship between your response and any predictor(s). LINEAR regression doesn't require the function to be itself linear, only that it be linear in the parameters to be estimated.

This expression would qualify as a functional form we'd lump into linear regression.

## y = \beta_0 + \beta_1 x_1^2 + \beta_2 \frac{x_2}{x_2^2 + 5} ##

This is not a linear function of the parameters.

## y = \beta_0 e^{\beta_1 x_1 + \beta_2 x_2} ##

since, for given values of the predictors it is a linear function of the betas. You don't need to worry about the affine stuff.
 

1. What does "linear" in linear regression refer to?

The term "linear" in linear regression refers to the assumption that there is a linear relationship between the independent variables (also known as predictors or features) and the dependent variable (the outcome you are trying to predict). This means the equation that models the relationship is linear with respect to the coefficients and the predictor variables. Essentially, the dependent variable is expected to be a straight-line function of each independent variable, holding the other variables constant.

2. Why is it important for the relationship to be linear in linear regression?

It's important for the relationship to be linear in linear regression because the mathematical methods used to calculate the regression line, specifically the least squares technique, are based on this linearity assumption. This linearity allows for simpler computation and interpretation of the coefficients. If the relationship is not linear, linear regression might not provide an accurate or meaningful model, leading to poor predictions and insights.

3. Can linear regression be used for non-linear relationships?

Linear regression can be used for non-linear relationships, but it requires transforming the data or the relationship into a linear form. This can be done by adding polynomial or interaction terms, or by applying a function transformation to the dependent or independent variables. However, strictly speaking, once these transformations are applied, the model is no longer a "pure" linear regression model; rather, it's a linear regression of transformed variables.

4. How do you check if the relationship is linear in linear regression?

To check if the relationship is linear in linear regression, you can start by plotting the data points and visually inspecting if a straight line describes the relationship well. Additionally, you can use residual plots, where you plot the residuals (differences between observed and predicted values) against the predicted values or one of the independent variables. If the residuals display no particular pattern (i.e., they're randomly dispersed around the horizontal axis), it suggests that the linearity assumption holds.

5. What happens if you incorrectly assume linearity in your regression model?

If you incorrectly assume linearity in your regression model when the underlying relationship is not linear, several issues can arise. The model might have poor predictive performance, particularly for prediction outside the range of the training data. The residuals may show patterns, indicating model inadequacy, and the interpretation of the coefficients could be misleading, leading to incorrect conclusions about the relationship between the variables.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
841
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top