Error in (Multi)linear Regression

In summary, the conversation discusses varying opinions on the conditions necessary to justify the use of multi-linear regression in data modeling. Some authors require normal and i.i.d. errors, while others only require i.i.d. errors with a mean of 0. The assumption of normality is used to find the distribution of coefficients and determine reliable confidence intervals. However, in practice, multi-linear models work well in many situations and it can be difficult to know if the errors truly follow a normal distribution. The conversation also briefly touches on situations where non-linear models may be more appropriate, such as in the case of measuring the height of a falling object with some noise.
  • #1
WWGD
Science Advisor
Gold Member
7,010
10,469
Hi,
I keep reading varying accounts on conditions needed to " justify" the use of ( multi) linear regression to model data.

Specifically, I have seen several authors require errors to be normal, i.i.d , whilr others only require the errors be i.i.d with mean 0. Just where is the assumption of normality used to justify the use of linear models? I know of Gauss Mark of, but this seems too strong. I've heard that it is used to find the distribution of the coefficients and determine reliable confidence intervals for the coefficients? If so, do you suggest a source? If not, can you explain?
 
Physics news on Phys.org
  • #2
I guess the question is what does "justify" mean. If the errors are zero mean iid normal and equally distributed, then I think the multi linear model is an unbiased estimator of the actual values. If the errors are otherwise distributed then a different model (i.e. different coefficients) might be a better choice.

But in practice, a multi linear model works pretty well in lots of other situations, and it's often hard to know that the errors actually look like.
 
  • Like
Likes WWGD
  • #3
Thank you. Just curious, as an aside, do you know of situations that cannot be modeled (log)linearly; with linearity meaning linearity in the coefficients?
 
  • #4
You just mean give a scenario where a non linear model is better? Sure, suppose you drop an object, and write down the time and the height at those times as it falls. The height measurement has some noise to it. Then the right model for the height is going to be something like ##h(t)=-\frac{g}{2}t^2+h_0## if it starts at a height of ##h_0##.

I feel like there's a good chance I did not understand the question.
 

1. What is an error in (multi)linear regression?

An error in (multi)linear regression refers to the difference between the actual values and the predicted values of the dependent variable. It measures the accuracy of the regression model in predicting the outcome.

2. How is error calculated in (multi)linear regression?

Error in (multi)linear regression is typically calculated using the root mean squared error (RMSE) or the mean absolute error (MAE) formula. These formulas take into account the differences between the actual and predicted values and provide a single value to represent the overall error of the model.

3. What causes error in (multi)linear regression?

Error in (multi)linear regression can be caused by a variety of factors, including incorrect assumptions about the relationship between the independent and dependent variables, outliers in the data, and the presence of multicollinearity (high correlation) among the independent variables.

4. How can error in (multi)linear regression be minimized?

To minimize error in (multi)linear regression, it is important to carefully select and preprocess the data, choose appropriate regression techniques, and validate the model using techniques such as cross-validation. It is also important to check for and address any violations of the assumptions of linear regression.

5. What is the significance of error in (multi)linear regression?

The error in (multi)linear regression is an important measure of the accuracy and reliability of the regression model. A low error indicates a good fit between the model and the data, while a high error may suggest that the model needs to be improved or that there are underlying issues with the data. Minimizing error is crucial for making accurate predictions and drawing reliable conclusions from the regression analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
849
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
480
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
478
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
2K
  • Other Physics Topics
Replies
3
Views
1K
Back
Top