What is the difference between random error and residual?

In summary: It is the error you get when you use your model to make a prediction. The difference between the true value and the predicted value.In summary, the simple linear regression model aims to estimate the values of β0 and β1 based on observed data, denoted by b0 and b1 respectively. These Greek letters represent the true values, while the Latin letters represent the estimated values. The random error, εi, refers to unobserved factors in the model and is assumed to be uncorrelated with X and have an expected value of 0. On the other hand, the residual, ei, is the difference between the expected value of the dependent variable and the observed value, and is used to minimize the sum of
  • #1
kingwinner
1,270
0
1) "Simple linear regression model: Yi = β0 + β1Xi + εi , i=1,...,n where n is the number of data points, εi is random error
We want to estimate β0 and β1 based on our observed data. The estimates of β0 and β1 are denoted by b0 and b1, respectively."


I don't understand the difference between β01 and b0,b1.
For example, when we see a scattered plot with a least-square line of best fit, say, y = 8 + 5x, then βo=8, β1=5, right? What are the b0 and b1 all about? Why do we need to introduce b0,b1?


2) "Simple linear regression model: Yi = β0 + β1Xi + εi , i=1,...,n where n is the number of data points, εi is random error
Fitted value of Yi for each Xi is: Yi hat = b0 + b1Xi
Residual = vertical deviations = Yi - Yi hat = ei
where Yi is the actual observed value of Y, and Yi hat is the value of Y predicted by the model"


Now I don't understand the difference between random error (εi) and residual (ei). What is the meaning of εi? How are εi and ei different?

Thanks for explaining!
 
Last edited:
Physics news on Phys.org
  • #2
The greek letters are for the true value of each parameter. The latin letters are for the estimated values. The values or the former do not depend on your sample. The values of the latter are sample-specific.
 
  • #3
To answer the question on error and observables...

ε in this regression model refers to the unobserved error of the data. In the true model, it represents all the factors that the experimentor cannot see or account for. In many models, it is assumed ε is uncorrelated with X and that E[ε]=0.


A residual (e) is something totally different. It is simply as you defined it: Ybar - Yhat. It is the difference between the expected value of the dependent variable and the observed value. e is fundamentally about what you are trying to solve in the regression. Minimizing the sum of squared residuals.
 

1. What is the purpose of a linear regression model?

A linear regression model is used to model the relationship between a dependent variable and one or more independent variables. It allows us to predict the value of the dependent variable based on the values of the independent variables.

2. How is a linear regression model calculated?

A linear regression model is calculated by finding the line of best fit that minimizes the sum of the squared differences between the observed values and the predicted values. This is typically done using a method called least squares.

3. What is the difference between simple linear regression and multiple linear regression?

In simple linear regression, there is only one independent variable, while in multiple linear regression, there are multiple independent variables. This allows for a more complex and accurate model that takes into account the relationships between multiple variables.

4. How do you interpret the coefficients in a linear regression model?

The coefficients in a linear regression model represent the slope of the line of best fit. They indicate the change in the dependent variable for every one unit change in the independent variable. A positive coefficient means that as the independent variable increases, the dependent variable also increases, and vice versa for a negative coefficient.

5. What are some limitations of linear regression models?

Linear regression models assume that there is a linear relationship between the dependent and independent variables, which may not always be the case. They also assume that the variables are independent and normally distributed, and that there is no multicollinearity (high correlation) between the independent variables. Additionally, linear regression may not be suitable for modeling non-linear relationships.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
376
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
698
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
774
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
333
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
390
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
Back
Top