What is the difference between random error and residual?

Click For Summary
In simple linear regression, β0 and β1 represent the true parameters of the model, while b0 and b1 are the estimated values based on sample data. The random error (εi) accounts for unobserved factors affecting the dependent variable, while the residual (ei) is the difference between the observed value (Yi) and the predicted value (Yi hat). The residual reflects the model's accuracy and is used to minimize the sum of squared deviations in regression analysis. Understanding these distinctions is crucial for interpreting regression results effectively. Accurate estimation and error analysis are essential for robust statistical modeling.
kingwinner
Messages
1,266
Reaction score
0
1) "Simple linear regression model: Yi = β0 + β1Xi + εi , i=1,...,n where n is the number of data points, εi is random error
We want to estimate β0 and β1 based on our observed data. The estimates of β0 and β1 are denoted by b0 and b1, respectively."


I don't understand the difference between β01 and b0,b1.
For example, when we see a scattered plot with a least-square line of best fit, say, y = 8 + 5x, then βo=8, β1=5, right? What are the b0 and b1 all about? Why do we need to introduce b0,b1?


2) "Simple linear regression model: Yi = β0 + β1Xi + εi , i=1,...,n where n is the number of data points, εi is random error
Fitted value of Yi for each Xi is: Yi hat = b0 + b1Xi
Residual = vertical deviations = Yi - Yi hat = ei
where Yi is the actual observed value of Y, and Yi hat is the value of Y predicted by the model"


Now I don't understand the difference between random error (εi) and residual (ei). What is the meaning of εi? How are εi and ei different?

Thanks for explaining!
 
Last edited:
Physics news on Phys.org
The greek letters are for the true value of each parameter. The latin letters are for the estimated values. The values or the former do not depend on your sample. The values of the latter are sample-specific.
 
To answer the question on error and observables...

ε in this regression model refers to the unobserved error of the data. In the true model, it represents all the factors that the experimentor cannot see or account for. In many models, it is assumed ε is uncorrelated with X and that E[ε]=0.


A residual (e) is something totally different. It is simply as you defined it: Ybar - Yhat. It is the difference between the expected value of the dependent variable and the observed value. e is fundamentally about what you are trying to solve in the regression. Minimizing the sum of squared residuals.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 34 ·
2
Replies
34
Views
29K
Replies
3
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 64 ·
3
Replies
64
Views
5K