Linear regression and random variables

Click For Summary

Discussion Overview

The discussion revolves around the concepts of linear regression models, correlation, and the nature of random variables involved in regression analysis. Participants explore the implications of treating both independent and dependent variables as random variables, the assumptions underlying linear regression, and the interpretation of regression outputs.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that in linear regression, the pairs ##(x,y)## represent realizations of a bivariate random variable ##Z=(X,Y)##, while others argue that the independent variable ##X## is not treated as a random variable in standard models.
  • It is noted that the model can be expressed as ##y = \beta_1 x + \beta_0 + \epsilon##, where ##\epsilon## is a normally distributed error term, leading to discussions on whether both ##X## and ##Y## should be considered random variables.
  • Some participants emphasize that the regression model estimates the mean of ##Y## given ##X##, while others question whether the model is intended to estimate the actual value of ##Y## or just its mean.
  • There is a discussion about the distinction between deterministic and random components in regression models, particularly in contexts where the relationship between variables may not be repeatable.
  • Participants express uncertainty about the terminology and the implications of including or excluding the error term in the regression model.
  • Some participants suggest that the regression model can be viewed as a statistical model that incorporates randomness, while others caution against overgeneralizing this interpretation.

Areas of Agreement / Disagreement

Participants do not reach a consensus on whether both ##X## and ##Y## should be treated as random variables in regression analysis. There are competing views on the interpretation of regression outputs, particularly regarding whether they estimate the mean of ##Y## or the actual values of ##Y##.

Contextual Notes

Participants note that the assumptions of linear regression imply that the independent variable ##X## does not have random errors, which complicates the interpretation of ##(X,Y)## as bivariate normal random variables. Additionally, there is ambiguity regarding the terminology used to describe the relationship between the regression model and the underlying statistical properties of the data.

  • #31
DrDu said:
Of course outliers are an issue! But first one has to define what an outlier is. An outlier may violate ##E(\epsilon_i)=0##. OLS is sensitive to this, it is not a robust method. A single outlier of this kind may lead to a slope estimate arbitrary far away from the true one.
Yes. My point was that I should not worry about outliers at other ##x## values where the expected values of the estimators are concerned since it is the entire distribution at those ##x## values that determine those estimator expected values.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 13 ·
Replies
13
Views
4K
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K