- #1
Kyouran
- 70
- 10
- TL;DR Summary
- Linear regression
I'm not a statistician, but this has been bothering me for a bit. Suppose we have the simple model
Y= aX + b + U
where Y,X and U are taken to be random variables representing the explanatory variable, the independent variable and the error term respectively.
In the case of a stochastic regressor X, we can write the expected value of the explanatory variable as E[Y] = a +bE[X] (assuming the expected value of the error term U is zero). Since E[X] is a constant parameter of the distribution, E[Y] is also a constant and thus this works, as taking the expected value simply yields a relationship between the means of 2 distributions. So we can take both X and Y each to be a distribution and each observation then constitutes a random pick from these 2 fixed distributions. However, in time series analysis we can sometimes encounter a similar model, e.g. Y_t = a + bX_t + U_t where this is same situation is viewed as a stochastic process instead rather than a relation between 2 random variables.
With a deterministic regressor however, trying the same will give E[Y] = a + bx, and unless x is fixed, this can't be correct since it would imply E[Y] has a changing mean and thus as a result Y cannot be a random variable corresponding to a single, fixed distribution. Thus in the case of a deterministic regressor we have to have at least Y_1...Y_n distributions, one for each value of x chosen, i.e. a model Y_i = a + bx + U_i
So, my questions here are regarding whether I'm viewing this correctly or not:
1) Should I view linear regression with stochastic regressors as being a stochastic process where each observation i corresponds to a realization of a different random variable Y_i in this process, or should I view it (as I argued above) as a simple relation between k + 1 distributions (k being the number of regressors) where each observation is just a different realization from the same set of random variables? In other words, Y = a + bX + U vs Y_i = a + bX_i + U_i ? If both views are possible, are there any implications of choosing one view over another?
2) Is my analysis of deterministic regressors here correct?
Y= aX + b + U
where Y,X and U are taken to be random variables representing the explanatory variable, the independent variable and the error term respectively.
In the case of a stochastic regressor X, we can write the expected value of the explanatory variable as E[Y] = a +bE[X] (assuming the expected value of the error term U is zero). Since E[X] is a constant parameter of the distribution, E[Y] is also a constant and thus this works, as taking the expected value simply yields a relationship between the means of 2 distributions. So we can take both X and Y each to be a distribution and each observation then constitutes a random pick from these 2 fixed distributions. However, in time series analysis we can sometimes encounter a similar model, e.g. Y_t = a + bX_t + U_t where this is same situation is viewed as a stochastic process instead rather than a relation between 2 random variables.
With a deterministic regressor however, trying the same will give E[Y] = a + bx, and unless x is fixed, this can't be correct since it would imply E[Y] has a changing mean and thus as a result Y cannot be a random variable corresponding to a single, fixed distribution. Thus in the case of a deterministic regressor we have to have at least Y_1...Y_n distributions, one for each value of x chosen, i.e. a model Y_i = a + bx + U_i
So, my questions here are regarding whether I'm viewing this correctly or not:
1) Should I view linear regression with stochastic regressors as being a stochastic process where each observation i corresponds to a realization of a different random variable Y_i in this process, or should I view it (as I argued above) as a simple relation between k + 1 distributions (k being the number of regressors) where each observation is just a different realization from the same set of random variables? In other words, Y = a + bX + U vs Y_i = a + bX_i + U_i ? If both views are possible, are there any implications of choosing one view over another?
2) Is my analysis of deterministic regressors here correct?
Last edited by a moderator: