Linear regression with error term

In summary: X_i## are linearly independent and so the expected value of the y-variable ##Y## is just the sum of the expected values of the x-variables ##X_i##. However, in the case of a stochastic regressor ##X##...##X_i## may be correlated. So the expected value of the y-variable ##Y## won't be just the sum of the expected values of the x-variables ##X_i##, it will be something more complicated involving the covariance of the x-variables ##X_i##. This is because the expected value of the y-variable ##Y## is a function of the covariance of the x-variables
  • #1
Kyouran
70
10
TL;DR Summary
Linear regression
I'm not a statistician, but this has been bothering me for a bit. Suppose we have the simple model

Y= aX + b + U

where Y,X and U are taken to be random variables representing the explanatory variable, the independent variable and the error term respectively.

In the case of a stochastic regressor X, we can write the expected value of the explanatory variable as E[Y] = a +bE[X] (assuming the expected value of the error term U is zero). Since E[X] is a constant parameter of the distribution, E[Y] is also a constant and thus this works, as taking the expected value simply yields a relationship between the means of 2 distributions. So we can take both X and Y each to be a distribution and each observation then constitutes a random pick from these 2 fixed distributions. However, in time series analysis we can sometimes encounter a similar model, e.g. Y_t = a + bX_t + U_t where this is same situation is viewed as a stochastic process instead rather than a relation between 2 random variables.

With a deterministic regressor however, trying the same will give E[Y] = a + bx, and unless x is fixed, this can't be correct since it would imply E[Y] has a changing mean and thus as a result Y cannot be a random variable corresponding to a single, fixed distribution. Thus in the case of a deterministic regressor we have to have at least Y_1...Y_n distributions, one for each value of x chosen, i.e. a model Y_i = a + bx + U_i

So, my questions here are regarding whether I'm viewing this correctly or not:

1) Should I view linear regression with stochastic regressors as being a stochastic process where each observation i corresponds to a realization of a different random variable Y_i in this process, or should I view it (as I argued above) as a simple relation between k + 1 distributions (k being the number of regressors) where each observation is just a different realization from the same set of random variables? In other words, Y = a + bX + U vs Y_i = a + bX_i + U_i ? If both views are possible, are there any implications of choosing one view over another?

2) Is my analysis of deterministic regressors here correct?
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
E[Y] = a + bx works with deterministic x, are you equating deterministic with stationary? The latter can be obtained by differencing. For example if you model Apple’s stock price (Y) relative to the S&P 500(x) price by y=a +bx +e, this is does not have a stable mean, but if you do the same in the first difference (the % change in price, forgetting dividends for the moment) then you do have a stable mean
 
  • Like
Likes sysprog
  • #3
BWV said:
E[Y] = a + bx works with deterministic x, are you equating deterministic with stationary? The latter can be obtained by differencing. For example if you model Apple’s stock price (Y) relative to the S&P 500(x) price by y=a +bx +e, this is does not have a stable mean, but if you do the same in the first difference (the % change in price, forgetting dividends for the moment) then you do have a stable mean
I'm aware of differencing, I'm more talking about the subtle theoretical differences in treating the explanatory variable as a single random variable which takes on many different values vs treating it as a part of a stochastic process. Afaik, in the case of differencing, one could say you are trying to create a single random variable with a stable mean from a stochastic process. So it's more of a deeply theoretical question here.
 
  • #4
@Kyouran I think that your being able to do multi-variant linear regression analysis with inclusion of consideration of stochastic processes is likely to help to enable you to land a job in a decision support group at a major financial institution :flashlight: ##\dots##
 
  • #5
Kyouran said:
I'm aware of differencing, I'm more talking about the subtle theoretical differences in treating the explanatory variable as a single random variable which takes on many different values vs treating it as a part of a stochastic process. Afaik, in the case of differencing, one could say you are trying to create a single random variable with a stable mean from a stochastic process. So it's more of a deeply theoretical question here.
In my stock example both Y and X can be represented by the stochastic process of geometric Brownian motion, but other processes may not work. It would need to be a martingale and have normally distributed error terms
 
  • #6
Kyouran said:
Y= aX + b + U

where Y,X and U are taken to be random variables representing the explanatory variable, the independent variable and the error term respectively.

In the usual sort of linear regression (linear least squares) ##X## is not considered a random variable. For example, in a experiment, the experimeter may systematically set ##X## to a sequence of (say) voltages ##X = 1.0, 2.0, 3.0, 4.0, 5.0## and measure the value of ##Y## at those voltages.

Of course, one may speak of the mean of any set of numbers. And one may imagine "picking a number at random" from any set of numbers. However, imagining doing it may have nothing to do with the mathematical model that involves the set of numbers.

So we can take both X and Y each to be a distribution and each observation then constitutes a random pick from these 2 fixed distributions.

It's unlikely that you can model an experiment where ##X## is set systematically that way. For example, if you pick 20 random voltages from the set ##\{1.0,2.0,3.0,4.0,5.0\}## you are unlikely to reproduce an experiment where the experimenter decided to use each voltage exactly 4 times.

There are types of regression where both ##X## and ##Y## are modeled as random variables, in the sense that each are measurements that include random errors. (e.g. "total least squares regression")

However, in time series analysis we can sometimes encounter a similar model, e.g. Y_t = a + bX_t + U_t where this is same situation is viewed as a stochastic process instead rather than a relation between 2 random variables.

What does "X_t" represent? There are types of time series where ##X_t## is a random variable that is a function of values ##X_{t-1}, X_{t-2},...## that happened previously, plus some random disturbance that happens at time ##t##.

A stochastic process assigns probabilities to "trajectories". For example, a stochastic process model of a stocks price over 100 days assigns a probability to each possible graph of that stocks price over the 100 days.

You can imagine picking a day at random and then picking (from all the possible graphs) a price on that day at random. However, for a reasonably sophisticated stochastic process, there need not be any simple relation between the mean day selected, the mean stock value selected, and the constants involve in the model. Perhaps you are thinking of a very simple model such as
##Y(t) = A + tB + U(t)## where ##A,B## are constants and ##U(t)## is a random variable.
 
  • Like
Likes sysprog
  • #7
Stephen Tashi said:
In the usual sort of linear regression (linear least squares) ##X## is not considered a random variable.
In classical linear regression this is indeed the case, but it is easy to think of an example from e.g. some of the social sciences where X can be both random yet without measurement error (and hence no total least squares is required). Take for example the age of a randomly selected individual from a population, the number of years they have studied, their wage, etc. You are still randomly selecting the individual, so the regressor is here stochastic.

Stephen Tashi said:
What does "X_t" represent? There are types of time series where ##X_t## is a random variable that is a function of values ##X_{t-1}, X_{t-2},...## that happened previously, plus some random disturbance that happens at time ##t##.
I think I meant to regress one time series onto another, i.e. ##Y_t## onto a time series ##X_t##. Mathematically, the formula looks pretty much the same as that for a regression between X and Y, but at a deeper level they seem to be completely different things.

Let's say that you have a population and select from each individual i their wage ##Y_i## and their age ##X_i##. If you select n individuals, then you could say you have 2n different random variables ##Y_1##...##Y_n## and ##X_1## .. ##X_n##. However, it is clear that ##Y_1 = Y_2 = ... = Y_n = Y##, as they are from the same population, and the same goes for the ##X_i##: they can be seen as realizations of the same ##X##. Because of this, we end up with n equations to solve for the parameters a and b. Also note that the ##i## was just a dummy index, I could've used ##t## instead and mathematically, it would have looked like the exact same formula I gave for the time series example.

Now let's see how this works with 2 time series. You select at each time the price of stock 1 ##(X_t)## and stock 2 ##(Y_t)##. If you select n times, you again have ##Y_1##...##Y_n## and ##X_1##...##X_n## just as in the previous paragraph, but clearly this time we cannot just assume ##Y_1 = Y_2 = ... Y_n##, nor that ##X_1 = X_2 = .. = X_n##.

In the first case, we can estimate a relation between the expected value E[Y] and E[X] because we have a total of n data points for 2 random variables: X and Y. In addition, we find that for the estimated parameters to be unbiased, we simply need to have that E[U|X] = 0. In the second case, ##Y_1## is not necessarily the same as ##Y_2## and we end up with only 1 data point for each pair of random variables ##(X_t, Y_t)## and all we can say is that for some specific value of t, ##E[Y_t] = a_t + b_tE[X_t]##, but not only are the true ##a_t## and ##b_t## parameters different for each pair of ##Y_t## and ##X_t##, but we also have only 1 data point for each such pair. This seems to defeat the whole purpose behind linear regression. So to me it seems we just have to start making assumptions about the underlying process that generates these random variables.
Stephen Tashi said:
A stochastic process assigns probabilities to "trajectories". For example, a stochastic process model of a stocks price over 100 days assigns a probability to each possible graph of that stocks price over the 100 days.

You can imagine picking a day at random and then picking (from all the possible graphs) a price on that day at random. However, for a reasonably sophisticated stochastic process, there need not be any simple relation between the mean day selected, the mean stock value selected, and the constants involve in the model. Perhaps you are thinking of a very simple model such as
##Y(t) = A + tB + U(t)## where ##A,B## are constants and ##U(t)## is a random variable.
For the very simple model here, ##t## would be a deterministic regressor (i.e. you choose the times). But in this case, you have the problem that you have a different distribution ##Y_t## for each value of ##t##. This implies that you only have a single data point to estimate anything about the distribution ##Y_t##, e.g. with ##m## times you have ##m## data points for ##m## distributions. So you have to start making assumptions about the relation between ##Y_t## and ##Y_{t-1}## or you will get nowhere. In fact, assuming a fixed A and B immediately leads to the following assumption between ##Y_t## and ##Y_{t-1}## (apply differencing):

##Y_t = Y_{t-1} + B + U_t - U_{t-1}##

But then you'd have to start justifying that assumption, and it is not as trivial as in the case with the population ages and wages.
 
  • Like
Likes sysprog
  • #8
Kyouran said:
Let's say that you have a population and select from each individual i their wage ##Y_i## and their age ##X_i##. If you select n individuals, then you could say you have 2n different random variables ##Y_1##...##Y_n## and ##X_1## .. ##X_n##. However, it is clear that ##Y_1 = Y_2 = ... = Y_n = Y##, as they are from the same population,

If you are assuming a linear regression model of the form ##Y = AX + B + U##. The distribution of the random variable of ##Y## measured when ##X = 1## does not have the same distribution as the random variable ##Y## measured when ##X = 2##.

In what you call stochastic regression, you have n data points selected from the joint distribution of the random vector ##(X,Y)##
If you consider all n data points to be data for a "random variable" then "random variable" must refer to a random vector.

From the joint distribution, we can compute the marginal distribution of ##Y## and we can compute conditional distributions for things like ##(Y|x=2)##.

You can consider all the ##Y## measurements to be from the same population, if you consider the marginal distribution of ##Y## to be the distribution of that population. However the marginal distribution of ##Y## is not specified in a linear regression model.

The linear regression model ##Y_i = AX_i + B + U(i)## does not specify the joint distribution of ##(X,Y)##. Hence it does not specify the marginal distribution of ##Y##. It does specify conditional distributions like the distribution of ##Y## given that ##X = 2##.

In the second case, ##Y_1## is not necessarily the same as ##Y_2## and we end up with only 1 data point for each pair of random variables ##(X_t, Y_t)##

So to me it seems we just have to start making assumptions about the underlying process that generates these random variables.

Yes, sufficient data to analyze a stochastic processes requires either data from multiple trajectories of the process (not the situation of 1 pair of (X,Y) data per each time t that you described) or some assumption that the formula for generating a trajectory can be estimated from segments of a single trajectory. (e.g. stationary random processes, Box-Jenkins models etc.)
 
  • Like
Likes sysprog, etotheipi and Kyouran
  • #9
@Stephen Tashi yes in the example where one draws data from a population you can consider having joint, marginal and conditional distributions (in that specific case one is mostly interested in E[Y|x], not so much E[Y] even though it is well-defined in that particular case), whereas in the time series example the underlying structures are different (e.g. stochastic process), yet linear regression can be applied to both of them.

I suppose sometimes it may be just a point of view as to how everything is considered, but this discussion did help my understanding of it all. Thanks!
 
  • Like
Likes sysprog

What is linear regression with error term?

Linear regression with error term is a statistical method used to analyze the relationship between two or more variables. It assumes that the relationship between the variables can be represented by a straight line. The error term, also known as the residual, represents the difference between the actual values and the predicted values by the regression line.

What is the purpose of the error term in linear regression?

The error term in linear regression serves as a measure of the variability of the data points around the regression line. It helps to identify the accuracy of the regression model and determine if the model is a good fit for the data. A smaller error term indicates a better fit of the model.

How is the error term calculated in linear regression?

The error term is calculated by taking the difference between the actual value of the dependent variable and the predicted value by the regression line. This difference is then squared to remove any negative values and to give more weight to larger errors. The sum of the squared errors is then divided by the number of data points to calculate the mean squared error.

What is the significance of the error term in linear regression?

The error term is significant in linear regression as it helps to assess the accuracy of the model and determine the level of confidence in the relationship between the variables. It also helps to identify any outliers or influential data points that may affect the overall results of the regression analysis.

How can the error term be minimized in linear regression?

The error term can be minimized in linear regression by finding the best-fit line that minimizes the sum of the squared errors. This can be achieved by using various techniques such as ordinary least squares, maximum likelihood estimation, or gradient descent. Additionally, removing outliers or influential data points can also help to reduce the error term.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
449
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
474
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
481
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
891
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
491
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
832
Back
Top