Is the estimator for regression through the origin consistent?

  • Thread starter Thread starter slakedlime
  • Start date Start date
  • Tags Tags
    Origin Regression
Click For Summary
The discussion focuses on the consistency of the OLS estimator for regression through the origin, specifically addressing the formula for the estimator \tilde{\beta_1} and the conditions under which it is consistent. The derived formula shows that \tilde{\beta_1} can be expressed as a combination of the true parameter \beta_1 and a term involving the error term u_i. Participants emphasize the importance of assumptions regarding the error terms, including their independence and zero conditional mean, which are crucial for establishing consistency. The instructor's point about the denominator converging to a constant is highlighted as a key aspect of proving consistency, although participants express confusion about how this occurs in practice. Overall, the conversation reflects a struggle to fully grasp the theoretical underpinnings of OLS estimators in this context.
slakedlime
Messages
74
Reaction score
2

Homework Statement


Any help on this would be immensely appreciated! I am having trouble interpreting what my instructor is trying to say.

Consider a simple linear regression model: y_i = \beta_0 + \beta_1x_i + u

(a) In regression through the origin, the intercept is assumed to be equal to zero. For this model, derive the formula for the OLS estimator. Let's denote this estimator by \tilde{\beta_1}.

(b) Show that \tilde{\beta_1} is consistent (assuming \beta_0=0)

The Attempt at a Solution



For part (a), I derived:

We know, y_i = \beta_1x_i + u. Hence:

\tilde{\beta_1} = \frac{\sum_{i=1}^{n}x_iy_i}{\sum_{i=1}^{n}x_i^2}If we wanted to show the error term, u, we plug in y_i = \beta_1x_i + u:

\tilde{\beta_1}
= \frac{\sum_{i=1}^{n}x_i(\beta_1x_i + u)}{\sum_{i=1}^{n}x_i^2}

= \frac{\sum_{i=1}^{n}\beta_1x_i^2 + ux_i}{\sum_{i=1}^{n}x_i^2}

For part (b), I would have to prove that \tilde{\beta_1} converges in probability to \beta_1. My instructor hinted that this involved proving that the denominator of the formula to be derived in part (a) will converge to a constant. I don't see how this can be possible, since the denominator might be infinitely large.

The only possible answer I can see is the following:

\tilde{\beta_1}
= \frac{\sum_{i=1}^{n}\beta_1x_i^2}{\sum_{i=1}^{n}x_i^2} + \frac{\sum_{i=1}^{n}ux_1}{\sum_{i=1}^{n}x_i^2}
= \beta_1 + \frac{\sum_{i=1}^{n}ux_1}{\sum_{i=1}^{n}x_i^2}

So if u = 0, then the OLS estimator for regression through the origin is consistent given that \beta_0 = 0. This would show that any inconsistency in the estimator
\tilde{\beta_1} is due to the error (u).

However, I think that my answer is incorrect because I haven't applied any convergence methods (law of large numbers, Slutsky's theorem, continuous mapping theorem).

Any help would be immensely appreciated. If someone could shed light on what my instructor is saying, that would be great too. Thank you!
 
Physics news on Phys.org
You need u_i for i = 1,2, ...,n, not just a single u that applies to every point. Recall also: there are assumptions about the u_i; what are they?

RGV
 
Ray Vickson said:
You need u_i for i = 1,2, ...,n, not just a single u that applies to every point. Recall also: there are assumptions about the u_i; what are they?

RGV

Sorry, I forgot to add the subscript for the u_i. I might not have explicitly mentioned this before, but I am deriving the OLS estimator for regression through the origin.

\tilde{\beta_1}
= \frac{\sum_{i=1}^{n}x_i(\beta_1x_i + u_i)}{\sum_{i=1}^{n}x_i^2}
= \frac{\sum_{i=1}^{n}\beta_1x_i^2 + u_ix_i}{\sum_{i=1}^{n}x_i^2}
= \frac{\sum_{i=1}^{n}\beta_1x_i^2}{\sum_{i=1}^{n}x_i^2} + \frac{\sum_{i=1}^{n}u_ix_1}{\sum_{i=1}^{n}x_i^2}
= \beta_1 + \frac{\sum_{i=1}^{n}u_ix_i}{\sum_{i=1}^{n}x_i^2}

The OLS first order condition tells us that the sample covariance between the regressors and OLS residuals is zero. Hence the first order condition can be written as:
\sum_{i=1}^{n}x\hat{u_i}= 0

As n converges in probability to infinity, all of the \hat{u_i} will converge into u_i. We also know that the average of the sample residuals is zero. If this is true, the numerator of the second term in the last line of my derivation would = 0. I think this would make \tilde{\beta_1} a consistent estimator? I don't know what this has to do with my instructor's point about the denominator converging to a constant, however.
 
I just realized that there are no \hat{u_i}, since regression through the origin means that there cannot be any sample-level error variables. Hence these are missing from the formula I derived in part (a). According to Wikipedia, "The error is a random variable with a mean of zero conditional on the explanatory variables." [Link]

Hence, u_i are i.i.d. Also, for any given value of x, the average of the unobservables is the same and therefore must equal the average value of u in the entire population. Provided that the var(x) is not = 0 and E(u_i|x) = 0, x_i and u_i have zero covariance. Hence the first order condition for the covariance means that Ʃxu_i = 0.

Why is it insufficient to assume that Ʃu_i = 0, that is, if the population error is negligible, then the OLS estimator for regression through the origin will be consistent?
 
Last edited:
slakedlime said:
I just realized that there are no \hat{u_i}, since regression through the origin means that there cannot be any sample-level error variables. Hence these are missing from the formula I derived in part (a). According to Wikipedia, "The error is a random variable with a mean of zero conditional on the explanatory variables." [Link]

Hence, u_i are i.i.d. Also, for any given value of x, the average of the unobservables is the same and therefore must equal the average value of u in the entire population. Provided that the var(x) is not = 0 and E(u_i|x) = 0, x_i and u_i have zero covariance. Hence the first order condition for the covariance means that Ʃxu_i = 0.

Why is it insufficient to assume that Ʃu_i = 0, that is, if the population error is negligible, then the OLS estimator for regression through the origin will be consistent?

The Wiki statement you quote is misleading. In the underlying statistical model the random variables u_i have nothing at all to do with the x_i values, and their distribution is completely unconnected with x or beta or any other problem aspects. However, they are assumed to be iid, as you said. What ELSE do we assume about their distribution? All of this is elementary prob and stats 101, and does not need the law of large numbers or Slutsky's theorem or the continuous mapping theorem, or any other "advanced" notions.

RGV
 
Last edited:
We assume that the error terms (u_i) follow a normal distribution. Hence, in a sufficiently large sample (as n approaches infinity), the sum of the errors should converge to 0. Hence,Ʃx_iu_i = 0. Are there other assumptions we have to make?
 
slakedlime said:
We assume that the error terms (u_i) follow a normal distribution. Hence, in a sufficiently large sample (as n approaches infinity), the sum of the errors should converge to 0. Hence,Ʃx_iu_i = 0. Are there other assumptions we have to make?

Usually we do NOT start by assuming the u_i follow a normal distribution, because some important properties can be obtained without such an assumption. However, when it comes to making inferences (confidence intervals and the like) then---at that point-- we make a normality assumption. Anyway, even if you do assume an normal distribution, is that enough? Don't you need to say a bit more?

Your statement that Ʃ xiui = 0 is, of course, absolutely false. We control (or at least measure) the xi but we have no control at all over the ui, so that sum may, or may not equal zero in any particular sample.

At this point I am abandoning this conversation, as I have already said enough. All this material appears in books and in on-line notes, tutorials and the like.

RGV
 
Thank you for all of your help Ray. I really appreciate it. All that I've managed to gather is that the expected value of the error term is zero, the expected value of the error term conditional on X is zero, that the variance of the error term is constant for all the values of the independent variable X (but that this homoscedasticity does not impact the consistency of the estimator), that the independent variable (X) and the error term are uncorrelated. Like you've pointed out, we might also assume that the error term is normally distributed. I can't find any more assumptions about the population error term anywhere, no matter how many google searches I run or how much I look through my book. Maybe it's an implicit assumption that isn't intuitive to me.

I guess the answer to how all of this should impact the consistency of the estimator is obvious, but I'm having a lot of difficulty grasping it. My instructor keeps saying that the denominator for Beta1 tilde should converge to a constant and that there should be some average values lurking around in this equation. I don't see how that can happen for an OLS esimator for regression through the origin.

If anyone can provide suggestions, I would really appreciate it. If the answer is too obvious and I should be able to get it, then I really do apologize. I've looked through textbooks, ran Google searches and asked my friends but we are stumped. Maybe I don't know where to start looking.
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 2 ·
Replies
2
Views
589
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 26 ·
Replies
26
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
3
Views
3K
Replies
1
Views
3K