Is the estimator for regression through the origin consistent?

In summary, the conversation is about deriving the OLS estimator for regression through the origin, with a discussion on the assumptions and consistency of the estimator. The OLS first order condition is used to show that the denominator of the estimator will converge to a constant, making the estimator consistent. There is also a clarification about the error term and its assumptions.
  • #1
slakedlime
76
2

Homework Statement


Any help on this would be immensely appreciated! I am having trouble interpreting what my instructor is trying to say.

Consider a simple linear regression model: [itex]y_i = \beta_0 + \beta_1x_i + u[/itex]

(a) In regression through the origin, the intercept is assumed to be equal to zero. For this model, derive the formula for the OLS estimator. Let's denote this estimator by [itex]\tilde{\beta_1}[/itex].

(b) Show that [itex]\tilde{\beta_1}[/itex] is consistent (assuming [itex]\beta_0=0)[/itex]

The Attempt at a Solution



For part (a), I derived:

We know, [itex]y_i = \beta_1x_i + u[/itex]. Hence:

[itex]\tilde{\beta_1} = \frac{\sum_{i=1}^{n}x_iy_i}{\sum_{i=1}^{n}x_i^2}[/itex]If we wanted to show the error term, u, we plug in [itex]y_i = \beta_1x_i + u[/itex]:

[itex]\tilde{\beta_1}[/itex]
[itex]= \frac{\sum_{i=1}^{n}x_i(\beta_1x_i + u)}{\sum_{i=1}^{n}x_i^2}[/itex]

[itex]= \frac{\sum_{i=1}^{n}\beta_1x_i^2 + ux_i}{\sum_{i=1}^{n}x_i^2}[/itex]

For part (b), I would have to prove that [itex]\tilde{\beta_1}[/itex] converges in probability to [itex]\beta_1[/itex]. My instructor hinted that this involved proving that the denominator of the formula to be derived in part (a) will converge to a constant. I don't see how this can be possible, since the denominator might be infinitely large.

The only possible answer I can see is the following:

[itex]\tilde{\beta_1} [/itex]
[itex]= \frac{\sum_{i=1}^{n}\beta_1x_i^2}{\sum_{i=1}^{n}x_i^2} + \frac{\sum_{i=1}^{n}ux_1}{\sum_{i=1}^{n}x_i^2}[/itex]
[itex]= \beta_1 + \frac{\sum_{i=1}^{n}ux_1}{\sum_{i=1}^{n}x_i^2}[/itex]

So if u = 0, then the OLS estimator for regression through the origin is consistent given that [itex]\beta_0 = 0[/itex]. This would show that any inconsistency in the estimator
[itex]\tilde{\beta_1} [/itex] is due to the error (u).

However, I think that my answer is incorrect because I haven't applied any convergence methods (law of large numbers, Slutsky's theorem, continuous mapping theorem).

Any help would be immensely appreciated. If someone could shed light on what my instructor is saying, that would be great too. Thank you!
 
Physics news on Phys.org
  • #2
You need u_i for i = 1,2, ...,n, not just a single u that applies to every point. Recall also: there are assumptions about the u_i; what are they?

RGV
 
  • #3
Ray Vickson said:
You need u_i for i = 1,2, ...,n, not just a single u that applies to every point. Recall also: there are assumptions about the u_i; what are they?

RGV

Sorry, I forgot to add the subscript for the [itex]u_i.[/itex] I might not have explicitly mentioned this before, but I am deriving the OLS estimator for regression through the origin.

[itex]\tilde{\beta_1}[/itex]
[itex]= \frac{\sum_{i=1}^{n}x_i(\beta_1x_i + u_i)}{\sum_{i=1}^{n}x_i^2}[/itex]
[itex]= \frac{\sum_{i=1}^{n}\beta_1x_i^2 + u_ix_i}{\sum_{i=1}^{n}x_i^2}[/itex]
[itex]= \frac{\sum_{i=1}^{n}\beta_1x_i^2}{\sum_{i=1}^{n}x_i^2} + \frac{\sum_{i=1}^{n}u_ix_1}{\sum_{i=1}^{n}x_i^2}[/itex]
[itex]= \beta_1 + \frac{\sum_{i=1}^{n}u_ix_i}{\sum_{i=1}^{n}x_i^2}[/itex]

The OLS first order condition tells us that the sample covariance between the regressors and OLS residuals is zero. Hence the first order condition can be written as:
[itex]\sum_{i=1}^{n}x\hat{u_i}= 0[/itex]

As n converges in probability to infinity, all of the [itex]\hat{u_i}[/itex] will converge into [itex]u_i[/itex]. We also know that the average of the sample residuals is zero. If this is true, the numerator of the second term in the last line of my derivation would = 0. I think this would make [itex]\tilde{\beta_1}[/itex] a consistent estimator? I don't know what this has to do with my instructor's point about the denominator converging to a constant, however.
 
  • #4
I just realized that there are no [itex]\hat{u_i}[/itex], since regression through the origin means that there cannot be any sample-level error variables. Hence these are missing from the formula I derived in part (a). According to Wikipedia, "The error is a random variable with a mean of zero conditional on the explanatory variables." [Link]

Hence, [itex]u_i[/itex] are i.i.d. Also, for any given value of x, the average of the unobservables is the same and therefore must equal the average value of u in the entire population. Provided that the var(x) is not = 0 and E(u_i|x) = 0, x_i and u_i have zero covariance. Hence the first order condition for the covariance means that Ʃxu_i = 0.

Why is it insufficient to assume that Ʃu_i = 0, that is, if the population error is negligible, then the OLS estimator for regression through the origin will be consistent?
 
Last edited:
  • #5
slakedlime said:
I just realized that there are no [itex]\hat{u_i}[/itex], since regression through the origin means that there cannot be any sample-level error variables. Hence these are missing from the formula I derived in part (a). According to Wikipedia, "The error is a random variable with a mean of zero conditional on the explanatory variables." [Link]

Hence, [itex]u_i[/itex] are i.i.d. Also, for any given value of x, the average of the unobservables is the same and therefore must equal the average value of u in the entire population. Provided that the var(x) is not = 0 and E(u_i|x) = 0, x_i and u_i have zero covariance. Hence the first order condition for the covariance means that Ʃxu_i = 0.

Why is it insufficient to assume that Ʃu_i = 0, that is, if the population error is negligible, then the OLS estimator for regression through the origin will be consistent?

The Wiki statement you quote is misleading. In the underlying statistical model the random variables u_i have nothing at all to do with the x_i values, and their distribution is completely unconnected with x or beta or any other problem aspects. However, they are assumed to be iid, as you said. What ELSE do we assume about their distribution? All of this is elementary prob and stats 101, and does not need the law of large numbers or Slutsky's theorem or the continuous mapping theorem, or any other "advanced" notions.

RGV
 
Last edited:
  • #6
We assume that the error terms (u_i) follow a normal distribution. Hence, in a sufficiently large sample (as n approaches infinity), the sum of the errors should converge to 0. Hence,Ʃx_iu_i = 0. Are there other assumptions we have to make?
 
  • #7
slakedlime said:
We assume that the error terms (u_i) follow a normal distribution. Hence, in a sufficiently large sample (as n approaches infinity), the sum of the errors should converge to 0. Hence,Ʃx_iu_i = 0. Are there other assumptions we have to make?

Usually we do NOT start by assuming the u_i follow a normal distribution, because some important properties can be obtained without such an assumption. However, when it comes to making inferences (confidence intervals and the like) then---at that point-- we make a normality assumption. Anyway, even if you do assume an normal distribution, is that enough? Don't you need to say a bit more?

Your statement that Ʃ xiui = 0 is, of course, absolutely false. We control (or at least measure) the xi but we have no control at all over the ui, so that sum may, or may not equal zero in any particular sample.

At this point I am abandoning this conversation, as I have already said enough. All this material appears in books and in on-line notes, tutorials and the like.

RGV
 
  • #8
Thank you for all of your help Ray. I really appreciate it. All that I've managed to gather is that the expected value of the error term is zero, the expected value of the error term conditional on X is zero, that the variance of the error term is constant for all the values of the independent variable X (but that this homoscedasticity does not impact the consistency of the estimator), that the independent variable (X) and the error term are uncorrelated. Like you've pointed out, we might also assume that the error term is normally distributed. I can't find any more assumptions about the population error term anywhere, no matter how many google searches I run or how much I look through my book. Maybe it's an implicit assumption that isn't intuitive to me.

I guess the answer to how all of this should impact the consistency of the estimator is obvious, but I'm having a lot of difficulty grasping it. My instructor keeps saying that the denominator for Beta1 tilde should converge to a constant and that there should be some average values lurking around in this equation. I don't see how that can happen for an OLS esimator for regression through the origin.

If anyone can provide suggestions, I would really appreciate it. If the answer is too obvious and I should be able to get it, then I really do apologize. I've looked through textbooks, ran Google searches and asked my friends but we are stumped. Maybe I don't know where to start looking.
 

1. What is a regression through the origin?

Regression through the origin is a statistical method used to estimate the relationship between two variables where the intercept is set to zero. This means that the line of best fit will pass through the origin (0,0) on a graph, and there is no constant term in the regression equation.

2. What does it mean for an estimator to be consistent?

An estimator is consistent if as the sample size increases, the estimated value approaches the true value of the population parameter. In the case of regression through the origin, this means that as more data points are added to the analysis, the estimates for the slope of the line will become more accurate.

3. How is the consistency of an estimator for regression through the origin determined?

The consistency of an estimator for regression through the origin can be determined by using the law of large numbers. This law states that as the sample size increases, the sample mean will converge to the true population mean. Therefore, if the estimated slope of the line of best fit approaches the true population slope, the estimator is considered consistent.

4. What factors can affect the consistency of an estimator for regression through the origin?

There are a few factors that can affect the consistency of an estimator for regression through the origin. These include the sample size, the distribution of the data, and the presence of outliers. A larger sample size will generally result in a more consistent estimator, while a non-normal distribution or outliers may lead to biased estimates.

5. Why is it important for the estimator to be consistent in regression through the origin?

Consistency is important for any statistical estimator, as it ensures that the estimated values are close to the true population values. In regression through the origin, consistency is important because it allows for accurate estimation of the relationship between variables without the influence of a constant term. This can be especially useful in situations where the intercept has no practical meaning or is not relevant to the analysis.

Similar threads

Replies
3
Views
725
  • Calculus and Beyond Homework Help
Replies
6
Views
384
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
820
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Calculus and Beyond Homework Help
Replies
26
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
740
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
472
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
918
Back
Top