Having trouble understanding variance of OLS estimator

In summary: I'm taking a leap of faith here: if this is true, the elements in the VarCov matrix are expressed in terms of sample statistics and are therefore stochastic.
  • #1
chevrox
2
0
So in computing the variance-covariance matrix for β-hat in an OLS model, we arrive at

VarCov(β-hat)=(σ_ε)^2E{[X'X]^-1}

However, I'm incredulous as to how X is considered non-stochastic and how we can just eliminate the expectation sign and have

VarCov(β-hat)=(σ_ε)^2[X'X]^-1

I'm accepting this to be true (since it's so written in the text) but I'm taking a leap of faith here: if this is true, the elements in the VarCov matrix are expressed in terms of sample statistics and are therefore stochastic. I thought that the variance of an estimator of a parameter, if consistent, should be a deterministic parameter itself and should not depend on the sample observations (besides sample size, n), such as the ones we see in using Cramer-Rao lower bound to determine efficiency. Likely I'm understanding something wrong here, any pointers would be greatly appreciated!
 
Physics news on Phys.org
  • #2
You haven't clearly stated a mathematical question. Is X supposed to be the vector of independent variables? If so, they aren't considered to be stochastic if you compute the regression so it minimizes the least square error in predicting the dependent variables, which are "Y", by tradition. If you have data of the form (X,Y) and there are "errors" in both X and Y, you should use a "total least squares" model.
 
  • #3
Hey chevrox and welcome to the forums.

Like Stephen Tashi I am going wait for clarification of what your variables are but I did want to comment on one thing you said:

chevrox said:
I thought that the variance of an estimator of a parameter, if consistent, should be a deterministic parameter itself and should not depend on the sample observations (besides sample size, n), such as the ones we see in using Cramer-Rao lower bound to determine efficiency. Likely I'm understanding something wrong here, any pointers would be greatly appreciated!

That should definitely be the case for a consistent estimator and it should be the case that the variance 'shrinks' with a higher sample size. If the variance does not do this, then basically your estimate doesn't get 'better' with a higher sample size and it becomes rather pointless to do statistics with any kind of sample using that estimator.
 
  • #4
Thanks for the replies! Yes, X is the nxk matrix of explanatory variables such that y=Xβ+ε. I think I understand it now. Variables in X do not necessarily follow a stochastic process, and even if they do, since all variability of y is explained by ε in the model, the independent variables affect the dependent variable solely through their observed values rather than a range of distribution where those values could fall, and it is therefore considered non-stochastic. And meanwhile β-hat does not lose its consistency since E(β-hat)=β (which is possible only if X is non-stochastic) and Var(β-hat)→0 even though Var(β-hat) varies with sample.
 
  • #5


It is understandable to have difficulty understanding the variance of the OLS estimator. The formula you have provided is the standard formula for the variance-covariance matrix of the OLS estimator, but it can be confusing because it involves both stochastic and non-stochastic elements.

First, it is important to understand that in OLS, the independent variable (X) is treated as non-stochastic, meaning it is not subject to random variation. This is because in OLS, the independent variable is assumed to be a fixed value, while the dependent variable (Y) is allowed to vary based on the error term (ε). Therefore, X is not considered a random variable and does not have an expectation or variance.

Next, the variance-covariance matrix of β-hat is a function of the error term (ε) and the independent variable (X). Since the error term is assumed to have a constant variance (σ_ε^2), the only source of randomness in the formula is from X. However, because X is treated as non-stochastic, the expectation sign can be eliminated, resulting in a deterministic variance-covariance matrix.

It is important to note that the elements in the variance-covariance matrix are not stochastic themselves, but rather they are functions of the random variable ε and the non-stochastic variable X. This means that they do not vary with each sample, but rather they are fixed values based on the underlying assumptions of the OLS model.

In terms of your concern about the variance of an estimator being a deterministic parameter, this is true in general. However, in the case of OLS, the variance of the estimator is a function of the sample statistics (X and ε) and therefore is not a fixed value. This is because the OLS estimator is based on a sample of data, rather than the entire population, and therefore the sample statistics will vary from sample to sample.

Overall, it is important to understand the assumptions and properties of the OLS model in order to fully grasp the variance of the estimator. I would suggest reviewing the assumptions of the OLS model and the derivation of the variance-covariance matrix to gain a better understanding. Additionally, consulting with a statistician or taking a course on regression analysis may also be helpful in understanding this topic.
 

1. What is the OLS estimator?

The OLS (Ordinary Least Squares) estimator is a statistical method used to estimate the parameters of a linear regression model. It calculates the best fitting line by minimizing the sum of squared errors between the predicted and actual values.

2. What is variance in statistics?

Variance is a measure of how spread out a set of data points are from the mean. It is calculated by taking the average of the squared differences between each data point and the mean.

3. Why is understanding the variance of the OLS estimator important?

Understanding the variance of the OLS estimator is important because it helps us assess the accuracy and reliability of our regression model. A high variance indicates that the estimated coefficients are not stable, and small changes in the data can greatly affect the results.

4. How is the variance of the OLS estimator calculated?

The variance of the OLS estimator is calculated by taking the square of the standard error of the estimated coefficients. The standard error is calculated by dividing the residual sum of squares by the degrees of freedom and taking the square root.

5. What factors can affect the variance of the OLS estimator?

There are several factors that can affect the variance of the OLS estimator, including the sample size, multicollinearity among predictor variables, and the presence of outliers in the data. Additionally, using non-linear regression models or including interactions in the model can also increase the variance of the OLS estimator.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
280
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
309
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
4K
Replies
5
Views
873
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Back
Top