# How do you implement the Dickey-Fuller test?

• I
Hi there,

I've recently start learning methods for determining whether or not time series are stationary. The first method I'm trying to learn is the 'Dickey-Fuller Test'. This test uses a time series modeled by an AR(1) process. The key is to find whether or not this process contains a unit root. If it contains a unit root, the series is said to be non-stationary.

While I'm understanding most of the derived equations, I'm inexperienced in hypothesis testing. Thus I'm struggling with the part when we actually implement the Dickey-Fuller test. There do not seem to be many resources that outline the iterative process for conducting the test.

I've outlined my question in full here:

http://imgur.com/HMWtn59

I appreciate any help!

Stephen Tashi
I commiserate with you about how sketchy explanations of the test on the web are. I don't know the answer, but I'll make a guess. Lets say your time series data is ## x_1, x_2, x_3,...## and we are assuming an ##AR(1)## model. I think you do a least squares fit of an equation of the form ## y = A w ## to your data ## (y_t, w_t ) ## where ##y_t = x_t - x_{t-1}## and ##w_t =x_{t-1}##. This isn't the usual kind of linear least squares fit because the equation being fit isn't ##y = Aw + B##.

The Dickey-Fuller statistic is ##(A-1)## divided by "##se(A)##. It seems "##se(A)##" is supposed to abbreviate "the standard error of "##A##". However, ##A## is a constant, so it's hard to see why it has any "standard error". Perhaps " ##se(A)##" is supposed to denote an estimate of the standard deviation between the predicted values and actual values. If that is the case then ##se(A)## is an estimated standard deviation computed from the data values ##( (x_t -x_{t-1}) - Ax_{t-1})##. That raises the question of which method of estimating the standard deviation is implied by the term "standard error" - e.g. divide by N or divide by N-1 ? You'll have to figure out that vocabulary exercise.

I commiserate with you about how sketchy explanations of the test on the web are. I don't know the answer, but I'll make a guess. Lets say your time series data is ## x_1, x_2, x_3,...## and we are assuming an ##AR(1)## model. I think you do a least squares fit of an equation of the form ## y = A w ## to your data ## (y_t, w_t ) ## where ##y_t = x_t - x_{t-1}## and ##w_t =x_{t-1}##. This isn't the usual kind of linear least squares fit because the equation being fit isn't ##y = Aw + B##.

The Dickey-Fuller statistic is ##(A-1)## divided by "##se(A)##. It seems "##se(A)##" is supposed to abbreviate "the standard error of "##A##". However, ##A## is a constant, so it's hard to see why it has any "standard error". Perhaps " ##se(A)##" is supposed to denote an estimate of the standard deviation between the predicted values and actual values. If that is the case then ##se(A)## is an estimated standard deviation computed from the data values ##( (x_t -x_{t-1}) - Ax_{t-1})##. That raises the question of which method of estimating the standard deviation is implied by the term "standard error" - e.g. divide by N or divide by N-1 ? You'll have to figure out that vocabulary exercise.

Hi Stephen,

I appreciate the response! After doing further investigation, I believe you are correct in the sense that the parameter must be solved via a least squares estimates. I am currently attempting to find a method for a solving a polynomial y(x) = a0 + a1*x, where I make the assumption a0 = 0. However, I'm having a tough time in how I solve this least squares problem while being able to make the assumption a0 = 0...

I will let you know if I find out anything else!

chiro
Hey tomizzo.

Is there any reason you can't fit the data and do an inference on a0?

For a simple linear regression you can technically set a0 = 0 and see the effect it has on the data (that has to follow this constraint if it is set to zero).

chiro