I How do you implement the Dickey-Fuller test?

AI Thread Summary
The discussion focuses on implementing the Dickey-Fuller test to determine the stationarity of time series data modeled by an AR(1) process. Key points include the necessity of fitting a least squares model to the data and calculating the Dickey-Fuller statistic as (A-1) divided by the standard error of A. There is confusion regarding the standard error's definition and how to estimate it, particularly whether to divide by N or N-1. Participants also discuss the possibility of constraining the intercept parameter in the regression model and the implications for the analysis. Overall, the conversation highlights the complexities and challenges in applying the Dickey-Fuller test effectively.
tomizzo
Messages
113
Reaction score
2
Hi there,

I've recently start learning methods for determining whether or not time series are stationary. The first method I'm trying to learn is the 'Dickey-Fuller Test'. This test uses a time series modeled by an AR(1) process. The key is to find whether or not this process contains a unit root. If it contains a unit root, the series is said to be non-stationary.

While I'm understanding most of the derived equations, I'm inexperienced in hypothesis testing. Thus I'm struggling with the part when we actually implement the Dickey-Fuller test. There do not seem to be many resources that outline the iterative process for conducting the test.

I've outlined my question in full here:

http://imgur.com/HMWtn59

I appreciate any help!
 
Physics news on Phys.org
I commiserate with you about how sketchy explanations of the test on the web are. I don't know the answer, but I'll make a guess. Let's say your time series data is ## x_1, x_2, x_3,...## and we are assuming an ##AR(1)## model. I think you do a least squares fit of an equation of the form ## y = A w ## to your data ## (y_t, w_t ) ## where ##y_t = x_t - x_{t-1}## and ##w_t =x_{t-1}##. This isn't the usual kind of linear least squares fit because the equation being fit isn't ##y = Aw + B##.

The Dickey-Fuller statistic is ##(A-1)## divided by "##se(A)##. It seems "##se(A)##" is supposed to abbreviate "the standard error of "##A##". However, ##A## is a constant, so it's hard to see why it has any "standard error". Perhaps " ##se(A)##" is supposed to denote an estimate of the standard deviation between the predicted values and actual values. If that is the case then ##se(A)## is an estimated standard deviation computed from the data values ##( (x_t -x_{t-1}) - Ax_{t-1})##. That raises the question of which method of estimating the standard deviation is implied by the term "standard error" - e.g. divide by N or divide by N-1 ? You'll have to figure out that vocabulary exercise.
 
Stephen Tashi said:
I commiserate with you about how sketchy explanations of the test on the web are. I don't know the answer, but I'll make a guess. Let's say your time series data is ## x_1, x_2, x_3,...## and we are assuming an ##AR(1)## model. I think you do a least squares fit of an equation of the form ## y = A w ## to your data ## (y_t, w_t ) ## where ##y_t = x_t - x_{t-1}## and ##w_t =x_{t-1}##. This isn't the usual kind of linear least squares fit because the equation being fit isn't ##y = Aw + B##.

The Dickey-Fuller statistic is ##(A-1)## divided by "##se(A)##. It seems "##se(A)##" is supposed to abbreviate "the standard error of "##A##". However, ##A## is a constant, so it's hard to see why it has any "standard error". Perhaps " ##se(A)##" is supposed to denote an estimate of the standard deviation between the predicted values and actual values. If that is the case then ##se(A)## is an estimated standard deviation computed from the data values ##( (x_t -x_{t-1}) - Ax_{t-1})##. That raises the question of which method of estimating the standard deviation is implied by the term "standard error" - e.g. divide by N or divide by N-1 ? You'll have to figure out that vocabulary exercise.

Hi Stephen,

I appreciate the response! After doing further investigation, I believe you are correct in the sense that the parameter must be solved via a least squares estimates. I am currently attempting to find a method for a solving a polynomial y(x) = a0 + a1*x, where I make the assumption a0 = 0. However, I'm having a tough time in how I solve this least squares problem while being able to make the assumption a0 = 0...

I will let you know if I find out anything else!
 
Hey tomizzo.

Is there any reason you can't fit the data and do an inference on a0?

For a simple linear regression you can technically set a0 = 0 and see the effect it has on the data (that has to follow this constraint if it is set to zero).
 
Also - with regard to your AR(1) time series - have you looked at results for AR(x) in terms of finding a distribution that has a fixed variance/mean?

The AR(1) is the simplest one and you can find a recurrence relation (this one is very simple) that gives you a constraint on whether it will "converge" or not.

You don't need all of the time series stuff with shifting operators and doing all the operator algebras (which you do when you have an arbitrary time series) - you can write it as a sum once you expand it and then look for the condition for your coefficient so that it will converge properly.

If you expand the recurrence relation out to an explicit form it should become clearer with regards to what I'm talking about.
 
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top