GARCH fitting to binary data / latent data

ARDE · May 30, 2012

Dear all

I am trying to fit a simple ARCH(1)/GARCH(1,1) model to a set of binary data, i.e. I assume a latent GARCH process that is only observed at the values a and b, say (whenever it crosses or hits those thresholds). I found some ideas on fitting a censored GARCH (by SX Wei, for example, see attachment), but this appears to me more complicated then my setup. I don't believe that this question has never been worked on, but did not find a suitable paper after some hours of searching. It would be fantastic to have a suitable R-package or general guidance.
Thank you very much for any ideas or hints.
Regards

ARDE

chiro · May 30, 2012

ARDE said:

Dear all

I am trying to fit a simple ARCH(1)/GARCH(1,1) model to a set of binary data, i.e. I assume a latent GARCH process that is only observed at the values a and b, say (whenever it crosses or hits those thresholds). I found some ideas on fitting a censored GARCH (by SX Wei, for example, see attachment), but this appears to me more complicated then my setup. I don't believe that this question has never been worked on, but did not find a suitable paper after some hours of searching. It would be fantastic to have a suitable R-package or general guidance.
Thank you very much for any ideas or hints.
Regards

ARDE

Hey ARDE and welcome to the forums.

I did a quick google search and got this:

http://cran.r-project.org/web/views/Finance.html

Hopefully this will give you a leg up.

ARDE · May 30, 2012

Hi chiro,

thanks a lot!
But I am afraid none of the r-packages discussed on that web page accept binary (discrete) inputs.

ARDE

viraltux · May 30, 2012

ARDE said:

Dear all

I am trying to fit a simple ARCH(1)/GARCH(1,1) model to a set of binary data, i.e. I assume a latent GARCH process that is only observed at the values a and b, say (whenever it crosses or hits those thresholds). I found some ideas on fitting a censored GARCH (by SX Wei, for example, see attachment), but this appears to me more complicated then my setup. I don't believe that this question has never been worked on, but did not find a suitable paper after some hours of searching. It would be fantastic to have a suitable R-package or general guidance.
Thank you very much for any ideas or hints.
Regards

ARDE

Hi ARDE,

I think this paper hardly relates to what you are asking; it only studies linear constraints in a GARCH model, but within the constraints data is not discrete.

GARCH models have an underlying ε_t distribution (usually Normal or t-student) where the impact follows a multiplicative model [itex]a_t = \sigma_t \epsilon_t[/itex]

So trying to apply a GARCH model directly to a stream of 1s and 0s makes no much sense, that's probably why there is no package in R dealing with that.

So what I am guessing is that you have a problem, you think that this idea is the solution, and then wonder how to do it, but maybe if you tell us about the problem itself we can better comment on your idea.

ARDE · May 30, 2012

Hi viraltux,

thanks for your reply!
I have a series (or a panel) of N=10.000 patient data and their doctor visits over approx. 10 years. So, for each patient I see e.g. 000001100001110100000001... with a 1 indicating a doctor visit in that week. Many authors model such a series as some 1st order Markov process. But when I do so, I am not satisfied with the clusters of doctor visits that I get (they have too many gaps and they are missing any long range dependence). So I played around with a simple GARCH(1,1) taking only its absolute values and cutting it off at some threshold b representing the doctor visits (the 1s). I get a surprisingly good fit for my 1s and the underlying GARCH has a nice interpretation as latent health status.
What I am looking for is a systematic way to use the information of my 0-1 data in order to fit the GARCH (as they clearly give us at least some information on it).

ARDE

viraltux · May 30, 2012

ARDE said:

Hi viraltux,

thanks for your reply!
I have a series (or a panel) of N=10.000 patient data and their doctor visits over approx. 10 years. So, for each patient I see e.g. 000001100001110100000001... with a 1 indicating a doctor visit in that week. Many authors model such a series as some 1st order Markov process. But when I do so, I am not satisfied with the clusters of doctor visits that I get (they have too many gaps and they are missing any long range dependence). So I played around with a simple GARCH(1,1) taking only its absolute values and cutting it off at some threshold b representing the doctor visits (the 1s). I get a surprisingly good fit for my 1s and the underlying GARCH has a nice interpretation as latent health status.
What I am looking for is a systematic way to use the information of my 0-1 data in order to fit the GARCH (as they clearly give us at least some information on it).

ARDE

Correct me if I am wrong, but you want to interpret the underlying health status of a patient based of the stream of 0s and 1s? Right?

OK, if this is so a GARCH model is definitely not the way to go. GARCH estimates volatility in a model, if for example you had a patient with all 1s 111111111111111 it would not have any volatility at all and the GARCH model will not differentiate this case from the case 0000000000000000. And that probably is baaaaaad...

So you stated the situation, but before we go on maybe you want to detail exactly what you want to get out from the data? health status? chances to get more visits in the future? ... Sorry for so many questions, but in a few days in PF I got some experience solving the wrong problem

ARDE · May 31, 2012

Hi viraltux,

thanks for the discussion. I attached a simple picture to illustrate my ideas and questions. And no, I don't want to get out from the data the latent health status explicitly (although this is a nice by-product) nor will I make predictions. I want to model the clusters of doctor visits (the 1s in the stream) and give it a plausible underlying process. I still think the GARCH is a good candidate because of its clustering and its autocovariance structure (doctor visits after some weeks of no doctor still belong to the same illness). Also, I like the fact that the GARCH "overshoots" the doctor threshold (b) from time to time very clearly. Because not all illnesses are equally serious (in the data we only see the doctor visit, but you may have had a cold or a heart attack...).

Thanks,

ARDE

viraltux · May 31, 2012

ARDE said:

Hi viraltux,

thanks for the discussion. I attached a simple picture to illustrate my ideas and questions. And no, I don't want to get out from the data the latent health status explicitly (although this is a nice by-product) nor will I make predictions. I want to model the clusters of doctor visits (the 1s in the stream) and give it a plausible underlying process. I still think the GARCH is a good candidate because of its clustering and its autocovariance structure (doctor visits after some weeks of no doctor still belong to the same illness). Also, I like the fact that the GARCH "overshoots" the doctor threshold (b) from time to time very clearly. Because not all illnesses are equally serious (in the data we only see the doctor visit, but you may have had a cold or a heart attack...).

Thanks,

ARDE

Hi ARDE,

GARCH is definitely not the way to go, the seemingly good fitting you get is spurious, if you check the significance of the model parameters you'll see is terrible, and the only reason you get a good plot fitting is because the resulting model does little else than to bet "if today no doctor then tomorrow no doctor, and if today doctor, tomorrow doctor."

Volatility is a finance term to what statisticians call variance, so the variance of 0000000 is the same of 1111111111, that is, none. What you call high and low volatility is the estimation of the constant level of the volatility process which has nothing to do with the volatility itself.

The underlying mathematical model of GARCH has nothing to do with you problem and actually I agree with the authors you mention in that a Markov process might be the way to go. Now, it seems that this approach does not quite work for you but there are many different models using Markov.

Anyway, since there is an obvious autoregressive behavior and given the features of your problem and the nature of your complains I would suggest to you to check the following models

Threshold Autoregressive Models
http://en.wikipedia.org/wiki/SETAR_(model)

Markov Switching Models
http://en.wikipedia.org/wiki/Markov_switching_multifractal

You will have to adjust some assumptions but they are a better shot than GARCH, I think you should definitely drop that idea.

GARCH fitting to binary data / latent data

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Attachments

Attachments

Similar threads

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect