Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

GARCH fitting to binary data / latent data

  1. May 30, 2012 #1
    Dear all

    I am trying to fit a simple ARCH(1)/GARCH(1,1) model to a set of binary data, i.e. I assume a latent GARCH process that is only observed at the values a and b, say (whenever it crosses or hits those thresholds). I found some ideas on fitting a censored GARCH (by SX Wei, for example, see attachment), but this appears to me more complicated then my setup. I don't believe that this question has never been worked on, but did not find a suitable paper after some hours of searching. It would be fantastic to have a suitable R-package or general guidance.
    Thank you very much for any ideas or hints.


    Attached Files:

  2. jcsd
  3. May 30, 2012 #2


    User Avatar
    Science Advisor

    Hey ARDE and welcome to the forums.

    I did a quick google search and got this:


    Hopefully this will give you a leg up.
  4. May 30, 2012 #3
    Hi chiro,

    thanks a lot!
    But I am afraid none of the r-packages discussed on that web page accept binary (discrete) inputs.

  5. May 30, 2012 #4
    Hi ARDE,

    I think this paper hardly relates to what you are asking; it only studies linear constraints in a GARCH model, but within the constraints data is not discrete.

    GARCH models have an underlying εt distribution (usually Normal or t-student) where the impact follows a multiplicative model [itex]a_t = \sigma_t \epsilon_t[/itex]

    So trying to apply a GARCH model directly to a stream of 1s and 0s makes no much sense, that's probably why there is no package in R dealing with that.

    So what I am guessing is that you have a problem, you think that this idea is the solution, and then wonder how to do it, but maybe if you tell us about the problem itself we can better comment on your idea.
  6. May 30, 2012 #5
    Hi viraltux,

    thanks for your reply!
    I have a series (or a panel) of N=10.000 patient data and their doctor visits over approx. 10 years. So, for each patient I see e.g. 000001100001110100000001... with a 1 indicating a doctor visit in that week. Many authors model such a series as some 1st order Markov process. But when I do so, I am not satisfied with the clusters of doctor visits that I get (they have too many gaps and they are missing any long range dependence). So I played around with a simple GARCH(1,1) taking only its absolute values and cutting it off at some threshold b representing the doctor visits (the 1s). I get a surprisingly good fit for my 1s and the underlying GARCH has a nice interpretation as latent health status.
    What I am looking for is a systematic way to use the information of my 0-1 data in order to fit the GARCH (as they clearly give us at least some information on it).

  7. May 30, 2012 #6
    Correct me if I am wrong, but you want to interpret the underlying health status of a patient based of the stream of 0s and 1s? Right?

    OK, if this is so a GARCH model is definitely not the way to go. GARCH estimates volatility in a model, if for example you had a patient with all 1s 111111111111111 it would not have any volatility at all and the GARCH model will not differentiate this case from the case 0000000000000000. And that probably is baaaaaad...

    So you stated the situation, but before we go on maybe you want to detail exactly what you wanna get out from the data? health status? chances to get more visits in the future? .... Sorry for so many questions, but in a few days in PF I got some experience solving the wrong problem :tongue:
    Last edited: May 30, 2012
  8. May 31, 2012 #7
    Hi viraltux,

    thanks for the discussion. I attached a simple picture to illustrate my ideas and questions. And no, I don't want to get out from the data the latent health status explicitly (although this is a nice by-product) nor will I make predictions. I want to model the clusters of doctor visits (the 1s in the stream) and give it a plausible underlying process. I still think the GARCH is a good candidate because of its clustering and its autocovariance structure (doctor visits after some weeks of no doctor still belong to the same illness). Also, I like the fact that the GARCH "overshoots" the doctor threshold (b) from time to time very clearly. Because not all illnesses are equally serious (in the data we only see the doctor visit, but you may have had a cold or a heart attack...).



    Attached Files:

  9. May 31, 2012 #8
    Hi ARDE,

    GARCH is definitely not the way to go, the seemingly good fitting you get is spurious, if you check the significance of the model parameters you'll see is terrible, and the only reason you get a good plot fitting is because the resulting model does little else than to bet "if today no doctor then tomorrow no doctor, and if today doctor, tomorrow doctor."

    Volatility is a finance term to what statisticians call variance, so the variance of 0000000 is the same of 1111111111, that is, none. What you call high and low volatility is the estimation of the constant level of the volatility process which has nothing to do with the volatility itself.

    The underlying mathematical model of GARCH has nothing to do with you problem and actually I agree with the authors you mention in that a Markov process might be the way to go. Now, it seems that this approach does not quite work for you but there are many different models using Markov.

    Anyway, since there is an obvious autoregressive behavior and given the features of your problem and the nature of your complains I would suggest to you to check the following models

    Threshold Autoregressive Models

    Markov Switching Models

    You will have to adjust some assumptions but they are a better shot than GARCH, I think you should definitely drop that idea.
    Last edited: May 31, 2012
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook