GARCH fitting to binary data / latent data

In summary, ARDE is seeking guidance on fitting a simple ARCH(1)/GARCH(1,1) model to a set of binary data representing patient doctor visits over a span of 10 years. They have found some ideas on fitting a censored GARCH, but are looking for a more straightforward approach. They are hoping for a suitable R-package or general advice on how to incorporate the 0-1 data into the GARCH model to accurately represent the underlying health status of patients.
  • #1
ARDE
5
0
Dear all

I am trying to fit a simple ARCH(1)/GARCH(1,1) model to a set of binary data, i.e. I assume a latent GARCH process that is only observed at the values a and b, say (whenever it crosses or hits those thresholds). I found some ideas on fitting a censored GARCH (by SX Wei, for example, see attachment), but this appears to me more complicated then my setup. I don't believe that this question has never been worked on, but did not find a suitable paper after some hours of searching. It would be fantastic to have a suitable R-package or general guidance.
Thank you very much for any ideas or hints.
Regards

ARDE
 

Attachments

  • Censored_GARCH.pdf
    708.8 KB · Views: 444
Physics news on Phys.org
  • #2
ARDE said:
Dear all

I am trying to fit a simple ARCH(1)/GARCH(1,1) model to a set of binary data, i.e. I assume a latent GARCH process that is only observed at the values a and b, say (whenever it crosses or hits those thresholds). I found some ideas on fitting a censored GARCH (by SX Wei, for example, see attachment), but this appears to me more complicated then my setup. I don't believe that this question has never been worked on, but did not find a suitable paper after some hours of searching. It would be fantastic to have a suitable R-package or general guidance.
Thank you very much for any ideas or hints.
Regards

ARDE

Hey ARDE and welcome to the forums.

I did a quick google search and got this:

http://cran.r-project.org/web/views/Finance.html

Hopefully this will give you a leg up.
 
  • #3
Hi chiro,

thanks a lot!
But I am afraid none of the r-packages discussed on that web page accept binary (discrete) inputs.

ARDE
 
  • #4
ARDE said:
Dear all

I am trying to fit a simple ARCH(1)/GARCH(1,1) model to a set of binary data, i.e. I assume a latent GARCH process that is only observed at the values a and b, say (whenever it crosses or hits those thresholds). I found some ideas on fitting a censored GARCH (by SX Wei, for example, see attachment), but this appears to me more complicated then my setup. I don't believe that this question has never been worked on, but did not find a suitable paper after some hours of searching. It would be fantastic to have a suitable R-package or general guidance.
Thank you very much for any ideas or hints.
Regards

ARDE

Hi ARDE,

I think this paper hardly relates to what you are asking; it only studies linear constraints in a GARCH model, but within the constraints data is not discrete.

GARCH models have an underlying εt distribution (usually Normal or t-student) where the impact follows a multiplicative model [itex]a_t = \sigma_t \epsilon_t[/itex]

So trying to apply a GARCH model directly to a stream of 1s and 0s makes no much sense, that's probably why there is no package in R dealing with that.

So what I am guessing is that you have a problem, you think that this idea is the solution, and then wonder how to do it, but maybe if you tell us about the problem itself we can better comment on your idea.
 
  • #5
Hi viraltux,

thanks for your reply!
I have a series (or a panel) of N=10.000 patient data and their doctor visits over approx. 10 years. So, for each patient I see e.g. 000001100001110100000001... with a 1 indicating a doctor visit in that week. Many authors model such a series as some 1st order Markov process. But when I do so, I am not satisfied with the clusters of doctor visits that I get (they have too many gaps and they are missing any long range dependence). So I played around with a simple GARCH(1,1) taking only its absolute values and cutting it off at some threshold b representing the doctor visits (the 1s). I get a surprisingly good fit for my 1s and the underlying GARCH has a nice interpretation as latent health status.
What I am looking for is a systematic way to use the information of my 0-1 data in order to fit the GARCH (as they clearly give us at least some information on it).

ARDE
 
  • #6
ARDE said:
Hi viraltux,

thanks for your reply!
I have a series (or a panel) of N=10.000 patient data and their doctor visits over approx. 10 years. So, for each patient I see e.g. 000001100001110100000001... with a 1 indicating a doctor visit in that week. Many authors model such a series as some 1st order Markov process. But when I do so, I am not satisfied with the clusters of doctor visits that I get (they have too many gaps and they are missing any long range dependence). So I played around with a simple GARCH(1,1) taking only its absolute values and cutting it off at some threshold b representing the doctor visits (the 1s). I get a surprisingly good fit for my 1s and the underlying GARCH has a nice interpretation as latent health status.
What I am looking for is a systematic way to use the information of my 0-1 data in order to fit the GARCH (as they clearly give us at least some information on it).

ARDE

Correct me if I am wrong, but you want to interpret the underlying health status of a patient based of the stream of 0s and 1s? Right?

OK, if this is so a GARCH model is definitely not the way to go. GARCH estimates volatility in a model, if for example you had a patient with all 1s 111111111111111 it would not have any volatility at all and the GARCH model will not differentiate this case from the case 0000000000000000. And that probably is baaaaaad...

So you stated the situation, but before we go on maybe you want to detail exactly what you want to get out from the data? health status? chances to get more visits in the future? ... Sorry for so many questions, but in a few days in PF I got some experience solving the wrong problem :tongue:
 
Last edited:
  • #7
Hi viraltux,

thanks for the discussion. I attached a simple picture to illustrate my ideas and questions. And no, I don't want to get out from the data the latent health status explicitly (although this is a nice by-product) nor will I make predictions. I want to model the clusters of doctor visits (the 1s in the stream) and give it a plausible underlying process. I still think the GARCH is a good candidate because of its clustering and its autocovariance structure (doctor visits after some weeks of no doctor still belong to the same illness). Also, I like the fact that the GARCH "overshoots" the doctor threshold (b) from time to time very clearly. Because not all illnesses are equally serious (in the data we only see the doctor visit, but you may have had a cold or a heart attack...).

Thanks,

ARDE
 

Attachments

  • GARCH.pdf
    27.5 KB · Views: 347
  • #8
ARDE said:
Hi viraltux,

thanks for the discussion. I attached a simple picture to illustrate my ideas and questions. And no, I don't want to get out from the data the latent health status explicitly (although this is a nice by-product) nor will I make predictions. I want to model the clusters of doctor visits (the 1s in the stream) and give it a plausible underlying process. I still think the GARCH is a good candidate because of its clustering and its autocovariance structure (doctor visits after some weeks of no doctor still belong to the same illness). Also, I like the fact that the GARCH "overshoots" the doctor threshold (b) from time to time very clearly. Because not all illnesses are equally serious (in the data we only see the doctor visit, but you may have had a cold or a heart attack...).

Thanks,

ARDE

Hi ARDE,

GARCH is definitely not the way to go, the seemingly good fitting you get is spurious, if you check the significance of the model parameters you'll see is terrible, and the only reason you get a good plot fitting is because the resulting model does little else than to bet "if today no doctor then tomorrow no doctor, and if today doctor, tomorrow doctor."

Volatility is a finance term to what statisticians call variance, so the variance of 0000000 is the same of 1111111111, that is, none. What you call high and low volatility is the estimation of the constant level of the volatility process which has nothing to do with the volatility itself.

The underlying mathematical model of GARCH has nothing to do with you problem and actually I agree with the authors you mention in that a Markov process might be the way to go. Now, it seems that this approach does not quite work for you but there are many different models using Markov.

Anyway, since there is an obvious autoregressive behavior and given the features of your problem and the nature of your complains I would suggest to you to check the following models

Threshold Autoregressive Models
http://en.wikipedia.org/wiki/SETAR_(model)

Markov Switching Models
http://en.wikipedia.org/wiki/Markov_switching_multifractal

You will have to adjust some assumptions but they are a better shot than GARCH, I think you should definitely drop that idea.
 
Last edited:

1. What is GARCH fitting and how does it apply to binary data?

GARCH (Generalized Autoregressive Conditional Heteroskedasticity) fitting is a statistical method used to model the volatility of a time series data. It is often used in finance to predict the volatility of asset prices. When applied to binary data, GARCH fitting is used to model the probability of a binary outcome, such as the likelihood of an event occurring.

2. How does GARCH fitting differ from traditional regression methods?

GARCH fitting differs from traditional regression methods in that it takes into account the heteroskedasticity (unequal variance) of the data. This is important in financial data, where volatility often changes over time. GARCH also incorporates autoregressive components to capture the time series nature of the data.

3. Can GARCH be used to fit other types of data besides binary data?

Yes, GARCH can be used to fit other types of data, such as continuous or count data. However, it is most commonly used for financial data and binary data.

4. How is GARCH fitting performed on latent data?

GARCH fitting on latent data involves using maximum likelihood estimation to estimate the parameters of the GARCH model. This is done by assuming a probability distribution for the latent data and then maximizing the likelihood of observing the data given the GARCH model.

5. What are some common limitations of GARCH fitting for binary data?

One limitation of GARCH fitting for binary data is that it assumes a linear relationship between the predictors and the probability of the binary outcome. This may not always be the case in real-world scenarios. Additionally, GARCH may have difficulty capturing sudden changes or shocks in the data, which can result in inaccurate predictions.

Similar threads

  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
Replies
2
Views
1K
Replies
1
Views
1K
  • Biology and Medical
Replies
2
Views
6K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
8
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • General Math
Replies
13
Views
9K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
6
Views
3K
Back
Top