# Quantile function for leptokurtotic sampling distribution

1. Oct 30, 2012

### ephedyn

I have a research problem that requires me to find the 95th percentile of a sampling distribution like this:

http://www.nuclearphynance.com/User Files/9231/1minVARCLOSE.png

My first question is, what would be a mathematically sound way to calculate something like this, with several clusters around and gaps between the 10's and significant leptokurtosis?

More specifically, this comes from quantitative finance where they have the idea of "value at risk" (VaR) where one wants to know with 5% chance that your one-day loss will be at least the 95th percentile or above. The naive approach is to fit a normal distribution to the historical losses, then find the 95th percentile on that fitted distribution. But I have noticed that this has no meaningful use for a leptokurtotic distribution.

The second question I have is less statistics than quant finance, but nonetheless I'll ask in case someone has an idea how to approach it. In particular, observe that my losses sampled every 1-minute. This means that I would end up calculating a 1-minute VaR, which is again, not very meaningful because one usually wants to know what's a worst case scenario say every 1 day. A common approach seems to be to scale the losses into expected 1-day losses, so multiply the losses by a factor of 7*60 (7 hours * 60 minutes in a trading day), before taking the VaR. While this works well in practice if you started with daily return and are looking to know what is the 1-month VaR, 1-week VaR etc., I realized that nonlinear behavior is more pronounced at high frequency (1 second, 1 minute etc.) and it is not sound to scale the losses by a factor of 7*60.

How should I deal with this? Perhaps I should throw out the VaR technique altogether for intraday returns?

Last edited by a moderator: Oct 30, 2012
2. Oct 30, 2012

### chiro

Hey ephedyn.

You have two major choices: the first is to decide whether you want to fit an existing parametric distribution to this loss data or whether to use the empirical distribution (i.e. your data) and the second is to figure out whether you want to apply additional attributes that are related to the underlying process that may have contributed to this data to decide what kind of analysis and model constraints you wish to use.

Personally I would not want to use a fitted normal to something like this because it's just ridiculously over-simplified and wrong looking at the data.

If you used the empirical distribution (i.e. the stuff in your sample data), you need to figure out a resolution for your bins. Smaller resolutions give more accuracy but are more computationally intensive depending on how much data is in that graph.

But in saying that you can easily get an empirical PDF (take the frequencies and normalize them by dividing by the number of total observations) and then your quantile function just calculates a CDF by finding when some value is either equal or closest to equal from the left.

Now if you want to actually analyze this in the context of asset/liability-modeling or something that pays attention to the process characteristics (or assumed ones) then this is going to be a lot more advanced.

The more detail you add and the more complex the assumptions, the more complex the model and the analyses and I don't know anything about your situation both process wise or financially to give you any specific advice on this matter.

Quantile calculation in general is very easy, but actually making sense of stuff in a way that looks at what is going on in a particular context is another thing entirely.

Depending what assets/liabilities are being modelled, you might want to look at either textbooks, journal papers, normal books, and other resources in the financial or actuarial fields since actuaries (as well as financial professionals) need to deal with asset-liability modelling so that the insurance firm doesn't become insolvent and as such, they have developed a lot of stuff mathematically (and non-mathematically) that addresses these sorts of issues.

3. Oct 30, 2012

### ephedyn

Ah thanks for the quick reply. I'm glad that you pointed out that I can do this nonparametrically. I feel really stupid now. :shy: Is there any drawback for choosing a large number of bins besides computational cost? So besides that I guess the only problem that I need to solve now involves the nonlinearity of returns and scaling them to make sense contextually on a 1-day basis...

I couldn't find good references on this topic. I suspect most of it is kept proprietary.

Regarding the situation: This is from a project for our introductory risk management class and I'm just supposed to come up with my own trading strategy (it can be something basic like buy-and-hold multiple assets, and we are allowed to use data of any instrument) but use the various risk measures that we've covered, including the VaR, to evaluate the strategy.

So I decided to write a strategy that basically earns rebates by providing liquidity (which explains the big spike near the 0), which basically means I just keep being the counterparty to anyone who wants to trade immediately, and try to offset my net positions accumulated that way so that I eventually just earn the rebate from the exchange. This explains why most of my negative losses (i.e. profits) are consistently nearly 0.

The reason for the strange distribution curve is because I tried to make my backtest realistic by including exchange rate, rebates, commissions and leverage in the strategy. The minimum tick size * leverage explains why the prices are clustered around the 10's and the small variations in each cluster are mainly a result of the exchange rate, commissions and rebates. I recognized that the issue with this strategy is that I still have to informed about the true value of the instrument(s) for which I'm supplying liquidity, otherwise I will just be adversely selected against by more informed participants. So various circumstances can still land me profits/losses far from the mode.

Then of course my methodology is not perfect, because there are some dynamics of participating at the bid-ask spread that dramatically change the outcome...

I tried to tone down on all these details in my earlier post because they just make things more confusing.

4. Oct 30, 2012

### chiro

The bin size is really just a computational trade-off with the resolution of information in the distribution.

If you just want to calculate some simple probabilities and moments then once you have the binned PDF, I would equate the computation of such properties roughly to those of dealing with distributions that are for example non-integrable or large discrete distributions (where continuous approximations don't exist or aren't known).

Computers nowadays do a good job of computing lots of stuff pretty quickly so even if you had say 100,000 bins you'd still get the results pretty quickly.

5. Oct 30, 2012

### twofish-quant

I'd avoid the temptation to try to reduce the risk measure into a single number. There is very interesting stuff going on here, and I'd deal with the probability distribution as a probability distribution. Without understanding what is going on, scaling things is dangerous, and I'd rather not even try.

The main question that you need to think through is "what exactly do you want to know"? If you are trying to test trading strategies then you really want to back test everything with the full probability distribution. Anything else is dangerous.

The other thing that you need to do is to question whether your model is in fact a real representation of the actual market. Tick sizes on US exchanges are typically 0.01. If you have enough so much leverage that the you start having this sort of periodicity, then the regulators and the exchanges won't let you trade. Stock markets are typically levered at a maximum of 10:1 and usually much lower. Typically in stock markets the liquidity providers are not very highly leveraged but rather make money by holding large amounts of capital executing trades in large volumes.

You can get massive leverage in commodities and foreign exchange, but they have extremely tiny tick sizes.

One thing to remember is that people think pretty carefully about the rules. The rules are set up so that you end up with "reasonable" looking probability distributions. If you end up with bizarre probability distributions then most people will just say "too weird" and refuse to trade.

When you are providing liquidity there is really no such thing as the "true value" of the instrument. If you are doing short term trading the value that you are providing is that you are serving as the middleman between buyers and sellers and collecting a small difference. The "true value" is determined by the ultimate buyer and seller and that number happens at time scales that are far, far different than the ones you are looking at.

6. Oct 30, 2012

### twofish-quant

Something that also occurs to me is that you could try this as an approach.

Start out by turning this into a discrete distribution (i.e. P(0)=x, P(-1)=-x, P(+1)=x). Assume that distributions are independent and then you can with some simple math, figure out the culmulative distributions in terms of ticks. The thing about that is that if you let this run for a day, then the ticks ought to be dense enough so that you can approximate the distribution with a continuous function, and then you can figure out percentiles.

One other way of thinking about this is to look at the envelope of the function so you have

f_real(x) = f_smooth(x) * spikes

You can do math with f_smooth(x).

I think the basic problem here is that your resolution is too high, and you've magnified things enough so that you are seeing the individual atoms or pixels rather than the overall distribution.

7. Oct 30, 2012

### BWV

8. Oct 30, 2012

### twofish-quant

One other thing is to step back and look at the business aspects of this and remember that "which risk measure to use" is as much a business problem as a mathematical one.

When you are calculating "VaR", you actually aren't interested in knowing what the worst case scenario is, and "VaR" is in fact a *terrible* measure for determining what the worst case scenario is. The way that trading desks work is that they all have trading limits. The millisecond, you hit the trading limit, the system automatically liquidates your positions and you've lost. *Assuming* (and these are very big assumptions) that the market hasn't moved and you can liquidate your positions, then your loss is limited to your trading limits.

VaR is a way of estimating where to set the trading limits. VaR doesn't take into account losses beyond the trading limit since in "normal" situations, your positions are limited since you've liquidated when you hit those positions. Much of the "day-to-day" work of risk management is to set daily limits and on "normal" days, the assumption that you can liquidate at the value of the trade limit is valid.

Also in a lot of situations where the tails aren't important, you can approximate with a Gaussian. Gaussians are useful because you can do the math in your head, and in a most routine situations, you need a decent answer quickly. *If* you are in a situation in which you will can and will liquidate once you hit a barrier, then the long tail is not important, and Gaussians are a pretty decent way of dealing with that situation. If you are not in that situation, then you could end up blowing up the world. Using Gaussian measures in situation where they were not appropriate was one of the major factors in the financial crisis.

If you have a risk distribution that looks "weird" then the standard assumptions and practices that people use will break down, and it's likely that you won't be allowed to trade that situation. I think that if you run a simulation and look at "daily" returns, it will turn out that the peaks will smooth out and you'll get something that is tradeable.