Quantiles of a log-multivariate-normal-distributed set.

  • Thread starter mtal
  • Start date

Let [itex]X[/itex] be a set of [itex]N[/itex] lognormal prices (in dollars), meaning
[tex]\log(X) = Y \sim MN(\mu_Y , \Sigma_Y) ,[/tex]
i.e. the log of [itex]X[/itex] follows a multivariate normal distribution.

Imagine now that one wants to compute various quantiles for this set, e.g. 2.5%, 50% and 97.5\%, and does this by simulating 100k draws from the distribution above.
You then get say a [itex]100k \times N[/itex] matrix, and to get the total value you'd find these three quantiles for each of the [itex]N[/itex] prices, resulting in a [itex]N \times 3[/itex] matrix of quantiles, and then simply add up each of the three columns giving three numbers which represent the total quantiles for the whole set.

So, my question is:
How would one go about transforming the information for the quantiles for the whole set from log-dollars to dollars?
I am of course aware of the exponential function, so what I'm asking is how and where in this process do I use it?


My main idea is this:
The three final quantiles, 2.5%, 50% and 97.5\%, represent the log-quantiles of the set as a whole, say [itex] V_{2.5\%}, V_{50\%}[/itex] and [itex] V_{97.5\%}[/itex]. They also have their log/exp-counterparts, e.g. [itex] V_{50\%} = log(X_{50\%})[/itex].
Now, the difference between, for example, the 50% and 2.5%, [itex]V_{50\%} - V_{2.5\%}[/itex] could then be represented as
[tex] log(X_{50\%}) - log(X_{2.5\%}) = log\left(\frac{X_{50\%}}{X_{2.5\%}}\right),[/tex]

meaning that the log difference can be interpreted as log-proportional difference. Thus one could just exp and inverse this value and get

[tex] exp \left( log \left( \frac{ X_{50\%} }{ X_{2.5\%} } \right) \right)^{-1} = \frac{X_{2.5\%}}{X_{50\%}} [/tex]

which gives you the proportional difference of the quantiles in dollars, instead of log-dollars. With this information the 2.5% quantile can be obtained by multiplying [itex] X_{2.5\%}[/itex] with the mean of the [itex]N[/itex] prices.

Or am I way off here?

Any help is greatly appreaciated.
First, let me make sure I have something clear. You have these N prices, and you just want to know, for each price, where the 2.5, 50, and 97.5 percentiles are? You're not paying any attention to the correlations among them, right? So, in fact, the fact that there are N of them isn't important, because you can just do each one on its own.

If all you want is quantiles based on a large (100k) sample, you don't need to muck around with logs and exponentials at all, and indeed, you need not know anything about the underlying distribution. For a given stock, sort your 100k prices. The 2.5 percentile is [itex]\frac{x_{250} + x_{251}}{2}[/itex], and the 50 and 97.5 percentiles are analogous. That's it. You're done.

If you want to make use of the fact that they're lognormal, then you can do something different that won't require such a large sample size. (This would be useful, for instance, if you were dealing with real data, such as daily closing prices, where the quantity is limited.) Let's call the price of stock i on day j [itex]p_{ij}[/itex]. Take the (natural) logs of all those prices -- let's call those [itex]\pi_{ij}[/itex]. For each i, calculate the mean and standard deviation of the [itex]\pi_{i.}[/itex]. Now the 2.5 percentile is at [itex]\pi=\mu -1.95996 \sigma[/itex], the 50 at [itex]\mu[/itex], and the 97.5 at [itex]\mu + 1.95996 \sigma[/itex]. To get the percentiles for p, just use [itex]p=e^\pi[/itex].

A big advantage of this second approach is that you can pay attention to the correlations between your different stocks. All you need to do is calculate the [itex]N(N-1)[/itex] covariances from your sample data, and you now have a complete description of the multivariate lognormal distribution. This can be used, for instance, to predict the behavior of a portfolio of stocks. (Of course, the predictions will only be as reliable as the assumption of normality.)

EDIT: Reading your post again, I think this is still not exactly what you want. But I don't know what you DO want. What exactly do you mean by the 2.5 percentile for a list of N prices? I know what the percentile is for one scalar random variable, but how do you define a percentile for a list of random variables?
Last edited:
Thanks for the reply!

These prices are estimates from a statistical model and are indeed correlated.

About the second approach you mentioned: The multivariate log-normal distribution didn't enter my mind before since I had never thought of its existence before. I did some searching and found that R (the statistical program I use) has a function which simulates data from the MVLN distribution given a mean vector of logs and a covariance matrix of logs. That is exactly what I was looking for!

But is that the case then? If the log of the data follows a multivariate normal, does the data itself follow the MVLN?
But is that the case then? If the log of the data follows a multivariate normal, does the data itself follow the MVLN?
That is in fact the definition.

I still don't get what you're trying to do, though. It seems like you want to have ONE number for the 2.5 percentile of (say) 23 stock prices. I just don't understand what that means. How could the 2.5 percentile of 23 stock prices be a single number? Now, if you had a portfolio containing 23 stocks, i.e. a weighted sum of the stock prices, and you wanted the 2.5 percentile of the portfolio, that I would understand.
Yes, I do have the equivalence of a portfolio, in that I'm trying to estimate the behaviour of the price of the set as a whole (i.e. the behaviour of the sum of the prices).

What I mean by the "x percentile of N prices" is that I would make some K draws from the multivariate normal distribution, resulting in K vectors of length N. Then for each of the N prices calculate the corresponding quantiles, then sum each column of quantiles. This would result in three different quantiles for the set as a whole. Right?
Ah, I see. But that won't actually give you the correct quantiles for the sum of the prices. If you want to do that, you need to:

(1) Draw K vectors of length N.
(2) Sum up each of the K vectors, to get a single vector of length K.
(3) Sort vector (2) and locate the quantiles.

What you're proposing is something like (1), (3), (2), but that will not give the right answer. The quantile of the sums is not the sum of the quantiles.
Thanks a lot!

I was so concentrated on this whole exp-log idea that I somehow never looked at adding the rows instead of the columns.

And thanks for pointing out where I went wrong, you've been tons of help!

Related Threads for: Quantiles of a log-multivariate-normal-distributed set.

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving