Quantiles of a log-multivariate-normal-distributed set.

  • Context: Graduate 
  • Thread starter Thread starter mtal
  • Start date Start date
  • Tags Tags
    Set
Click For Summary

Discussion Overview

The discussion revolves around computing quantiles for a set of lognormal prices that follow a multivariate normal distribution. Participants explore methods for transforming log-quantiles back to dollar values and the implications of correlations among the prices in the context of statistical modeling.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes a method for simulating quantiles from a multivariate normal distribution and expresses uncertainty about how to convert log-quantiles to dollar values using the exponential function.
  • Another participant suggests that if the goal is to find quantiles for each price independently, one could simply sort the simulated prices without needing to consider the underlying distribution.
  • There is a proposal to use the properties of the lognormal distribution to calculate quantiles based on the mean and standard deviation of the log-transformed prices, emphasizing the potential to account for correlations among prices.
  • Some participants clarify that if the log of the data follows a multivariate normal distribution, then the data itself follows a multivariate lognormal distribution, which is acknowledged as a definition.
  • One participant expresses confusion about how to define a single percentile for multiple stock prices, suggesting a need for clarification on the concept of quantiles in this context.
  • Another participant points out that the method proposed for calculating quantiles of the sum of prices is incorrect, emphasizing the need to sum the vectors before determining quantiles.

Areas of Agreement / Disagreement

Participants generally agree on the properties of the multivariate lognormal distribution but disagree on the correct method for calculating quantiles for the sum of prices versus individual prices. The discussion remains unresolved regarding the best approach to define and compute the quantiles for a set of correlated prices.

Contextual Notes

Limitations include the need for clarity on how to define quantiles for a set of correlated random variables and the implications of using different methods for calculating quantiles based on the underlying distribution.

mtal
Messages
5
Reaction score
0
Hello,

Let X be a set of N lognormal prices (in dollars), meaning
\log(X) = Y \sim MN(\mu_Y , \Sigma_Y) ,
i.e. the log of X follows a multivariate normal distribution.

Imagine now that one wants to compute various quantiles for this set, e.g. 2.5%, 50% and 97.5\%, and does this by simulating 100k draws from the distribution above.
You then get say a 100k \times N matrix, and to get the total value you'd find these three quantiles for each of the N prices, resulting in a N \times 3 matrix of quantiles, and then simply add up each of the three columns giving three numbers which represent the total quantiles for the whole set.

So, my question is:
How would one go about transforming the information for the quantiles for the whole set from log-dollars to dollars?
I am of course aware of the exponential function, so what I'm asking is how and where in this process do I use it?

------------------------------------------------------------------------------------------------------------------------------------------------------

My main idea is this:
The three final quantiles, 2.5%, 50% and 97.5\%, represent the log-quantiles of the set as a whole, say V_{2.5\%}, V_{50\%} and V_{97.5\%}. They also have their log/exp-counterparts, e.g. V_{50\%} = log(X_{50\%}).
Now, the difference between, for example, the 50% and 2.5%, V_{50\%} - V_{2.5\%} could then be represented as
log(X_{50\%}) - log(X_{2.5\%}) = log\left(\frac{X_{50\%}}{X_{2.5\%}}\right),

meaning that the log difference can be interpreted as log-proportional difference. Thus one could just exp and inverse this value and get

exp \left( log \left( \frac{ X_{50\%} }{ X_{2.5\%} } \right) \right)^{-1} = \frac{X_{2.5\%}}{X_{50\%}}

which gives you the proportional difference of the quantiles in dollars, instead of log-dollars. With this information the 2.5% quantile can be obtained by multiplying X_{2.5\%} with the mean of the N prices.

Or am I way off here?

Any help is greatly appreaciated.
 
Physics news on Phys.org
First, let me make sure I have something clear. You have these N prices, and you just want to know, for each price, where the 2.5, 50, and 97.5 percentiles are? You're not paying any attention to the correlations among them, right? So, in fact, the fact that there are N of them isn't important, because you can just do each one on its own.

If all you want is quantiles based on a large (100k) sample, you don't need to muck around with logs and exponentials at all, and indeed, you need not know anything about the underlying distribution. For a given stock, sort your 100k prices. The 2.5 percentile is \frac{x_{250} + x_{251}}{2}, and the 50 and 97.5 percentiles are analogous. That's it. You're done.

If you want to make use of the fact that they're lognormal, then you can do something different that won't require such a large sample size. (This would be useful, for instance, if you were dealing with real data, such as daily closing prices, where the quantity is limited.) Let's call the price of stock i on day j p_{ij}. Take the (natural) logs of all those prices -- let's call those \pi_{ij}. For each i, calculate the mean and standard deviation of the \pi_{i.}. Now the 2.5 percentile is at \pi=\mu -1.95996 \sigma, the 50 at \mu, and the 97.5 at \mu + 1.95996 \sigma. To get the percentiles for p, just use p=e^\pi.

A big advantage of this second approach is that you can pay attention to the correlations between your different stocks. All you need to do is calculate the N(N-1) covariances from your sample data, and you now have a complete description of the multivariate lognormal distribution. This can be used, for instance, to predict the behavior of a portfolio of stocks. (Of course, the predictions will only be as reliable as the assumption of normality.)

EDIT: Reading your post again, I think this is still not exactly what you want. But I don't know what you DO want. What exactly do you mean by the 2.5 percentile for a list of N prices? I know what the percentile is for one scalar random variable, but how do you define a percentile for a list of random variables?
 
Last edited:
Thanks for the reply!

These prices are estimates from a statistical model and are indeed correlated.

About the second approach you mentioned: The multivariate log-normal distribution didn't enter my mind before since I had never thought of its existence before. I did some searching and found that R (the statistical program I use) has a function which simulates data from the MVLN distribution given a mean vector of logs and a covariance matrix of logs. That is exactly what I was looking for!

But is that the case then? If the log of the data follows a multivariate normal, does the data itself follow the MVLN?
 
mtal said:
But is that the case then? If the log of the data follows a multivariate normal, does the data itself follow the MVLN?
That is in fact the definition.

I still don't get what you're trying to do, though. It seems like you want to have ONE number for the 2.5 percentile of (say) 23 stock prices. I just don't understand what that means. How could the 2.5 percentile of 23 stock prices be a single number? Now, if you had a portfolio containing 23 stocks, i.e. a weighted sum of the stock prices, and you wanted the 2.5 percentile of the portfolio, that I would understand.
 
Yes, I do have the equivalence of a portfolio, in that I'm trying to estimate the behaviour of the price of the set as a whole (i.e. the behaviour of the sum of the prices).

What I mean by the "x percentile of N prices" is that I would make some K draws from the multivariate normal distribution, resulting in K vectors of length N. Then for each of the N prices calculate the corresponding quantiles, then sum each column of quantiles. This would result in three different quantiles for the set as a whole. Right?
 
Ah, I see. But that won't actually give you the correct quantiles for the sum of the prices. If you want to do that, you need to:

(1) Draw K vectors of length N.
(2) Sum up each of the K vectors, to get a single vector of length K.
(3) Sort vector (2) and locate the quantiles.

What you're proposing is something like (1), (3), (2), but that will not give the right answer. The quantile of the sums is not the sum of the quantiles.
 
Thanks a lot!

I was so concentrated on this whole exp-log idea that I somehow never looked at adding the rows instead of the columns.

And thanks for pointing out where I went wrong, you've been tons of help!
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
Replies
1
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
Replies
4
Views
3K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 5 ·
Replies
5
Views
8K
  • · Replies 52 ·
2
Replies
52
Views
13K