Simplex point picking

1. Aug 21, 2014

stlukits

I have an application where I need to pick a probability distribution $(x_{1},\ldots,x_{n})$ at random and uniformly from the simplex of all points for which the coordinates add up to 1, i.e. $$\sum_{i=1}^{n}x_{i}=1.$$ Surprisingly, I didn't find much about simplex point picking on the internet, but http://en.wikipedia.org/wiki/User:Skinnerd/Simplex_Point_Picking appears to address this issue. Skinnerd suggests to pick individual members of $(y_{1},\ldots,y_{n})$ randomly from a uniform distribution over the interval $(0,1)$ and then take $$x_{i}=\frac{\ln{}y_{i}}{\sum{}\ln{}y_{i}}.$$ So far so good (although, why does he need the minus sign in his $x_{i}=-\ln{}y_{i}$?).

My question is: what is the distribution of $x_{i}$ over the interval $(0,1)$, i.e. what is the probability $P(a<x<b)$ that one of these coordinates is in $(a,b)\subseteq{}(0,1)$?

2. Aug 22, 2014

haruspex

If Xi = x, that leaves a hyperpyramid $\Sigma_{i\neq i}X_j = 1 - x$. Can't you make the p.d.f of Xi proportional to the volume of that?

3. Aug 24, 2014

stlukits

volume of n-dimensional simplex

Great idea! I am a little confused about terminology. Hyperpyramid at http://physicsinsights.org/pyramids-1.html seems to mean that the height of the pyramid is the same as the side of the base -- which is not what we want here. We want something more like a generalization for $n$ dimensions of a pentatope, see http://mathworld.wolfram.com/Pentatope.html. Mathworld advises on the volume of a simplex in $n$ dimensions at http://mathworld.wolfram.com/Cayley-MengerDeterminant.html. What haruspex is suggesting, as I see it, is that

$$P(0<x<b)=S(\sqrt{2})-S(\sqrt{2}(1-b))$$

where $S(z)$ is the volume of a simplex in $n$ dimensions whose side length is $z$. In our case, $z=\sqrt{2}$ because $x_{1}+\ldots{}+x_{n}=1$.

4. Aug 24, 2014

haruspex

Seems that simplex is the word I should have used.
Not sure that's quite what I was saying. For a start, there should be a ratio of volumes in there.
I think I'm saying the p.d.f., f(x) = Sn-1((1-x)√2)/Sn(√2), or maybe the subscripts should be n, n+1. You'd then to integrate that to get the interval probability.

5. Aug 28, 2014

stlukits

Yes, indeed, it should be a ratio, not a difference. Thanks, haruspex!