# Generate a uniform random vector

1. May 17, 2012

Hi all, I'm trying to generate uniform random vectors with n dimensions.
To be more precise, each of the elements of the vector must be a uniform distributed variable in [0,1]
The constraint is that the sum of the elements of the vector must be 1.
I tried different solutions for over a week but i can't get it working.
Can you help me, please?

Last edited: May 17, 2012
2. May 17, 2012

### chiro

Hey 3HeadedMonkey and welcome to the forums.

There are many ways to do this, but one that I think is simple an intuitive is to generate N independent U(0,1) realizations and then normalize all elements by dividing by the total sum of all elements.

In terms of generation, the actual mechanisms can be complicated but you can use anything from R to QBASIC to generate a random number from U(0,1). You can also use a random number table which is just a printed table of random numbers.

Also another good source of random numbers is http://www.random.org

In R (which you can download for free) the command for generating a realization from a uniform distribution is runif() and you can type help(runif) in R to give the command line arguments.

3. May 17, 2012

Hi chiro, thanks for the reply. If you generate N variables and then you divide by the total sum you no longer have N uniform random variables. If you try, you can see that the plot of the frequencies of a single variable is no longer "flat". I could try to use R, but only a question: can I easily use it by command line? I need this vector to be generated automatically many times by a C++ program

4. May 17, 2012

### chiro

What you are saying is true, but the thing is you will never truly have N independent U(0,1) variables with your constraint. Because you have introduce a dependency between observations, there will always be a dependency not only between the variables, but also between the definitions of the variables as well.

You can use R like a console: you type in one command and it will spit out the output. You type one command at a time and you can store results in variables. You don't need to declare variables like you do in C or C++ and it's a lot easier than using C or C++.

5. May 17, 2012

### AlephZero

We can't guess what you really want to do here, but think about the case where N = 1 and it should be obvious that what you desribed in your OP is impossible.

If there is a constraint involving N variables, you can only make N-1 indepedent choices. You probably need to think what that means in terms of the problem you want to solve, before you get into the details of writing code.

6. May 17, 2012

Take the case with N=2
Generate v[0] as uniform[0,1]
Calculate v[1] as 1-v[0]
Result: 2 uniform variables (not independent but uniform)
If you plot the frequencies of the two variables, the plots are "flat". I mean each of the two variables is uniformly distributed. Obviously v[1] is uniform by definition. Then v[1] is uniform too but "mirrored" with respect to 0.5.
I think that this could be valid with more than 2 variables.

I thought that someone could know some well known mathemagical formula to generate this kind of vectors ;-)

7. May 17, 2012

### AlephZero

It doesn't work for N > 2, because the sum of N-1 uniformly distributed variables is not uniformly distributed. In fact, using the central limit theorem, the sum will approximate to a normal distribution for large N.

8. May 17, 2012

ok this is what i feared. Thank you anyway

9. May 17, 2012

### mathman

In order to get a unit N dimensional unit vector, uniform on the surface of an N sphere, you need to be careful. The suggestion that you take N independent uniformly distributed between -1 and 1 is incorrect since the distribution will not be the same as uniform on the N sphere. To illustrate let N = 2, then you would be picking points uniform on a square so that directions along the diagonals would be preferred.

To avoid this use a rejection technique, and keep vectors only if the original length is < 1. The main problem with this approach is that it is inefficient for higher dimensions. Alternatively one could convert to the N dimensional analog of spherical coordinates and choose angles uniform in the appropriate domain.

10. May 18, 2012

### ImaLooser

That seems like the easy way to do it.

11. May 18, 2012

### Stephen Tashi

I don't know if this is relevant to the goal of the original post, but in post #4 of the thread, https://www.physicsforums.com/showthread.php?t=587264, the member Petawa gives a link to a paper that deals with selecting a random subspace in N dimensions. "Some Remarks On A Lemma Of Ran Raz" by Milman and Wagner.

12. May 18, 2012

Yes, easy but wrong. In fact if you look at the plot of sin and cos you see that there are more points in the codomain when sin and cos come near to 1.
My original idea was to find a distribution (not uniform) to generate these angles such that the result of sin and cos would be uniform. I don't know if it is possible or not

13. May 18, 2012

Thank you, i could try to look at this later, now i returned to use the normalization (much easier than the angles generation and equally bad )

14. May 18, 2012

### AlephZero

If the sum of the elements must be 1, the subspace isn't a sphere.

If N = 2 it is a straight line joining (1,0) and (0,1), which gives a simple interpretation of why post #6 "works" for that special case.
If N = 3 it is the surface of an equilateral triangle with vertices (1,0,0), (0,1,0) and (0,0,1). Maybe you really want a uniform distribution of points on the area of the triangle? If so, it's fairly clear that the x y and z coordinate are not an easy way to measure that. It would probably be better to choose the random variables using two parameters that describe the surface, then transform the point back to (x,y,z).

For N = 4 the points fill the volume of an "equilateral" tetrahedron embedded in 4-D space, and similarly for higher values of N.

15. May 18, 2012

I have to think about this. If I want this, I can do it by repeatedly folding the hyperspace. I already coded it and it works. In this case the single elements are not uniform. But maybe the right thing to do.
I explain better the problem:
This vector is the input of an heuristic optimization algorithm, the random restarts are useful to find the global maximum. I want to restart in very random points (uniformly). So, I think that sampling the hyper-tetrahedron uniformly could be the best solution. Do you agree?

16. May 18, 2012

### mathman

You are right. I didn't make myself clear. For some angles it will be uniform in angle. For others it will be uniform in some trig. function. Essentially one needs to know how dxdydz... converts to generalized spherical coordinates.

17. May 18, 2012

I tried to "uniformize" the resulting variables by generating angles with a distribution such that there were more angles near π/4, but it didn't work properly. I think it was because each angle uses a certain combination of cos and sin in the conversion to cartesian coordinates. Maybe I should generate each single angle with a different distribution depending on the sin and cos in the hyperspherical coordinates formula of that angle? This would be a suicide :tongue:

18. May 18, 2012

### viraltux

If you divide/multiply/add/subtract any constant to a uniform random variable it is still a uniform random variable and still "flat", am I missing something?

Also, be aware that the random generator in C++ is not good for statistical purposes.

19. May 18, 2012

### chiro

You can't generalize like that.

Random number generators for many applications that are statistically sound can be implemented as code in any language and this includes C++. You just have to get the right algorithm with the right properties and use that.

Any RNG from a computer of any sort will always be pseudo-random, but that doesn't mean that the algorithm and the results it generates are useless.

If you really want random data that is as 'random' as you can get, then you would measure some kind of physical noise and other interference like is done at www.random.org which is a good source for random data.

20. May 19, 2012

Yes, but the sum of N random variables is not a constant
By normalizing you get each variable as the result of a uniform random variable divided for a sum of uniform random variables.

21. May 19, 2012

### viraltux

You're absolutely right Chiro, I'll explain myself, the C++ default RNG is a linear congruential generator which is very fast and useful for any non-statistically critical application. This kind of RNG perform poorly with tests checking for randomness and specially so in random vectors.

Which takes me to 3HeadedMonkey's issue in this thread and the flatness when dividing a uniform variable by a constant; since you divide by the sum this will be different each time and ,since some sums values will be more common than others, the resulting distribution for each dimension is not uniform anymore.

Oh, a random point, I see, then I disagree, if what you truly want is a random vector, your condition "the sum of the components is one" is not going to give you that. This way of sampling will favor the vector (1/N,...,1/N), and neither having uniform distributions in their components will give you that. You need these conditions for anything else?

For a uniform random vector of length one, that is, a vector equally likely to point at any direction, you need to sample a vector which components follow a standard normal distribution $x_i \sim N(0,1)$ and then divide it by the length of the sampled vector $\sqrt{x_1^2+x_2^2+...+x_n^2}$

Last edited: May 19, 2012
22. May 19, 2012

It's not due to the poor randomness of the C random(). It's because you divide a random variable by the SUM of N random variables. The variance changes (this may be also because one variable of the sum at the denominator is the variable at the numerator). I tried it even with BOOST libraries.

I need this condition because each variable of the vector is a probability and all the probabilties of the vector must sum to 1.
ok i like this solution I already thought about using the normal distribution in some way but i couldn't find how. Later I will try this solution, thank you!

Last edited: May 19, 2012
23. May 19, 2012

### viraltux

Oh no, no, you're right, I didn't imply that! I definitely need to polish my writing skills.

Oh I see, then divide the vector sampled this way by the components sum $x_1+x_2+...+x_n$ instead the vector's length.

You're welcome

Last edited: May 19, 2012
24. May 19, 2012

### AKG

This is not possible for $n>2$. If $x_i \sim \mathrm{Unif}(0,1)$ for $i = 1, \dots, n$, then $1-x_n \sim \mathrm{Unif}(0,1)$ as well. But if they sum to 1, then this means $x_1+\dots +x_{n-1} \sim \mathrm{Unif}(0,1)$. But

$$\mathbb{E}[x_1+\dots x_{n-1}] = \frac{n-1}{2} \neq \frac 12$$

25. May 21, 2012

### bpet

That's impossible but if you mean to take points uniformly over the area of the region x1+...+xn=1, then it should be possible by modifying a simplex-generating algorithm: just take n-1 uniform random variables, sort them in order and take the successive differences xk = uk-u(k-1) with u0=0 and un=1.