Generate a uniform random vector

In summary, to generate uniform random vectors with n dimensions, each element must be a uniform distributed variable in [0,1]. To be more precise, you can generate N independent realizations and then normalize all elements by dividing by the total sum of all elements. Alternatively, you can use a random number table.
  • #1
3HeadedMonkey
11
0
Hi all, I'm trying to generate uniform random vectors with n dimensions.
To be more precise, each of the elements of the vector must be a uniform distributed variable in [0,1]
The constraint is that the sum of the elements of the vector must be 1.
I tried different solutions for over a week but i can't get it working.
Can you help me, please?
 
Last edited:
Physics news on Phys.org
  • #2
3HeadedMonkey said:
Hi all, I'm trying to generate uniform random vectors with n dimensions.
To be more precise, each of the elements of the vector must be a uniform distributed variable.
The constraint is that the sum of the elements of the vector must be 1.
I tried different solutions for over a week but i can't get it working.
Can you help me, please?

Hey 3HeadedMonkey and welcome to the forums.

There are many ways to do this, but one that I think is simple an intuitive is to generate N independent U(0,1) realizations and then normalize all elements by dividing by the total sum of all elements.

In terms of generation, the actual mechanisms can be complicated but you can use anything from R to QBASIC to generate a random number from U(0,1). You can also use a random number table which is just a printed table of random numbers.

Also another good source of random numbers is http://www.random.org

In R (which you can download for free) the command for generating a realization from a uniform distribution is runif() and you can type help(runif) in R to give the command line arguments.
 
  • #3
chiro said:
Hey 3HeadedMonkey and welcome to the forums.

There are many ways to do this, but one that I think is simple an intuitive is to generate N independent U(0,1) realizations and then normalize all elements by dividing by the total sum of all elements.

In terms of generation, the actual mechanisms can be complicated but you can use anything from R to QBASIC to generate a random number from U(0,1). You can also use a random number table which is just a printed table of random numbers.

Also another good source of random numbers is
In R (which you can download for free) the command for generating a realization from a uniform distribution is runif() and you can type help(runif) in R to give the command line arguments.

Hi chiro, thanks for the reply. If you generate N variables and then you divide by the total sum you no longer have N uniform random variables. If you try, you can see that the plot of the frequencies of a single variable is no longer "flat". I could try to use R, but only a question: can I easily use it by command line? I need this vector to be generated automatically many times by a C++ program
 
  • #4
3HeadedMonkey said:
Hi chiro, thanks for the reply. If you generate N variables and then you divide by the total sum you no longer have N uniform random variables. If you try, you can see that the plot of the frequencies of a single variable is no longer "flat". I could try to use R, but only a question: can I easily use it by command line? I need this vector to be generated automatically many times by a C++ program

What you are saying is true, but the thing is you will never truly have N independent U(0,1) variables with your constraint. Because you have introduce a dependency between observations, there will always be a dependency not only between the variables, but also between the definitions of the variables as well.

You can use R like a console: you type in one command and it will spit out the output. You type one command at a time and you can store results in variables. You don't need to declare variables like you do in C or C++ and it's a lot easier than using C or C++.
 
  • #5
We can't guess what you really want to do here, but think about the case where N = 1 and it should be obvious that what you desribed in your OP is impossible.

If there is a constraint involving N variables, you can only make N-1 indepedent choices. You probably need to think what that means in terms of the problem you want to solve, before you get into the details of writing code.
 
  • #6
AlephZero said:
We can't guess what you really want to do here, but think about the case where N = 1 and it should be obvious that what you desribed in your OP is impossible.

If there is a constraint involving N variables, you can only make N-1 indepedent choices. You probably need to think what that means in terms of the problem you want to solve, before you get into the details of writing code.

Take the case with N=2
Generate v[0] as uniform[0,1]
Calculate v[1] as 1-v[0]
Result: 2 uniform variables (not independent but uniform)
If you plot the frequencies of the two variables, the plots are "flat". I mean each of the two variables is uniformly distributed. Obviously v[1] is uniform by definition. Then v[1] is uniform too but "mirrored" with respect to 0.5.
I think that this could be valid with more than 2 variables.

I thought that someone could know some well known mathemagical formula to generate this kind of vectors ;-)
 
  • #7
3HeadedMonkey said:
I think that this could be valid with more than 2 variables.

It doesn't work for N > 2, because the sum of N-1 uniformly distributed variables is not uniformly distributed. In fact, using the central limit theorem, the sum will approximate to a normal distribution for large N.
 
  • #8
It doesn't work for N > 2, because the sum of N-1 uniformly distributed variables is not uniformly distributed. In fact, using the central limit theorem, the sum will approximate to a normal distribution for large N.

ok this is what i feared. Thank you anyway :smile:
 
  • #9
In order to get a unit N dimensional unit vector, uniform on the surface of an N sphere, you need to be careful. The suggestion that you take N independent uniformly distributed between -1 and 1 is incorrect since the distribution will not be the same as uniform on the N sphere. To illustrate let N = 2, then you would be picking points uniform on a square so that directions along the diagonals would be preferred.

To avoid this use a rejection technique, and keep vectors only if the original length is < 1. The main problem with this approach is that it is inefficient for higher dimensions. Alternatively one could convert to the N dimensional analog of spherical coordinates and choose angles uniform in the appropriate domain.
 
  • #10
mathman said:
Alternatively one could convert to the N dimensional analog of spherical coordinates and choose angles uniform in the appropriate domain.

That seems like the easy way to do it.
 
  • #11
I don't know if this is relevant to the goal of the original post, but in post #4 of the thread, https://www.physicsforums.com/showthread.php?t=587264, the member Petawa gives a link to a paper that deals with selecting a random subspace in N dimensions. "Some Remarks On A Lemma Of Ran Raz" by Milman and Wagner.
 
  • #12
That seems like the easy way to do it.
Yes, easy but wrong. In fact if you look at the plot of sin and cos you see that there are more points in the codomain when sin and cos come near to 1.
My original idea was to find a distribution (not uniform) to generate these angles such that the result of sin and cos would be uniform. I don't know if it is possible or not
 
  • #13
Stephen Tashi said:
I don't know if this is relevant to the goal of the original post, but in post #4 of the thread, [], the member Petawa gives a link to a paper that deals with selecting a random subspace in N dimensions. "Some Remarks On A Lemma Of Ran Raz" by Milman and Wagner.

Thank you, i could try to look at this later, now i returned to use the normalization (much easier than the angles generation and equally bad :wink:)
 
  • #14
If the sum of the elements must be 1, the subspace isn't a sphere.

If N = 2 it is a straight line joining (1,0) and (0,1), which gives a simple interpretation of why post #6 "works" for that special case.
If N = 3 it is the surface of an equilateral triangle with vertices (1,0,0), (0,1,0) and (0,0,1). Maybe you really want a uniform distribution of points on the area of the triangle? If so, it's fairly clear that the x y and z coordinate are not an easy way to measure that. It would probably be better to choose the random variables using two parameters that describe the surface, then transform the point back to (x,y,z).

For N = 4 the points fill the volume of an "equilateral" tetrahedron embedded in 4-D space, and similarly for higher values of N.
 
  • #15
AlephZero said:
Maybe you really want a uniform distribution of points on the area of the triangle?

I have to think about this. If I want this, I can do it by repeatedly folding the hyperspace. I already coded it and it works. In this case the single elements are not uniform. But maybe the right thing to do.
I explain better the problem:
This vector is the input of an heuristic optimization algorithm, the random restarts are useful to find the global maximum. I want to restart in very random points (uniformly). So, I think that sampling the hyper-tetrahedron uniformly could be the best solution. Do you agree?
 
  • #16
3HeadedMonkey said:
Yes, easy but wrong. In fact if you look at the plot of sin and cos you see that there are more points in the codomain when sin and cos come near to 1.
My original idea was to find a distribution (not uniform) to generate these angles such that the result of sin and cos would be uniform. I don't know if it is possible or not

You are right. I didn't make myself clear. For some angles it will be uniform in angle. For others it will be uniform in some trig. function. Essentially one needs to know how dxdydz... converts to generalized spherical coordinates.
 
  • #17
mathman said:
You are right. I didn't make myself clear. For some angles it will be uniform in angle. For others it will be uniform in some trig. function. Essentially one needs to know how dxdydz... converts to generalized spherical coordinates.

I tried to "uniformize" the resulting variables by generating angles with a distribution such that there were more angles near π/4, but it didn't work properly. I think it was because each angle uses a certain combination of cos and sin in the conversion to cartesian coordinates. Maybe I should generate each single angle with a different distribution depending on the sin and cos in the hyperspherical coordinates formula of that angle? This would be a suicide :tongue:
 
  • #18
3HeadedMonkey said:
Hi chiro, thanks for the reply. If you generate N variables and then you divide by the total sum you no longer have N uniform random variables. If you try, you can see that the plot of the frequencies of a single variable is no longer "flat". I could try to use R, but only a question: can I easily use it by command line? I need this vector to be generated automatically many times by a C++ program

If you divide/multiply/add/subtract any constant to a uniform random variable it is still a uniform random variable and still "flat", am I missing something?

Also, be aware that the random generator in C++ is not good for statistical purposes.
 
  • #19
viraltux said:
If you divide/multiply/add/subtract any constant to a uniform random variable it is still a uniform random variable and still "flat", am I missing something?

Also, be aware that the random generator in C++ is not good for statistical purposes.

You can't generalize like that.

Random number generators for many applications that are statistically sound can be implemented as code in any language and this includes C++. You just have to get the right algorithm with the right properties and use that.

Any RNG from a computer of any sort will always be pseudo-random, but that doesn't mean that the algorithm and the results it generates are useless.

If you really want random data that is as 'random' as you can get, then you would measure some kind of physical noise and other interference like is done at www.random.org which is a good source for random data.
 
  • #20
viraltux said:
If you divide/multiply/add/subtract any constant to a uniform random variable it is still a uniform random variable and still "flat", am I missing something?

Yes, but the sum of N random variables is not a constant :wink:
By normalizing you get each variable as the result of a uniform random variable divided for a sum of uniform random variables.
 
  • #21
chiro said:
You can't generalize like that.

Random number generators for many applications that are statistically sound can be implemented as code in any language and this includes C++. You just have to get the right algorithm with the right properties and use that.

You're absolutely right Chiro, I'll explain myself, the C++ default RNG is a linear congruential generator which is very fast and useful for any non-statistically critical application. This kind of RNG perform poorly with tests checking for randomness and specially so in random vectors.

Which takes me to 3HeadedMonkey's issue in this thread and the flatness when dividing a uniform variable by a constant; since you divide by the sum this will be different each time and ,since some sums values will be more common than others, the resulting distribution for each dimension is not uniform anymore.

3HeadedMonkey said:
I have to think about this. If I want this, I can do it by repeatedly folding the hyperspace. I already coded it and it works. In this case the single elements are not uniform. But maybe the right thing to do.
I explain better the problem:
This vector is the input of an heuristic optimization algorithm, the random restarts are useful to find the global maximum. I want to restart in very random points (uniformly). So, I think that sampling the hyper-tetrahedron uniformly could be the best solution. Do you agree?

Oh, a random point, I see, then I disagree, if what you truly want is a random vector, your condition "the sum of the components is one" is not going to give you that. This way of sampling will favor the vector (1/N,...,1/N), and neither having uniform distributions in their components will give you that. You need these conditions for anything else?

For a uniform random vector of length one, that is, a vector equally likely to point at any direction, you need to sample a vector which components follow a standard normal distribution [itex] x_i \sim N(0,1) [/itex] and then divide it by the length of the sampled vector [itex]\sqrt{x_1^2+x_2^2+...+x_n^2}[/itex]
 
Last edited:
  • #22
viraltux said:
Which takes me to 3HeadedMonkey's issue in this thread and the flatness when dividing a uniform variable by a constant; since you divide by the sum this will be different each time and ,since some sums values will be more common than others, the resulting distribution for each dimension is not uniform anymore.

It's not due to the poor randomness of the C random(). It's because you divide a random variable by the SUM of N random variables. The variance changes (this may be also because one variable of the sum at the denominator is the variable at the numerator). I tried it even with BOOST libraries.

You need these conditions for anything else?

I need this condition because each variable of the vector is a probability and all the probabilties of the vector must sum to 1.
For a uniform random vector of length one, that is, a vector equally likely to point at any direction, you need to sample a vector which components follow a standard normal distribution [itex] x_i \sim N(0,1) [/itex] and then divide it by the length of the sampled vector [itex]\sqrt{x_1^2+x_2^2...+x_n^2}[/itex]

ok i like this solution :smile: I already thought about using the normal distribution in some way but i couldn't find how. Later I will try this solution, thank you!
 
Last edited:
  • #23
3HeadedMonkey said:
It's not due to the poor randomness of the C random(). It's because you divide a random variable by the product of N random variables. The variance changes (this may be also because one variable of the sum at the denominator is the variable at the numerator). I tried it even with BOOST libraries.

Oh no, no, you're right, I didn't imply that! I definitely need to polish my writing skills. :wink:

3HeadedMonkey said:
I need this condition because each variable of the vector is a probability and all the probabilities of the vector must sum to 1.

Oh I see, then divide the vector sampled this way by the components sum [itex]x_1+x_2+...+x_n[/itex] instead the vector's length.

3HeadedMonkey said:
ok i like this solution :smile: I already thought about using the normal distribution in some way but i couldn't find how. Later I will try this solution, thank you!

You're welcome :smile:
 
Last edited:
  • #24
This is not possible for [itex]n>2[/itex]. If [itex]x_i \sim \mathrm{Unif}(0,1)[/itex] for [itex]i = 1, \dots, n[/itex], then [itex]1-x_n \sim \mathrm{Unif}(0,1)[/itex] as well. But if they sum to 1, then this means [itex]x_1+\dots +x_{n-1} \sim \mathrm{Unif}(0,1)[/itex]. But

[tex]\mathbb{E}[x_1+\dots x_{n-1}] = \frac{n-1}{2} \neq \frac 12[/tex]
 
  • #25
3HeadedMonkey said:
Hi all, I'm trying to generate uniform random vectors with n dimensions.
To be more precise, each of the elements of the vector must be a uniform distributed variable in [0,1]
The constraint is that the sum of the elements of the vector must be 1.
I tried different solutions for over a week but i can't get it working.
Can you help me, please?

That's impossible but if you mean to take points uniformly over the area of the region x1+...+xn=1, then it should be possible by modifying a simplex-generating algorithm: just take n-1 uniform random variables, sort them in order and take the successive differences xk = uk-u(k-1) with u0=0 and un=1.
 
  • #26
bpet said:
That's impossible but if you mean to take points uniformly over the area of the region x1+...+xn=1, then it should be possible by modifying a simplex-generating algorithm: just take n-1 uniform random variables, sort them in order and take the successive differences xk = uk-u(k-1) with u0=0 and un=1.

it sounds good!
 
  • #27
It seems that a variety of problems are being discussed here. I'll add another.

Let S be the set of all discrete probability distributions that can be defined on n outcomes. Find a probability distribution on S that has maximum entropy. (This asks to find the probability distribution of maximum entropy with density f(p1,p2,...pn) that gives "the probability of the probability distribution p1,p2,..pn", rather than asking to find an single probability distribution (p1,p2,..pn) with maximum entropy. )
 

What is a uniform random vector?

A uniform random vector is a set of random numbers that are equally likely to occur within a specified range. This means that each element in the vector has the same probability of being selected.

Why is generating a uniform random vector important?

Generating a uniform random vector is important because it is a fundamental tool in statistical analysis and simulation. It allows for the creation of a large, representative sample of data that can be used to make inferences about a larger population.

How do you generate a uniform random vector?

To generate a uniform random vector, you can use a random number generator that is programmed to produce values within a specified range. This can be done using software or coding languages like Python or R.

What is the range of values in a uniform random vector?

The range of values in a uniform random vector depends on the parameters set by the user. It can be any range of numbers, such as 0 to 1, -10 to 10, or 1 to 100.

Can a uniform random vector contain duplicate values?

Yes, a uniform random vector can contain duplicate values. Since each element in the vector is selected independently, it is possible for the same value to be selected multiple times. However, the likelihood of this happening decreases as the size of the vector increases.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
474
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
Replies
0
Views
349
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Programming and Computer Science
Replies
1
Views
632
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top