How do I randomly generate a set of numbers that sum up to one?

In summary, the conversation discussed the need to randomly generate a set of numbers between zero and one that sum up to one, based on a belief about the range each proportion must lie in. The suggested solutions included rescaling randomly generated numbers and using the Dirichlet distribution with appropriate parameters. The conversation also touched on the possibility of using Bayesian inference and the Chebyshev inequality to control the spread of the generated numbers.
  • #1
danacland
3
0
I teach cost-benefit analysis, which requires me to teach monte carlo simulation for sensitivity analysis. I use excel. I understand how to generate a number with uniform, triangular, normal or other distributions, but I don't know how to randomly generate a set of numbers between zero and one which sum up to one.

Here is the exact application. Suppose I have an estimate of the proportions of blacks, whites, hispanics, and asians in a given population, let's say 0.2, 0.5, 0.2, 0.1. In my cost-benefit analysis there is some impact I'm estimating that depends on these proportions. For example, suppose I'm estimating the total number of people who will sign up for medicaid benefits, and I have an estimate of the sign-up rate for each race/ethnic group, so the total number of sign-ups depends on the weighted average of sign-up rates. In monte carlo sensitivity analysis I want to vary each of these parameters over some believable range. Let's say I have reason to believe that the race/ethnic proportions may not be exactly 0.2, 0.5, 0.2, 0.1, but that each of them lies in a range of .05 above or below those numbers, so my ranges are [0.15,0.25], [0.45,0.55], [0.15,0.25], [0.05,0.15]. If I naively tell excel to choose four numbers randomly, one from within each of those ranges, they are extremely unlikely to sum up to one. If I tell excel to choose a number from each of the first three ranges and subtract their sum from one to get the fourth number, it is possible the first three will sum to more than 1.

Ultimately what I need is to be able to randomly generate a set of proportions that sum up to one when I have some belief about the range each proportion must lie in. I have no idea how to think about how to do this, but it must come up a lot. I (and my students) will be hugely grateful for a solution.

Dan Acland, Goldman School of Public Policy, UC Berkeley.
 
Physics news on Phys.org
  • #2
Welcome to PF;

From your criteria, you cannot get a set of proportions that add up to one - unless you make one of them dependent on the others.

In your example, you could randomly generate the first three, and make the last one whatever makes the four sum to one.
 
  • #3
An obvious solution is to generate all four numbers randomly then rescale to get the desired total. Now, that won't generate values with exactly the distribution fed in, but I gather those distributions are only plucked out of the air anyway, so that shouldn't matter.
 
  • #4
Hi Dan, I suggest following the Bayesian procedure with multinomial model and Dirichlet prior. You can set your prior for Dirichlet in terms of shaping parameters. This prior can correspond to your [.2, .5, .2, .1] vector with parameters set to suit this need. Then, you update the Dirichlet distribution by multinomial model (e.g. with randomly generated classes, according to your needs) in order to obtain posterior which is slightly different to [.2, .5, .2, .1].
Other, more direct way, is sampling directly from the prior, e.g. in python:
numpy.random.mtrand.dirichlet([2.,5.,2.,1.], 2)
which generates 2 vectors, e.g.:
array([[ 0.09636368, 0.53846125, 0.20418588, 0.16098919],
[ 0.19053245, 0.69141272, 0.11662014, 0.00143469]])

Notice the rows sum to unity. The concentration around original values are driven by the magnitude of Dirichlet's parameters, e.g.
numpy.random.mtrand.dirichlet(np.array([2.,5.,2.,1.])*1e5, 2)
array([[ 0.20068111, 0.49879339, 0.20027888, 0.10024661],
[ 0.20021036, 0.49957287, 0.19975537, 0.10046141]])

In R, you can proceed similarly. Presumably, composition of equivalent function in excel wouldn't be too hard.
 
Last edited:
  • #5
haruspex said:
An obvious solution is to generate all four numbers randomly then rescale to get the desired total. Now, that won't generate values with exactly the distribution fed in, but I gather those distributions are only plucked out of the air anyway, so that shouldn't matter.

This gets my vote. It's crude but simple, and appropriate for what you are doing.

Good for you for giving us so much detail.
 
  • #6
@Danacland: how did you get on?
 
  • #7
Randomly generate all positive numbers then divide each one by the total.
 
  • #8
Thanks for these responses. The sample-and-scale approach occurred to me. I think a simulation would give me a sense of how badly it would violate my ranges.

@Camillio: I was unaware of the Dirichlet distribution. It looks like the right answer, though if I use the simpler, direct approach you suggest, it looks like I can't specify the range I believe the true proportions lie in around the initial alphas. If I ran a simulation I could probably get a sense of how much spread the Dirichlet distribution generates around the alphas, and for most of the rough-and-ready policy analysis stuff I teach, this would probably be fine.

My sense is that the two-stage procedure you proposed is a way to get the spread I want. Is that right? Unfortunately I don't really follow the steps you outline. Is that because I don't know anything about Bayesian inference? Is there somewhere a moderately bone-headed economist could get a quick and dirty introduction to the kind of procedure you are outlining? Or can you explain it to me in "layman's" terms without taking up too much of your time?
 
  • #9
Well, immediately I'm not sure how to make your values don't exceed the limits. Still, there are some possibilities how to generate them in the way that they do not with high probability:
1) Notice, that you have 4 classes [itex]X_1, ..., X_4[/itex] with parameters [itex]\alpha_1,...,\alpha_4[/itex], each with mean value [itex]\mathbb{E}[X_i] = \alpha_i / \sum \alpha_i[/itex]. You can exploit the Chebyshev inequality and set parameters' values so high that probability of mean values exceeding your limits is adequately small.
2) Consider [.2, .5, .2, .1] to be mean values of Dirichlet dist. with [itex]\alpha = (20, 50, 20, 10)[/itex]. Then you can generate (uniform) random vectors with elements' values from (-5, 5), update [itex]\alpha[/itex] (i.e. add the random vector to it) and calculate estimate.

The former case produces vectors with "non-uniform" distribution, values close to original ones will be more frequent. However, in both cases, you will need to check whether the final estimates are within your constraints. In the latter case, e.g. an extreme case [25, 55, 25, 5] leads to [.227, 0.5, .227, .045]. The last value is below the allowed difference 0.05.
 
  • #10
First you take your random numbers to be the deviations from your average values (the average values should sum to 1). You now want to create a list of deviation values that are zero in sum.

The algorithm works like this: Once you have chosen a number, then the next number you can choose will lie in the intersection between what can still be corrected (negative sum of the maximum remaining deviations) and the the deviation allowed for this value.

Code:
from pylab import *
from random import uniform

limits=array([(-.05,.05),(-.05,.05),(-.05,.05),(-.05,.05)])

deviations=[]
for n,l in enumerate(limits):
   deviations.append(uniform(max(l[0],sum(-limits[n+1:,1])-sum(deviations)),min(l[1],sum(-limits[n+1:,0])-sum(deviations))))

result = array([0.2, 0.5, 0.2, 0.1])+array(deviations)
print result

For symmetric intervals the deviations are uniformly distributed says my histogram. Sorry I am a bit lazy today. Tell me if the python gives you trouble. Btw maybe you should shuffle the order of the values first, and then unshuffle them afterwards otherwise the last value might get less variation (but it might not matter due to some statistical magic).
 
Last edited:
  • #11
That smart, 0xDEADBEEF! I'd however add one more thing - choose the indices of the deviations list in a random way, otherwise the result will be biased due to shrinkage of the the sequence of the distributions' supports. Other (easier) possibility could be to shuffle the deviations with numpy.random.shuffle.

EDIT: Ough, I'm even more lazy... Now I've read your post to its end, where you mention shuffling...
 
Last edited:
  • #12
That sounds like a rather brilliantly simple solution, 0xDEADBEEF. Thanks to all.
 
  • #13
0xDEADBEEF said:
maybe you should shuffle the order of the values first, and then unshuffle them afterwards otherwise the last value might get less variation (but it might not matter due to some statistical magic).
No such magic, I fear. The first few generated will be according to their chosen (uniform?) distributions. As you proceed through the list, the accumulated total will become a binomial distribution, trending towards Gaussian. This will distort the distributions for the later selections. Shuffling will restore fairness, ideally with each of the n! orderings equally often.
It's still not clear to me whether the result is better or worse than post-scaling. But given the crudeness of supposing uniform distributions in the first place, I can't see that it'll matter much.
 

1. How do I randomly generate a set of numbers that sum up to one?

In order to randomly generate a set of numbers that sum up to one, you can use a variety of methods such as Monte Carlo simulations, Markov Chain Monte Carlo, or random number generators. These methods involve using mathematical algorithms to generate a series of random numbers that add up to one.

2. Can I use any type of numbers for this random generation?

Yes, you can use any type of numbers for this random generation. This includes whole numbers, decimals, fractions, and even negative numbers. However, it is important to keep in mind that the sum of all the numbers must equal one.

3. How precise will the generated numbers be?

The precision of the generated numbers will depend on the method used and the parameters set. For example, if you use a random number generator with a larger range, the numbers may be less precise compared to using a Monte Carlo simulation with a smaller variance. It is important to choose a method and parameters that are suitable for your specific needs.

4. Is there a limit to the number of numbers that can be generated?

No, there is no limit to the number of numbers that can be generated as long as they add up to one. However, the larger the number of numbers, the more complex the calculations may become, so it is important to consider the computational resources available when choosing the number of numbers to generate.

5. Can I control the distribution of the generated numbers?

Yes, you can control the distribution of the generated numbers by using different methods and parameters. For example, using a normal distribution in a Monte Carlo simulation will result in a bell-shaped curve while using a uniform distribution will result in a flat distribution. It is important to understand the different distributions and how they may affect the randomness of the generated numbers.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
887
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
605
  • Programming and Computer Science
Replies
1
Views
587
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
Back
Top