How do I randomly generate a set of numbers that sum up to one?


by danacland
Tags: monte carlo, random numbers
danacland
danacland is offline
#1
Nov15-12, 04:45 PM
P: 3
I teach cost-benefit analysis, which requires me to teach monte carlo simulation for sensitivity analysis. I use excel. I understand how to generate a number with uniform, triangular, normal or other distributions, but I don't know how to randomly generate a set of numbers between zero and one which sum up to one.

Here is the exact application. Suppose I have an estimate of the proportions of blacks, whites, hispanics, and asians in a given population, let's say 0.2, 0.5, 0.2, 0.1. In my cost-benefit analysis there is some impact I'm estimating that depends on these proportions. For example, suppose I'm estimating the total number of people who will sign up for medicaid benefits, and I have an estimate of the sign-up rate for each race/ethnic group, so the total number of sign-ups depends on the weighted average of sign-up rates. In monte carlo sensitivity analysis I want to vary each of these parameters over some believable range. Let's say I have reason to believe that the race/ethnic proportions may not be exactly 0.2, 0.5, 0.2, 0.1, but that each of them lies in a range of .05 above or below those numbers, so my ranges are [0.15,0.25], [0.45,0.55], [0.15,0.25], [0.05,0.15]. If I naively tell excel to choose four numbers randomly, one from within each of those ranges, they are extremely unlikely to sum up to one. If I tell excel to choose a number from each of the first three ranges and subtract their sum from one to get the fourth number, it is possible the first three will sum to more than 1.

Ultimately what I need is to be able to randomly generate a set of proportions that sum up to one when I have some belief about the range each proportion must lie in. I have no idea how to think about how to do this, but it must come up a lot. I (and my students) will be hugely grateful for a solution.

Dan Acland, Goldman School of Public Policy, UC Berkeley.
Phys.Org News Partner Science news on Phys.org
Internet co-creator Cerf debunks 'myth' that US runs it
Astronomical forensics uncover planetary disks in Hubble archive
Solar-powered two-seat Sunseeker airplane has progress report
Simon Bridge
Simon Bridge is online now
#2
Nov15-12, 06:31 PM
Homework
Sci Advisor
HW Helper
Thanks ∞
PF Gold
Simon Bridge's Avatar
P: 11,121
Welcome to PF;

From your criteria, you cannot get a set of proportions that add up to one - unless you make one of them dependent on the others.

In your example, you could randomly generate the first three, and make the last one whatever makes the four sum to one.
haruspex
haruspex is offline
#3
Nov15-12, 09:47 PM
Homework
Sci Advisor
HW Helper
Thanks ∞
P: 9,216
An obvious solution is to generate all four numbers randomly then rescale to get the desired total. Now, that won't generate values with exactly the distribution fed in, but I gather those distributions are only plucked out of the air anyway, so that shouldn't matter.

camillio
camillio is offline
#4
Nov16-12, 09:17 AM
P: 74

How do I randomly generate a set of numbers that sum up to one?


Hi Dan, I suggest following the Bayesian procedure with multinomial model and Dirichlet prior. You can set your prior for Dirichlet in terms of shaping parameters. This prior can correspond to your [.2, .5, .2, .1] vector with parameters set to suit this need. Then, you update the Dirichlet distribution by multinomial model (e.g. with randomly generated classes, according to your needs) in order to obtain posterior which is slightly different to [.2, .5, .2, .1].
Other, more direct way, is sampling directly from the prior, e.g. in python:
numpy.random.mtrand.dirichlet([2.,5.,2.,1.], 2)
which generates 2 vectors, e.g.:
array([[ 0.09636368, 0.53846125, 0.20418588, 0.16098919],
[ 0.19053245, 0.69141272, 0.11662014, 0.00143469]])

Notice the rows sum to unity. The concentration around original values are driven by the magnitude of Dirichlet's parameters, e.g.
numpy.random.mtrand.dirichlet(np.array([2.,5.,2.,1.])*1e5, 2)
array([[ 0.20068111, 0.49879339, 0.20027888, 0.10024661],
[ 0.20021036, 0.49957287, 0.19975537, 0.10046141]])

In R, you can proceed similarly. Presumably, composition of equivalent function in excel wouldn't be too hard.
ImaLooser
ImaLooser is offline
#5
Nov18-12, 12:56 AM
P: 571
Quote Quote by haruspex View Post
An obvious solution is to generate all four numbers randomly then rescale to get the desired total. Now, that won't generate values with exactly the distribution fed in, but I gather those distributions are only plucked out of the air anyway, so that shouldn't matter.
This gets my vote. It's crude but simple, and appropriate for what you are doing.

Good for you for giving us so much detail.
Simon Bridge
Simon Bridge is online now
#6
Nov18-12, 01:16 AM
Homework
Sci Advisor
HW Helper
Thanks ∞
PF Gold
Simon Bridge's Avatar
P: 11,121
@Danacland: how did you get on?
lavinia
lavinia is offline
#7
Nov18-12, 08:32 AM
Sci Advisor
P: 1,716
Randomly generate all positive numbers then divide each one by the total.
danacland
danacland is offline
#8
Nov19-12, 04:39 PM
P: 3
Thanks for these responses. The sample-and-scale approach occurred to me. I think a simulation would give me a sense of how badly it would violate my ranges.

@Camillio: I was unaware of the Dirichlet distribution. It looks like the right answer, though if I use the simpler, direct approach you suggest, it looks like I can't specify the range I believe the true proportions lie in around the initial alphas. If I ran a simulation I could probably get a sense of how much spread the Dirichlet distribution generates around the alphas, and for most of the rough-and-ready policy analysis stuff I teach, this would probably be fine.

My sense is that the two-stage procedure you proposed is a way to get the spread I want. Is that right? Unfortunately I don't really follow the steps you outline. Is that because I don't know anything about Bayesian inference? Is there somewhere a moderately bone-headed economist could get a quick and dirty introduction to the kind of procedure you are outlining? Or can you explain it to me in "layman's" terms without taking up too much of your time?
camillio
camillio is offline
#9
Nov20-12, 10:36 AM
P: 74
Well, immediately I'm not sure how to make your values don't exceed the limits. Still, there are some possibilities how to generate them in the way that they do not with high probability:
1) Notice, that you have 4 classes [itex]X_1, ..., X_4[/itex] with parameters [itex]\alpha_1,...,\alpha_4[/itex], each with mean value [itex]\mathbb{E}[X_i] = \alpha_i / \sum \alpha_i[/itex]. You can exploit the Chebyshev inequality and set parameters' values so high that probability of mean values exceeding your limits is adequately small.
2) Consider [.2, .5, .2, .1] to be mean values of Dirichlet dist. with [itex]\alpha = (20, 50, 20, 10)[/itex]. Then you can generate (uniform) random vectors with elements' values from (-5, 5), update [itex]\alpha[/itex] (i.e. add the random vector to it) and calculate estimate.

The former case produces vectors with "non-uniform" distribution, values close to original ones will be more frequent. However, in both cases, you will need to check whether the final estimates are within your constraints. In the latter case, e.g. an extreme case [25, 55, 25, 5] leads to [.227, 0.5, .227, .045]. The last value is below the allowed difference 0.05.
0xDEADBEEF
0xDEADBEEF is offline
#10
Nov20-12, 02:12 PM
P: 824
First you take your random numbers to be the deviations from your average values (the average values should sum to 1). You now want to create a list of deviation values that are zero in sum.

The algorithm works like this: Once you have chosen a number, then the next number you can choose will lie in the intersection between what can still be corrected (negative sum of the maximum remaining deviations) and the the deviation allowed for this value.

from pylab import *
from random import uniform

limits=array([(-.05,.05),(-.05,.05),(-.05,.05),(-.05,.05)])

deviations=[]
for n,l in enumerate(limits):
   deviations.append(uniform(max(l[0],sum(-limits[n+1:,1])-sum(deviations)),min(l[1],sum(-limits[n+1:,0])-sum(deviations))))

result = array([0.2, 0.5, 0.2, 0.1])+array(deviations)
print result
For symmetric intervals the deviations are uniformly distributed says my histogram. Sorry I am a bit lazy today. Tell me if the python gives you trouble. Btw maybe you should shuffle the order of the values first, and then unshuffle them afterwards otherwise the last value might get less variation (but it might not matter due to some statistical magic).
camillio
camillio is offline
#11
Nov21-12, 04:20 AM
P: 74
That smart, 0xDEADBEEF! I'd however add one more thing - choose the indices of the deviations list in a random way, otherwise the result will be biased due to shrinkage of the the sequence of the distributions' supports. Other (easier) possibility could be to shuffle the deviations with numpy.random.shuffle.

EDIT: Ough, I'm even more lazy... Now I've read your post to its end, where you mention shuffling...
danacland
danacland is offline
#12
Nov27-12, 12:56 PM
P: 3
That sounds like a rather brilliantly simple solution, 0xDEADBEEF. Thanks to all.
haruspex
haruspex is offline
#13
Nov28-12, 12:40 AM
Homework
Sci Advisor
HW Helper
Thanks ∞
P: 9,216
Quote Quote by 0xDEADBEEF View Post
maybe you should shuffle the order of the values first, and then unshuffle them afterwards otherwise the last value might get less variation (but it might not matter due to some statistical magic).
No such magic, I fear. The first few generated will be according to their chosen (uniform?) distributions. As you proceed through the list, the accumulated total will become a binomial distribution, trending towards Gaussian. This will distort the distributions for the later selections. Shuffling will restore fairness, ideally with each of the n! orderings equally often.
It's still not clear to me whether the result is better or worse than post-scaling. But given the crudeness of supposing uniform distributions in the first place, I can't see that it'll matter much.


Register to reply

Related Discussions
Can the mind generate random numbers? General Discussion 61
Generate random numbers by hand? General Math 5
Complex numbers and hamilton quaternions generate [tex]M_{2}(C)[/tex] Linear & Abstract Algebra 1
In theory, does a quantum computer have the capacity to generate truly random numbers Computers 7
next number in a sequence of randomly chosen numbers? General Math 12