# Micromass' big simulation challenge

• Challenge
• micromass
N,r,k): #finds the probability that the rumor will return to the first person after r generations in k runs count = 0 for i in range(k): if rumorReturns(N,r) == 1: count+=1 return count/k#exampleN = 1001 #number of peopler = 3 #number of generationk = 10000 #total number of runsprint(prob(N,r,k)) #probability that the rumor returns in one generation#COMMENT: The probability depends on the value of r and the amount of runs

#### micromass

Staff Emeritus
Homework Helper
Let's do some simulations. Below are 5 simulation problems that require a computer to solve. Use any language you want. Post the answers (including graphics) and code here!

• Any use of outside sources is allowed, but do not look up the question directly. For example, it is ok to go check probability books, but it is not allowed to google the exact question.
• All programming languages are allowed.
• Please reference every source you use.

1. SOLVED BY Ibix A clinic has three doctors. Patients come into the clinic at random, starting at 9a.m., according to a Poisson process with time parameter 10 minutes: that is, the time after opening at which the first patient appea follows an exponential distribution with expectation 10 minutes and then, after each patient arrices the waiting time until the next patient is indepdnently exponentially distributed, also with expectation 10 minutes. When a patient arrices, he or she waits until a doctor is available. The amount of time spent by each doctor with each patient is a random variable, uniformly distributed between 5 and 20 minutes. The office stops admitting new patients at 4m and closes when the last patient is with the doctor.

Find the median and 50% interval for the following: the amount of patients coming to the office, the amount of patients having to wait, the average wait, the closing time of the office. Graph the distributions.
2. SOLVED BY mfb Write a program to simulate data from the standard normal distribution. The ONLY randomization program you can use is a program which automatically tosses a coin and gives you head or tails. Throughout the program, you cannot know what the probability of heads is (only that it is not 0 or 1). Provide proof that your program simulates the normal distribution.
3. SOLVED BY I_am_learning, Ygggdrasil Simulate the distribution from Challenge 5 of the probability challenge:
In a town of 1001 inhabitants, a person tells a rumor to two distinct people, the "first generation". These repeat the performance and generally, each person tells a rumor to two people at random without regard to the past development (a rumor cannot be told to the person who told him this rumor). Find the probability ##P(r)## that the generation ##1,2,...,r## will not contain the person who started the rumor.
4. SOLVED BY I_am_learning Repeat (3) but now suppoe that a person can tell the rumor to the person who told him the rumor. Do the result of 3 change significantly?
5. SOLVED BY Ibix The newsstand buys papers for 33 dollars each and sells them for 50 dollars each. Newspaper not sold at the end of the day are sold as scrap for 5 dollars each. Newspaper can be purchased in bundles of 10. thus the newsstand can buy 50, 60, and so on. There are three types of newsdays: ‘good’, ‘fair’ and ‘poor’ having probabilities 0.35, 0.45 and 0.20, respectively. The distribution of newspapers demanded on each of these days is given below in the table. Problem: Compute the optimal number of papers the newsstand should purchase.
$$\begin{array}{||c||c|c|c||} \hline \hline \text{Demand} & \text{Good day} & \text{Fair day} & \text{Poor day}\\ \hline \hline 40 & 0.03 & 0.1 & 0.44\\ \hline 50 & 0.05 & 0.18 & 0.22\\ \hline 60 & 0.15 & 0.4 & 0.16\\ \hline 70 & 0.2 & 0.2 & 0.12\\ \hline 80 & 0.35 & 0.08 & 0.06\\ \hline 90 & 0.15 & 0.04 & 0\\ \hline 100 & 0.07 & 0 & 0\\ \hline \hline \end{array}$$

Thank you all for participating! I hope many of you have fun with this! Don't hesitate to post any feedback in the thread!

1. Gelman et al "Bayesian Data Analysis"
2. Invented myself
3. Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
4. Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
5. http://www.slideshare.net/praveshnegi/chp-2-simulation-examples

Last edited:
mheslep and Greg Bernhardt
That rumor will haunt us forever; just like the ring.

Challenge 3 and 4:
Python:
__author__ = "I_am_learning"

import random

def rumorReturns(N,r): #returns the generation when the rumor returns back to the original person, otherwise returns -1 if it doesn't return
peoples = list(range(0, N, 1)) #Index is person, value is that person's source of rumor
first_man_index = random.randrange(0,N) #Index of first man
peoples[first_man_index]=first_man_index #the source of first person is himself

current_gen = [first_man_index]
generation = 1
while (generation<=r):
new_gen = []
for person in current_gen: #each person in the current gen will spread the rumor
while(True):
new_men = random.sample(range(N),2) #spread rumor to two random person

#Use this for problem 3
if new_men[0] not in [person, peoples[person]] and new_men[1] not in [person, peoples[person]]: #repeat until non of them is the current person or his source

#Use this for problem 4
#if new_men[0] not in [person] and new_men[1] not in [person]: #repeat until non of them is the current person; you can't rumor yourself
break
new_gen += new_men #add these new persons into new_generation bucket
peoples[new_men[0]] = person #mark this person as the source of rumor for both
peoples[new_men[1]] = person #mark this person as the source of rumor
if first_man_index in new_men:
#print("Got back the the first man")
return generation
generation += 1
current_gen = new_gen

return -1

Repeat = 10000 #How many times to repeat the experiment; larger number gives better answer,
ans = 0
for i in range(Repeat):
ans += 1 if rumorReturns(N=1001,r=5) > 0 else 0

print("The probability that the rumor returns back to the original source is: {0}".format(ans/Repeat))

For N=1001, and r=5
Problem 3: 0.0564
Problem 4: 0.0571
So, there appears to be a slight increase in the probability.

Last edited:
Can you perhaps provide some kind of histogram for different numbers of ##r## for both questions?

I'm not used to the specifics of the code, but to be sure, in the following line:

new_men =random.sample(range(N),2)#spread rumor to two random person

Are you sure you're picking two different elements here?

micromass said:
I'm not used to the specifics of the code, but to be sure, in the following line:

new_men =random.sample(range(N),2)#spread rumor to two random person

Are you sure you're picking two different elements here?
Yes, the python random.sample library generates random samples without replacement (i.e. unique samples)

I have included the histogram as requested for problem 3 and problem 4. They look quite the same.

micromass
I realized I have a bug in my previous submission. When rumor is spread to next generation, there is a chance (which grows higher as generation increases) that the same person will be counted multiple times as a next gen candidate to spread rumor (because he was randomly selected by multiple parents). I have fixed this bug by using a set (no repetition) to store candidates for next generation. Below is the updated code (including plotting part), and new histograms.This change limits the growth of rumor compared to previous buggy version.

Python:
__author__ = "Rajendra Adhikari"
from matplotlib import pyplot
import random
import numpy

def rumorReturns(N,r): #returns the genertion when the rumor returns back to the original person, otherwise returns -1 if it doesn't return
peoples = list(range(0, N, 1)) #Index is person, value is that person's source of rumor
first_man_index = random.randrange(0,N) #Index of first man
peoples[first_man_index]=first_man_index #the source of first person is himself

current_gen = set([first_man_index])
generation = 1
while (generation<=r):
new_gen = set()
for person in current_gen: #each person in the current gen will spread the rumor
while(True):
new_men = random.sample(range(N),2) #spread rumor to two random person

#Use this for problem 3
#if new_men[0] not in [person, peoples[person]] and new_men[1] not in [person, peoples[person]]: #repeat until non of them is the current person or his source

#Use this for problem 4
if new_men[0] not in [person] and new_men[1] not in [person]: #repeat until non of them is the current person; you can't rumor yourself
break
new_gen |= set(new_men) #union these new persons into new_generation bucket
peoples[new_men[0]] = person #mark this person as the source of rumor for both
peoples[new_men[1]] = person #mark this person as the source of rumor
if first_man_index in new_men:
#print("Got back the the first man")
return generation
generation += 1
current_gen = new_gen

return -1

Repeat = 1000 #How many times to repeat the experiment; larger number gives better answer,
final_r = 13

probability = []
label = []
for k in range(final_r):
r = k+1
label += [str(r)]
ans = 0
for i in range(Repeat):
ans += 1 if rumorReturns(N=1001,r=r) > 0 else 0
probability += [ans/Repeat]
print("The probability for r={0} is {1}".format(r,ans/Repeat))

rs = numpy.array(range(1,14))
pyplot.bar(rs,probability)
pyplot.xticks(rs+0.4,label)
pyplot.xlabel('r')
pyplot.ylabel('probability')
pyplot.title('P4: Probability of the rumor going full circle (N=1001)')

pyplot.show()

Last edited:
The graph I generate for #3 is quite different from the one above. For example, P(r) never reaches 1. Here's my semi-analytic, semi-numeric solution made while thinking of the answer to the probability challenge:

Building off of an approach similar to @mfb in post #91:

Let N(r) be the number of inhabitants in generation r. There are N(r) inhabitants who know the rumor and cannot pass it on to themselves, and n + 1 - N(r) inhabitants who do not currently know the rumor (let's call these inhabitants naive).

For case 2 (naive inhabitants), they have N(r) opportunities to hear the rumor in the next round. The probability that one rumor spreader passes the rumor to one particular individual is 2/n. Therefore, the probability that a naive individual does not hear the rumor is ##(1-2/n)^N##. The expected number of naive individuals to hear the rumor in the next generation is ##(n+1-N) (1 - p^N)##, where p = 1-2/n
.
For case 1 (rumor spreaders), they have N - 1 opportunities to hear the rumor as they cannot spread the rumor to themselves. Therefore, the probability that they hear the rumor during the next round is ##1 - p^{N-1}##, and the expected number of rumor spreaders who become part of generation r+1 is ##N(1-p^{N-1})##.

In total ##N(r+1) = (n+1-N) (1-p^N)+N (1-p^{N-1})## which simplifies to ##N(r+1) = (n+1) - (n+1 + \frac{2N}{n-2})p^N##

We can express, the fraction of inhabitants part of generation r, F(r), through the following recurrence relation: $$F(r+1) = 1-\left(1+\frac{2F(r)}{n-2}\right)\left(\frac{n-2}{n}\right)^{(n+1)F(r)}$$

Using the recurrence relation and F(1) = 2, I can plot F(r) for various values of n:

Solving the recurrence relation analytically would give a solution to the probability challenge, but I'm not sure how to do that.

micromass said:
• Write a program to simulate data from the standard normal distribution. The ONLY randomization program you can use is a program which automatically tosses a coin and gives you head or tails. Throughout the program, you cannot know what the probability of heads is (only that it is not 0 or 1). Provide proof that your program simulates the normal distribution.
If we don't care about efficiency, in C++ reduced to the relevant parts:
Code:
//here we get our coin flip, heads=true, tails=false
bool biasedcoinflip() {
...
}

//here we get a fair coin flip, independent of the bias of biasedcoinflip():
//we interpret "HT" as heads and "TH" as tails, otherwise we repeat.
bool faircoinflip() {
while(1) {
bool flip1=biasedcoinflip();
bool flip2=biasedcoinflip();
if( flip1 && !flip2 ) return true;
if( !flip1 && flip2 ) return false;
}
}

double getstandardnormvalue() {
double precision=0.01; //we could also leave that up to the user if we want
int maxN=1/(precision*precision);
double sum=0;
for(int n=0;n<maxN;n++) {
if(faircoinflip) sum++;
}
return (2*sum-maxN)/sqrt(maxN);
}
Proof: HT and TH have the same probability, assuming subsequent coin tosses are independent. The sum follows a binomial distribution which (for large maxN) approximates a Gaussian with mean maxN/2 and variance maxN/4. The numerator follows a Gaussian with mean 0 and variance maxN, the denominator scales it to a variance of 1. We get a step size of 0.01, which can be reduced arbitrarily to improve the precision.

The Gaussian part is horribly inefficient, and the HT/TH part is not optimal either, but the question didn't ask for the fastest solution.

micromass
Ygggdrasil said:
For example, P(r) never reaches 1.

Somehow I find that intuitively very unlikely...

micromass said:
Somehow I find that intuitively very unlikely...
Even if everyone knows the rumor in round x, there will be some small fraction of inhabitants who by chance do not hear the rumor in round x+1.

Or is this a first passage problem, and you're asking about when the originator will first re-hear the rumor?

I understood P(r) as probability that a given person gets told the rumor in this generation, independent of the history.
It doesn't surprise me. In generation r-1, we have at most all people telling the rumors to others, so of course we can never guarantee that a given person will hear the rumor.

Ygggdrasil said:
Even if everyone knows the rumor in round x, there will be some small fraction of inhabitants who by chance do not hear the rumor in round x+1.

I agree but that was not what was asked:

Or is this a first passage problem, and you're asking about when the originator will first re-hear the rumor?

Correct!

micromass said:
Correct!

Ah, ok. I misread the definition of P(r) in this thread. P(r) was defined differently in the probability challenge thread which was the source of confusion.

Ygggdrasil said:
Ah, ok. I misread the definition of P(r) in this thread. P(r) was defined differently in the probability challenge thread which was the source of confusion.

Your solution was very interesting though, even though it didn't answer the question. It showed that the later generations would all be about 3/4th of the maximal size. I'll credit you as well for this fun piece of insight!

So I had a crack at #1, the doctors' office. Here's the code:
Python:
from __future__ import division

import math,random

# Constants
MEANTIME=10  # Mean time between arrivals
CLOSINGTIME=60*7  # Number of minutes surgery is open in a day
MINCONSULT=5  # Minimum consultation time
CONSULTRANGE=15  # Range of consultation times (=max time-MINCONSULT)

ITERATIONS=10000  # Number of loops round the simulation
DISPLAYSTEP=50  # Number of loops between printed iteration numbers
OUTFILE="C:\\Projects\\Physics\\Micromass simulation 1\\histos.dat"

# Empty arrays to hold results from each run of the sims
patientCount=[]  # the amount of patients coming to the office
waitCount=[]  # the amount of patients having to wait
averageWait=[]  # the average wait
closingTime=[]  # the closing time of the office.

# Simulation!
for i in xrange(ITERATIONS):
if (i+1)//DISPLAYSTEP*DISPLAYSTEP==i+1: print "Iteration: "+str(i+1)
# Initialise the first time the docs are free to zero minutes past 9am,
# the latest arrival to zero minutes past 9am, and the number of patients
# to zero
docsBecomeFree=[0,0,0]
lastArrival=0
patients=0
waited=0
totalWait=0
while True:
# Generate an arrival - but if they arrived after closing time,
# terminate the sim and they're out of luck
lastArrival-=MEANTIME*math.log(random.random())
if lastArrival>CLOSINGTIME:
break
patients+=1
# Find when the next doc is free and calculate the waiting time for
# this patient
nextDocFree=min(docsBecomeFree)
waitTime=max(0,nextDocFree-lastArrival)
totalWait+=waitTime
# Find which doc was first available and make them busy to the end of
# this patient's consulting time
for doc in range(len(docsBecomeFree)):
if docsBecomeFree[doc]==nextDocFree:
consultTime=MINCONSULT+random.random()*CONSULTRANGE
if waitTime>0:
waited+=1
docsBecomeFree[doc]=nextDocFree+consultTime
else:
docsBecomeFree[doc]=lastArrival+consultTime
break
# Day has ended - add the MI to the logs
patientCount.append(patients)
waitCount.append(waited)
averageWait.append(totalWait/patients)
closingTime.append(max(docsBecomeFree+[CLOSINGTIME])-CLOSINGTIME)

# Analysis of results
def round2(val):
# Round the value to 2dp
return int(val*100)/100
def analyseArray(outfile,title,data,binWidth):
# Sort the array into increasing order
data.sort()
# Get the quartiles - this is only exact if len(data) is a multiple of 4,
# but the definition is ambiguous in other cases anyway.
firstQuartile=data[len(data)//4]
median=data[len(data)//2]
thirdQuartile=data[3*len(data)//4]
print title+" ("+str(len(data))+" iterations)"
print "Median: "+str(round2(median))
print "50% interval: "+str(round2(firstQuartile))+" -- "+str(round2(thirdQuartile))
print
# Determine the histogram
minVal=data[0]
nBins=1+int((data[-1]-data[0])/binWidth)
histo=[0 for i in range(nBins)]
for d in data:
bin=min(nBins-1,int((d-minVal)/binWidth))
histo[bin]+=1
# Dump the histogram to the file
print >> outfile,"Bin\t"+title
for bin in range(nBins):
binEdge=bin*binWidth+minVal
print >> outfile,str(round2(binEdge))+"\t"+str(100*histo[bin]/len(data))
print >> outfile,"\n"

fp=file(OUTFILE,"w")
analyseArray(fp,"Amount of patients coming to the office",patientCount,1)
analyseArray(fp,"Amount of patients having to wait",waitCount,1)
analyseArray(fp,"Average wait",averageWait,0.25)
analyseArray(fp,"Closing time of the office",closingTime,1)
fp.close()
Basically it keeps track of the last arrival, each time adding ##-10\ln r## minutes, where ##r\sim U(0,1)##. It also keeps track of when the three doctors are finished with their current patient queue, and adds this patient on to the end of the first to become free, with a consultation time of c, ##c\sim U(5,20)##. I've taken "50% interval" to mean the range from the 25th to 75th percentile. The results I got are:

Amount of patients coming to the office (10000 iterations)
Median: 42.0
50% interval: 37.0 -- 46.0

Amount of patients having to wait (10000 iterations)
Median: 5.0
50% interval: 3.0 -- 9.0

Average wait (minutes - 10000 iterations)
Median: 0.46
50% interval: 0.21 -- 0.86

Closing time of the office (minutes after 4pm - 10000 iterations)
Median: 5.96
50% interval: 0.0 -- 11.2

Histograms:

Last edited:
micromass
:)

Does the news stand in problem (5) know in advance if a day is Good, Fair or Poor? If no, where is the point in making three groups, if yes, is it just the same problem with 3 different numbers?

mfb said:
Does the news stand in problem (5) know in advance if a day is Good, Fair or Poor?

No

micromass said:
No
Problem 5 looks like a standard Decision and Risk problem, which is usually solved by Monte Carlo simulation. Is the use of Monte Carlo allowed?

Chestermiller said:
Problem 5 looks like a standard Decision and Risk problem, which is usually solved by Monte Carlo simulation. Is the use of Monte Carlo allowed?

Yep, as long as it's a simulation!

Problem 5 can be done with pen and paper (unless I'm missing something) - old school computing. Is that cheating, or shall I post a photo?

I make it 60

Last edited:
micromass
Ibix said:
Problem 5 can be done with pen and paper (unless I'm missing something) - old school computing. Is that cheating, or shall I post a photo?

I make it 60
Yes. I agree that it is much simpler than requiring Monte Carlo. There is enough information in the table to determine by hand the fraction of the time that the demand is any of the values shown. I realized that last night after my Monte Carlo post.

Chet

mfb said:
Proof: HT and TH have the same probability, assuming subsequent coin tosses are independent. The sum follows a binomial distribution which (for large maxN) approximates a Gaussian with mean maxN/2 and variance maxN/4. The numerator follows a Gaussian with mean 0 and variance maxN, the denominator scales it to a variance of 1. We get a step size of 0.01, which can be reduced arbitrarily to improve the precision.

The Gaussian part is horribly inefficient, and the HT/TH part is not optimal either, but the question didn't ask for the fastest solution.
Here is a sketch of an improved system, simply replacing mfb's binomial approximation with a Box-Muller transformation. It is still horribly inefficient if the bias is large since one of HH or TT will become much more probable than HT or TH. I should note that I haven't actually tested this...
C:
double nextRandom;
bool hasNextRandom=false;

double getstandardnormvalue() {
if(hasNextRandom) {
hasNextRandom=false;
return nextRandom;
} else {
// Box-Muller method for generating two random Gaussians - https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform
// Create two random unsigned integers
unsigned long random1=0,random2=0;
for(int i=0;i<8*sizeof(long);++i) {
if(faircoinflip()) random1|=1<<i;
if(faircoinflip()) random2|=1<<i;
}
// Scale the random numbers to 0-1 and 0-2pi
double u1=((double)random1)/MAX_ULONG;
double u2=2*PI*((double)random1)/MAX_ULONG;
// Generate the two random variables
nextRandom=sqrt(-2*log(u1))*sin(u2);
hasNextRandom=true;
return sqrt(-2*log(u1))*cos(u2);
}
}

If the bias is large, then the amount of information we can get per coin flip is small.
The entropy of a coin flip (p between 0 and 1) is -p ld p - (1-p) ld (1-p).
For p=1/2, it is 1 bit, and we need on average 4, so we waste 3/4 of our information.
For p=0.9, it is 0.469 bit, and we need on average 2/(2*0.1*0.9)=11.1 or 5.21 bit, and waste 80%.
For p=0.99, it is 0.0808 bit, and we need on average 2/(2*0.01*0.99)=101 or 8.16 bit, and waste 88%.

Yeah, if we know p we certainly can do better, but what if we do not know p? We need events that have the same probability independently of p. We can also accept HHTT and TTHH and similar sequences to improve the probability to stop earlier, but I'm not sure how far we get with that.

Here's a stab at 5. Buy 60? Total at risk dollars per day is lowest at @305

Jimster41 said:
Here's a stab at 5. Buy 60? Total at risk dollars per day is lowest at @305
I would maximize the expectation value. Risk averages out over multiple days.

mfb said:
I would maximize the expectation value. Risk averages out over multiple days.
That's how I did it, and I came to the same answer as
@Jimster41.

mfb said:
I would maximize the expectation value. Risk averages out over multiple days.

Ibix said:
That's how I did it, and I came to the same answer as
@Jimster41.

Dollars at risk (maybe not the best term) are all the dollars for each combination of buying level and demand outcome I either lost because I bought but could sell and had to recycle instead, or failed to make because there was demand for more papers than I bought.

Last edited:
For Challenge 5, I found 60 papers to be optimal as well.
Code:
%matlab script
%given probability data
%column 1 is demand
P1 = [ 40, .03, .1, .44;
50, .05, .18, .22;
60, .15, .4, .16;
70, .2, .2, .12;
80, .35, .08, .06;
90, .15, .04, 0;
100, .07, 0, 0];

%probability of type of day relating to columns 2-4 of P1
P2 = [.35, .45, .2];

%combined probability for demand - column 1 is demand column 2 is prob.
P = [P1(:,1),P1(:,2)*P2(1)+P1(:,3)*P2(2)+P1(:,4)*P2(3)];

%fixed parameters
cost = -33;
sell = 50;
scrap = 5;

%purchase options
Purchase = [0:10:100]';

%initial cost
Profit = Purchase*cost;

%7 cases for demand: 40:10:100
for k = 1:7
sold = min(P(k,1),Purchase);
Profit=Profit+P(k,2)*(sold*sell +(Purchase-sold)*scrap);
end

index = find(Profit == max(Profit));
Optimal = Purchase(index);

%output results
fprintf('Optimal number is %g', Optimal)
fprintf('. Expected profit is $%g', round(Profit(index),2)) %plot results plot(Purchase,Profit) title('Expected Profit from Newspaper Sales') xlabel('Papers Purchased') ylabel('Profit') Ibix said: Problem 5 can be done with pen and paper (unless I'm missing something) - old school computing. Is that cheating, or shall I post a photo? I make it 60 Here are (slightly enhanced) photos of my "program": Part 1 - calculate expected sales: Part 2 - calculate profit from various purchases given expected sales: I did more work than I needed - I could have just done the 60 and 70, as it was obvious from the functional form of the profit that the answer had to be one of the two. The expectation value of sold newspapers alone doesn't work. It happens to give the right answer here, but there are other cases where it fails. Imagine selling 100 newspapers 50% of the time and 20 newspapers 50% of the time. The expectation value is 60, but every newspaper beyond 20 gives$17 profit or \$28 loss with equal probability, so you should buy 20. Your calculation would assume that you always sell 60 newspapers if you buy 60, which is clearly not true.

This question doesn't need any fancy calculations, just go through all the options and calculate the expected profit for each one, use the one where the expected profit is maximal, done.

The fraction of the time that the demand is 40 newspapers is:

f(40)=(0.03)(0.35)+(0.1)(0.45)+(0.44)(0.2)=0.1435

Similarly,

f(50)=0.1425
f(60)=0.2645
f(70)=0.1840
f(80)=0.1705
f(90)=0.0757
f(100)=0.0245

Let g(D,N) represent the profit on a given day if the number of papers ordered is N and the demand is D. So, the average profit per day is

$$P(N)=\sum_{D=40}^{100}{f(D)g(D,N)}$$

This needs to be maximized with respect to N.

mfb