Micromass' big simulation challenge

  • Challenge
  • Thread starter micromass
  • Start date
  • #1
22,089
3,296
Let's do some simulations. Below are 5 simulation problems that require a computer to solve. Use any language you want. Post the answers (including graphics) and code here!

  • Any use of outside sources is allowed, but do not look up the question directly. For example, it is ok to go check probability books, but it is not allowed to google the exact question.
  • All programming languages are allowed.
  • Please reference every source you use.

  1. SOLVED BY Ibix A clinic has three doctors. Patients come into the clinic at random, starting at 9a.m., according to a Poisson process with time parameter 10 minutes: that is, the time after opening at which the first patient appea follows an exponential distribution with expectation 10 minutes and then, after each patient arrices the waiting time until the next patient is indepdnently exponentially distributed, also with expectation 10 minutes. When a patient arrices, he or she waits until a doctor is available. The amount of time spent by each doctor with each patient is a random variable, uniformly distributed between 5 and 20 minutes. The office stops admitting new patients at 4m and closes when the last patient is with the doctor.

    Find the median and 50% interval for the following: the amount of patients coming to the office, the amount of patients having to wait, the average wait, the closing time of the office. Graph the distributions.

  2. SOLVED BY mfb Write a program to simulate data from the standard normal distribution. The ONLY randomization program you can use is a program which automatically tosses a coin and gives you head or tails. Throughout the program, you cannot know what the probability of heads is (only that it is not 0 or 1). Provide proof that your program simulates the normal distribution.

  3. SOLVED BY I_am_learning, Ygggdrasil Simulate the distribution from Challenge 5 of the probability challenge:
    In a town of 1001 inhabitants, a person tells a rumor to two distinct people, the "first generation". These repeat the performance and generally, each person tells a rumor to two people at random without regard to the past development (a rumor cannot be told to the person who told him this rumor). Find the probability ##P(r)## that the generation ##1,2,...,r## will not contain the person who started the rumor.

  4. SOLVED BY I_am_learning Repeat (3) but now suppoe that a person can tell the rumor to the person who told him the rumor. Do the result of 3 change significantly?

  5. SOLVED BY Ibix The newsstand buys papers for 33 dollars each and sells them for 50 dollars each. Newspaper not sold at the end of the day are sold as scrap for 5 dollars each. Newspaper can be purchased in bundles of 10. thus the newsstand can buy 50, 60, and so on. There are three types of newsdays: ‘good’, ‘fair’ and ‘poor’ having probabilities 0.35, 0.45 and 0.20, respectively. The distribution of newspapers demanded on each of these days is given below in the table. Problem: Compute the optimal number of papers the newsstand should purchase.
$$\begin{array}{||c||c|c|c||}
\hline
\hline
\text{Demand} & \text{Good day} & \text{Fair day} & \text{Poor day}\\
\hline
\hline
40 & 0.03 & 0.1 & 0.44\\
\hline
50 & 0.05 & 0.18 & 0.22\\
\hline
60 & 0.15 & 0.4 & 0.16\\
\hline
70 & 0.2 & 0.2 & 0.12\\
\hline
80 & 0.35 & 0.08 & 0.06\\
\hline
90 & 0.15 & 0.04 & 0\\
\hline
100 & 0.07 & 0 & 0\\
\hline
\hline
\end{array}$$

Thank you all for participating! I hope many of you have fun with this! Don't hesitate to post any feedback in the thread!

More information:
  1. Gelman et al "Bayesian Data Analysis"
  2. Invented myself
  3. Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
  4. Feller "An introduction to probability theory and its applications Vol1" Chapter II "Elements of Combinatorial analysis"
  5. http://www.slideshare.net/praveshnegi/chp-2-simulation-examples
 
Last edited:
  • Like
Likes mheslep and Greg Bernhardt

Answers and Replies

  • #2
TheBlackAdder
Gold Member
97
56
That rumor will haunt us forever; just like the ring.
 
  • #3
677
16
Challenge 3 and 4:
Python:
__author__ = "I_am_learning"

import random

def rumorReturns(N,r): #returns the generation when the rumor returns back to the original person, otherwise returns -1 if it doesn't return
    peoples = list(range(0, N, 1)) #Index is person, value is that person's source of rumor
    first_man_index = random.randrange(0,N) #Index of first man
    peoples[first_man_index]=first_man_index #the source of first person is himself

    current_gen = [first_man_index]
    generation = 1
    while (generation<=r):
        new_gen = []
        for person in current_gen: #each person in the current gen will spread the rumor
            while(True):
                new_men = random.sample(range(N),2) #spread rumor to two random person

                #Use this for problem 3
                if new_men[0] not in [person, peoples[person]] and new_men[1] not in [person, peoples[person]]: #repeat until non of them is the current person or his source

                #Use this for problem 4
                #if new_men[0] not in [person] and new_men[1] not in [person]: #repeat until non of them is the current person; you can't rumor yourself
                    break
            new_gen += new_men #add these new persons into new_generation bucket
            peoples[new_men[0]] = person #mark this person as the source of rumor for both
            peoples[new_men[1]] = person #mark this person as the source of rumor
            if first_man_index in new_men:
                #print("Got back the the first man")
                return generation
        #Advance the generation
        generation += 1
        current_gen = new_gen

    return -1

Repeat = 10000 #How many times to repeat the experiment; larger number gives better answer,
ans = 0
for i in range(Repeat):
    ans += 1 if rumorReturns(N=1001,r=5) > 0 else 0

print("The probability that the rumor returns back to the original source is: {0}".format(ans/Repeat))
For N=1001, and r=5
Problem 3: 0.0564
Problem 4: 0.0571
So, there appears to be a slight increase in the probability.
 
Last edited:
  • #4
22,089
3,296
Can you perhaps provide some kind of histogram for different numbers of ##r## for both questions?
 
  • #5
22,089
3,296
I'm not used to the specifics of the code, but to be sure, in the following line:

new_men =random.sample(range(N),2)#spread rumor to two random person

Are you sure you're picking two different elements here?
 
  • #6
677
16
I'm not used to the specifics of the code, but to be sure, in the following line:

new_men =random.sample(range(N),2)#spread rumor to two random person

Are you sure you're picking two different elements here?
Yes, the python random.sample library generates random samples without replacement (i.e. unique samples)
 
  • #7
677
16
I have included the histogram as requested for problem 3 and problem 4. They look quite the same.
micromass_rumor_p3.png
micromass_rumor_p4.png
 
  • Like
Likes micromass
  • #8
677
16
I realized I have a bug in my previous submission. When rumor is spread to next generation, there is a chance (which grows higher as generation increases) that the same person will be counted multiple times as a next gen candidate to spread rumor (because he was randomly selected by multiple parents). I have fixed this bug by using a set (no repetition) to store candidates for next generation. Below is the updated code (including plotting part), and new histograms.This change limits the growth of rumor compared to previous buggy version.

Python:
__author__ = "Rajendra Adhikari"
from matplotlib import pyplot
import random
import numpy

def rumorReturns(N,r): #returns the genertion when the rumor returns back to the original person, otherwise returns -1 if it doesn't return
    peoples = list(range(0, N, 1)) #Index is person, value is that person's source of rumor
    first_man_index = random.randrange(0,N) #Index of first man
    peoples[first_man_index]=first_man_index #the source of first person is himself

    current_gen = set([first_man_index])
    generation = 1
    while (generation<=r):
        new_gen = set()
        for person in current_gen: #each person in the current gen will spread the rumor
            while(True):
                new_men = random.sample(range(N),2) #spread rumor to two random person

                #Use this for problem 3
                #if new_men[0] not in [person, peoples[person]] and new_men[1] not in [person, peoples[person]]: #repeat until non of them is the current person or his source

                #Use this for problem 4
                if new_men[0] not in [person] and new_men[1] not in [person]: #repeat until non of them is the current person; you can't rumor yourself
                    break
            new_gen |= set(new_men) #union these new persons into new_generation bucket
            peoples[new_men[0]] = person #mark this person as the source of rumor for both
            peoples[new_men[1]] = person #mark this person as the source of rumor
            if first_man_index in new_men:
                #print("Got back the the first man")
                return generation
        #Advance the generation
        generation += 1
        current_gen = new_gen

    return -1

Repeat = 1000 #How many times to repeat the experiment; larger number gives better answer,
final_r = 13

probability = []
label = []
for k in range(final_r):
    r = k+1
    label += [str(r)]
    ans = 0
    for i in range(Repeat):
        ans += 1 if rumorReturns(N=1001,r=r) > 0 else 0
    probability += [ans/Repeat]
    print("The probability for r={0} is {1}".format(r,ans/Repeat))

rs = numpy.array(range(1,14))
pyplot.bar(rs,probability)
pyplot.xticks(rs+0.4,label)
pyplot.xlabel('r')
pyplot.ylabel('probability')
pyplot.title('P4: Probability of the rumor going full circle (N=1001)')

pyplot.show()
micromass_rumor_p3_set.png

micromass_rumor_p4_set.png
 
Last edited:
  • #9
Ygggdrasil
Science Advisor
Insights Author
Gold Member
3,241
3,318
The graph I generate for #3 is quite different from the one above. For example, P(r) never reaches 1. Here's my semi-analytic, semi-numeric solution made while thinking of the answer to the probability challenge:

Building off of an approach similar to @mfb in post #91:

Let N(r) be the number of inhabitants in generation r. There are N(r) inhabitants who know the rumor and cannot pass it on to themselves, and n + 1 - N(r) inhabitants who do not currently know the rumor (let's call these inhabitants naive).

For case 2 (naive inhabitants), they have N(r) opportunities to hear the rumor in the next round. The probability that one rumor spreader passes the rumor to one particular individual is 2/n. Therefore, the probability that a naive individual does not hear the rumor is ##(1-2/n)^N##. The expected number of naive individuals to hear the rumor in the next generation is ##(n+1-N) (1 - p^N)##, where p = 1-2/n
.
For case 1 (rumor spreaders), they have N - 1 opportunities to hear the rumor as they cannot spread the rumor to themselves. Therefore, the probability that they hear the rumor during the next round is ##1 - p^{N-1}##, and the expected number of rumor spreaders who become part of generation r+1 is ##N(1-p^{N-1})##.

In total ##N(r+1) = (n+1-N) (1-p^N)+N (1-p^{N-1})## which simplifies to ##N(r+1) = (n+1) - (n+1 + \frac{2N}{n-2})p^N##

We can express, the fraction of inhabitants part of generation r, F(r), through the following recurrence relation: $$F(r+1) = 1-\left(1+\frac{2F(r)}{n-2}\right)\left(\frac{n-2}{n}\right)^{(n+1)F(r)}$$

Using the recurrence relation and F(1) = 2, I can plot F(r) for various values of n:
problem5.png


Solving the recurrence relation analytically would give a solution to the probability challenge, but I'm not sure how to do that.
 
  • #10
35,259
11,510
  • Write a program to simulate data from the standard normal distribution. The ONLY randomization program you can use is a program which automatically tosses a coin and gives you head or tails. Throughout the program, you cannot know what the probability of heads is (only that it is not 0 or 1). Provide proof that your program simulates the normal distribution.
If we don't care about efficiency, in C++ reduced to the relevant parts:
Code:
//here we get our coin flip, heads=true, tails=false
bool biasedcoinflip() {
  ...
}

//here we get a fair coin flip, independent of the bias of biasedcoinflip():
//we interpret "HT" as heads and "TH" as tails, otherwise we repeat.
bool faircoinflip() {
  while(1) {
    bool flip1=biasedcoinflip();
    bool flip2=biasedcoinflip();
    if( flip1 && !flip2 ) return true;
    if( !flip1 && flip2 ) return false;
  }
}

double getstandardnormvalue() {
  double precision=0.01; //we could also leave that up to the user if we want
  int maxN=1/(precision*precision);
  double sum=0;
  for(int n=0;n<maxN;n++) {
   if(faircoinflip) sum++;
  }
  return (2*sum-maxN)/sqrt(maxN);
}
Proof: HT and TH have the same probability, assuming subsequent coin tosses are independent. The sum follows a binomial distribution which (for large maxN) approximates a Gaussian with mean maxN/2 and variance maxN/4. The numerator follows a Gaussian with mean 0 and variance maxN, the denominator scales it to a variance of 1. We get a step size of 0.01, which can be reduced arbitrarily to improve the precision.

The Gaussian part is horribly inefficient, and the HT/TH part is not optimal either, but the question didn't ask for the fastest solution.
 
  • Like
Likes micromass
  • #11
22,089
3,296
For example, P(r) never reaches 1.
Somehow I find that intuitively very unlikely...
 
  • #12
Ygggdrasil
Science Advisor
Insights Author
Gold Member
3,241
3,318
Somehow I find that intuitively very unlikely...
Even if everyone knows the rumor in round x, there will be some small fraction of inhabitants who by chance do not hear the rumor in round x+1.

Or is this a first passage problem, and you're asking about when the originator will first re-hear the rumor?
 
  • #13
35,259
11,510
I understood P(r) as probability that a given person gets told the rumor in this generation, independent of the history.
It doesn't surprise me. In generation r-1, we have at most all people telling the rumors to others, so of course we can never guarantee that a given person will hear the rumor.
 
  • #14
22,089
3,296
Even if everyone knows the rumor in round x, there will be some small fraction of inhabitants who by chance do not hear the rumor in round x+1.
I agree but that was not what was asked:

Or is this a first passage problem, and you're asking about when the originator will first re-hear the rumor?
Correct!
 
  • #15
Ygggdrasil
Science Advisor
Insights Author
Gold Member
3,241
3,318
Correct!
Ah, ok. I misread the definition of P(r) in this thread. P(r) was defined differently in the probability challenge thread which was the source of confusion.
 
  • #16
22,089
3,296
Ah, ok. I misread the definition of P(r) in this thread. P(r) was defined differently in the probability challenge thread which was the source of confusion.
Your solution was very interesting though, even though it didn't answer the question. It showed that the later generations would all be about 3/4th of the maximal size. I'll credit you as well for this fun piece of insight!
 
  • #19
Ibix
Science Advisor
Insights Author
2020 Award
7,396
6,481
So I had a crack at #1, the doctors' office. Here's the code:
Python:
from __future__ import division

import math,random

# Constants
MEANTIME=10  # Mean time between arrivals
CLOSINGTIME=60*7  # Number of minutes surgery is open in a day
MINCONSULT=5  # Minimum consultation time
CONSULTRANGE=15  # Range of consultation times (=max time-MINCONSULT)

ITERATIONS=10000  # Number of loops round the simulation
DISPLAYSTEP=50  # Number of loops between printed iteration numbers
OUTFILE="C:\\Projects\\Physics\\Micromass simulation 1\\histos.dat"

# Empty arrays to hold results from each run of the sims
patientCount=[]  # the amount of patients coming to the office
waitCount=[]  # the amount of patients having to wait
averageWait=[]  # the average wait
closingTime=[]  # the closing time of the office.

# Simulation!
for i in xrange(ITERATIONS):
   if (i+1)//DISPLAYSTEP*DISPLAYSTEP==i+1: print "Iteration: "+str(i+1)
   # Initialise the first time the docs are free to zero minutes past 9am,
   # the latest arrival to zero minutes past 9am, and the number of patients
   # to zero
   docsBecomeFree=[0,0,0]
   lastArrival=0
   patients=0
   waited=0
   totalWait=0
   while True:
     # Generate an arrival - but if they arrived after closing time,
     # terminate the sim and they're out of luck
     lastArrival-=MEANTIME*math.log(random.random())
     if lastArrival>CLOSINGTIME:
       break
     patients+=1
     # Find when the next doc is free and calculate the waiting time for
     # this patient
     nextDocFree=min(docsBecomeFree)
     waitTime=max(0,nextDocFree-lastArrival)
     totalWait+=waitTime
     # Find which doc was first available and make them busy to the end of
     # this patient's consulting time
     for doc in range(len(docsBecomeFree)):
       if docsBecomeFree[doc]==nextDocFree:
         consultTime=MINCONSULT+random.random()*CONSULTRANGE
         if waitTime>0:
           waited+=1
           docsBecomeFree[doc]=nextDocFree+consultTime
         else:
           docsBecomeFree[doc]=lastArrival+consultTime
         break
   # Day has ended - add the MI to the logs
   patientCount.append(patients)
   waitCount.append(waited)
   averageWait.append(totalWait/patients)
   closingTime.append(max(docsBecomeFree+[CLOSINGTIME])-CLOSINGTIME)

# Analysis of results
def round2(val):
   # Round the value to 2dp
   return int(val*100)/100
def analyseArray(outfile,title,data,binWidth):
   # Sort the array into increasing order
   data.sort()
   # Get the quartiles - this is only exact if len(data) is a multiple of 4,
   # but the definition is ambiguous in other cases anyway.
   firstQuartile=data[len(data)//4]
   median=data[len(data)//2]
   thirdQuartile=data[3*len(data)//4]
   print title+" ("+str(len(data))+" iterations)"
   print "Median: "+str(round2(median))
   print "50% interval: "+str(round2(firstQuartile))+" -- "+str(round2(thirdQuartile))
   print
   # Determine the histogram
   minVal=data[0]
   nBins=1+int((data[-1]-data[0])/binWidth)
   histo=[0 for i in range(nBins)]
   for d in data:
     bin=min(nBins-1,int((d-minVal)/binWidth))
     histo[bin]+=1
   # Dump the histogram to the file
   print >> outfile,"Bin\t"+title
   for bin in range(nBins):
     binEdge=bin*binWidth+minVal
     print >> outfile,str(round2(binEdge))+"\t"+str(100*histo[bin]/len(data))
   print >> outfile,"\n"
  
fp=file(OUTFILE,"w")
analyseArray(fp,"Amount of patients coming to the office",patientCount,1)
analyseArray(fp,"Amount of patients having to wait",waitCount,1)
analyseArray(fp,"Average wait",averageWait,0.25)
analyseArray(fp,"Closing time of the office",closingTime,1)
fp.close()
Basically it keeps track of the last arrival, each time adding ##-10\ln r## minutes, where ##r\sim U(0,1)##. It also keeps track of when the three doctors are finished with their current patient queue, and adds this patient on to the end of the first to become free, with a consultation time of c, ##c\sim U(5,20)##. I've taken "50% interval" to mean the range from the 25th to 75th percentile. The results I got are:

Amount of patients coming to the office (10000 iterations)
Median: 42.0
50% interval: 37.0 -- 46.0

Amount of patients having to wait (10000 iterations)
Median: 5.0
50% interval: 3.0 -- 9.0

Average wait (minutes - 10000 iterations)
Median: 0.46
50% interval: 0.21 -- 0.86

Closing time of the office (minutes after 4pm - 10000 iterations)
Median: 5.96
50% interval: 0.0 -- 11.2

Histograms:
graph1.png
graph2.png
graph3.png
graph4.png
 
Last edited:
  • Like
Likes micromass
  • #20
35,259
11,510
:)

Does the news stand in problem (5) know in advance if a day is Good, Fair or Poor? If no, where is the point in making three groups, if yes, is it just the same problem with 3 different numbers?
 
  • #21
22,089
3,296
Does the news stand in problem (5) know in advance if a day is Good, Fair or Poor?
No
 
  • #22
20,867
4,546
No
Problem 5 looks like a standard Decision and Risk problem, which is usually solved by Monte Carlo simulation. Is the use of Monte Carlo allowed?
 
  • #23
22,089
3,296
Problem 5 looks like a standard Decision and Risk problem, which is usually solved by Monte Carlo simulation. Is the use of Monte Carlo allowed?
Yep, as long as it's a simulation!
 
  • #24
Ibix
Science Advisor
Insights Author
2020 Award
7,396
6,481
Problem 5 can be done with pen and paper (unless I'm missing something) - old school computing. Is that cheating, or shall I post a photo?

I make it 60
 
Last edited:
  • Like
Likes micromass
  • #25
20,867
4,546
Problem 5 can be done with pen and paper (unless I'm missing something) - old school computing. Is that cheating, or shall I post a photo?

I make it 60
Yes. I agree that it is much simpler than requiring Monte Carlo. There is enough information in the table to determine by hand the fraction of the time that the demand is any of the values shown. I realized that last night after my Monte Carlo post.

Chet
 

Related Threads on Micromass' big simulation challenge

Replies
105
Views
9K
Replies
116
Views
12K
Replies
62
Views
6K
Replies
35
Views
4K
Replies
77
Views
8K
Replies
74
Views
7K
Replies
42
Views
5K
Replies
101
Views
9K
Replies
83
Views
8K
Replies
57
Views
6K
Top