# How to program random processes

#### Mdmguyon

Summary
I want to be able to determine the average number of trials it takes for an event to happen if the other events can't repeat with there being a limit to how many events can be seen.
A simple example would be to take a deck of 52 cards and deal them out until the ace of spades is seen except that I'd quit after I saw 20 cards. I'm not very sophisticated in programming. I use GW Basic, which has been generally adequate, and I have much experience with Excel and Lotus spreadsheets. In GW Basic, I successfully wrote a program that would give me the answer to the same question with 10 cards, but I want to be able to write a program that would handle thousands of different events with differing probabilities without having to list each one. That there are only a few categories of events gives me hope that this is simple. Might a spreadsheet be able to do this?

Related Programming and Computer Science News on Phys.org

#### mfb

Mentor
What is the average of "5, 10, not seen"?
If you have a single card it is easy. It has a 1/52 probability to appear at place N for every N, pen and paper can give you an exact solution. What that means for the average depends on how you handle "not seen".

#### DEvens

Gold Member
It's not clear what you want to accomplish. You want to handle thousands of different events without listing them?

As mfb suggests, you also need to be very careful about exactly what you want to calculate. Maybe what you want is to actually scan through the possibility tree and analytically do the the result. Or maybe what you want is to roll a lot of random numbers and estimate the result on that basis. Either way, you need to carefully think about what you want to calculate, and how to do it.

If you choose the random number thing, the buzz-phrase to search on is "Monte Carlo methods."

Depending on the level of sophistication, the random number generator built into the typical spreadsheet program may not be adequate. You should think carefully about what requirements you have with regard to your random numbers. You can do some things with the MS Excel built in, but not a lot. You should be able to easily google up some random number algorithms and find one that satisfies your requirements.

If you are comfortable with GWBasic, then the visual basic that comes with MS Excel should be easy to pick up.

If you do the random number thing, tou also need to think carefully about things like the statistical uncertainty in your answer. There are a bunch of different ways you can approach that.

#### FactChecker

Gold Member
2018 Award
You might be able to clarify what you want to program if you make a flowchart diagram or write some "pseudocode" that describes what you want the program to do.

#### Mdmguyon

I may have been misleading in the first description of my problem. It is average time that I want to measure, not the average number of trials. It's a randomized slideshow in which no picture can repeat and each picture stays on the screen for differing durations. A simplified version would be a slideshow that has 3000 pictures, 2900 of which stay on the screen for 20 seconds and 100 of which stay on the screen for 5 minutes. There are 10 pictures which would stop the slideshow and the slideshow will stop after 2 hours and 40 minutes. What is the average duration of the slideshow? The only thing I can think of is to list each of 3000 possibilities in a randomized program and arrange for each picture that shows to not be able to repeat, so that I can run many trials and see what the average duration is, but that's too much trouble. Is there a way to do a randomized program and tell the number of each type of picture to be reduced by 1 each time one shows?

#### Mdmguyon

A question I have that is indirectly related is that I want to find a randomized slideshow that allows all pictures to always have the same chance of occurring, no matter how many times they’ve shown. It seems like that would be simpler to program. The only ones I'm familiar with, including the one I'm using (MegaView), don't allow any picture to repeat until all pictures have shown. I've read that many versions of software are sold only with the agreement that they not be "reverse engineered." I don't necessarily even know what that means, but I assume it means that the program can be altered. Might there be a way to get into MegaView's program and eliminate the part that keeps pictures from repeating? One way in which this is relevant to my original question is that I'd want to use fewer pictures, so the mathematics might be easier, both because the numbers would be smaller and I wouldn't have to tell a randomized program to eliminate any picture that had shown.

Last edited:

#### mastrofoffi

It's not clear (at least to me) what you want to obtain.
You said the first example was misleading, hence referring to the example you gave in #5, your problem seems to be the following:
you have a set of $N$ pictures which are to be displayed, in a random order, without repetitions in a given time interval $T$ and to each of them is assigned a displaying time $t_i$, with $i=1,2,\ldots, N$; in addition you have the constraint that a number $M$ out of the $N$ pictures will end the slideshow before the established time $T$, and you want to know what is the average time a picture was displayed.

Is my interpretation correct?
If it is, may I ask what exactly is the purpose of it? I could see a reason in the card example, but here I'm kind of missing the point I guess.

One flaw/problem that immediately comes to me is what would happen if, let's say you have 40 seconds left, and a 10 minutes picture shows up? You just cut it?

I didn't understand at all the last comment about the image viewer; you do not need to really 'do' the slideshow thing if all you want is an estimate of the average permanence time of a picture: you just need to write a process which models the actual experiment you want to do.

#### mfb

Mentor
~20 lines in python, ~1s per 1000 simulated shows, shuffling the list of 3000 images is probably the step that takes the longest. If you allow repetition you can skip that step and make the program even easier. An analytic solution will be too messy, but simulating 100,000 shows is easy.

Just simulate it. You'll learn a very useful tool, as a quick simulation is often the best approach.
I've read that many versions of software are sold only with the agreement that they not be "reverse engineered." I don't necessarily even know what that means, but I assume it means that the program can be altered. Might there be a way to get into MegaView's program and eliminate the part that keeps pictures from repeating?
There might be but that is a very complicated process and probably not what you want. There are thousands of image display programs, some will allow repetition. Did you check the documentation if there is a way to change it in the program?

#### FactChecker

Gold Member
2018 Award
Here is a Perl program that you might modify to do what you want.
Starter Perl program to display random photos:
# Define the top directory containing your photos
$photoDir = 'C:\Users\hollimb\Pictures\Big Bend 2005'; # Define the time to display each photo$displayTimeSeconds = 5;

# ===================== BEGIN PROGRAM CODE =============
chdir $photoDir; # go to directory of photos @photos = dir /b /s *.jpg; # get list of photos into an array chomp @photos; # remove the newline from the end of the photo file names$numPhotos = $#photos; # get the number of photos while( 1 ){ # start an infinite loop to display the photos$random = int(rand($numPhotos)); # get a random index into the array of photos$photo = $photos[$random]; # get the name of the selected photo

$cmd=qq/start "myPhotoDisplay" "$photo"/; # define dos command to display the photo
$cmd; # call DOS to display photo sleep$displayTimeSeconds; # sleep for time to display photo
$tasklist = tasklist; # get task list # loop to kill all photo display programs using their PIDs while($tasklist =~ /Microsoft\.Photos\.exe\s+(\d+)/ig ){
$pid =$1; # save the PID
taskkill /F /pid $pid; # kill the task with that PID } } #### Mdmguyon mastrofoffi, I want to determine the expected average duration of the slideshow. It can end in 2 ways. One of the 10 designated pictures can show or 2 hours and 40 minutes can elapse. If a picture shows that would take it past 2 hours and 40 minutes, it would end at 2 hours and 40 minutes, which is another complicating feature that makes me even more pessimistic that the average duration can simply be calculated. mfb, I made a reasonable attempt to find image viewing software that would allow pictures to repeat and didn't find one. If writing a simulation that doesn't enable pictures to repeat is that simple, maybe I can do it in GW Basic. I'm not familiar with python. I think I've tried to download it a few times and failed. Thanks, FactChecker. If I don't find an easier solution, I'll try to learn Perl and try to use what you wrote. #### FactChecker Science Advisor Gold Member 2018 Award I may have been misleading in the first description of my problem. It is average time that I want to measure, not the average number of trials. It's a randomized slideshow in which no picture can repeat and each picture stays on the screen for differing durations. A simplified version would be a slideshow that has 3000 pictures, 2900 of which stay on the screen for 20 seconds and 100 of which stay on the screen for 5 minutes. There are 10 pictures which would stop the slideshow and the slideshow will stop after 2 hours and 40 minutes. What is the average duration of the slideshow? The only thing I can think of is to list each of 3000 possibilities in a randomized program and arrange for each picture that shows to not be able to repeat, so that I can run many trials and see what the average duration is, but that's too much trouble. Is there a way to do a randomized program and tell the number of each type of picture to be reduced by 1 each time one shows? Is this what you want? Perl program for simulation of 500 runs of slideshows: $numberOfSimulations=500;

foreach $simulation (1..$numberOfSimulations){
# make random set of 10 termination photos
foreach $termNumber (1..10){$index = int(rand(3000));
$stopHere[$index] = 'y';
}
# Begin to run one simulation
while(1){
# Generate a random photo number
$randomPhotoIndex = int(rand(3000)); # Skip any photos already displayed if($alreadyDisplayed[$randomPhotoIndex] ){next} # Record that this photo is displayed$alreadyDisplayed[$randomPhotoIndex] = 'y'; # Add time for this photo display if($randomPhotoIndex < 2900 ){
$time += 20; }else{$time += 5*60;
}
# If this is a "stop photo", end this one simulation
if( $stopHere[$randomPhotoIndex] ){last}
# If maximum of 2 hours, 40 minutes is exceeded,
#  set time at max and end this simulation
if( $time > (2*60+40)*60 ){$time = (2*60+40)*60;
last;
}
}
# Save and print result of one simulation
$averageTime +=$time/$numberOfSimulations; print "Simulation$simulation: time=$time\n"; # Clear data from one simulation$time=0;
undef @stopHere;
}
# Print final average from all simulations
print "averageTime=$averageTime\n";$ans=<STDIN>;
Here is the last bit of data from a run:
Result of a run (last few simulations and total average):
Simulation 474: time=25340
Simulation 475: time=5700
Simulation 476: time=9600
Simulation 477: time=9600
Simulation 478: time=9600
Simulation 479: time=4360
Simulation 480: time=1740
Simulation 481: time=4580
Simulation 482: time=9600
Simulation 483: time=2380
Simulation 484: time=6560
Simulation 485: time=8740
Simulation 486: time=9600
Simulation 487: time=3480
Simulation 488: time=9420
Simulation 489: time=6520
Simulation 490: time=340
Simulation 491: time=9600
Simulation 492: time=7880
Simulation 493: time=200
Simulation 494: time=320
Simulation 495: time=1660
Simulation 496: time=5760
Simulation 497: time=9600
Simulation 498: time=4740
Simulation 499: time=2120
Simulation 500: time=1340
averageTime=5892.83999999998
Result of 5000 runs:
Result of 5000 simulations:
Simulation 4995: time=8460
Simulation 4996: time=720
Simulation 4997: time=2320
Simulation 4998: time=3980
Simulation 4999: time=5800
Simulation 5000: time=8940
averageTime=5731.8080000001
Result of 50,000 runs:
Simulation 49998: time=4760
Simulation 49999: time=2080
Simulation 50000: time=7060
averageTime=5749.37720000011

#### mfb

Mentor
That agrees with my simulation. I don't know if the 10 stop pictures have a specific duration, I just let the simulation stop as soon as such a picture comes up. Just randomly selecting images is probably faster than shuffling 3000 images each time. Ah, whatever, 100,000 simulations are fast and lead to a negligible uncertainty.
Python:
import random

imagetimes=[20 for i in range(0,3000)]
for i in range(0,10):
imagetimes[i]=-1 # "stop" images
for i in range(10,110):
imagetimes[i]=300 # long images

repetitions=100000
stoptime=2*3600+40*60

totaltime=0

for i in range(0,repetitions):
time=0
random.shuffle(imagetimes)
for imagetime in imagetimes:
if(imagetime==-1 or time>stoptime):
totaltime+=time
#print("Stopped show after "+str(time)+" seconds")
break
time+=imagetime

avgtime=totaltime/repetitions
print("avg time: "+str(avgtime))
To be adjusted depending on how exactly the stop images work.

$python slideshow.py avg time: 5750 #### .Scott Homework Helper Here is some pseudo code. My "random" returns a value from 0 to less than 1. The key difference between this code and the previous examples is the way I pick the cards. You specified that there were 10 of the 3000 cards that were stoppers. I assumed they could be any ten and that they would cause the presentation to stop after they were shown. If they should cause the presentation to stop without being shown, then reverse the order of the first two statements in the while loop. Code: maxtime = 160*60 all_trials_time = 0 max_trial_count = 10000 for trial= 1 to max_trial_count set image_sets = { {count=2900, time=20}, {count=100, time=300}} images_remaining = image_sets[0].count+image_sets[1].count total_time = 0 while total_time<max_time total_time = total_time + pick_a_card() if(images_remaining*random() < 10) break; images_remaining = images_remaining - 1 end while if(total_time>maxtime) total_time=max_time; all_trial_times = all_trial_times + total_time next trial average_time = all_trial_times/max_trial_count function pick_a_card() card_num = images_remaining*random() for each image in image_set card_num = card_num - image.count if(card_num<0) { image.count = image.count - 1; return image.time; } } } #### FactChecker Science Advisor Gold Member 2018 Award @mfb , Good job. Just out of curiosity of how much variation there might be, I ran my program for a million simulations a few times and got averages of 5747.9, 5743.7, 5753.1, 5749.3, 5748.5. I agree that there are some details like whether the time of a terminal photo is included that needs to be sorted out. The fact that we get such similar results makes me confident that both programs are correct but may differ in those details. I was a little surprised at the amount of variation in the million-simulation averages. #### FactChecker Science Advisor Gold Member 2018 Award I think I had an error in my version. I included the full time of a termination photo even if the total exceeded 2 hr 40 min. After fixing that, I made another 5 runs of a million simulations and got these averages: 5749.6, 5749.9, 5742.8, 5749.3, 5749.5. So there is very little variation except for one outlier (I don't know what happened there). The corrected code exchanged two tests for termination. I always allow some display of a termination photo. Corrected code with swapped termination tests.:  # If maximum of 2 hours, 40 minutes is exceeded, # set time at max and end this simulation if($time > (2*60+40)*60 ){
$time = (2*60+40)*60; last; } # If this is a "stop photo", end this one simulation if($stopHere[\$randomPhotoIndex] ){last}

#### mfb

Mentor
The standard deviation should be of the order of 2000-3000 seconds. With 1 million simulations you expect 2-3 seconds spread of the mean. Looks like most of your simulations were unusually close together.

#### FactChecker

Gold Member
2018 Award
The standard deviation should be of the order of 2000-3000 seconds. With 1 million simulations you expect 2-3 seconds spread of the mean. Looks like most of your simulations were unusually close together.
Ok. With only 5 data points, I will not be surprised if there is some clustering of them by luck. I might try some more runs before I look for any mistake.

EDIT: I ran it 5 more times and the spread was larger: 5747.2, 5752.9, 5755.5, 5742.6, 5751.8

Last edited:
mfb

"How to program random processes"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving