# Probability of event occurring - poisson distribution?

probability of event occurring -- poisson distribution?

I am the keeper of records for my local Volunteer Fire Dept. I have now collected data for each of our incident calls from the last 3 years and have made some _very_ basic stabs at interesting statistics which you can see at:
http://hondovfd.org/statistics.php"

We have about 500 calls a year -- a bit over 40 a month or around 1.3 per day. But as you can see from the graphs at the bottom of that page -- which are just about the full extent of my Excel skills -- they are not randomly distributed over the days of the week or hours of the day. More interesting to all our responders is how they are distributed by number per day. My "Calls per Day" graph seems to show a sorta-exponential decay from 1 per day to 8 (our all time high during a snow storm when our little section of Interstate turned into a Bumper Car arena). However we can go for up to a week with nada, and then break the drought with 3 or 4 in an afternoon.

So the question is: How do I characterize the likely-hood of getting a certain number of calls in any particular day, with the number 0 being of special interest. I think I should be able to compare to a Poisson distribution to see how un-random things are, but my eyes roll into the back of my head about a quarter of the way through the wiki page. Can anyone point me to some other explanations and examples, or have better thoughts on the approach?

Last edited by a moderator:

Related Set Theory, Logic, Probability, Statistics News on Phys.org

Just use the POISSON function in Excel. Set the third value (cumulative) to 0.

Oh,huh...thanks.

So what I did was get a poisson value for the integers from 0 to 8 using my average call/day rate of 1.39: POISSON( <N=0::8>, 1.39, FALSE) and then plotted those values against the real data I have -- err, after fixing a mistake in my original, how come no one told me my percentages only added up to %72? -- and it all looks like it's pretty random. Except we are a couple percent more likely to get 3 runs a day and less likely to get 2...plus two intriguingly fat-tailed points out there at 7 and 8 per day.

[PLAIN]http://www.etantdonnes.com/TMP/image011.gif [Broken]

Too bad. I was hoping for some publishable results (hah) -- but at least I've learned a bit more arithmetic now.

Thanks again!

Last edited by a moderator:

The fit looks pretty good. The small discrepancies you observed seem to be from correlation between events; for example the snow storm would have made individual events more likely on that day.

It would be interesting to see the results with days classified according to their risk, e.g. icy vs normal vs hot/dry weather.

Yup, as I understand it, that fat tail sort of distribution indicates non-random correlation between events and your storm-hypothesis is probably correct. Of course I'm basing that on data for about 4 days out of 1000. Is there a way to determine how significant a deviation is? (I guess, modulo the amount of data used to start with...) I was trying to figure out how I could plot a power-law relationship, like the poisson, for comparison, but got stuck again.

Unfortunately, except for the date and time which could be indicative, I don't have weather correlations for the calls.

Another thing I should try is to break out the different types of calls... We have about 40% medical and 20% vehicle accidents (which are medical with added traffic -- you'd think the Sheriff would do traffic but they like to measure skid marks so it's the FD that stands out there with the SLOW signs). When it comes down to it there's less than 10% that even have a fire involved somehow. Also, the data I'm working with is what we send to the National Fire Information Something-or-Other database and it doesn't have an easy way to determine how significant each event is, so a type: "Structure Fire" with action: "Fire Control" could be anything from, A) someone climbing through an open window to take a pot of caramelized eggs off the stove, to B) everyone in the region saving half of a house belonging to a guy who spilled lacquer thinner in his garage and then closed it all up to take his kid to school.

Maybe I should be satisfied being able to that say that things is mostly random...

Of course I'm basing that on data for about 4 days out of 1000.
That's not necessarily a bad thing, in fact if the outliers are all explainable by foreseeable factors such as weather then that would be a very good result. A perfectly valid and useful model could be along the lines of "if the weather is normal, number of calls has the Poisson distribution, otherwise anything can happen."

Is there a way to determine how significant a deviation is?
The Chi-square test is probably most appropriate here (implemented as CHITEST in Excel - tests the difference between the observed and expected number of events). May need to group together the 5+ calls per day to make the frequency high enough to be accurately tested.

When I put my two tiny distributions into CHITEST() I get "1". Maybe that means I'm in perfect agreement with myself? The Excel "help" says:
"CHITEST returns the probability for a γ2 statistic and degrees of freedom, df"
however there is only one return value so I'm not sure what I'm looking at or for....

Excel "help" (using the word in it's broadest sense) does not, and wiki seems to assume that I already know what they are trying to explain. I guess I'd better try to read that Stat's book I've stored away all these years.

But thanks for the attempt anyway.