Distribution fitting

1. Dec 8, 2011

Mark J.

Hi
I am stucked on a problem of distribution fitting.
I have collected data about bus arrivals on field in a week time and now I want to fit a statistical distribution to these data.
I know that theoretically should be Poisson or smthg but to make a correct step by step observation I need to exclude other distributions and to work on the most probable ones until I get only one distribution.
Can you suggest me any good links or literature on this?
Any step by step solved similar problem in literature would be very nice :)

2. Dec 8, 2011

PatrickPowers

The way statistics works is you for some reason think the data follows the distribution, then make a test that could show that your hypothesis is unlikely. Trying a bunch of distributions and seeing which one fits best is not a good way to go about it, unless you have a supervisor who thinks its a good idea in which case it is the greatest thing in the world.

Trying a bunch of distributions and seeing which one fits best is likely to give you a random and meaningless result. Only try to fit a distribution if there is a good reason for it to be correct.

3. Dec 8, 2011

chiro

Hey MarkJ and welcome to the forums.

PatrickPowers makes some very good points. You need to put the data into context of what you are doing. It's not a matter of just using a goodness of fit to see which distribution is the best fit.

The reason Poisson distributions are used is because the foundational assumptions match those of modelling rates of physical phenomena which includes your bus example. If however independence of events is not gauranteed, you will need to use a more complicated model to take this into account.

In saying the above you need to look at the process and decide how to analyze the data based on this.

In terms of modelling these kinds of problems you should look at Markov modelling. One book that goes into a lot of depth on this topic is "Introduction to Probability Models" by Sheldon M. Ross. If you have models that have more complicated behaviors where independence is violated and where you have more general criteria, then you need to read this book or something like it.

The book itself is pretty comprehensive in terms of content, but if you need some worked out examples, you could probably get a Schaums worked out problems book in the right area or get another book that has a comprehensive solutions manual.

4. Dec 17, 2011

Mark J.

I totally agree with you.Thank you for your answer.Can you give me some other orientation in making theoretical assumptions on this bus example process.I had a look on Sheldon Ross and it is a very good book indeed.Anyway if you please can give me some advice on theory arguments I can rely on??

5. Dec 17, 2011

chiro

If your are dealing with rates where each measurement is independent from the next, it's probably a good idea to a) plot the histogram based on the data to see if it "looks like" a poisson distribution and then b) if it is then use a valid estimator to estimate the rate parameter from the mean and/or variance.

If it is more complicated where events are conditional on prior events, you will have to create some kind of markov chain model for a rate process.

I would start by doing it using the independence assumption. You should get a histogram as well as the first couple of moments and some other exploratory statistics to see whether the poisson is a relatively good model for your data.