# Parametric versus no parametric distributions

1. Nov 5, 2012

Hi there,

I'm working on a simulation of the travel patterns of cars. There are many variables and conditional probabilities in the model.

My question is, is there anything wrong with fitting all non parametric distributions to variables (both continuous and discrete)? The software I'm using fits lots (50+) different parametric distributions to data and ranks them in order of best fit. Some are very good fits but some are not very good fits. But I can't check every single distribution in a simulation, so would it be reasonable to fit all non parametric distributions?

I believe it is called fitting an Ogive distribution http://www.vosesoftware.com/ModelRi...ntinuous_distributions/Ogive_distribution.htm

Thanks

2. Nov 5, 2012

### Stephen Tashi

There are no theorems in mathematics that answer those questions. So there is no proof that a given distribution is wrong or that it is reasonable. Applying math to moderately complicated real world problems almost always involves making assumptions. Some people make these assumptions in an organized manner and use mathematics to deduce the proper method from them. Other people simply make assumptions as they go along, They make a long sequence of arbitrary decisions about what methods they will use. I can't take the latter kind of analysis seriously unless the person doing the work can prove the method applied to one set of data worked to predict another set that wasn't involved in the orginal analysis.

There are several approaches you can take to investigate your question in a practical manner.

The first thing you should ask is whether there is any reaonable physical model for what causes a distribution and, if so, what parameters are involved in that model.

You can try a "bootstrap" approach. Pick an important bottom line result of your project - for example, perhaps it is the total yearly useage of electricity by electric cars or the distribution of miles per year driven by drivers of electric cars, etc. Take the data you have and reepatedly pick a smaller subset of it at random and apply your methods. See how sensitive your bottom line result is to this random variation of data that is used. If your bottom line is extremely sensitive to random selection of the input data then I'd suspect that either your methods are impractical or you are dealing with a problem that is too sensitive to inputs to be reliably analyzed. (To make quantitative conclusions from "bootstrap" methods isn't straightforward mathematically. But you don't have to know how to do that to get a "feel" for how sensitive your methods are to the input data.)

You can also see how sensitive your bottom line results are to using various methods to fit distributions to all of the data. For example, would you get a drastically different number for the total yearly useage of electricity by electric cars if you used an orgive for a particular distribution than if you used a lognomal distribution?

3. Nov 5, 2012