Determine the required sample size of collection

1. Jan 12, 2012

Mark J.

Hi.
Regarding bus inter-arrival times how to determine how much empirical data we must collect and how can we determine the required sample size from this collection in order to get a statistical significant representation?

Regards

2. Jan 12, 2012

SW VandeCarr

The obvious problem I have with this is that you are not describing a random process. If by "inter-arrival times" you mean the time windows where passengers can most conveniently transfer, this would involve the variability around expected arrival times. What hypothesis are you testing?

Last edited: Jan 12, 2012
3. Jan 13, 2012

Mark J.

Hi by inter-arrival times I mean the interval between arrival of first and second bus at the same bus stop;
My hypothesis is that it is exponential as it is usually described in most literature but strange that from empirical data collected the distribution that fits is log-normal.
Maybe I am doing something wrong but not quite sure

Regards

4. Jan 13, 2012

SW VandeCarr

You really haven't described a hypothesis. One hypothesis is that on average, the bus is on time or early. The alternative hypothesis might be that the bus on average is late. This is a useful way to phrase it since it allows you to use the normal or lognormal distribution based on the Central Limit Theorem (CLT). You would calculate the sample mean and variance of the time between arrivals in the usual way. You would also specify an ideal or scheduled time interval, say 15 minutes (use 0.25 hr). If the curve is skewed to the right, as I expect if this is a realistic situation, you would do a log transform to obtain a hopefully symmetric curve for hypothesis testing. Note the ideal (on time) value would be to the left of the mean on the transformed curve. You want to evaluate the difference between the ideal (on time) value on time and the transformed mean using the normal assumptions.

Don't be concerned with the idea that the initial distribution might have been exponential. If it was, your calculated standard deviation should be close to the actual rate parameter. Note, I don't fully understand your problem, so I sort of invented one that seems to fit your concerns.

EDIT: If this is to be your hypothesis, the sample size is a secondary consideration. 100 arrival time intervals is more than adequate. There should be at least 30 for a one sided alpha=0.05. The key thing is to frame your problem so you can do hypothesis testing.

Last edited: Jan 13, 2012
5. Jan 18, 2012

Mark J.

I am working on a problem of modeling bus arrival data so I gathered data from different bus stops.I found general descriptive statistics parameters and make some tests on Minitab but for every stop there are different distributions fitting like Wei bull , Logistics,Log normal.After transforming box cox transformation my data fit to normal distribution.
I have the sensation of missing something or maybe making something wrong on my way.
Please some advice could be appreciated or maybe some good book on this topics.
Regards