Determine the required sample size of collection

Mark J. · Jan 12, 2012

Hi.
Regarding bus inter-arrival times how to determine how much empirical data we must collect and how can we determine the required sample size from this collection in order to get a statistical significant representation?

Regards

SW VandeCarr · Jan 12, 2012

Mark J. said:

Hi.
Regarding bus inter-arrival times how to determine how much empirical data we must collect and how can we determine the required sample size from this collection in order to get a statistical significant representation?

Regards

The obvious problem I have with this is that you are not describing a random process. If by "inter-arrival times" you mean the time windows where passengers can most conveniently transfer, this would involve the variability around expected arrival times. What hypothesis are you testing?

Mark J. · Jan 13, 2012

Hi by inter-arrival times I mean the interval between arrival of first and second bus at the same bus stop;
My hypothesis is that it is exponential as it is usually described in most literature but strange that from empirical data collected the distribution that fits is log-normal.
Maybe I am doing something wrong but not quite sure

Regards

SW VandeCarr · Jan 13, 2012

Mark J. said:

Hi by inter-arrival times I mean the interval between arrival of first and second bus at the same bus stop;
My hypothesis is that it is exponential as it is usually described in most literature but strange that from empirical data collected the distribution that fits is log-normal.
Maybe I am doing something wrong but not quite sure

Regards

You really haven't described a hypothesis. One hypothesis is that on average, the bus is on time or early. The alternative hypothesis might be that the bus on average is late. This is a useful way to phrase it since it allows you to use the normal or lognormal distribution based on the Central Limit Theorem (CLT). You would calculate the sample mean and variance of the time between arrivals in the usual way. You would also specify an ideal or scheduled time interval, say 15 minutes (use 0.25 hr). If the curve is skewed to the right, as I expect if this is a realistic situation, you would do a log transform to obtain a hopefully symmetric curve for hypothesis testing. Note the ideal (on time) value would be to the left of the mean on the transformed curve. You want to evaluate the difference between the ideal (on time) value on time and the transformed mean using the normal assumptions.

Don't be concerned with the idea that the initial distribution might have been exponential. If it was, your calculated standard deviation should be close to the actual rate parameter. Note, I don't fully understand your problem, so I sort of invented one that seems to fit your concerns.

EDIT: If this is to be your hypothesis, the sample size is a secondary consideration. 100 arrival time intervals is more than adequate. There should be at least 30 for a one sided alpha=0.05. The key thing is to frame your problem so you can do hypothesis testing.

Mark J. · Jan 18, 2012

I am working on a problem of modeling bus arrival data so I gathered data from different bus stops.I found general descriptive statistics parameters and make some tests on Minitab but for every stop there are different distributions fitting like Wei bull , Logistics,Log normal.After transforming box cox transformation my data fit to normal distribution.
I have the sensation of missing something or maybe making something wrong on my way.
Please some advice could be appreciated or maybe some good book on this topics.
Regards

Determine the required sample size of collection

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight