Determine the required sample size of collection

  • Thread starter Thread starter Mark J.
  • Start date Start date
  • Tags Tags
    Sample size
Mark J.
Messages
81
Reaction score
0
Hi.
Regarding bus inter-arrival times how to determine how much empirical data we must collect and how can we determine the required sample size from this collection in order to get a statistical significant representation?

Regards
 
Physics news on Phys.org
Mark J. said:
Hi.
Regarding bus inter-arrival times how to determine how much empirical data we must collect and how can we determine the required sample size from this collection in order to get a statistical significant representation?

Regards

The obvious problem I have with this is that you are not describing a random process. If by "inter-arrival times" you mean the time windows where passengers can most conveniently transfer, this would involve the variability around expected arrival times. What hypothesis are you testing?
 
Last edited:
Hi by inter-arrival times I mean the interval between arrival of first and second bus at the same bus stop;
My hypothesis is that it is exponential as it is usually described in most literature but strange that from empirical data collected the distribution that fits is log-normal.
Maybe I am doing something wrong but not quite sure

Regards
 
Mark J. said:
Hi by inter-arrival times I mean the interval between arrival of first and second bus at the same bus stop;
My hypothesis is that it is exponential as it is usually described in most literature but strange that from empirical data collected the distribution that fits is log-normal.
Maybe I am doing something wrong but not quite sure

Regards

You really haven't described a hypothesis. One hypothesis is that on average, the bus is on time or early. The alternative hypothesis might be that the bus on average is late. This is a useful way to phrase it since it allows you to use the normal or lognormal distribution based on the Central Limit Theorem (CLT). You would calculate the sample mean and variance of the time between arrivals in the usual way. You would also specify an ideal or scheduled time interval, say 15 minutes (use 0.25 hr). If the curve is skewed to the right, as I expect if this is a realistic situation, you would do a log transform to obtain a hopefully symmetric curve for hypothesis testing. Note the ideal (on time) value would be to the left of the mean on the transformed curve. You want to evaluate the difference between the ideal (on time) value on time and the transformed mean using the normal assumptions.

Don't be concerned with the idea that the initial distribution might have been exponential. If it was, your calculated standard deviation should be close to the actual rate parameter. Note, I don't fully understand your problem, so I sort of invented one that seems to fit your concerns.

EDIT: If this is to be your hypothesis, the sample size is a secondary consideration. 100 arrival time intervals is more than adequate. There should be at least 30 for a one sided alpha=0.05. The key thing is to frame your problem so you can do hypothesis testing.
 
Last edited:
I am working on a problem of modeling bus arrival data so I gathered data from different bus stops.I found general descriptive statistics parameters and make some tests on Minitab but for every stop there are different distributions fitting like Wei bull , Logistics,Log normal.After transforming box cox transformation my data fit to normal distribution.
I have the sensation of missing something or maybe making something wrong on my way.
Please some advice could be appreciated or maybe some good book on this topics.
Regards
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Back
Top