Determine the required sample size of collection

  • Context: Undergrad 
  • Thread starter Thread starter Mark J.
  • Start date Start date
  • Tags Tags
    Sample size
Click For Summary

Discussion Overview

The discussion revolves around determining the required sample size for collecting empirical data on bus inter-arrival times, with a focus on achieving a statistically significant representation. Participants explore the nature of the data, potential distributions, and hypotheses related to bus arrival patterns.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions the randomness of the inter-arrival times and asks for clarification on the hypothesis being tested.
  • Another participant defines inter-arrival times as the interval between the first and second bus arrivals and proposes that the distribution should be exponential, although empirical data suggests a log-normal fit.
  • A different participant suggests framing the problem with alternative hypotheses regarding bus punctuality and discusses the use of normal or log-normal distributions based on the Central Limit Theorem (CLT).
  • It is mentioned that a sample size of 100 arrival time intervals is generally adequate, with at least 30 needed for hypothesis testing at a specified alpha level.
  • One participant shares their experience of gathering data from various bus stops, noting different fitting distributions and expressing uncertainty about their analysis process, including the use of Box-Cox transformation to achieve normality.

Areas of Agreement / Disagreement

Participants express differing views on the appropriate distribution for bus inter-arrival times, with some supporting the exponential hypothesis and others noting the log-normal fit. There is no consensus on the best approach to framing the hypothesis or determining the sample size needed for statistical significance.

Contextual Notes

Participants highlight potential limitations in their understanding of the data and the hypotheses being tested, as well as the variability in fitting distributions across different bus stops. There are unresolved questions regarding the appropriateness of the transformations applied to the data.

Who May Find This Useful

This discussion may be useful for researchers or practitioners involved in transportation modeling, statistical analysis of arrival times, or those interested in hypothesis testing methodologies.

Mark J.
Messages
81
Reaction score
0
Hi.
Regarding bus inter-arrival times how to determine how much empirical data we must collect and how can we determine the required sample size from this collection in order to get a statistical significant representation?

Regards
 
Physics news on Phys.org
Mark J. said:
Hi.
Regarding bus inter-arrival times how to determine how much empirical data we must collect and how can we determine the required sample size from this collection in order to get a statistical significant representation?

Regards

The obvious problem I have with this is that you are not describing a random process. If by "inter-arrival times" you mean the time windows where passengers can most conveniently transfer, this would involve the variability around expected arrival times. What hypothesis are you testing?
 
Last edited:
Hi by inter-arrival times I mean the interval between arrival of first and second bus at the same bus stop;
My hypothesis is that it is exponential as it is usually described in most literature but strange that from empirical data collected the distribution that fits is log-normal.
Maybe I am doing something wrong but not quite sure

Regards
 
Mark J. said:
Hi by inter-arrival times I mean the interval between arrival of first and second bus at the same bus stop;
My hypothesis is that it is exponential as it is usually described in most literature but strange that from empirical data collected the distribution that fits is log-normal.
Maybe I am doing something wrong but not quite sure

Regards

You really haven't described a hypothesis. One hypothesis is that on average, the bus is on time or early. The alternative hypothesis might be that the bus on average is late. This is a useful way to phrase it since it allows you to use the normal or lognormal distribution based on the Central Limit Theorem (CLT). You would calculate the sample mean and variance of the time between arrivals in the usual way. You would also specify an ideal or scheduled time interval, say 15 minutes (use 0.25 hr). If the curve is skewed to the right, as I expect if this is a realistic situation, you would do a log transform to obtain a hopefully symmetric curve for hypothesis testing. Note the ideal (on time) value would be to the left of the mean on the transformed curve. You want to evaluate the difference between the ideal (on time) value on time and the transformed mean using the normal assumptions.

Don't be concerned with the idea that the initial distribution might have been exponential. If it was, your calculated standard deviation should be close to the actual rate parameter. Note, I don't fully understand your problem, so I sort of invented one that seems to fit your concerns.

EDIT: If this is to be your hypothesis, the sample size is a secondary consideration. 100 arrival time intervals is more than adequate. There should be at least 30 for a one sided alpha=0.05. The key thing is to frame your problem so you can do hypothesis testing.
 
Last edited:
I am working on a problem of modeling bus arrival data so I gathered data from different bus stops.I found general descriptive statistics parameters and make some tests on Minitab but for every stop there are different distributions fitting like Wei bull , Logistics,Log normal.After transforming box cox transformation my data fit to normal distribution.
I have the sensation of missing something or maybe making something wrong on my way.
Please some advice could be appreciated or maybe some good book on this topics.
Regards
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 31 ·
2
Replies
31
Views
5K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 24 ·
Replies
24
Views
7K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 9 ·
Replies
9
Views
3K