Modelling Taxy Duration Distribution & Deriving Probability of Occurrence

iambasil · Sep 20, 2005

Hiya,

I have been looking at a problem, and just can’t get my head round the right way to tackle it.

I have a whole heap of data for an airline showing the taxy duration of each of their flights (this is the time taken to go from the gate to actual lift off). As I see it, there are 3 variables that can affect this time:
1. Flight number (this takes account of both departure station and time of day)
2. Aircraft type
3. Month of year

Now, having had a look at the data, it does not follow a simple normal or chi-squared distribution – it actually varies. Some airports have two runways – one miles from the terminal and the other right next it – hence you get two clusters.

What I need to have as an output (monthly) is:
Flight number: (set by user)
Significance: (set by user) – probability that a taxi duration will fall within the output taxy time duration (below)
Taxy duration: (derived)

What I have is years worth of data. The data obviously changes year to year, so whilst I want the month of year variable considered, it must also be in relation to how the previous month compared to the previous month a year ago (or some other way eg deviation of month from annual mean?).

From a statistical perspective, how would I assess what type of distribution the data falls into, and how would I handle the sample to be used (sample size/the different variables)?

Would really appreciate your help.

The purpose of analysis is for improved aircraft fuel management – currently a simple average taxy time of the previous month (by flight number) is used.

Thanks,

Basil

EnumaElish · Sep 21, 2005

What you want to do is to run a regression with taxiing duration on the left hand side and all other variables that may affect that duration on the right hand side. Since you don't really care about a "theory" that explains that relationship (am I correct that what you are really looking for is an empirical relationship?) I would start with "the kitchen sink" on the right hand side. That is, put in every conceivable variable on the right hand side and maximize the R-squared, and/or the total F statistic for overall significance; i.e., maximize "equation significance" as opposed to "individual variable significance" (measured by individual t-statistic values). And, you want to start with the Ordinary Least Squares regression technique -- you can always refine it (graduate to more complex techniques if needed) further down the road.

iambasil · Sep 22, 2005

Thank you so much for the response.

Sorry for being naive on this, but I don't quite understand what you mean by 'on the right' - are there any examples on the web of something similar being done?

Basil

EnumaElish · Sep 22, 2005

Check this out. In a simple (single x) regression equation y = a + b x + u, y is the dependent or explained variable "on the left," and x is the independent or explanatory variable "on the right."

"a" and "b" are model coefficients (parameters) to be estimated; "u" is the error term.

A multiple regression equation has multiple variables on the right: y = a + b₁x₁ + ... + b_kx_k + u.

For example, x₁ may be departure station, x₂ may be time of day, x₃ may be type of equipment (somehow numericized, e.g. large = 1, small = 0 and then x₄ might be coded jet = 1, prop = 0, etc.), x₅ may be month of the year, and x₆ might be "the number of pizza slices served in the airport pizzeria during the flight day."

Modelling Taxy Duration Distribution & Deriving Probability of Occurrence

1. What is the purpose of modelling taxy duration distribution?

2. How is taxy duration data collected for modelling?

3. What factors are typically considered when modelling taxy duration distribution?

4. How can the probability of occurrence be derived from a taxy duration distribution model?

5. What are some potential applications of modelling taxy duration distribution?

Similar threads

Hot Threads

Recent Insights