I have been looking at a problem, and just can’t get my head round the right way to tackle it.

I have a whole heap of data for an airline showing the taxy duration of each of their flights (this is the time taken to go from the gate to actual lift off). As I see it, there are 3 variables that can affect this time:

1. Flight number (this takes account of both departure station and time of day)

2. Aircraft type

3. Month of year

Now, having had a look at the data, it does not follow a simple normal or chi-squared distribution – it actually varies. Some airports have two runways – one miles from the terminal and the other right next it – hence you get two clusters.

What I need to have as an output (monthly) is:

Flight number: (set by user)

Significance: (set by user) – probability that a taxi duration will fall within the output taxy time duration (below)

Taxy duration: (derived)

What I have is years worth of data. The data obviously changes year to year, so whilst I want the month of year variable considered, it must also be in relation to how the previous month compared to the previous month a year ago (or some other way eg deviation of month from annual mean?).

From a statistical perspective, how would I assess what type of distribution the data falls into, and how would I handle the sample to be used (sample size/the different variables)?

Would really appreciate your help.

The purpose of analysis is for improved aircraft fuel management – currently a simple average taxy time of the previous month (by flight number) is used.

Thanks,

Basil