Modelling Taxy Duration Distribution & Deriving Probability of Occurrence

In summary, In a regression analysis, you would use multiple variables on the right to explain the dependent variable on the left (taxy duration). You would use a regression equation to estimate the model coefficients (parameters). And, you would use multiple regressions to analyze multiple variables.
  • #1
iambasil
14
0
Hiya,

I have been looking at a problem, and just can’t get my head round the right way to tackle it.

I have a whole heap of data for an airline showing the taxy duration of each of their flights (this is the time taken to go from the gate to actual lift off). As I see it, there are 3 variables that can affect this time:
1. Flight number (this takes account of both departure station and time of day)
2. Aircraft type
3. Month of year

Now, having had a look at the data, it does not follow a simple normal or chi-squared distribution – it actually varies. Some airports have two runways – one miles from the terminal and the other right next it – hence you get two clusters.

What I need to have as an output (monthly) is:
Flight number: (set by user)
Significance: (set by user) – probability that a taxi duration will fall within the output taxy time duration (below)
Taxy duration: (derived)

What I have is years worth of data. The data obviously changes year to year, so whilst I want the month of year variable considered, it must also be in relation to how the previous month compared to the previous month a year ago (or some other way eg deviation of month from annual mean?).

From a statistical perspective, how would I assess what type of distribution the data falls into, and how would I handle the sample to be used (sample size/the different variables)?

Would really appreciate your help.

The purpose of analysis is for improved aircraft fuel management – currently a simple average taxy time of the previous month (by flight number) is used.

Thanks,

Basil
 
Physics news on Phys.org
  • #2
What you want to do is to run a regression with taxiing duration on the left hand side and all other variables that may affect that duration on the right hand side. Since you don't really care about a "theory" that explains that relationship (am I correct that what you are really looking for is an empirical relationship?) I would start with "the kitchen sink" on the right hand side. That is, put in every conceivable variable on the right hand side and maximize the R-squared, and/or the total F statistic for overall significance; i.e., maximize "equation significance" as opposed to "individual variable significance" (measured by individual t-statistic values). And, you want to start with the Ordinary Least Squares regression technique -- you can always refine it (graduate to more complex techniques if needed) further down the road.
 
  • #3
Thank you so much for the response.

Sorry for being naive on this, but I don't quite understand what you mean by 'on the right' - are there any examples on the web of something similar being done?

Basil
 
  • #4
Check this out. In a simple (single x) regression equation y = a + b x + u, y is the dependent or explained variable "on the left," and x is the independent or explanatory variable "on the right."

"a" and "b" are model coefficients (parameters) to be estimated; "u" is the error term.

A multiple regression equation has multiple variables on the right: y = a + b1x1 + ... + bkxk + u.

For example, x1 may be departure station, x2 may be time of day, x3 may be type of equipment (somehow numericized, e.g. large = 1, small = 0 and then x4 might be coded jet = 1, prop = 0, etc.), x5 may be month of the year, and x6 might be "the number of pizza slices served in the airport pizzeria during the flight day." :smile:
 

1. What is the purpose of modelling taxy duration distribution?

The purpose of modelling taxy duration distribution is to understand the patterns and trends in how long taxy rides typically last. This information can be used to improve transportation systems, optimize routes, and make more accurate predictions about travel times.

2. How is taxy duration data collected for modelling?

Taxy duration data can be collected through various methods, such as surveys, GPS tracking, or analyzing historical trip data. This data is then used to create a distribution of taxy duration, which can be modeled using statistical techniques.

3. What factors are typically considered when modelling taxy duration distribution?

Some common factors that are considered when modelling taxy duration distribution include traffic conditions, distance traveled, time of day, and pick-up/drop-off locations. Other variables such as weather, events, and demand for taxy services may also be taken into account.

4. How can the probability of occurrence be derived from a taxy duration distribution model?

The probability of occurrence can be derived by calculating the area under the curve of the taxy duration distribution model. This area represents the likelihood of a taxy ride lasting a certain duration, and can be used to make predictions about the probability of different trip lengths.

5. What are some potential applications of modelling taxy duration distribution?

Modelling taxy duration distribution can have various applications, such as optimizing taxy routes, predicting travel times, and improving transportation planning. It can also be used to analyze the efficiency and effectiveness of taxy services, and identify areas for improvement.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
323
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
387
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
438
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Precalculus Mathematics Homework Help
Replies
1
Views
567
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
3K
Back
Top