I need help with a problem

1. Jul 21, 2011

thetexan

This is isnt homework but it is a problem I dont have the training to solve.

Im an air traffic controller. We have a database of 5840 hours of traffic during a 365 day year. Each of these 5840 numbers represent the amount of traffic we worked during that hour. They range from 0 to 72 operations per hour. I have to determine a number representing the average busy hour for the year.

So I know it wont be 0 and it wont be 72 and it wont be an average of all 5840 numbers since that would only give an average hourly count.

To give you an idea about 40% of the traffic hours have less than 10. Maybe 10% has over 60 and the rest is between 10 and 60. But what I need is the average busy hour. In other words I have to first determine what would be considered a busy range then get an average of that.

Here is the question. Wont this need some sort of statistical analysis? Bell curve and all that. Standard deviations and the rest? All of which Im unfamiliar. I thought that I could just randomly state that I will say that 'busy' is anything between 40 and 60. Anything over that is 'heavy' so I will average the 40-60 range. But isnt there some way to make these determinations mathematically that isnt arbitrary?

I need help. Maybe you guys can give me a clue./

thanks,
tex

2. Jul 21, 2011

pmsrw3

There is no definite way of doing this. You have to define what you mean by "busy", and statistics is not going to tell you how to do it. That said, I would start by plotting a histogram. See if it looks approximately bell-shaped. If so, reasonable definition of "Busy" might be "more than 1 standard deviation above average" (but only if that turns out to be a number that a reasonable professional would call "busy"). If you have the data in excel, use the data analysis toolkits. They can do histograms and averages and standard deviations.

3. Jul 21, 2011

gsal

maybe you can use some mundane parameters to define what 'busy' is...for example...how many controllers do you have around? how many do you use during the light hours? busy hours? heavy hours? maybe the upper range of 'busy' is defined as the maximum amount of traffic that can be handled by 3 controllers, before you have to call a 4th controller?

...I have no idea what I am talking about, though, just rambling...

4. Jul 22, 2011

Stephen Tashi

thetexan,

I agree with the other responses that you must define "busy". The way to do this is ask yourself "What am I trying to accomplish?" (which you may know, but you haven't revealed).

For example, some cases that I could make up would be:

case 1: You have been asked to design a scenario for an air traffic control simulation to be used to train new controllers. You want a scenario that represents a typical "busy" shift. You want to know how much air traffic to put in this scenario.

case 2: You are writing your resume and it will contain the sentence "On a typical busy shift, I handle X planes per hour". You want to know what number to use for X.

5. Jul 22, 2011

thetexan

This is what Im thinking and Ill explain why....

In electronics the Root Mean Square value for a sine wave curve representing alternating current is .707 times the peak voltage. That represents the 'perceived' voltage for a given peak value. How is that determined mathematically? It isnt the 'heaviest' value but represents a type of average. Would something like this be analogous to what I need.

Here is a histogram of one months count. It is sorted from lowest to highest values. The actual histogram is the dark left half of the image. The lighter right half is a duplicate mirror image of the left half.

This forms a curve that should have some mathematical definition. It seems to me that Im looking for something akin to a RMS value and that it would fall somewhere in the pink region (Im just guessing).

Am I thinking right? Is something like RMS what Im really looking for? Is there a statistical method that can solve this or, like you say, I need to define what is 'busy'. It seems that if I 'define' busy then it is arbitrary and unprovable. There must be some way to determine some mathematical statistical values.

I dont know. I trust you math wiz's to know for sure.

tex

Last edited: Jul 22, 2011
6. Jul 22, 2011

pmsrw3

Good thinking! Standard deviation is in fact RMS value. (Actually, it's the RMS value of the difference from the mean, but that's closely related to the overall RMS value.)

Could you explain your graph? Is this really a histogram? It doesn't look like one. In a histogram of "amount of traffic", the x-axis should be amount of traffic in an hour, ranging from 0 to 72, and the y-axis should be the number of hours in which that much traffic was handled. See http://en.wikipedia.org/wiki/Histogram" [Broken] for some examples.

On this graph, what's the x-axis? What's the y-axis?

Last edited by a moderator: May 5, 2017
7. Jul 22, 2011

pmsrw3

Ah, I think I've figured it out. You're plotting what is known as the cumulative distribution function, or CDF, except that you have swapped the x and y axes. You have amount of traffic on the y and total number of hours with that much OR LESS traffic on the x, right? (Except I'm puzzled why it only goes up to 700, rather than 5840.) In that case, doing my best to differentiate it in my head, you have a roughly bell-shaped curve, but somewhat skew, with a long tail leading out to higher number of hours.

8. Jul 22, 2011

thetexan

The left dark half of the diagram is a histogram (a series of columnar bars representing traffic count for each of 720 hours in a 30 day period), sorted in asscending value, of one month's traffic. The x axis represent the hours of count for 30 days (720). The y axis represents the actual traffic count. We can disregard any zero count. So the actual curve Im interested in starts at about hour
231.

The right light gray half of the diagram is simply a mirror image of the left half shown to complete the curve.

Again, disregard the zero values. We are only concerned with at or above 1 which starts at about hour 231.

tex

Last edited: Jul 22, 2011
9. Jul 22, 2011

pmsrw3

Well, but this can't be right. For instance, your "histogram" has a point at about x = 700h, y = 33. So, according to the explanation you just gave, there were 700h hours that month in which the count was 33. There were also around 700h in which the count was 34, 35, etc. Given that there are a total of 720 hours in a month, this is clearly impossible.

Are you SURE that this graph shows the # of hours when the count was equal to y, rather than the number of hours when the count was y or less? If it's the latter, then it's not a histogram -- it's a CDF.

Sorry to harp on this, but this is not a trivial difference.

10. Jul 22, 2011

thetexan

I guess Im confused. Maybe it isnt a histogram. It's just a graph where each vertical bar represents a single hour's worth of traffic. The fact that the graph seems ragged is that there may be many hours that have a single value such as 33 so you get the stair step effect.

But this is for sure. This represents 720 hours and Im only interested in the curve from hour 231 to 720 since 1 to 230 are zero hours (mostly the midnight shift) and we dont care about those in determining a busy hour.

So, again, each hour has its own vertical bar representing count for that hour. Im shouldnt have called it a histogram. Im sorry for the confussion.

tex

11. Jul 22, 2011

pmsrw3

OK, I get it. There's a single vertical bar for each of the 720 hours. The height of the bar is the activity during that hour. The bars, rather than being shown in order of time, are shown in order of amount of activity. And then, just in case the graph isn't sufficiently confusing or doesn't take up enough space, the whole thing is mirrored on the right :-)

So, this is, as I thought, an inverted CDF plot. The bar at 369, for instance, seems to go up to 8. That means that there were 369 hours in which the activity was 8 or less.

This seems a strange and confusing way of plotting activity levels to me, but I'm guessing that it's a standard way of plotting the data in your business? If so, and if the folks you usually talk to are used to looking at graphs like this, I understand why you present the data so.

12. Jul 22, 2011

thetexan

Figure 1 shows the bar graph for 720 hours (30 days with hour number 1 chronologically thru hour 720 on the x axis) and the traffic count for each of those hours.

Figure 2 shows the bar graph for about 490 hours and their traffic. 490 hours after removing all zero hours. This graph shows only hours with any traffic. The x axis represents hour units.

Figure 3 shows fig 2 with a mirror image to help visualize a bell type curve. Hopefully this explains the graph.

tex

13. Jul 22, 2011

pmsrw3

The problem is that this is a bell curve that is artificially created by the manipulation of mirroring the data. Making the curve look bell-shaped in this ways decreases ones ability to determine whether activity actually follows a bell-shaped distribution. If you really want to visualize your data in a way that allows you to determine whether they follow a bell-shaped distribution, you should plot a histogram.

14. Jul 22, 2011

thetexan

Ok, forget the bell curve.

Forget everything Ive said up to this point. I dont understand what the problem is with the graph. It says what it says, as described.

Consider a set of values represented in figure 2.

Is there a way to statistically find a value that represents what I would call a average 'busy' session. As I asked before...would an RMS-type calculation be what I need? If so can someone please tell me how to do that?

tex

15. Jul 22, 2011

pmsrw3

There is not a way to statistically find a value that represents what you would call an average "busy" session, unless you define what you (this means you personally, not you generically) would call a "busy" session. There is no obvious reason I can think of why the RMS value would be that, but you seem to think it is, and it's really your call.

To calculate RMS activity, calculate the average activity, $\overline{x}$, and the standard deviation of activity, $s$. You can do this in excel, or you can use the formula, such as you find http://www.mathmotivation.com/symbolic/standard-deviation.html", or you can use one of the many calculators you'll find on the web. Then the RMS value of activity is

$$\sqrt{{\overline{x}}^2+s^2}$$

Last edited by a moderator: Apr 26, 2017
16. Jul 22, 2011

thetexan

Ok, I made what I believe is a correct histogram.....thus

The x axis represents traffic count. The y axis represents the number of non-zero hours (490) that had that value of traffic for that hour. For example there were 33 hours with a count of 8 during the month....there were 15 hours with a value of 7.

Maybe this helps.

tex

17. Jul 22, 2011

thetexan

pmsrw3,

I appreciate all of your ideas and help. I have no idea what I want. It just seemed to me, as a math neophyte, that something similar to RMS with electricity (determining voltage) was something like I needed. A couple of posts earlier thought it might be possible.

Maybe 'busy' is a hangup. Maybe first I need to figure out a statistical average. It seems to me I need the average traffic that occurs over the most number of hours, if that makes sense. This would be different from a simple average I would think.

The simple average comes out to 13.93 per hour.

tex

18. Jul 22, 2011

pmsrw3

That looks reasonably bell-shaped. A normal distribution would be useful, though crude approximation.