Calculating Averages: Not so simple after all?

  • Thread starter Thread starter Astro
  • Start date Start date
AI Thread Summary
The discussion centers around the confusion in calculating averages using two different methods, which yield different results. Method #1 calculates the overall average based on total class hours and classes, while Method #2 averages the low and high averages to find the overall average. The discrepancy arises because the two methods are not mathematically equivalent, as demonstrated by an example of average speeds for a round trip. Additionally, participants discuss how averaging subsets of data can lead to different results due to unequal sample sizes and the nature of statistical sampling. The conversation highlights the importance of understanding the context and methodology behind average calculations.
Astro
Messages
48
Reaction score
1
Maybe you math guys & gals will have more luck than the physics people. Basically, I calculated an average two different ways and got two different answers. The problem is that both answers should be the same and I don't understand why they are not. My question is "why are they not the same" or "why is one wrong"? To see my data and work, please see the attachment. In particular, the problem is: why does method1-answer not= method2-answer?

Thanks.
 

Attachments

Physics news on Phys.org
Neither of them look right. First off, the average should be:

(total # of class hours) / (total # of classes)

right? Secondly, I don't see how the next step follows in either of your approaches.
 
Some weeks are busier than others. That means more classes and more hours. The least busiest week is what I have termed the 'low average'. The busiest a week can ever be is what I have termed the 'high average'. On average, a week is busier than the least-busiest-week and less busy than the most-busiest-week.

I need all three types of averages for my project. If I only wanted to find the overall average I would approach it like you suggested. In fact, that's exactly what I did in method #1. The only difference is that I am more specific in my terminology. Instead of saying average=(total # of class hours)/(total # of classes) I said average =(total # average of class hours) / (total # average of classes). That extra word is important because without it it's not clear if you're trying to calculate the low, high, or overall average. If you take a careful look at my table you will see that I have indicated that some classes do not run every week. If every class did run every week then , yes, I would have answered the question exactly as you suggested.

Since I had calculated the 'low average' and 'high average" already, I thought it might be easier to average the two to find the overall average instead of doing the whole calculation from scratch. (low+high)/2 is what I did in method #2.

What I can't explain is why method #1 and method #2 give different answers. That's what I need help with. It became obvious that the two methods are not mathematically equivalent but I really don't understand why not. To me, it seems that they should be.

Anyway, thanks for trying. You're the one who posted a reply, so far, and I appreciate it.

~ Astro ~
 
I thought it might be easier to average the two to find the overall average instead of doing the whole calculation from scratch. (low+high)/2 is what I did in method #2.

What I can't explain is why method #1 and method #2 give different answers.
Can you explain why you think they should be the same?


There's a classic (pseudo)paradox: while driving to my friends house, I averaged 45 miles per hour. While driving home, I averaged 55 miles per hour. What was my average speed for the round trip?

Average speed is, of course, (total distance) / (total time). If the distance between our houses is X, then:

Total distance = 2X
Total time = (time there) + (time back) = X/45 + X/55 = 4X/99.
Average speed: 99/2 = 49.5 MPH.

Of course, if you're not thinking, you might simply take the (unweighted) average of 45 and 55 and get 50... but since this computation is not computing (total distance) / (total time), there's no reason to think you should get the right answer.

(Of course, taking the appropriate weighted average of the two speeds does work)
 
Thanks Hurkyl!

Yes, I think I understand now. Your example helped quite a bit although I had to think about it for a while.

(See new attachment for more info.)
 

Attachments

Last edited:
Hi,

I am facing a similar problem. What I am trying to do is average a series of values by dividing them into different sets. For example, let us take 8 values, 1, 2, 3, 4, 6, 1, 2, and 3. When I average ((1+2+3)/3 + (4+6)/2 + (1+2+3)/3)/3, I get 3 as the result. Whereas when I average (1+2+3+4+6+1+2+3)/8, I get 2.75 as the result. Why is this so? I get a different average each time I divide these values into different sets. While 2.75 is the correct answer, is there a way to find a correction factor which can be applied to these other averages to arrive at the correct average?

Regards,
Santosh
 
Santosh_J said:
Hi,

I am facing a similar problem. What I am trying to do is average a series of values by dividing them into different sets. For example, let us take 8 values, 1, 2, 3, 4, 6, 1, 2, and 3. When I average ((1+2+3)/3 + (4+6)/2 + (1+2+3)/3)/3, I get 3 as the result. Whereas when I average (1+2+3+4+6+1+2+3)/8, I get 2.75 as the result. Why is this so? I get a different average each time I divide these values into different sets. While 2.75 is the correct answer, is there a way to find a correction factor which can be applied to these other averages to arrive at the correct average?

Regards,
Santosh

What you're doing is actually very interesting because it is very similar to how statistics works.

You could think of taking the average of all of the numbers as finding the true average of that population of numbers.

Taking the average of all small subsets is similar to statistical sampling. If you kept taking the average of small sets (for example (1+2+3)/3 = 2) and repeated that process with different (randomly chosen) sets, you would expect the average of those averages to approach the population average of 2.75.

You don't get that because you haven't taken enough samples.

Another thing is the samples you take aren't of the same size, so things get weighted unfairly.

But without thinking about that too much, Try doing this (sorry I don't have a picture):

Draw a number line at the top of a page from 0 to 3

Now we know that the overall average will be 2 (by adding 1+2+3 and dividing by 3), so draw a line directly down from the 2. You can see that the line is directly between 1 and 3, and you can draw an inverted triangle leading to these numbers. From this you can visually see that when you take an unweighted average of numbers that increment by the same amount, the result is exactly in the middle of the numbers.

Now draw an inverted pyramid for the average of 1 + 2. Again it's directly between the numbers. Now draw a pyramid showing the average between the average you just found for (1+2)/2 and 3. So the line you draw is directly between the average of 1+2 and 3. See how things get skewed higher?

This is visually what is occurring when you take smaller sets and average their averages. What happens to the picture when the smaller sets all have the same size?

I hope that makes sense (wish I had a pic!).

Edit: you could draw the line up and then the pyramids won't be inverted. Just not the way I drew it on my white board. :P
 
Last edited:
Back
Top