Calculating mean from 5 number summary

Click For Summary

Discussion Overview

The discussion revolves around the possibility of calculating the mean from a 5-number summary, which includes the minimum, first quartile (Q1), median, third quartile (Q3), and maximum values of a dataset. Participants explore methods for estimating the mean based on these summary statistics, discussing both theoretical and practical aspects.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that it should be possible to estimate the mean from the 5-number summary by considering percentiles and using calculus.
  • Others argue that an exact calculation of the mean from the 5-number summary is not feasible, suggesting that it can only be estimated.
  • A specific formula for a rough estimate is presented: (min + max + 2(Q1 + median + Q3)) / 8.
  • One participant shares a dataset and notes that the approximation provided by the formula is significantly off from the actual mean, indicating the roughness of the estimate.
  • Another participant mentions that if the distribution is fairly symmetrical, the mean and median may be approximately equal, but this does not resolve the issue of calculating the mean directly from the summary.
  • Concerns are raised about the influence of extreme values (outliers) on the mean and the effectiveness of the estimation method.

Areas of Agreement / Disagreement

Participants generally agree that an exact calculation of the mean from the 5-number summary is not possible, but there is disagreement on the effectiveness and accuracy of various estimation methods proposed.

Contextual Notes

Limitations include the dependence on the distribution shape and the potential influence of outliers on the mean, which complicates the estimation process.

Banaticus
Messages
32
Reaction score
0
It seems like it should be possible to calculate the mean (usual average) from a 5-number summary of a set of numbers (min, first quartile or Q1, median, third quartile or Q3, and max). You should be able to calculate roughly what a percentile is, then by taking each discrete percentile and then taking the average of those hundred numbers... or better yet by using calculus and taking every percentile point, and the average of every point, you should be able to come really close to the mean, if not compute it directly. I, however, don't know math well enough to do that, nor do I remember any calculus.

In one data set that I'm looking at, the min, Q1, median, Q3, and max are: 0, 3900, 18882, 50145.5, 1250000
And the mean is: 46172.04545 or just under Q3.
How can the mean be calculated from those 5 numbers?
 
Physics news on Phys.org
You can't exactly. A rough estimate (analogous to trapezoid rule for integral approximation) is:
(min + max + 2(Q1+median+Q3))/8.
 
Banaticus said:
It seems like it should be possible to calculate the mean (usual average) from a 5-number summary of a set of numbers ...
Did you mean estimate? Surely you can see (as mathman pointed out) that you can't get an exact calculation based on just summary numbers.
 
mathman said:
You can't exactly. A rough estimate (analogous to trapezoid rule for integral approximation) is:
(min + max + 2(Q1+median+Q3))/8.
Hmm, ok, thanks. I have one data set where the 5-number summary is: 0, 29496, 68552, 124280, 780575. The mean is 80041.24331 and that approximation comes up with 153153.875, so I guess it's a really rough estimate.
phinds said:
Did you mean estimate?
Sure, why not. If I clearly don't know what I'm talking about, feel free to attempt to fill in the gaps. :)
 
Banaticus said:
It seems like it should be possible to calculate the mean (usual average) from a 5-number summary of a set of numbers (min, first quartile or Q1, median, third quartile or Q3, and max). You should be able to calculate roughly what a percentile is, then by taking each discrete percentile and then taking the average of those hundred numbers... or better yet by using calculus and taking every percentile point, and the average of every point, you should be able to come really close to the mean, if not compute it directly. I, however, don't know math well enough to do that, nor do I remember any calculus.

In one data set that I'm looking at, the min, Q1, median, Q3, and max are: 0, 3900, 18882, 50145.5, 1250000
And the mean is: 46172.04545 or just under Q3.
How can the mean be calculated from those 5 numbers?
You cannot. But if you the distribution is fairly symmetrical (like the familiar bell-curve), the mean and the median are approximately equal.
 
Banaticus said:
Hmm, ok, thanks. I have one data set where the 5-number summary is: 0, 29496, 68552, 124280, 780575. The mean is 80041.24331 and that approximation comes up with 153153.875, so I guess it's a really rough estimate.

Sure, why not. If I clearly don't know what I'm talking about, feel free to attempt to fill in the gaps. :)
The last number (780575) swamps the other four.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
Replies
13
Views
4K
  • · Replies 16 ·
Replies
16
Views
2K
Replies
4
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K