Calculating mean from 5 number summary

In summary: So the mean is much closer to that than the rest, and is an overestimate. In summary, the conversation discusses the possibility of calculating the mean from a 5-number summary of a set of numbers. It is suggested that by using calculus and taking every percentile point, an estimation of the mean can be obtained. However, it is acknowledged that this method is not exact and only provides a rough estimate. The conversation also mentions a specific data set where the mean is significantly influenced by one large number.
  • #1
Banaticus
32
0
It seems like it should be possible to calculate the mean (usual average) from a 5-number summary of a set of numbers (min, first quartile or Q1, median, third quartile or Q3, and max). You should be able to calculate roughly what a percentile is, then by taking each discrete percentile and then taking the average of those hundred numbers... or better yet by using calculus and taking every percentile point, and the average of every point, you should be able to come really close to the mean, if not compute it directly. I, however, don't know math well enough to do that, nor do I remember any calculus.

In one data set that I'm looking at, the min, Q1, median, Q3, and max are: 0, 3900, 18882, 50145.5, 1250000
And the mean is: 46172.04545 or just under Q3.
How can the mean be calculated from those 5 numbers?
 
Mathematics news on Phys.org
  • #2
You can't exactly. A rough estimate (analogous to trapezoid rule for integral approximation) is:
(min + max + 2(Q1+median+Q3))/8.
 
  • #3
Banaticus said:
It seems like it should be possible to calculate the mean (usual average) from a 5-number summary of a set of numbers ...
Did you mean estimate? Surely you can see (as mathman pointed out) that you can't get an exact calculation based on just summary numbers.
 
  • #4
mathman said:
You can't exactly. A rough estimate (analogous to trapezoid rule for integral approximation) is:
(min + max + 2(Q1+median+Q3))/8.
Hmm, ok, thanks. I have one data set where the 5-number summary is: 0, 29496, 68552, 124280, 780575. The mean is 80041.24331 and that approximation comes up with 153153.875, so I guess it's a really rough estimate.
phinds said:
Did you mean estimate?
Sure, why not. If I clearly don't know what I'm talking about, feel free to attempt to fill in the gaps. :)
 
  • #5
Banaticus said:
It seems like it should be possible to calculate the mean (usual average) from a 5-number summary of a set of numbers (min, first quartile or Q1, median, third quartile or Q3, and max). You should be able to calculate roughly what a percentile is, then by taking each discrete percentile and then taking the average of those hundred numbers... or better yet by using calculus and taking every percentile point, and the average of every point, you should be able to come really close to the mean, if not compute it directly. I, however, don't know math well enough to do that, nor do I remember any calculus.

In one data set that I'm looking at, the min, Q1, median, Q3, and max are: 0, 3900, 18882, 50145.5, 1250000
And the mean is: 46172.04545 or just under Q3.
How can the mean be calculated from those 5 numbers?
You cannot. But if you the distribution is fairly symmetrical (like the familiar bell-curve), the mean and the median are approximately equal.
 
  • #6
Banaticus said:
Hmm, ok, thanks. I have one data set where the 5-number summary is: 0, 29496, 68552, 124280, 780575. The mean is 80041.24331 and that approximation comes up with 153153.875, so I guess it's a really rough estimate.

Sure, why not. If I clearly don't know what I'm talking about, feel free to attempt to fill in the gaps. :)
The last number (780575) swamps the other four.
 

1. How do you calculate the mean from a 5 number summary?

The mean can be calculated by adding all five numbers in the summary (minimum, lower quartile, median, upper quartile, and maximum) and then dividing the sum by 5.

2. Can the mean be calculated if the 5 number summary is missing one or more values?

No, the mean cannot be accurately calculated without all five numbers in the summary. The mean is a measure of central tendency that takes into account all values in a data set.

3. How does the mean differ from the median in a 5 number summary?

The mean is the average of all five numbers in the summary, while the median is the middle value in the set. The median is less affected by extreme values, making it a better measure of central tendency for skewed data.

4. What does the mean represent in a 5 number summary?

The mean represents the sum of all values in the data set divided by the total number of values. It is a measure of central tendency that gives an idea of the "typical" value in the set.

5. Can the mean be used to compare data sets with different 5 number summaries?

Yes, the mean can be used to compare data sets with different 5 number summaries. However, it is important to also consider the range and distribution of the data in addition to the mean when making comparisons.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Precalculus Mathematics Homework Help
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
714
Replies
1
Views
813
  • Precalculus Mathematics Homework Help
Replies
5
Views
3K
  • Quantum Physics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
345
  • Calculus and Beyond Homework Help
Replies
13
Views
3K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • General Math
Replies
11
Views
2K
Back
Top