Mean with standard deviation or Median with IQR?

Click For Summary

Homework Help Overview

The discussion revolves around determining the most accurate representation of a dataset concerning phone call times for a call center agent. The original poster has calculated both the median with interquartile range (IQR) and the mean with standard deviation, noting significant differences between the two measures. The context involves understanding the implications of skewed data distributions on the choice between these statistical measures.

Discussion Character

  • Exploratory, Assumption checking, Conceptual clarification

Approaches and Questions Raised

  • The original poster attempts to justify whether the mean or median is a better representation of the data, considering the skewness and presence of outliers. Participants question the relevance of the measures of spread (standard deviation and IQR) in this context and discuss the implications of data distribution on the choice of average.

Discussion Status

Participants are exploring the appropriateness of using mean versus median in the context of the dataset. Some guidance has been offered regarding the characteristics of the measures, but there is no explicit consensus on which representation is superior. The discussion remains open with various interpretations being considered.

Contextual Notes

The original poster notes that the assignment requires calculating both measures and justifying the choice of the better representation without making further conclusions. There is an acknowledgment of outliers affecting the mean and a question about the impact on standard deviation.

3vo
Messages
6
Reaction score
0
Hi guys,

I hope someone is able to help me with this, I'm currently stuck on a problem.

1. I was given some data (in continuous, grouped form) regarding phone call times for a call center agent and asked to represent the data using the most accurate form of average.
I initially calculated an estimate of the median and IQR using interpolation, followed by estimation of the mean with standard deviation using midpoints for each group.

The answers however were very different.
Median 3.44 with IQR 9.5
Mean 8.6 with standard deviation of 14.5


I now need to justify which is the better representation of the data, mean or median?
My ogive graph seems to indicate that the distribution for data is wide and uneven and based on this I was always under the impression that if the distribution is skewed it is better to use median as it is not affected by outliers. However I was informed by someone that the mean was the more accurate representation in this case. Any ideas why this may be?

Can anyone explain to me what to do with standard deviation value or IQR. I understand they are both measures of spread, but what do they mean to the data? All my textbooks seem to keep reiterating that they are measures of spread without explaining what to do with them regarding accuracy


2. Below is a copy of the data table I've complied
t group mid(x) f c.f fx fx2
0 ≤ t < 2 1 80 80 80 80
2 ≤ t < 4 3 53 133 159 477
4 ≤ t < 6 5 19 152 95 475
6 ≤ t < 10 8 22 174 176 1408
10 ≤ t < 20 15 31 205 465 6975
20 ≤ t < 30 25 16 221 400 10000
30 ≤ t < 60 45 15 236 675 30375
Total - 236 236 2050 49790


3. I believe that due to the uneven distribution that the median may be the better representation for this data, however I have also noticed that the spread is very wide and understand that median is more to do with the central tendency. I'm unsure if this would then disregards the high freq in the first group and may explain why median is not the best representation in the case of wide distributions rather than just uneven?
 

Attachments

  • photo 1.jpg
    photo 1.jpg
    38 KB · Views: 647
  • photo 2.jpg
    photo 2.jpg
    47.4 KB · Views: 600
Physics news on Phys.org
IMHO, it makes no sense to ask what is the best way to represent data without first understanding how the representation will be used to make decisions.
 
Hi Haruspex,

Thank you for your reply.

The main part of my assignment brief was to show I was able to calculate an estimate for both mean and median with measures of spread.

However for the final part I only need to justify which of the two averages is the better representation for this data as a whole. No further conclusions or decisions would be made or required from this data.

I'm now stuck on which of two measures best represents this data set.

From my understanding (and please correct me if I am wrong) is that 68% of the values are less then one SD from the mean value. And 95% are less then 2 SD.

Looking at my cumulative frequency graph I can see that most of the data does fall within one SD from the mean value of 8. However everything I've read either online or in my textbook also seems to suggest if the distribution is ever uneven, to always use median. I've also calculated that outliers are present after the 25.25 value and this would normally affect the mean value. Would it also have an impact on SD or is SD resistant to outliers?

I understand the median gives a better idea of central tendency than mean and is resistant to presence of outliers, would this be enough justification to use median as a better representation than mean?
 
Last edited:

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
22K
Replies
13
Views
3K
  • · Replies 4 ·
Replies
4
Views
5K
  • · Replies 24 ·
Replies
24
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 42 ·
2
Replies
42
Views
6K
  • · Replies 2 ·
Replies
2
Views
11K
Replies
4
Views
2K
Replies
8
Views
4K