MHB Which is better? Mean with standard deviation or Median with IQR?

AI Thread Summary
The discussion centers on the choice between using the mean with standard deviation or the median with interquartile range (IQR) for data representation. The mean is sensitive to outliers, making the median a preferred option when data is skewed or uneven, as indicated by the user's ogive graph. However, some participants argue that the mean can still be useful depending on the context and distribution of the data. The standard deviation and IQR serve as measures of spread, with the former considering all data points and the latter focusing on the central portion, thus providing insights into potential outliers. Ultimately, the choice depends on the data's characteristics and the presence of outliers.
3vo
Messages
6
Reaction score
0
Hi guys,

I hope someone is able to help, I'm currently stuck on a problem.

I'm having trouble justifying which representation is more accurate for my data; either mean with standard deviation or median with IQR.
I've calculated both averages for the my data, however I was advised by someone that the mean with standard deviation was a better representation. I don't understand how this could be the case as my ogive graph seems to indicate that the distribution is wide and uneven. I was under the impression if distribution is ever uneven, to always use the median over mean as median is not affected by outliers.

Why in this case would the mean be better than the median? Or is median better representation, if so why?
My mean value was 8.6 with a standard deviation of 14.5 whilst the median was 3.44 with IQR of 9.5.

TBH I don't know what to do with the standard deviation and IQR values. I understand they are measures of spread, but what do they mean to the data? Would appreciate if anyone could also explain what I do with them so I can justify which would be the better representation.

I've included some links to a copy of my data table and ogive graph if that helps.

Any advice given would be appreciated. Thanks in advance.

P.s. Please correct me if I've done my ogive graph wrong, first real attempt at it.

Data table:
http://www.talkstats.com/attachment.php?attachmentid=3930&d=1385031179

Ogive:
http://www.talkstats.com/attachment.php?attachmentid=3931&d=1385031208

Calculation of standard deviation, incase I've gone wrong:
http://www.talkstats.com/attachment.php?attachmentid=3932&d=1385031219
 
Mathematics news on Phys.org
3vo said:
Hi guys,

I hope someone is able to help, I'm currently stuck on a problem.

I'm having trouble justifying which representation is more accurate for my data; either mean with standard deviation or median with IQR.
I've calculated both averages for the my data, however I was advised by someone that the mean with standard deviation was a better representation. I don't understand how this could be the case as my ogive graph seems to indicate that the distribution is wide and uneven. I was under the impression if distribution is ever uneven, to always use the median over mean as median is not affected by outliers.

Why in this case would the mean be better than the median? Or is median better representation, if so why?
My mean value was 8.6 with a standard deviation of 14.5 whilst the median was 3.44 with IQR of 9.5.

TBH I don't know what to do with the standard deviation and IQR values. I understand they are measures of spread, but what do they mean to the data? Would appreciate if anyone could also explain what I do with them so I can justify which would be the better representation.

I've included some links to a copy of my data table and ogive graph if that helps.

Any advice given would be appreciated. Thanks in advance.

P.s. Please correct me if I've done my ogive graph wrong, first real attempt at it.

Data table:
http://www.talkstats.com/attachment.php?attachmentid=3930&d=1385031179

Ogive:
http://www.talkstats.com/attachment.php?attachmentid=3931&d=1385031208

Calculation of standard deviation, incase I've gone wrong:
http://www.talkstats.com/attachment.php?attachmentid=3932&d=1385031219

Wellcome on MHB 3vo!... before to say if is better to use mean value and variance or median value and interquartile range it is iportant to remember that the last exist for any PDF and the first not. As example we can consider the so called Cauchy distribution that in the case of symmetry around x=0 has the form...

$\displaystyle f(x) = \frac{1}{\pi} \frac{1}{1+x^{2}}\ (1)$

In You try to find mean and variance in standard fashion You obtain...

$\displaystyle \mu = \frac{1}{\pi}\ \int_{- \infty}^{+ \infty} \frac{x}{1+x^{2}}\ dx\ (2)$

$\displaystyle \sigma^{2} = \frac{1}{\pi}\ \int_{- \infty}^{+ \infty} \frac{x^{2}}{1+x^{2}}\ dx\ (3)$

... and both the integrals don't converge...

Kind regards

$\chi$ $\sigma$
 
Hi 3vo, (Wave)

Welcome to MHB!

I've been trying to make sense of the table you were given for a bit now and think I finally understand what is going on, but this isn't normally how I have seen these problems. You are essentially supposed to use the midpoint in each range as the value of $x$ for that range then you can multiply by the corresponding probability to get the mean and S.D.

You aren't given probabilities though, you are given frequencies so you we are calculating probabilities indirectly. You can think of the mean as:

$$\sum_{i=1}^{n}x_i \left( \frac{f_i}{n_i} \right)$$ or as in the problem $$\sum_{i=1}^{n}\frac{f_i x_i}{n_i}=\frac{\sum_{i=1}^{n}f_i x_i}{n}$$

Anyway, the above isn't too important. It's just interesting to see how this is presented to you. I think I got the table now. :)

You are correct that the mean is easily affected by outliers so in those cases we usually use the median instead. This is also true when the data is skewed left or right. For a normal distribution, 3 S.D. to the left and right covers about 99.7% of the data. For your data set one S.D. to the left is already outside of the possible values for $t$, so this value only has meaning for one side of the mean. It does imply skewed data though.

It depends on what information can be presented with your choice, like can we state the $0 \le t < 60$? I would probably go with the median and IQR because this shows where most of the data is contained but I don't know all the details of what your teacher expects.

Hope this helps some, even if it's not that definitive.
 
Since this has been posted in BASIC Probability and Statistics, we should just discuss what each of the statistics does, and the situations when each statistic is better to be used.

We need to remember that to calculate the mean, we need to add all the scores, then divide by the number of scores. This means that EVERY score is taken into account.
The mean could then be considered your "ideal" or "expected" score. It also gives us an idea of what happens in the "centre" of the distribution.
Obviously not every score is going to be this ideal score, and only talks about what happens in the centre, so we would like a way to measure the "spread" of the data as well.
A deviation is the difference between the observed score and the expected (mean) score. You would probably think that a measure of spread from the mean could be done just by averaging all the deviations, and it could be, but a problem is that positive and negative deviations could cancel each other out, thereby losing information. So to get around this, we square each deviation first (thereby making everything nonnegative) and then average. This gives the "variance". Then the standard deviation is found by taking the square root of the variance (kind of taking into account the fact that we had to square all the deviations first).

As for the median, we need to note that it is found by having the data put in order and then finding the middle score. This splits the data into two halves. Also note that not every score will affect the median, only its position. This means that the median is not affected by outliers, like the mean would be.
We can then go further and find the median of each of the halves of the data, giving us quartiles (quarters of the data). The difference between the upper (3rd) quartile and the lower (1st) quartile gives us the InterQuartile Range (IQR), another measure of spread. The IQR also gives us a way to determine if we have outliers in the data, as each of the first and last quarters of the data can not spread any further than 1.5 x IQR, so any values that do lie outside this spread are considered outliers.

So to answer your question, when is it better to use each pair of statistics, you really should always determine both sets, and then determine if you have outliers. If you do, use the median and IQR. If not, use the mean and SD.
 
Thanks to all those that replied!
 
Suppose ,instead of the usual x,y coordinate system with an I basis vector along the x -axis and a corresponding j basis vector along the y-axis we instead have a different pair of basis vectors ,call them e and f along their respective axes. I have seen that this is an important subject in maths My question is what physical applications does such a model apply to? I am asking here because I have devoted quite a lot of time in the past to understanding convectors and the dual...
Fermat's Last Theorem has long been one of the most famous mathematical problems, and is now one of the most famous theorems. It simply states that the equation $$ a^n+b^n=c^n $$ has no solutions with positive integers if ##n>2.## It was named after Pierre de Fermat (1607-1665). The problem itself stems from the book Arithmetica by Diophantus of Alexandria. It gained popularity because Fermat noted in his copy "Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos, et...
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles. In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/ by...
Back
Top