Finding Quartiles for Ungrouped Data: Correct Method and Precision

  • Thread starter Thread starter stpmmaths
  • Start date Start date
  • Tags Tags
    Data
stpmmaths
Messages
28
Reaction score
0
From the question, is the way to find Lower Quartiles and Upper Quartiles correct? I have seen books taking the 3rd and 8th (from the question) as Lower Quartiles and Upper Quartiles respectively. Which should be the correct Quartiles?
 

Attachments

  • DSC00198.jpg
    DSC00198.jpg
    34.1 KB · Views: 5,955
Physics news on Phys.org
stpmmaths said:
From the question, is the way to find Lower Quartiles and Upper Quartiles correct? I have seen books taking the 3rd and 8th (from the question) as Lower Quartiles and Upper Quartiles respectively. Which should be the correct Quartiles?

Are you asking how to calculate quartiles or how to interpret them? First, there's no such thing as an eighth quartile. By definition a quartile partitions the data into four ordered sets of data. The generic term is quantile which can be any number of equal divisions of data points. The 1st quartile contains the upper 25% of the data points. the last quartile the lowest 25% of data. All data values must be ranked putting them into correspondence with the integers 1 through k, 1<k. If the total number of ranked data points is n and k is a chosen data point k \leq n then:

P[X &lt; x] \leq k/n; P[X\geq x] \geq 1 - (k/n)

So by the first inequality if x is ranked 5th highest point out of 100 data points, then k=95 and P=0.95 which is the 95th percentile. It seems you want the upper quartile (top 25%), and lower quartile (bottom 25%) . The meaning of the term 75th percentile is that 75% of all data points are less than the lowest data point of the upper quartile.
 
Last edited:
stpmmaths said:
Based on the attachment https://www.physicsforums.com/attachment.php?attachmentid=44365&d=1330184818, is this the correct way to interpret quartile?

I'm having a hard time reading it, but to establish quartiles it's the number of data points and their quantitative rank that matter, not their actual values. So if n=15, the median value is k/n= 0.5. Solve for k to get 7.5. For the quartile: k/n= 0.25. k= 3.75. So the lower boundary of the upper quartile would be 15-3.75=11.25. This would include your top four ranked values which would be your last four data points in counting order: the 12th, 13th, 14th and 15th data points.

If you type out what you're doing, I can tell you more, You seem to be doing it correctly. For an even number of values, some people use k+1, as you have, so quantile boundaries do not fall on data points. The value of your median is then 5.5 and the quartile boundaries would be calculated using 2.75. So 5.5 - 2.75 = 2.75. Your answer could be this or 2,25. I'm not sure which.
 
Last edited:
There are 10 data values in my attached example.

{51, 55, 57, 61, 62, 67, 70, 72, 73, 74}

Q1 = 56.5
Q3 = 72.25

But
Even-sized population

Consider an ordered population of 10 data values {3, 6, 7, 8, 8, 10, 13, 15, 16, 20}.

The rank of the first quartile is 10×(1/4) = 2.5, which rounds up to 3, meaning that 3 is the rank in the population (from least to greatest values) at which approximately 1/4 of the values are less than the value of the first quartile. The third value in the population is 7.
The rank of the second quartile (same as the median) is 10×(2/4) = 5, which is an integer, while the number of values (10) is an even number, so the average of both the fifth and sixth values is taken—that is (8+10)/2 = 9, though any value from 8 through to 10 could be taken to be the median.
The rank of the third quartile is 10×(3/4) = 7.5, which rounds up to 8. The eighth value in the population is 15.

from http://en.wikipedia.org/wiki/Quantile

Q1 = 57
Q3 = 72 instead
 
stpmmaths said:
There are 10 data values in my attached example.

{51, 55, 57, 61, 62, 67, 70, 72, 73, 74}

Q1 = 56.5
Q3 = 72.25

ButQ1 = 57
Q3 = 72 instead

As far as I know, with sparse data like this, you can't be very precise in the placing quantile boundaries in terms of extrapolations of the actual data values. All you can say is the median falls between 62 and 67. The quartile boundaries fall on 57 and 72. If you use k+1 and center the rank distribution on the median, using 2.75 ranks as the quartile width, than 57 will fall into the second quartile while 72 will fall into the third quartile when strictly observing the boundaries 2.75 and 8.25. With n=10+1, you can't be more precise than that IMO. Note I'm using Q4 for the quartile with the highest values and Q1 as the one with the lowest values as you did.
 
Last edited:
SW VandeCarr said:
As far as I know, with sparse data like this, you can't be very precise in the placing quantile boundaries in terms of extrapolations of the actual data values. All you can say is the median falls between 62 and 67. The quartile boundaries fall on 57 and 72. If you use k+1 and center the rank distribution on the median, using 2.75 ranks as the quartile width, than 57 will fall into the second quartile while 72 will fall into the third quartile when strictly observing the boundaries 2.75 and 8.25. With n=10+1, you can't be more precise than that IMO. Note I'm using Q4 for the quartile with the highest values and Q1 as the one with the lowest values as you did.

I made a mistake with the "k+1" adjustment for even n. It should be n+1 of course.
 
Back
Top