I Help me understand skewness in QQ-plots please

  • I
  • Thread starter Thread starter bremenfallturm
  • Start date Start date
AI Thread Summary
Understanding skewness in QQ plots involves recognizing how the plotted points relate to the expected normal distribution. In a left-skewed distribution, points below the reference line indicate that the observed values are smaller than expected, which can be counterintuitive. The confusion arises when interpreting the scaling of the axes, as larger data points can distort the visual representation. It's essential to remember that the QQ plot reflects the distribution's characteristics accurately, despite initial perceptions. Clarifying these concepts can enhance the interpretation of skewness in QQ plots.
bremenfallturm
Messages
81
Reaction score
13
TL;DR Summary
I am trying to understand how QQ plots work, but I have a hard time understanding how to interpret skewness. Specifically, it is "the other way around" than I expect. See the post for an explanation.
I am trying to understand how QQ plots work, but I have a hard time understanding how to interpret skewness. Specifically, it is "the other way around" than I expect.

Let me explain.

From what I understand, in a QQ plot, we divide the normal distribution (typically ##N(0,1)##) and the dataset into ##n## quantiles (where ##n## is the number of datapoints). We sort the dataset and plot each datapoint against the normal distribution. For example, if we have 10 ordered datapoints $$a_1, a_2, ...$$, and have created 10 normal quantiles $$n_1,n_2,...$$ we would plot $$(a_1, n_1), (a_2, n_2)$$ and so on.

Now, here is when I don't understand how we interpret the skewness.
Consider the left skewed case for example (https://anasrana.github.io/ems-practicals/qq-plot.html)
1746684655365.webp

If I look at the plot, my first intuition is this: it looks to me that all points below the line (the points between -4 and around -1 of the normal distribution's quantiles) are smaller than expected. This is because they are below the line. Therefore, the points would be drawn from a distribution where smaller values are more probable. Of course, looking at the actual distribution, we can see that it is the other way around.
My second idea is then this: if we have many large datapoints (i.e. in the image above), the graph axes are going to be scaled such that the smaller values fall below the line, and thus, we have a distribution with a tendency towards large datapoints. Does any of this make sense? Could you help me deepen my understanding?
 
Physics news on Phys.org
bremenfallturm said:
If I look at the plot, my first intuition is this: it looks to me that all points below the line (the points between -4 and around -1 of the normal distribution's quantiles) are smaller than expected. This is because they are below the line. Therefore, the points would be drawn from a distribution where smaller values are more probable.
True.
bremenfallturm said:
Of course, looking at the actual distribution, we can see that it is the other way around.
No it isn't. It's exactly as expected.
bremenfallturm said:
My second idea is then this: if we have many large datapoints (i.e. in the image above), the graph axes are going to be scaled such that the smaller values fall below the line, and thus, we have a distribution with a tendency towards large datapoints. Does any of this make sense?
No, none of this makes sense.
 
Back
Top