MHB Visual illustration of Pearson correlation coefficient r

Click For Summary
The discussion focuses on the calculation of the Pearson correlation coefficient using a sample dataset with five data points. A visual illustration was created to depict the relationship between the variables x and y, including color-coded mean lines for clarity. The covariance formula and the standard deviation calculations for both x and y were correctly applied in the context of the diagram. It was confirmed that the interpretation of the correlation coefficient and its formula were accurate. The formula for r can be simplified, emphasizing the relationship between the covariance and the standard deviations of the datasets.
dhiraj
Messages
3
Reaction score
0
From what I have understood about Pearson correlation coefficient I have created a visual illustration, I would like to know if this understanding looks correct.

Say I have a sample with 5 data points:-

x y
8 6
16 8
20 16
28 12
32 20

My goal is to calculate Pearson correlation coefficient between x and y.

So this is how the diagram I created looks like:-

View attachment 6472

I have done appropriate color coding.

So in this case the covariance between x and y is:-

[math]cov(x,y) = \frac {\sum d_x d_y}{n-1} [/math]

[math]d_x[/math] and [math]d_y[/math] are the deviations (not standard deviation) from [math]\bar{x}[/math] and [math]\bar{y}[/math] respectively, these mean lines are shown in the diagram (red line for [math]\bar{x}[/math] and the green line for [math]\bar{y}[/math]).

Pearson correlation coefficient [math] r = \frac{cov(x,y)}{S_x S_y} [/math]

Based on the diagram, standard deviations of x and y are:-

[math]S_x = \sqrt{ \frac{\sum d_x^2}{n-1} }[/math]

[math]S_y = \sqrt{ \frac{\sum d_y^2}{n-1} }[/math]

So replacing these in the formula for the correlation coefficient we get:-
[math] r = \frac {\sum d_x d_y} { (n-1) \sqrt{ \frac{\sum d_x^2}{n-1} } \sqrt{ \frac{\sum d_y^2}{n-1} } } [/math]Is this interpretation correct with respect to the diagram I have shown? I know the signs of [math]d_x[/math] and [math]d_y[/math] will depend on which side of [math]\bar{x}[/math] and [math]\bar{y}[/math] , [math]x[/math] and [math]y[/math] appear.
 

Attachments

  • Correlation.png
    Correlation.png
    5.4 KB · Views: 113
Mathematics news on Phys.org
Hi dhiraj!

It's all correct.
And note that the formula for $r$ can be simplified to:
$$ r = \frac {\sum d_x d_y} {\sqrt{ \sum d_x^2 } \sqrt{ \sum d_y^2 }}$$
 
I have been insisting to my statistics students that for probabilities, the rule is the number of significant figures is the number of digits past the leading zeros or leading nines. For example to give 4 significant figures for a probability: 0.000001234 and 0.99999991234 are the correct number of decimal places. That way the complementary probability can also be given to the same significant figures ( 0.999998766 and 0.00000008766 respectively). More generally if you have a value that...

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 7 ·
Replies
7
Views
1K
Replies
17
Views
4K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
2
Views
2K
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K