dhiraj
- 3
- 0
From what I have understood about Pearson correlation coefficient I have created a visual illustration, I would like to know if this understanding looks correct.
Say I have a sample with 5 data points:-
x y
8 6
16 8
20 16
28 12
32 20
My goal is to calculate Pearson correlation coefficient between x and y.
So this is how the diagram I created looks like:-
View attachment 6472
I have done appropriate color coding.
So in this case the covariance between x and y is:-
[math]cov(x,y) = \frac {\sum d_x d_y}{n-1} [/math]
[math]d_x[/math] and [math]d_y[/math] are the deviations (not standard deviation) from [math]\bar{x}[/math] and [math]\bar{y}[/math] respectively, these mean lines are shown in the diagram (red line for [math]\bar{x}[/math] and the green line for [math]\bar{y}[/math]).
Pearson correlation coefficient [math] r = \frac{cov(x,y)}{S_x S_y} [/math]
Based on the diagram, standard deviations of x and y are:-
[math]S_x = \sqrt{ \frac{\sum d_x^2}{n-1} }[/math]
[math]S_y = \sqrt{ \frac{\sum d_y^2}{n-1} }[/math]
So replacing these in the formula for the correlation coefficient we get:-
[math] r = \frac {\sum d_x d_y} { (n-1) \sqrt{ \frac{\sum d_x^2}{n-1} } \sqrt{ \frac{\sum d_y^2}{n-1} } } [/math]Is this interpretation correct with respect to the diagram I have shown? I know the signs of [math]d_x[/math] and [math]d_y[/math] will depend on which side of [math]\bar{x}[/math] and [math]\bar{y}[/math] , [math]x[/math] and [math]y[/math] appear.
Say I have a sample with 5 data points:-
x y
8 6
16 8
20 16
28 12
32 20
My goal is to calculate Pearson correlation coefficient between x and y.
So this is how the diagram I created looks like:-
View attachment 6472
I have done appropriate color coding.
So in this case the covariance between x and y is:-
[math]cov(x,y) = \frac {\sum d_x d_y}{n-1} [/math]
[math]d_x[/math] and [math]d_y[/math] are the deviations (not standard deviation) from [math]\bar{x}[/math] and [math]\bar{y}[/math] respectively, these mean lines are shown in the diagram (red line for [math]\bar{x}[/math] and the green line for [math]\bar{y}[/math]).
Pearson correlation coefficient [math] r = \frac{cov(x,y)}{S_x S_y} [/math]
Based on the diagram, standard deviations of x and y are:-
[math]S_x = \sqrt{ \frac{\sum d_x^2}{n-1} }[/math]
[math]S_y = \sqrt{ \frac{\sum d_y^2}{n-1} }[/math]
So replacing these in the formula for the correlation coefficient we get:-
[math] r = \frac {\sum d_x d_y} { (n-1) \sqrt{ \frac{\sum d_x^2}{n-1} } \sqrt{ \frac{\sum d_y^2}{n-1} } } [/math]Is this interpretation correct with respect to the diagram I have shown? I know the signs of [math]d_x[/math] and [math]d_y[/math] will depend on which side of [math]\bar{x}[/math] and [math]\bar{y}[/math] , [math]x[/math] and [math]y[/math] appear.