MHB Visual illustration of Pearson correlation coefficient r

Click For Summary
SUMMARY

The discussion focuses on the calculation and visual representation of the Pearson correlation coefficient (r) using a sample of five data points. The user correctly identifies the formulas for covariance and standard deviations, and demonstrates an understanding of how to compute r using the deviations from the mean. The simplified formula for r is confirmed as r = (Σd_x d_y) / (√Σd_x² * √Σd_y²), validating the user's interpretation of the diagram created for this purpose.

PREREQUISITES
  • Understanding of Pearson correlation coefficient
  • Knowledge of covariance and standard deviation calculations
  • Familiarity with statistical notation and formulas
  • Ability to interpret graphical data representations
NEXT STEPS
  • Explore advanced statistical concepts such as Spearman's rank correlation coefficient
  • Learn how to visualize data distributions using scatter plots
  • Study the implications of correlation versus causation in data analysis
  • Investigate the use of Python libraries like NumPy and Pandas for statistical calculations
USEFUL FOR

Statisticians, data analysts, and students in quantitative fields who are looking to deepen their understanding of correlation analysis and its graphical representations.

dhiraj
Messages
3
Reaction score
0
From what I have understood about Pearson correlation coefficient I have created a visual illustration, I would like to know if this understanding looks correct.

Say I have a sample with 5 data points:-

x y
8 6
16 8
20 16
28 12
32 20

My goal is to calculate Pearson correlation coefficient between x and y.

So this is how the diagram I created looks like:-

View attachment 6472

I have done appropriate color coding.

So in this case the covariance between x and y is:-

[math]cov(x,y) = \frac {\sum d_x d_y}{n-1} [/math]

[math]d_x[/math] and [math]d_y[/math] are the deviations (not standard deviation) from [math]\bar{x}[/math] and [math]\bar{y}[/math] respectively, these mean lines are shown in the diagram (red line for [math]\bar{x}[/math] and the green line for [math]\bar{y}[/math]).

Pearson correlation coefficient [math] r = \frac{cov(x,y)}{S_x S_y} [/math]

Based on the diagram, standard deviations of x and y are:-

[math]S_x = \sqrt{ \frac{\sum d_x^2}{n-1} }[/math]

[math]S_y = \sqrt{ \frac{\sum d_y^2}{n-1} }[/math]

So replacing these in the formula for the correlation coefficient we get:-
[math] r = \frac {\sum d_x d_y} { (n-1) \sqrt{ \frac{\sum d_x^2}{n-1} } \sqrt{ \frac{\sum d_y^2}{n-1} } } [/math]Is this interpretation correct with respect to the diagram I have shown? I know the signs of [math]d_x[/math] and [math]d_y[/math] will depend on which side of [math]\bar{x}[/math] and [math]\bar{y}[/math] , [math]x[/math] and [math]y[/math] appear.
 

Attachments

  • Correlation.png
    Correlation.png
    5.4 KB · Views: 124
Mathematics news on Phys.org
Hi dhiraj!

It's all correct.
And note that the formula for $r$ can be simplified to:
$$ r = \frac {\sum d_x d_y} {\sqrt{ \sum d_x^2 } \sqrt{ \sum d_y^2 }}$$
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 17 ·
Replies
17
Views
5K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
2
Views
3K
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K