Visual illustration of Pearson correlation coefficient r

In summary, The Pearson correlation coefficient is a measure of the linear relationship between two variables. It is calculated using the formula r = cov(x,y) / (S_x * S_y), where cov(x,y) is the covariance between x and y, and S_x and S_y are the standard deviations of x and y respectively. A visual illustration can be created to help understand the concept, with appropriate color coding and mean lines. The formula can also be simplified to r = sum(d_x d_y) / (sqrt(sum(d_x^2)) * sqrt(sum(d_y^2))). The signs of d_x and d_y depend on which side of the mean lines x and y appear on. Overall, the conversation confirms the understanding of
  • #1
dhiraj
4
0
From what I have understood about Pearson correlation coefficient I have created a visual illustration, I would like to know if this understanding looks correct.

Say I have a sample with 5 data points:-

x y
8 6
16 8
20 16
28 12
32 20

My goal is to calculate Pearson correlation coefficient between x and y.

So this is how the diagram I created looks like:-

View attachment 6472

I have done appropriate color coding.

So in this case the covariance between x and y is:-

\(\displaystyle cov(x,y) = \frac {\sum d_x d_y}{n-1} \)

\(\displaystyle d_x\) and \(\displaystyle d_y\) are the deviations (not standard deviation) from \(\displaystyle \bar{x}\) and \(\displaystyle \bar{y}\) respectively, these mean lines are shown in the diagram (red line for \(\displaystyle \bar{x}\) and the green line for \(\displaystyle \bar{y}\)).

Pearson correlation coefficient \(\displaystyle r = \frac{cov(x,y)}{S_x S_y} \)

Based on the diagram, standard deviations of x and y are:-

\(\displaystyle S_x = \sqrt{ \frac{\sum d_x^2}{n-1} }\)

\(\displaystyle S_y = \sqrt{ \frac{\sum d_y^2}{n-1} }\)

So replacing these in the formula for the correlation coefficient we get:-
\(\displaystyle r = \frac {\sum d_x d_y} { (n-1) \sqrt{ \frac{\sum d_x^2}{n-1} } \sqrt{ \frac{\sum d_y^2}{n-1} } } \)Is this interpretation correct with respect to the diagram I have shown? I know the signs of \(\displaystyle d_x\) and \(\displaystyle d_y\) will depend on which side of \(\displaystyle \bar{x}\) and \(\displaystyle \bar{y}\) , \(\displaystyle x\) and \(\displaystyle y\) appear.
 

Attachments

  • Correlation.png
    Correlation.png
    5.4 KB · Views: 54
Mathematics news on Phys.org
  • #2
Hi dhiraj!

It's all correct.
And note that the formula for $r$ can be simplified to:
$$ r = \frac {\sum d_x d_y} {\sqrt{ \sum d_x^2 } \sqrt{ \sum d_y^2 }}$$
 

1. What is a visual illustration of Pearson correlation coefficient r?

A visual illustration of Pearson correlation coefficient r is a graphical representation of the strength and direction of the relationship between two quantitative variables. It is often depicted as a scatter plot, with the data points forming a pattern that indicates the strength and direction of the correlation.

2. How is Pearson correlation coefficient r calculated?

Pearson correlation coefficient r is calculated by dividing the covariance of the two variables by the product of their standard deviations. The resulting value ranges from -1 to 1, with a value of 1 indicating a perfect positive correlation, 0 indicating no correlation, and -1 indicating a perfect negative correlation.

3. What does a strong or weak correlation look like in a visual illustration of Pearson correlation coefficient r?

A strong correlation in a visual illustration of Pearson correlation coefficient r is represented by a tightly clustered pattern of data points that follow a clear linear trend. A weak correlation, on the other hand, is represented by a scattered pattern of data points with no clear trend.

4. How can a visual illustration of Pearson correlation coefficient r be used in scientific research?

A visual illustration of Pearson correlation coefficient r can be used to visualize the relationship between two variables and determine the strength and direction of the correlation. This information can then be used to make predictions, identify trends, and inform further research.

5. Are there any limitations to using a visual illustration of Pearson correlation coefficient r?

Yes, there are some limitations to using a visual illustration of Pearson correlation coefficient r. It only measures linear relationships between two variables and does not take into account any potential non-linear relationships. Additionally, correlation does not imply causation, so it is important to interpret the results carefully and not make assumptions about causality based on correlation alone.

Similar threads

Replies
6
Views
1K
  • Calculus and Beyond Homework Help
Replies
7
Views
939
Replies
17
Views
2K
Replies
3
Views
726
Replies
17
Views
3K
  • General Math
Replies
22
Views
3K
Replies
1
Views
623
Replies
1
Views
2K
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
618
Back
Top