When is it inappropriate to report Pearson correlation coefficients?

Click For Summary

Discussion Overview

The discussion revolves around the appropriateness of reporting Pearson correlation coefficients in the context of logarithmic relationships in scientific data. Participants explore the implications of using Pearson's R for non-linear data, particularly when the data exhibits logarithmic characteristics, and seek guidance on alternative correlation metrics and transformations.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant expresses concern about the misleading nature of Pearson correlation coefficients when applied to logarithmic data, questioning their appropriateness even if the logarithmic relationship is emphasized.
  • Another participant suggests that Pearson's R is best interpreted when there is an approximately linear relationship, recommending transformations to achieve linearity before calculating the coefficient.
  • A participant inquires about the antilog transformation and its implications for data values, expressing confusion about the potential for extremely large numbers resulting from the transformation.
  • Further clarification is provided regarding the transformation process, with examples illustrating how to handle zero values and the potential for using log-linear transformations to facilitate correlation calculations.
  • Participants discuss the possibility of using log-log transformations if both datasets are non-linear, although no consensus is reached on the best approach.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the appropriateness of reporting Pearson correlation coefficients for logarithmic data, with multiple competing views on the necessity and method of transformation before applying the correlation metric.

Contextual Notes

Participants highlight limitations regarding the interpretation of Pearson's R in non-linear contexts and the challenges posed by zero values in the data, indicating that the discussion remains open to further exploration of appropriate statistical methods.

Bacat
Messages
149
Reaction score
1
I am writing a paper publishing scientific data. My background is chemistry but I have taken a couple of stat classes. In my opinion, one of the biggest deficiencies in modern research publications is the improper use of statistics. I hope to avoid making this mistake, but I need your help.

Most of my data has logarithmic relationships (ie one set of observations (say, mass) plotted vs another set of observations (say, temperature) is best fit with a logarithmic regression line). If I compute Pearson correlation coefficients for these, I get values around 0.80. However, when I plot the data I can see clearly that it is not linear- it is logarithmic. If I fit regression lines to the data I get better R^2 values with a logarithmic fit than a linear fit. Reporting the Pearson number could be misleading because it is a measure of linearity in the data...but I am trying to show that there is a high correlation without claiming that the data is linear.

Can someone with a firm background in statistics theory answer the following questions?

1) Is it appropriate to publish Pearson correlation coefficients for logarithmic data if it is emphasized that the relationship is logarithmic?

2) Does the Pearson number have meaning in the context of logarithmic relationships?

3) Is there a better correlation metric for logarithmic data? Please note that some of the data points are zero. This makes transforming to a logarithmic basis difficult...but maybe there is a trick I'm missing?

Your help is very much appreciated.
 
Physics news on Phys.org
It depends on whether you are plotting original data or transformed data. If the plot of the original data is non linear, you would generally want to transform one or both variables to achieve approximate linearity before calculating Pearson's R.

You can get a R value for any data, but it's best interpreted when there is a approximately linear relation between X1 and X2. When these stats are reported, its always in terms of the the specific transformation(s) used.

If the there are zero values in the originally linear variable, use a transformation of the antilogs on the originally log variable and report as such.

For example, if the antilogs are 1, 10, 100,1000 use the base 10 log transform: 0, 1, 2, 3 . Higher bases are more powerfully linearizing.
This would be reported as log-linear transformed data for the Pearson R.
 
Last edited:
Thanks for the response!

I'm not familiar with the antilog transform. I think I should make the following transformation:

Date -> Transformed Data

0 -> 1
1 -> 10
2 -> 100
etc..

Is this correct?

Won't this lead to some enormous numbers later? For example:
17 -> 100,000,000,000,000,000

I think I must be doing it wrong...
 
Bacat said:
Thanks for the response!

I'm not familiar with the antilog transform. I think I should make the following transformation:

Date -> Transformed Data

0 -> 1
1 -> 10
2 -> 100
etc..

Is this correct?

Won't this lead to some enormous numbers later? For example:
17 -> 100,000,000,000,000,000

I think I must be doing it wrong...

Sorry. I wasn't sure what form your original data was in. For example, was X1 already in log form and still non-linear?

Suppose X1 is 1, 10, 100, 1000 and X2 is 0, 1, 2, 3

The best thing is to transform X1 to 0, 1, 2, 3 with a log-linear transform for X1 on X2. This eliminates the problem with 0 in the X2 data. In this example, of course, R=1.

The antilog transform would be transforming X2 to the antilogs (1,10,100,1000). The main problem is that this is not conveniently presented on a graph with a single linear scale. However R can still be calculated with the same result (R=1). Obviously, you can use powers of 10. You will get a linear graph if both axes use the same scaling.

If both data sets were non linear increasing or decreasing, a log-log transform with an appropriately chosen base might be tried.
 
Last edited:

Similar threads

  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K