When is it inappropriate to report Pearson correlation coefficients?

In summary, the speaker is writing a paper on scientific data and recognizes the need to properly use statistics. They are seeking advice on whether it is appropriate to publish Pearson correlation coefficients for logarithmic data and if there is a better correlation metric. The response suggests transforming the data to achieve linearity and using a log-linear or log-log transform, depending on the data. The antilog transform may also be used, but it may not be easily presented on a graph.
  • #1
Bacat
151
1
I am writing a paper publishing scientific data. My background is chemistry but I have taken a couple of stat classes. In my opinion, one of the biggest deficiencies in modern research publications is the improper use of statistics. I hope to avoid making this mistake, but I need your help.

Most of my data has logarithmic relationships (ie one set of observations (say, mass) plotted vs another set of observations (say, temperature) is best fit with a logarithmic regression line). If I compute Pearson correlation coefficients for these, I get values around 0.80. However, when I plot the data I can see clearly that it is not linear- it is logarithmic. If I fit regression lines to the data I get better R^2 values with a logarithmic fit than a linear fit. Reporting the Pearson number could be misleading because it is a measure of linearity in the data...but I am trying to show that there is a high correlation without claiming that the data is linear.

Can someone with a firm background in statistics theory answer the following questions?

1) Is it appropriate to publish Pearson correlation coefficients for logarithmic data if it is emphasized that the relationship is logarithmic?

2) Does the Pearson number have meaning in the context of logarithmic relationships?

3) Is there a better correlation metric for logarithmic data? Please note that some of the data points are zero. This makes transforming to a logarithmic basis difficult...but maybe there is a trick I'm missing?

Your help is very much appreciated.
 
Physics news on Phys.org
  • #2
It depends on whether you are plotting original data or transformed data. If the plot of the original data is non linear, you would generally want to transform one or both variables to achieve approximate linearity before calculating Pearson's R.

You can get a R value for any data, but it's best interpreted when there is a approximately linear relation between X1 and X2. When these stats are reported, its always in terms of the the specific transformation(s) used.

If the there are zero values in the originally linear variable, use a transformation of the antilogs on the originally log variable and report as such.

For example, if the antilogs are 1, 10, 100,1000 use the base 10 log transform: 0, 1, 2, 3 . Higher bases are more powerfully linearizing.
This would be reported as log-linear transformed data for the Pearson R.
 
Last edited:
  • #3
Thanks for the response!

I'm not familiar with the antilog transform. I think I should make the following transformation:

Date -> Transformed Data

0 -> 1
1 -> 10
2 -> 100
etc..

Is this correct?

Won't this lead to some enormous numbers later? For example:
17 -> 100,000,000,000,000,000

I think I must be doing it wrong...
 
  • #4
Bacat said:
Thanks for the response!

I'm not familiar with the antilog transform. I think I should make the following transformation:

Date -> Transformed Data

0 -> 1
1 -> 10
2 -> 100
etc..

Is this correct?

Won't this lead to some enormous numbers later? For example:
17 -> 100,000,000,000,000,000

I think I must be doing it wrong...

Sorry. I wasn't sure what form your original data was in. For example, was X1 already in log form and still non-linear?

Suppose X1 is 1, 10, 100, 1000 and X2 is 0, 1, 2, 3

The best thing is to transform X1 to 0, 1, 2, 3 with a log-linear transform for X1 on X2. This eliminates the problem with 0 in the X2 data. In this example, of course, R=1.

The antilog transform would be transforming X2 to the antilogs (1,10,100,1000). The main problem is that this is not conveniently presented on a graph with a single linear scale. However R can still be calculated with the same result (R=1). Obviously, you can use powers of 10. You will get a linear graph if both axes use the same scaling.

If both data sets were non linear increasing or decreasing, a log-log transform with an appropriately chosen base might be tried.
 
Last edited:
  • #5


As a scientist, it is important to use appropriate statistical methods in order to accurately represent and interpret data. In the case of logarithmic data, it may be inappropriate to report Pearson correlation coefficients, as this measure is specifically designed for linear relationships.

1) It is not appropriate to publish Pearson correlation coefficients for logarithmic data if the goal is to show a logarithmic relationship. This can be misleading to readers, as the correlation coefficient does not accurately reflect the strength of the logarithmic relationship.

2) The Pearson correlation coefficient does not have a meaningful interpretation in the context of logarithmic relationships. This measure is designed for linear relationships and may not accurately reflect the strength of a logarithmic relationship.

3) There are alternative correlation metrics that are better suited for logarithmic data, such as the Spearman correlation coefficient. This measure does not require linear relationships and can accurately reflect the strength of a logarithmic relationship. However, it is important to note that the presence of zeros in the data may still affect the results and further statistical analysis may be needed to address this issue.

In conclusion, it is important to carefully consider the appropriate statistical methods to use when analyzing and reporting data. In the case of logarithmic relationships, it is best to use correlation metrics that are specifically designed for this type of data. Consultation with a statistician or further research on appropriate methods may be beneficial in accurately representing the data in your paper.
 

1. When is it considered inappropriate to report Pearson correlation coefficients?

Pearson correlation coefficients are inappropriate to report when the data does not meet the assumptions of the test. This includes when the data is not normally distributed, when the relationship between the variables is not linear, or when there are outliers present.

2. How can I determine if my data meets the assumptions for reporting Pearson correlation coefficients?

To determine if your data meets the assumptions for reporting Pearson correlation coefficients, you can conduct a normality test, visually inspect the scatterplot for linearity, and check for outliers using statistical methods such as box plots or z-scores.

3. Can I still report Pearson correlation coefficients if my data does not meet the assumptions?

If your data does not meet the assumptions for reporting Pearson correlation coefficients, it is not appropriate to report them. Instead, you can use non-parametric correlation tests such as Spearman's rank correlation or Kendall's tau.

4. Are there any other situations where reporting Pearson correlation coefficients is considered inappropriate?

Yes, reporting Pearson correlation coefficients is also considered inappropriate when the sample size is too small (less than 30), when there is a lack of variability in one of the variables, or when there is a third variable that is influencing the relationship between the two variables being correlated.

5. What should I do if I have already reported Pearson correlation coefficients but later discover that my data does not meet the assumptions?

If you have already reported Pearson correlation coefficients but later realize that your data does not meet the assumptions, you should acknowledge this in your report and consider conducting additional analyses using appropriate methods. It is important to be transparent about any limitations or assumptions in your data and analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
981
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
491
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
482
Back
Top