When is it inappropriate to report Pearson correlation coefficients?

Bacat · Aug 29, 2010

I am writing a paper publishing scientific data. My background is chemistry but I have taken a couple of stat classes. In my opinion, one of the biggest deficiencies in modern research publications is the improper use of statistics. I hope to avoid making this mistake, but I need your help.

Most of my data has logarithmic relationships (ie one set of observations (say, mass) plotted vs another set of observations (say, temperature) is best fit with a logarithmic regression line). If I compute Pearson correlation coefficients for these, I get values around 0.80. However, when I plot the data I can see clearly that it is not linear- it is logarithmic. If I fit regression lines to the data I get better R^2 values with a logarithmic fit than a linear fit. Reporting the Pearson number could be misleading because it is a measure of linearity in the data...but I am trying to show that there is a high correlation without claiming that the data is linear.

Can someone with a firm background in statistics theory answer the following questions?

1) Is it appropriate to publish Pearson correlation coefficients for logarithmic data if it is emphasized that the relationship is logarithmic?

2) Does the Pearson number have meaning in the context of logarithmic relationships?

3) Is there a better correlation metric for logarithmic data? Please note that some of the data points are zero. This makes transforming to a logarithmic basis difficult...but maybe there is a trick I'm missing?

Your help is very much appreciated.

SW VandeCarr · Aug 30, 2010

It depends on whether you are plotting original data or transformed data. If the plot of the original data is non linear, you would generally want to transform one or both variables to achieve approximate linearity before calculating Pearson's R.

You can get a R value for any data, but it's best interpreted when there is a approximately linear relation between X1 and X2. When these stats are reported, its always in terms of the the specific transformation(s) used.

If the there are zero values in the originally linear variable, use a transformation of the antilogs on the originally log variable and report as such.

For example, if the antilogs are 1, 10, 100,1000 use the base 10 log transform: 0, 1, 2, 3 . Higher bases are more powerfully linearizing.
This would be reported as log-linear transformed data for the Pearson R.

Bacat · Aug 30, 2010

Thanks for the response!

I'm not familiar with the antilog transform. I think I should make the following transformation:

Date -> Transformed Data

0 -> 1
1 -> 10
2 -> 100
etc..

Is this correct?

Won't this lead to some enormous numbers later? For example:
17 -> 100,000,000,000,000,000

I think I must be doing it wrong...

SW VandeCarr · Aug 30, 2010

Bacat said:

Thanks for the response!

I'm not familiar with the antilog transform. I think I should make the following transformation:

Date -> Transformed Data

0 -> 1
1 -> 10
2 -> 100
etc..

Is this correct?

Won't this lead to some enormous numbers later? For example:
17 -> 100,000,000,000,000,000

I think I must be doing it wrong...

Sorry. I wasn't sure what form your original data was in. For example, was X1 already in log form and still non-linear?

Suppose X1 is 1, 10, 100, 1000 and X2 is 0, 1, 2, 3

The best thing is to transform X1 to 0, 1, 2, 3 with a log-linear transform for X1 on X2. This eliminates the problem with 0 in the X2 data. In this example, of course, R=1.

The antilog transform would be transforming X2 to the antilogs (1,10,100,1000). The main problem is that this is not conveniently presented on a graph with a single linear scale. However R can still be calculated with the same result (R=1). Obviously, you can use powers of 10. You will get a linear graph if both axes use the same scaling.

If both data sets were non linear increasing or decreasing, a log-log transform with an appropriately chosen base might be tried.

blue_raver22 · Sep 6, 2010

As a scientist, it is important to use appropriate statistical methods in order to accurately represent and interpret data. In the case of logarithmic data, it may be inappropriate to report Pearson correlation coefficients, as this measure is specifically designed for linear relationships.

1) It is not appropriate to publish Pearson correlation coefficients for logarithmic data if the goal is to show a logarithmic relationship. This can be misleading to readers, as the correlation coefficient does not accurately reflect the strength of the logarithmic relationship.

2) The Pearson correlation coefficient does not have a meaningful interpretation in the context of logarithmic relationships. This measure is designed for linear relationships and may not accurately reflect the strength of a logarithmic relationship.

3) There are alternative correlation metrics that are better suited for logarithmic data, such as the Spearman correlation coefficient. This measure does not require linear relationships and can accurately reflect the strength of a logarithmic relationship. However, it is important to note that the presence of zeros in the data may still affect the results and further statistical analysis may be needed to address this issue.

In conclusion, it is important to carefully consider the appropriate statistical methods to use when analyzing and reporting data. In the case of logarithmic relationships, it is best to use correlation metrics that are specifically designed for this type of data. Consultation with a statistician or further research on appropriate methods may be beneficial in accurately representing the data in your paper.

When is it inappropriate to report Pearson correlation coefficients?

1. When is it considered inappropriate to report Pearson correlation coefficients?

2. How can I determine if my data meets the assumptions for reporting Pearson correlation coefficients?

3. Can I still report Pearson correlation coefficients if my data does not meet the assumptions?

4. Are there any other situations where reporting Pearson correlation coefficients is considered inappropriate?

5. What should I do if I have already reported Pearson correlation coefficients but later discover that my data does not meet the assumptions?

Similar threads

Hot Threads

Recent Insights