- #1
Bacat
- 151
- 1
I am writing a paper publishing scientific data. My background is chemistry but I have taken a couple of stat classes. In my opinion, one of the biggest deficiencies in modern research publications is the improper use of statistics. I hope to avoid making this mistake, but I need your help.
Most of my data has logarithmic relationships (ie one set of observations (say, mass) plotted vs another set of observations (say, temperature) is best fit with a logarithmic regression line). If I compute Pearson correlation coefficients for these, I get values around 0.80. However, when I plot the data I can see clearly that it is not linear- it is logarithmic. If I fit regression lines to the data I get better R^2 values with a logarithmic fit than a linear fit. Reporting the Pearson number could be misleading because it is a measure of linearity in the data...but I am trying to show that there is a high correlation without claiming that the data is linear.
Can someone with a firm background in statistics theory answer the following questions?
1) Is it appropriate to publish Pearson correlation coefficients for logarithmic data if it is emphasized that the relationship is logarithmic?
2) Does the Pearson number have meaning in the context of logarithmic relationships?
3) Is there a better correlation metric for logarithmic data? Please note that some of the data points are zero. This makes transforming to a logarithmic basis difficult...but maybe there is a trick I'm missing?
Your help is very much appreciated.
Most of my data has logarithmic relationships (ie one set of observations (say, mass) plotted vs another set of observations (say, temperature) is best fit with a logarithmic regression line). If I compute Pearson correlation coefficients for these, I get values around 0.80. However, when I plot the data I can see clearly that it is not linear- it is logarithmic. If I fit regression lines to the data I get better R^2 values with a logarithmic fit than a linear fit. Reporting the Pearson number could be misleading because it is a measure of linearity in the data...but I am trying to show that there is a high correlation without claiming that the data is linear.
Can someone with a firm background in statistics theory answer the following questions?
1) Is it appropriate to publish Pearson correlation coefficients for logarithmic data if it is emphasized that the relationship is logarithmic?
2) Does the Pearson number have meaning in the context of logarithmic relationships?
3) Is there a better correlation metric for logarithmic data? Please note that some of the data points are zero. This makes transforming to a logarithmic basis difficult...but maybe there is a trick I'm missing?
Your help is very much appreciated.