How accurate is the correlation coefficient for log-log data?

  • Thread starter Old Guy
  • Start date
  • Tags
    Fit
In summary, the conversation involves using linear regression and the correlation coefficient on a data set that is linear on a log-log plot. The slope was calculated by taking logs of the x's and y's, and the resulting slope appears to be correct. The question is how good of a fit it is, and there was a discussion about whether the correlation coefficient calculation is valid due to the transformations. It was concluded that the R^2 statistic should be appropriate for the transformed data.
  • #1
Old Guy
103
1
I am familiar with linear regression and the correlation coefficient. My current problem involves a data set that is pretty linear on a log-log plot. I have calculated the slope by taking logs of all my x's and y's, and doing the linear regression on the transformed data set. The resulting slope appears to be correct, and I'm happy with that.

The question is, how good a fit do I have on the data? I planned to simply calculate the correlation coefficient on the transformed data, but a coworker challenged this - said that the transformations alter the measurements in a way that makes the typical correlation coefficient calculation invalid.

Is that correct? And if it is, what is the correct measure of goodness of fit for log-log data? Thanks.
 
Physics news on Phys.org
  • #2
Old Guy said:
I am familiar with linear regression and the correlation coefficient. My current problem involves a data set that is pretty linear on a log-log plot. I have calculated the slope by taking logs of all my x's and y's, and doing the linear regression on the transformed data set. The resulting slope appears to be correct, and I'm happy with that.

The question is, how good a fit do I have on the data? I planned to simply calculate the correlation coefficient on the transformed data, but a coworker challenged this - said that the transformations alter the measurements in a way that makes the typical correlation coefficient calculation invalid.

Is that correct? And if it is, what is the correct measure of goodness of fit for log-log data? Thanks.

If it's linear with a log-log transform, then the correlation is based on a log-log data set with transformed expectations. As along as it's clear what the data is, I don't see a problem.

EDIT: I forgot to answer your second question. If you have a linear plot in a standard regression on transformed data, the [tex]R^2[/tex] statistic should be appropriate for the transformed data (but only for the transformed data.)
 
Last edited:

Related to How accurate is the correlation coefficient for log-log data?

1. What is the "Goodness of Fit" question?

The "Goodness of Fit" question is a statistical concept that measures how well a given model or distribution fits a set of data. It is often used to evaluate the validity of a hypothesis or to compare multiple models to determine which one best fits the observed data.

2. How is the "Goodness of Fit" question calculated?

The "Goodness of Fit" question is typically calculated using a statistical test, such as the chi-square test or the Kolmogorov-Smirnov test. These tests compare the observed data to the expected values under the given model and provide a measure of the level of agreement between the two.

3. What does a high "Goodness of Fit" value indicate?

A high "Goodness of Fit" value indicates that the observed data closely matches the expected values under the model being tested. This suggests that the model is a good fit for the data and provides evidence in support of the hypothesis being tested.

4. What does a low "Goodness of Fit" value indicate?

A low "Goodness of Fit" value indicates that the observed data does not match the expected values under the model being tested. This suggests that the model is not a good fit for the data and raises doubts about the validity of the hypothesis being tested.

5. How is the "Goodness of Fit" question used in scientific research?

The "Goodness of Fit" question is an important tool in scientific research, particularly in fields such as biology, psychology, and social sciences. It allows researchers to evaluate the fit of their models to the observed data and make informed decisions about the validity of their hypotheses. It is also commonly used in data analysis and statistical modeling to compare different models and determine which one best fits the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
948
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
847
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
975
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
568
Back
Top