Is Correlation Coefficient an Informative Indicator in Real-World Datasets?

Click For Summary

Discussion Overview

The discussion centers on the correlation coefficient as an indicator of relationships between variables in real-world datasets. Participants explore whether the correlation coefficient can be informative, particularly in cases where variables may be unrelated or influenced by multiple factors. The scope includes theoretical considerations and practical examples, with references to datasets and statistical properties.

Discussion Character

  • Exploratory
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions the informativeness of the correlation coefficient, asking for examples of datasets where the coefficient is close to 1 despite the variables being unrelated.
  • Another suggests examining the correlation between nearly constant random variables, providing a hypothetical example involving weather and sports outcomes.
  • A participant proposes creating synthetic datasets to illustrate various correlation scenarios, including spurious correlations and cases where correlation coefficients are less than 1 or -1.
  • There is acknowledgment that while it is possible to create datasets demonstrating these properties, finding real-world examples remains challenging.
  • One participant notes the cultural reasons that may prevent the publication of datasets involving nearly constant random variables, suggesting that the absence of such datasets does not imply the absence of relevant data.
  • Stock price data is mentioned as having properties similar to those discussed in the examples provided.

Areas of Agreement / Disagreement

Participants express differing views on the correlation coefficient's informativeness, with some proposing theoretical examples while others emphasize the difficulty in finding real-world datasets that fit the criteria. The discussion remains unresolved regarding the overall utility of the correlation coefficient.

Contextual Notes

Participants highlight limitations in finding real-world datasets that exhibit the discussed properties, indicating a potential gap between theoretical examples and practical data availability.

alex.kin.
Messages
6
Reaction score
0
Hi,

Are you aware of any dataset (in R or elsewhere) consisting of a sample from two variables where the correlation coefficient is (approximately) equal to 1, but the variables refer to completely irrelevant things, i.e. one measuring something that happens on Earth and the other something on a distant planet?

or

a case where a parameter causauly affects a measure, but because other such 'causual' parameters also exist, a sample from the respective two variables has correlation coefficient far distant from 1 or -1?

My point is that the correlation coefficient is really an indicator that is informative?

Any suggestions?

thanx, alex
 
Physics news on Phys.org
Perhaps you should look at the correlation between two nearly constant "random" variables. Something like X = 1 if there are at least 100 sunny days this year and Y = 1 if the Cubs don't win the world series this year. ( I supose you need a little variation to prevent the covriances from being 0.)
 
alex.kin. said:
Hi,

Are you aware of any dataset (in R or elsewhere) consisting of a sample from two variables where the correlation coefficient is (approximately) equal to 1, but the variables refer to completely irrelevant things, i.e. one measuring something that happens on Earth and the other something on a distant planet?

or

a case where a parameter causauly affects a measure, but because other such 'causual' parameters also exist, a sample from the respective two variables has correlation coefficient far distant from 1 or -1?

My point is that the correlation coefficient is really an indicator that is informative?

Any suggestions?

thanx, alex

You could create your own data, for example:

Code:
> z<-rnorm(1000)
> w<-rnorm(1000)
> # spurious correlation (random walks)
> cor(cumsum(z),cumsum(w))
[1] 0.6556251
> # perfectly correlated but correlation less than 1
> cor(exp(z),exp(2*z))
[1] 0.8726321
> # perfectly anticorrelated but correlation is almost zero
> cor(exp(z),exp(-2*z))
[1] -0.08543019

Other measures of dependence such as rank correlation have much nicer properties.
 
Stephen Tashi said:
Perhaps you should look at the correlation between two nearly constant "random" variables. Something like X = 1 if there are at least 100 sunny days this year and Y = 1 if the Cubs don't win the world series this year. ( I supose you need a little variation to prevent the covriances from being 0.)

Thanks Stephen,

I know that it is possible to create such a dataset, however so far I haven't found any real-world dataset with the data I have access to.
 
Thanks Stephen,

I know that it is possible to create such a dataset, however so far I haven't found any real-world dataset with the data I have access to.
 
alex.kin. said:
Thanks Stephen,

I know that it is possible to create such a dataset, however so far I haven't found any real-world dataset with the data I have access to.

You'll have to distinguish between the existence of data and the existence of datasets. There are cultural reasons why people would not bother to publish a dataset of nearly constant random variables. This doesn't mean that the data isn't "real world".
 
alex.kin. said:
Thanks Stephen,

I know that it is possible to create such a dataset, however so far I haven't found any real-world dataset with the data I have access to.

Stock price data has properties very similar to the examples described in post #3.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
6K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 178 ·
6
Replies
178
Views
10K
  • · Replies 21 ·
Replies
21
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 10 ·
Replies
10
Views
4K