- #1
Andre
- 4,311
- 74
I'm helping a friend writing a study and we stumble upon statistical relevance. Having to investigate the correlation of a certain data set with a population n=67 with many other datasets, we wondered, at what correlation value one can speak of statistical relevant. So we ducked into wikipedia but found no simple straighforward answer, instead we soon found ourselfs back linking deeper and deeper in a maze of math.
However, having juggled with excel a lot, it was pretty simple to run umpty (say 5000) correlations of gaussian distributed random number sets (null hypothesis) and get the standard deviation σ. This turned out to be 12.3%, so I guess it is safe to say that if you want a statistical relevance of 95% certainty of some real correlation, with of 67 data pairs, the corrolation needs to be greater than 3σ or 3*12.3=37%. Right?
So I got curious what variation in sample size would do to this ρ for the null hypothesis, so I ran the same spreadsheet for a couple of larger and smaller sample sizes, plotted them and found this:
So that trendline is unbelievably close to simply σ=1/√n
Of course such simple relations usually have been found already a century or two ago, so my question is to whom we should refer if we use this formula in the study to justify what we consider statistical relevance.
However, having juggled with excel a lot, it was pretty simple to run umpty (say 5000) correlations of gaussian distributed random number sets (null hypothesis) and get the standard deviation σ. This turned out to be 12.3%, so I guess it is safe to say that if you want a statistical relevance of 95% certainty of some real correlation, with of 67 data pairs, the corrolation needs to be greater than 3σ or 3*12.3=37%. Right?
So I got curious what variation in sample size would do to this ρ for the null hypothesis, so I ran the same spreadsheet for a couple of larger and smaller sample sizes, plotted them and found this:
So that trendline is unbelievably close to simply σ=1/√n
Of course such simple relations usually have been found already a century or two ago, so my question is to whom we should refer if we use this formula in the study to justify what we consider statistical relevance.