Originally posted by zoobyshoe
I still need to understand how much better the woman's subjects did. A small margin, even three times in a row, isn't convincing enough, for me. I don't understand the meta-statistics technique, obviously. My recollection from the show, once again, seems to be twisted because I came away with the impression that her results were so far above chance that there was no disputing what they indicated.
Zooby, here's a brief discussion on statistical techniques in an experimental design:
When it comes to experimental testing of hypotheses, the important measure of statistical discrimination is the p-value. The experiment is set up to test a null hypothesis. The null hypothesis is typically conservative and generates predictions that the experiment aims to empirically contradict. If the the null hypothesis is empirically refuted, an alternate hypothesis (formulated before the experiment) is adopted as an explanation for the observed data.
In the case of the Schlitz experiments, the null hypothesis would be "there is no psi phenomenon; therefore, we should not observe a statistically significant correlation between the subjects' EDAs and the experimental observation periods"; the alternate would be "there is some psi phenomenon occurring; therefore, we should observe a statistically significant correlation between the subjects' EDAs and the observation periods." "Statistically significant" here means that the correlation is much stronger than pure chance would allow, or (equivalently) that there is a very low probability that we would observe such a strong correlation if the underlying processes really were random.
The null hypothesis is accepted or rejected on the basis of the p-value, which is the probability that the experimentally established correlation between subjects' EDAs and observation periods would be as strong (or weak) as it was observed to be if their EDA fluctuations really were random and not aided by some psychic ability. So, a high p-value means that the experimental data fits nicely into the null hypothesis, whereas a low p-value indicates that the null hypothesis is inadequate to explain the data. It is decided before the conducting of the experiment what constitutes a sufficiently low p-value to reject the null hypothesis. Typically accepted threshold values for rejection are .05 and .01.
It is important to note that highly significant p-values can occur even if the difference of the observed data from the expected parameters seems trivial on the surface, if the sample size of collected data is large enough. For instance, suppose we conduct an experiment to test whether a certain coin is fair or biased (null hypothesis = fair). If we flip it 100 times and see that it comes up heads 52 times, we will conclude that this deviation is well within the expected value of 50 heads and that we can't reject the null hypothesis with great confidence; that is, there is a fairly high probability that even if the coin is perfectly fair, we will see it come up heads 52 times in 100 flips, so the observed deviation from the expected value of 50 is not statistically significant. However, if we flip the coin 10 trillion times and see that it still comes up heads 52% of the time, we will have an extraordinary low p-value, since there is a very low probability that a perfectly fair coin will be this far from 50% heads over so many trials; that is, the observed deviation is highly statistically significant and we can reject the null hypothesis with confidence.
In Schlitz's original paper, she recorded a p-value of p < .005, meaning that there is less than a 0.5% chance that she would have recorded the correlation data she did if her subjects' EDAs really were fluctuating randomly in synch with observation periods, rather than being aided by some kind of psychic ability. In the Schlitz/Wiseman paper, Schlitz recorded p = 0.04. It is worth noting that the first p-value was calculated using a one-tailed test while the second used a two-tailed test; the gist of this is that two-tailed tests are more conservative in that p-values are higher for two-tailed tests than for one-tailed tests.
-----
Here's a brief introduction to how meta-analysis works:
The technique of meta-analysis involves performing statistical analysis on a hypothesis using data congolmerated from experiments that have already been run. These experiments all should test the same hypothesis and use fundamentally equivalent experimental designs.
The advantage of meta-analysis is that it can be conducted on a vast array of data, allowing one to establish extremely significant p-values for even miniscule effects (see
this old thread for an example of meta-analysis supporting psi phenomena). The obvious disadvantage of meta-analysis is that the designs of the separate experiments are not all completely identical. There is also the problem of the file drawer effect, which basically is the phenomenon that papers which establish positive results tend to be published while those that do not tend to be 'stuffed into the file drawer,' never to be accessible to analysis. The file drawer effect can be offset by further statistical meta-analysis, however, by showing that in order for the file drawer effect to statistically negate the conclusions of a certain meta-analysis, X number of papers that failed to establish positive results would have to have been compiled but never published. Establishing X to be a high number minimizes the likelihood that the file drawer effect is really salient grounds for doubting a particular meta-analysis.
-----
Originally posted by zoobyshoe
To complicate that point, Fz+ turned me on to Chaos a few weeks ago and I've been reading about it. It has convinced me that statistics are the wrong way to go about understanding what is going on on a system; that the laws of averages are not applicable in explaining dynamcal systems.
So when I asked what might explain the woman's results, what I meant was: how could the fact anyone seems to be able to do it be explained? What is the mechanism? I don't think believing you can do it gives you the ability to do it. It may, however, give you the confidence to use a preexisting ability if it exists. Likewise not believing could cause you not to use it.
I agree that the statistics aren't going to uncover the precise mechanisms of how psi works, if it exists. It is an invaluable tool, however, for establishing the existence of psi in the first place.
I'm not very knowledgeable on chaos theory myself; however, whatever it says, it can't negate the usefulness of statistics in
describing, if not explaining, sensitive and dynamic systems. Physics on the quantum scale is exquisitely sensitve and even, as far as we can tell, non-deterministic. Yet on the macro scale of classic physics we observe regular, deterministic behavior, thanks to the macroscopic
statistical tendencies of all those little and unpredictable quantum particles.