# Precognition paper to be published in mainstream journal

A test for precognition should be simple, shouldn't it?

I propose the following:

The test subject must accurately* predict a future event. The event must be something that is otherwise considered un-predictable (or of such low odds any other method wouldn't be able to determine its occurrence accurately).

*Accuracy is defined here in relation to the complexity of the prediction. See following examples.

Example 1
Task - A person predicts the outcome of a number of rolls of a fair dice.
Accuracy Required - Due to the nature of the task, the person must predict the exact result.
Additional Requirements - The dice must be rolled a number of times to ensure the probability of simply guessing the outcome correctly each time is made as low as possible. Recomendation is 20 rolls as a start.

Example 2
Task - A person predicts a seemingly random event, in this case we'll use a car crash.
Accuracy Required - The event must be described in enough detail so that a random person could match the description to the crash should it occur, without any details being left vague or open to interpretation. "A car will crash on the M4 tomorrow" is not a valid predicition. "A blue Ford will crash into a red Hyundai near junction 10 on the M4 tomorrow" is acceptable, but more detail would be preferred.
Additional Requirements - As above, the event must clearly match the description given in order to be considered an accurate prediction of said event.

As you can see, all you need to do is describe a future event in enough detail for us to clearly identify it when it occurs. Simple.

FlexGunship
Gold Member
A test for precognition should be simple, shouldn't it?

I propose the following: [...] Simple.

I think the idea is that this is an unconscious response. And that it is uncontrollable by the individual. Specifically, they are saying that psychological tests are functional even if causality is reversed.

Examples of standard tests:
1. Show a scary picture -> heart rate increases
2. Show a boring picture -> heart rate steady

Examples of precognition tests:
1. Heart rate increases -> show a scary picture
2. Heart rate steady -> show a boring picture

The important fact is that whether a scary picture or a boring picture is being shown is predetermined and NOT based on the heart rate. It's quite a claim!​

Last edited:
collinsmark
Homework Helper
Gold Member
In one experiment, students were shown a list of words and then asked to recall words from it, after which they were told to type words that were randomly selected from the same list. Spookily, the students were better at recalling words that they would later type.
Spooky my a**. All they've said there is that a student has shown they remembered a word and then when asked to type some words later that is one of the ones the typed. Would you believe it.
I think you might be misinterpreting the experiment, maybe. The article is ambiguous, and not well written on this point, but here is how the experiment was apparently done (I'll try to summarize it):

The entire process for each participant was done in private on a computer. There were a total of 100 precipitants.

1. A list of 48 common words are given to the participant to remember. The word list and word order are identical for all test subjects. I'll call this word list the "super-set."
2. The test subject is then asked to recall as many words as they could from the superset. I'll call this list of a test subject's recalled words the "recalled-set."
3. The computer randomly generates a subset of 24 words from the super-set. This list of words is called the "practice-word-set" (the draft version of the paper calls them the 24 "practice words"). Participants then had to perform some exercises on each word, such as clicking on each word with the mouse, categorizing each word (all words form the superset are are either foods, animals, occupations, or clothes), and typing each practice word.
4. I'll call the remaining 24 words from the super-set that are not in the practice-word-set the "control-word-set" (the paper calls them "control words").
5. A measure is calculated called a "weighted differential recall (DR) score," ranging from -100% to 100%, which correlates the recalled-set to the practice-word-set and control-word-set. A positive DR% means the words from the recalled-set had a higher percentage of "practice words" than "control words." A negative DR% means the words from the recalled-set had a higher percentage of "control words" than "practice words." A 0 DR% means that the participant chose an equal number of words from both sets.
The DR score was calculated as follows,
P: number of words in both the recalled-set and practice-word set.
C: number of words in both the recalled-set and control-word set.
DR% = [(P – C) × (P + C)]/576

{Edit: Here's an example: 10 practice words recalled, 8 control words recalled. DR% = 100% x [(10-8)(10+8)]/576 = 6.25%}
There was also a 25 person control group. In this group, the procedure was the same except the participants did not do any practice exercises, and were not shown the randomly generated practice-word-set. However it was still used to calculate a DR% score for comparison.

Results:
Mean DR% score:
Main group:2.27%
Control group: 0.26%

A variation of the experiment was performed which had a slight change of how the superset of words were originally given to the participants. In this version of the experiment, the sample size was much smaller; only 50 participants. There was also a 25 participant control session.

Mean DR% score:
Main group:4.21%
Control group: Not given in the paper, but only mentioned as, "DR% scores from the control sessions did not differ significantly from zero."

For details, here's a link to where I gathered this:
http://dbem.ws/FeelingFuture.pdf

I'd like to see the experiment reproduced with a larger sample size. For now I am not impressed. And why does the paper not give the control group's mean DR% in the second experiment ?!? Perhaps because all DR% scores in the whole experiment do not statistically differ significantly from 0? I'm not impressed.

Last edited:
Ivan Seeking
Staff Emeritus
Gold Member
Flex and Jared, you guys are discussing the wrong paper. You're discussing the crackpot Radin paper that Ivan posted. He was thinking of an older unrelated paper.

http://www.apa.org/pubs/journals/psp/index.aspx [Broken] 0022-3514/10/$12.00 DOI: 10.1037/a0021524 This article may not exactly replicate the final version published in the APA journal. It is not the copy of record. Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect Daryl J. Bem Cornell University Last edited by a moderator: Ivan Seeking Staff Emeritus Science Advisor Gold Member Actually, I didn't even link it, I just quoted from the paper linked in the op. I think you might be misinterpreting the experiment No misinterpretation about it, that is what the article said. 53% means you are only 3% over the expected 50/50 odds of guesswork. Without a much larger test group that 3% doesn't mean anything. It could simply be a statistical anomaly. Any of you seen the Derren Brown episode where he flips a coin ten times in a row and it comes out a head each time? The test group is too small and this 3% doesn't show anything. If I sat in a room and flipped a coin 100 times, calling heads each time, there is a an equal chance that heads will come up as tails and so although you'd expect an even spread of heads vs tails, however there is a chance that you get more heads than tails and as such would show me as being correct >50% of the time. But there's nothing precognitive about that. Also, as per the Derren Brown experiment, I flip a coin ten times and could call heads ten times in a row and each coin toss come out heads. Again, nothing precognitive there. Despite what it looks like. As a note, DB spent 8 hours stood in front of a camera flipping the coin until it came out heads ten times in a row (they showed this at the end). He used it to explain something in a show (he made out it was extremely likely to happen to help what he was trying to get the audience to do), but the purpose of the explanation (showing the 8 hours worth of attempts) at the end was him trying to demonstrate that it is possible for heads to come out ten times in a row, how unlikely it was - but not impossible. collinsmark Homework Helper Gold Member Considering the experiment involving the word memorization followed by the "practice" typing of a random subset of words, Now I am kinda' impressed. (But not jumping out of my seat or anything). I just created a C# program to simulate Daryl J. Bem's experiment in order to analyze the statistics. Basically, the program simulates the experiment, except without any human interaction so we can rule out any human influences. This way one can compare the paper's reported DR% against simulated DR% values. When simulating 100 participants in a given experiment, and repeating the experiment 5000 times, the mean DR% was very close to 0 as expected, but the standard deviation of the mean DR% was only 1.097%. The paper's reported DR% (for the first trial of 100 participants) was 2.27%. That's over two standard deviations better than expected. That could be significant. For the second trial with 50 participants, repeating the experiment 5000 times, the simulated mean was (of course) almost 0, and the standard deviation of the mean DR% was 1.54%. The actual experiment apparently had a DR% of 4.21%. That's about 2.7 standard deviations away from what is expected. So, the numbers in this experiment might be somewhat statistically significant. But I still would be curious to see how it turns out with a larger sample set. I've attached the code below. Please forgive my poor coding, I wasn't putting a whole lot of time into this. Code: //Written by Collins Mark. using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace Precognition_tester { class Program { static void Main(string[] args) { int NumLoops = 5000; // <== number of experiments int SampleSize = 100; // <== number of participants in each experiment. double memoryMean = 18.4; // <== averge number of words recalled. double memoryStDev = 5; // <== standard deviation of number of words // recalled (I had to guess at this one) int ItemsPerCat = 12; int i; Random uniRand = new Random(); // Load the category lists. List<string> foodList = new List<string>(); foodList.Add("HotDogs"); foodList.Add("Hamburgers"); foodList.Add("Waffles"); foodList.Add("IceCream"); foodList.Add("Coffee"); foodList.Add("Pizza"); foodList.Add("Guinness"); foodList.Add("SausageEggAndCheeseBiscuit"); foodList.Add("Toast"); foodList.Add("Salad"); foodList.Add("Taco"); foodList.Add("Steak"); List<string> animalList = new List<string>(); animalList.Add("Cat"); animalList.Add("Dog"); animalList.Add("Snake"); animalList.Add("Whale"); animalList.Add("Bee"); animalList.Add("Spider"); animalList.Add("Elephant"); animalList.Add("Mongoose"); animalList.Add("Wambat"); animalList.Add("Bonobo"); animalList.Add("Hamster"); animalList.Add("Human"); List<string> occupationsList = new List<string>(); occupationsList.Add("Engineer"); occupationsList.Add("Plumber"); occupationsList.Add("TalkShowHost"); occupationsList.Add("Doctor"); occupationsList.Add("Janitor"); occupationsList.Add("Prostitute"); occupationsList.Add("Cook"); occupationsList.Add("Theif"); occupationsList.Add("Pilot"); occupationsList.Add("Maid"); occupationsList.Add("Nanny"); occupationsList.Add("Bartender"); List<string> clothesList = new List<string>(); clothesList.Add("Shirt"); clothesList.Add("Shoes"); clothesList.Add("Jacket"); clothesList.Add("Undershorts"); clothesList.Add("Socks"); clothesList.Add("Jeans"); clothesList.Add("Wristwatch"); clothesList.Add("Cap"); clothesList.Add("Sunglasses"); clothesList.Add("Overalls"); clothesList.Add("LegWarmers"); clothesList.Add("Bra"); // Add elements to superset without clustering List<string> superset = new List<string>(); for (i = 0; i < ItemsPerCat; i++) { superset.Add(foodList[i]); superset.Add(animalList[i]); superset.Add(occupationsList[i]); superset.Add(clothesList[i]); } mainLoop( NumLoops, SampleSize, ItemsPerCat, memoryMean, memoryStDev, superset, foodList, animalList, occupationsList, clothesList, uniRand); } // This is the big, main loop. static void mainLoop( int NumLoops, int SampleSize, int ItemsPerCat, double memoryMean, double memoryStDev, List<string> superset, List<string> foodList, List<string> animalList, List<string> occupationsList, List<string> clothesList, Random uniRand) { // Report something to the screen, Console.WriteLine("Simulating {0} experiments of {1} participants each", NumLoops, SampleSize); Console.WriteLine("...Calculating..."); // Create list of meanDR of separate experiments. List<double> meanDRlist = new List<double>(); // Loop through main big loop for (int mainCntr = 0; mainCntr < NumLoops; mainCntr++) { // create Array of participant's DR's for a given experiment. List<double> DRarray = new List<double>(); //Loop through each participant in one experiment. for (int participant = 0; participant < SampleSize; participant++) { // Reset parameters. int P = 0; // number of practice words recalled. int C = 0; // number of control words recalled. double DR = 0; // weighted differential recall (DR) score. // Create recalled set. List<string> recalledSet = new List<string>(); createRecalledSet( recalledSet, superset, memoryMean, memoryStDev, uniRand); // Create random practice set. List<string> practiceSet = new List<string>(); createPracticeSet( practiceSet, foodList, animalList, occupationsList, clothesList, ItemsPerCat, uniRand); // Compare recalled count to practice set. foreach (string strTemp in recalledSet) { if (practiceSet.Contains(strTemp)) P++; else C++; } // Compute weighted differential recall (DR) score DR = 100.0 * (P - C) * (P + C) / 576.0; // Record DR in list. DRarray.Add(DR); // Report output. //Console.WriteLine("DR%: {0}", DR); } // record mean DR. double meanDR = DRarray.Average(); meanDRlist.Add(meanDR); // Report Average DR. //Console.WriteLine("Experiment {0}, Sample size: {1}, mean DR: {2}", mainCntr, SampleSize, meanDR); } // Finished looping. // Calculate mean of meanDR double finalMean = meanDRlist.Average(); // Calculate standard deviation of meanDR double finalStDev = 0; foreach (double dTemp in meanDRlist) { finalStDev += (dTemp - finalMean) * (dTemp - finalMean); } finalStDev = finalStDev / NumLoops; finalStDev = Math.Sqrt(finalStDev); // Report final results. Console.WriteLine(" "); Console.WriteLine("Participants per experiment: {0}", SampleSize); Console.WriteLine("Number of separate experiments: {0}", NumLoops); Console.WriteLine("mean of the mean DR% from all experiments: {0}", finalMean); Console.WriteLine("Standard deviation of the mean DR%: {0}", finalStDev); Console.ReadLine(); } static double Gaussrand(double unirand1, double unirand2) { return (Math.Sqrt(-2 * Math.Log(unirand1)) * Math.Cos(2 * Math.PI * unirand2)); } static void createRecalledSet(List<string> recalledSet, List<string> superSet, double mean, double stdev, Random unirand) { // Determine how many words were recalled. (random) double unirand1 = unirand.NextDouble(); double unirand2 = unirand.NextDouble(); while (unirand1 == 0.0) unirand1 = unirand.NextDouble(); while (unirand2 == 0.0) unirand2 = unirand.NextDouble(); double gaussrand = Gaussrand(unirand1, unirand2); gaussrand *= stdev; gaussrand += mean; int recalledCount = (int)gaussrand; if (recalledCount > superSet.Count) recalledCount = superSet.Count; // Create temporary superset and copy elements over. List<string> tempSuperSet = new List<string>(); foreach (string strTemp in superSet) { tempSuperSet.Add(strTemp); } // Randomize temporary superset. shuffleList(tempSuperSet, unirand); // Copy over first recalledCount items to recalledSet. for (int i = 0; i < recalledCount; i++) { recalledSet.Add(tempSuperSet[i]); } } static void createPracticeSet( List<string> practiceList, List<string> foodList, List<string> animalList, List<string> occupationsList, List<string> clothesList, int itemsPerCat, Random uniRand) { List<string> tempFoodList = new List<string>(); List<string> tempAnimalList = new List<string>(); List<string> tempOccupationsList = new List<string>(); List<string> tempClothesList = new List<string>(); // load temporary lists. foreach (string strTemp in foodList) tempFoodList.Add(strTemp); foreach (string strTemp in animalList) tempAnimalList.Add(strTemp); foreach (string strTemp in occupationsList) tempOccupationsList.Add(strTemp); foreach (string strTemp in clothesList) tempClothesList.Add(strTemp); // Shuffle temporary lists shuffleList(tempFoodList, uniRand); shuffleList(tempAnimalList, uniRand); shuffleList(tempOccupationsList, uniRand); shuffleList(tempClothesList, uniRand); // Load practice list for (int i = 0; i < itemsPerCat / 2; i++) { practiceList.Add(tempFoodList[i]); practiceList.Add(tempAnimalList[i]); practiceList.Add(tempOccupationsList[i]); practiceList.Add(tempClothesList[i]); } // Shuffle practice list shuffleList(practiceList, uniRand); } // method to shuffle lists. static void shuffleList(List<string> list, Random unirand) { List<string> shuffledList = new List<string>(); while (list.Count() > 0) { int indexTemp = unirand.Next(list.Count()); shuffledList.Add(list[indexTemp]); list.RemoveAt(indexTemp); } foreach (string strTemp in shuffledList) list.Add(strTemp); } } } Last edited: Evo Mentor What are you talking about? This is what I linked. © 2010 American Psychological Association http://www.apa.org/pubs/journals/psp/index.aspx [Broken] 0022-3514/10/$12.00 DOI: 10.1037/a0021524
This article may not exactly replicate the final version published in the APA journal. It is not the copy of record.
Feeling the Future: Experimental Evidence for
Anomalous Retroactive Influences on Cognition and Affect
Daryl J. Bem
Cornell University

Actually, I didn't even link it, I just quoted from the paper linked in the op.
This is what you posted https://www.physicsforums.com/showpost.php?p=2982604&postcount=3

From the cited paper, this is what I saw quite some time ago [probably around 2002 or 2003]. I have mentioned it but was never able to find a valid reference for this work.

The trend is exemplified by several recent “presentiment” experiments, pioneered by Radin (1997), in which physiological indices of participants’ emotional arousal were monitored as participants viewed a series of pictures on a computer screen. Most of the pictures were emotionally neutral, but a highly arousing negative or erotic image was displayed on randomly selected trials. As expected, strong emotional arousal occurred when these images appeared on the screen, but the remarkable finding is that the increased arousal was observed to occur a few seconds before the picture appeared, before the computer has even selected the picture to be displayed. The presentiment effect has also been demonstrated in an fMRI experiment that monitored brain activity (Bierman & Scholte, 2002) and in experiments using bursts of noise rather than visual images as the arousing stimuli (Spottiswoode & May, 2003). A review of presentiment experiments prior to 2006 can be found in Radin (2006, pp. 161–180). Although there has not yet been a formal meta-analysis of presentiment studies, there have been 24 studies with human participants through 2009, of which 19 were in the predicted direction and Feeling the Future 5 about half were statistically significant. Two studies with animals are both positive, one marginally and the other substantially so (D. I. Radin, personal communication, December 20, 2009)...

Last edited by a moderator:
FlexGunship
Gold Member
<whisper>Umm... so was I talking about the wrong thing or not? :uhh:</whisper>

collinsmark
Homework Helper
Gold Member
No misinterpretation about it, that is what the article said.

53% means you are only 3% over the expected 50/50 odds of guesswork. Without a much larger test group that 3% doesn't mean anything. It could simply be a statistical anomaly.

Any of you seen the Derren Brown episode where he flips a coin ten times in a row and it comes out a head each time?

The test group is too small and this 3% doesn't show anything. If I sat in a room and flipped a coin 100 times, calling heads each time, there is a an equal chance that heads will come up as tails and so although you'd expect an even spread of heads vs tails, however there is a chance that you get more heads than tails and as such would show me as being correct >50% of the time. But there's nothing precognitive about that.
Also, as per the Derren Brown experiment, I flip a coin ten times and could call heads ten times in a row and each coin toss come out heads. Again, nothing precognitive there. Despite what it looks like.
Yes, if you were to flip a coin fair ten times in a single experiment, the likelihood of the coin coming up all heads on a given experiment is 1/210 or about 1 chance in 1024. If that happened on the first experimental attempt, it would be a statistical fluke. Not at all impossible but very unlikely. And if an experimenter did not know if the coin was fair or not, he might take that as positive evidence against the coin being fair, and of meriting further trials. But I'm not sure how the analogy applies to this this set of experiments though. Are you suspecting that the author of the study repeated the experiment perhaps hundreds of times, each with 50 or 100 people in each experiment (many thousands or tens of thousands of people total), and then cherry picked the best results? If so, that would be unethical manipulation of the data (and very costly too :tongue2:). [Edit: And besides, there are easier ways to manipulate the data.]

And forgive me for my confusion, but I'm not certain where you are getting the 53%? In my earlier reply, I was talking about the specific set of experiments described in the study as "Experiment 8: Retroactive Facilitation of Recall I" and "Experiment 9: Retroactive Facilitation of Recall II." These are the experiments where participants are asked to memorize a list of words, and try to recall the words. Then later, a computer generated random subset of half the total words are given to the subjects to perform "practice exercises" on, such as typing each word. The study seems to show that the words recalled are correlated to the random subset of "practice" words that was generated after the fact. Those are the only experiments I was previously discussing on this thread. I haven't even really looked at any of the other experiments in the study.

To demonstrate the statistical relevance further, I've modified my C# a little bit to add some more information. I've attached it below. Now it shows how many of the simulated experiments produce a DR% that is greater than or equal to the DR% reported in the study. My results show 1 in 56 chance, and a 1 in 300 chance, for achieving a DR% that is greater than or equal to the mean DR% reported in the study, for the first and second experiment respectively (the paper calls them experiment 8 and experiment 9). The program simulated 10000 experiments in both cases -- the first with 100 participants per experiment, the second with 50, as per the paper.

Here are the possible choices of interpretations, as I see them:

(I) The author of the paper might really be on to something. This study may be worth further investigation and attempted reproduction.

(II) The data obtained in the experiments were a statistical fluke. However, for the record, if the experiment was repeated many times, the statistics show that the chances of achieving a mean DR%, at or above what is given in the paper, merely by chance and equal odds, are roughly 1 out of 56 for the first experiment (consisting of 100 participants, mean DR% of 2.27%) and roughly 1 out of 333 for the second experiment (consisting of 50 participants, mean DR% of 4.21).

(III) The experiments were somehow biased in ways not evident from the paper, or the data were manipulated or corrupted somehow.​

In my own personal, biased opinion [edit: being the skeptic that I am], I suspect that either (II) or (III) is what really happened. But all I am saying in this post is that the statics quoted in the paper are actually relevant. Granted, a larger sample size would have been better, but still, even with the sample size given in the paper, the results are statistically significant. If we're going to poke holes in the study, we're not going to get very far by poking holes in the study's statistics.

Below is the revised C# code. It was written as console program in Microsoft Visual C# 2008, if you'd like to try it out. You can modify the parameters near the top and recompile to test out different experimental parameters and number of simulated experiments.
(Again, pardon my inefficient coding. I wasn't putting a lot of effort into this).
Code:
//Written by Collins Mark.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Precognition_tester
{
class Program
{
static void Main(string[] args)
{
int NumLoops = 10000;  // <== number of experiments
int SampleSize = 50;  // <== number of participants in each experiment.

// This represents the paper's mean DR% threshold. Used for
// comparison of simulated mean DR% values. Should be 2.27
// for SampleSize of 100, and 4.21% for SampleSize of 50,
// to compare directly with paper's results.
double DRcomparisonThreshold = 4.21;

double memoryMean = 18.4; // <== averge number of words recalled.
double memoryStDev = 5;   // <== standard deviation of number of words
//     recalled (I had to guess at this one)

int ItemsPerCat = 12;
int i;
Random uniRand = new Random();

List<string> foodList = new List<string>();

List<string> animalList = new List<string>();

List<string> occupationsList = new List<string>();

List<string> clothesList = new List<string>();

// Add elements to superset without clustering
List<string> superset = new List<string>();
for (i = 0; i < ItemsPerCat; i++)
{
}

mainLoop(
NumLoops,
SampleSize,
DRcomparisonThreshold,
ItemsPerCat,
memoryMean,
memoryStDev,
superset,
foodList,
animalList,
occupationsList,
clothesList,
uniRand);
}

// This is the big, main loop.
static void mainLoop(
int NumLoops,
int SampleSize,
double DRcomparisonThreshold,
int ItemsPerCat,
double memoryMean,
double memoryStDev,
List<string> superset,
List<string> foodList,
List<string> animalList,
List<string> occupationsList,
List<string> clothesList,
Random uniRand)
{
// Report something to the screen,
Console.WriteLine("Simulating {0} experiments of {1} participants each", NumLoops, SampleSize);
Console.WriteLine("...Calculating...");

// Create list of meanDR of separate experiments.
List<double> meanDRlist = new List<double>();

// Initialze DR comparison counter.
int NumDRaboveThresh = 0; // Number of DR% above comparison thesh.

// Loop through main big loop
for (int mainCntr = 0; mainCntr < NumLoops; mainCntr++)
{
// create Array of participant's DR's for a given experiment.
List<double> DRarray = new List<double>();

//Loop through each participant in one experiment.
for (int participant = 0; participant < SampleSize; participant++)
{
// Reset parameters.
int P = 0; // number of practice words recalled.
int C = 0; // number of control words recalled.
double DR = 0; // weighted differential recall (DR) score.

// Create recalled set.
List<string> recalledSet = new List<string>();
createRecalledSet(
recalledSet,
superset,
memoryMean,
memoryStDev,
uniRand);

// Create random practice set.
List<string> practiceSet = new List<string>();
createPracticeSet(
practiceSet,
foodList,
animalList,
occupationsList,
clothesList,
ItemsPerCat,
uniRand);

// Compare recalled count to practice set.
foreach (string strTemp in recalledSet)
{
if (practiceSet.Contains(strTemp))
P++;
else
C++;
}

// Compute weighted differential recall (DR) score
DR = 100.0 * (P - C) * (P + C) / 576.0;

// Record DR in list.

// Report output.
//Console.WriteLine("DR%:  {0}", DR);
}
// record mean DR.
double meanDR = DRarray.Average();

// Update comparison counter
if (meanDR >= DRcomparisonThreshold) NumDRaboveThresh++;

// Report Average DR.
//Console.WriteLine("Experiment {0}, Sample size: {1},  mean DR:  {2}", mainCntr, SampleSize, meanDR);

}
// Finished looping.

// Calculate mean of meanDR
double finalMean = meanDRlist.Average();

// Calculate standard deviation of meanDR
double finalStDev = 0;
foreach (double dTemp in meanDRlist)
{
finalStDev += (dTemp - finalMean) * (dTemp - finalMean);
}
finalStDev = finalStDev / NumLoops;
finalStDev = Math.Sqrt(finalStDev);

// Report final results.

Console.WriteLine(" ");
Console.WriteLine("Participants per experiment: {0}", SampleSize);
Console.WriteLine("Number of separate experiments: {0}", NumLoops);
Console.WriteLine("mean of the mean DR% from all experiments: {0}",
finalMean);
Console.WriteLine("Standard deviation of the mean DR%: {0}", finalStDev);
Console.WriteLine("");
Console.WriteLine("Comparison theshold (from study): {0}", DRcomparisonThreshold);
Console.WriteLine("Total number of meanDR above comparison threshold: {0}", NumDRaboveThresh);
Console.WriteLine("% of meanDR above comparison threshold: {0}%", 100.0*((double)NumDRaboveThresh)/((double)NumLoops));

}

static double Gaussrand(double unirand1, double unirand2)
{
return (Math.Sqrt(-2 * Math.Log(unirand1)) * Math.Cos(2 * Math.PI * unirand2));
}

static void createRecalledSet(List<string> recalledSet, List<string> superSet, double mean, double stdev, Random unirand)
{
// Determine how many words were recalled. (random)
double unirand1 = unirand.NextDouble();
double unirand2 = unirand.NextDouble();
while (unirand1 == 0.0) unirand1 = unirand.NextDouble();
while (unirand2 == 0.0) unirand2 = unirand.NextDouble();

double gaussrand = Gaussrand(unirand1, unirand2);
gaussrand *= stdev;
gaussrand += mean;
int recalledCount = (int)gaussrand;
if (recalledCount > superSet.Count) recalledCount = superSet.Count;

// Create temporary superset and copy elements over.
List<string> tempSuperSet = new List<string>();
foreach (string strTemp in superSet)
{
}

// Randomize temporary superset.
shuffleList(tempSuperSet, unirand);

// Copy over first recalledCount items to recalledSet.
for (int i = 0; i < recalledCount; i++)
{
}
}

static void createPracticeSet(
List<string> practiceList,
List<string> foodList,
List<string> animalList,
List<string> occupationsList,
List<string> clothesList,
int itemsPerCat,
Random uniRand)
{
List<string> tempFoodList = new List<string>();
List<string> tempAnimalList = new List<string>();
List<string> tempOccupationsList = new List<string>();
List<string> tempClothesList = new List<string>();

foreach (string strTemp in foodList)
foreach (string strTemp in animalList)
foreach (string strTemp in occupationsList)
foreach (string strTemp in clothesList)

// Shuffle temporary lists
shuffleList(tempFoodList, uniRand);
shuffleList(tempAnimalList, uniRand);
shuffleList(tempOccupationsList, uniRand);
shuffleList(tempClothesList, uniRand);

for (int i = 0; i < itemsPerCat / 2; i++)
{
}

// Shuffle practice list
shuffleList(practiceList, uniRand);
}

// method to shuffle lists.
static void shuffleList(List<string> list, Random unirand)
{
List<string> shuffledList = new List<string>();
while (list.Count() > 0)
{
int indexTemp = unirand.Next(list.Count());
list.RemoveAt(indexTemp);
}
foreach (string strTemp in shuffledList) list.Add(strTemp);
}
}
}

Last edited:
Yes, if you were to flip a coin fair ten times in a single experiment, the likelihood of the coin coming up all heads on a given experiment is 1/210 or about 1 chance in 1024.

Which is exactly the same odds of equal heads and tails coming up.

The test itself, as per the article had 50/50 odds of the test subject guessing correctly. So I don't see 53/47 as being statistically amazing.

EDIT: I'm talking in regards to prediction so far as the coin toss odds.

The 53% must be from another experiment. The first one in the article I believe.

Last edited:
Perhaps I should elaborate.

By always having a 50/50 chance of any outcome. No matter what you predict the odds of it occurring are the same. Any pattern you choose so far as a coin toss goes is equally likely to occur. So you really need to shift the odds to >70/30 to show strong predictability.

I'd prefer a test with smaller odds, say 1 in 6, of you guessing the result. That way you have significant odds against you simply guessing on each turn. By using 50/50 you are swinging the odds in favour of a guess.

Even a roll of the dice, giving the 1 in 6 odds, gives an even chance of any pattern occurring. However, it does mean that there is a 5 in 6 chance you are wrong on each go, making a string of correct predictions far more spectacular and significantly less likely.

collinsmark
Homework Helper
Gold Member
Yes, if you were to flip a coin fair ten times in a single experiment, the likelihood of the coin coming up all heads on a given experiment is 1/210 or about 1 chance in 1024.
Which is exactly the same odds of equal heads and tails coming up.

It's not the same. Let's take a 2 coin toss experiment to start. There are four possibilities.

H H
H T *
T H *
T T

Only one possibility out of 4 gives you all heads. That's one chance in 4. But there there are two possibilities that given you equal number of heads and tails, H T and T H. So the probability to tossing equal number of heads vs. tails is 50% or one chance in two attempts.

Moving on to a experiment with 4 tosses,

H H H H
H H H T
H H T H
H H T T *
H T H H
H T H T *
H T T H *
H T T T
T H H H
T H H T *
T H T H *
T H T T
T T H H *
T T H T
T T T H
T T T T

There are 16 possible outcomes and only 1 with all heads. So there is one chance in 16 of getting all heads. But there are 6 ways of getting an equal number of heads and tails. So the probability of equal heads and tails is 6/16 = 37.5% or about one chance in 2.67 attempts.

It turns out that one can calculate the number of ways to produce an outcome of the coin toss flip using

$$\left( \begin{array}{c}n \\ x \end{array} \right) = \frac{n!}{x!(n-x)!}$$

where n is the number of tosses, and x is the number of heads (or tails).

So for a 10-toss experiment, the chances of getting all heads is 1 in 1024, but the chances of getting equal number of heads and tails is 24.6094% or about 1 in 4.

By always having a 50/50 chance of any outcome. No matter what you predict the odds of it occurring are the same. Any pattern you choose so far as a coin toss goes is equally likely to occur. So you really need to shift the odds to >70/30 to show strong predictability.
Yes, I agree with that. For a particular pattern the odds are 1 in 1024 (10-toss coin experiment) for any specific pattern.

But if you don't care which coins come up heads as long as there is an even number of heads and tails, things are very different.

The experiments presented in the paper don't really care which order the words are recalled, or which specific words happen to be in the "practice" or "control" set. The experiments are not looking for overly specific patters, they are looking for sums of choices that are statistically unlikely when taken as a whole.
I'd prefer a test with smaller odds, say 1 in 6, of you guessing the result. That way you have significant odds against you simply guessing on each turn. By using 50/50 you are swinging the odds in favour of a guess.

Even a roll of the dice, giving the 1 in 6 odds, gives an even chance of any pattern occurring. However, it does mean that there is a 5 in 6 chance you are wrong on each go, making a string of correct predictions far more spectacular and significantly less likely.
Again, for a single roll of the die you are correct. For a single roll of the die, the probability distribution is uniform.

But that is not the case for rolling the die twice and taking the sum. Or, the same thing, guessing on the sum of two dice rolled together.

If you were to guess on the sum being 2 (snake eyes), you have a 1 chance in 36

On the other hand, if you were to guess that the sum is 7, your odds are incredibly better. There are 6 combinations that give you a score of 7. That makes your odds 6/36 = 16.6667% or 1 chance in 6.

[Edit: fixed a math/typo error.]

[Another edit: Sorry if this is a bit off topic but this subject is fascinating. It's a curious aspect of nature that things tend to reach a state of equilibrium. At the heart of nature, this aspect is because there are a far greater number of possible states that are roughly equally distributed and far fewer states at the extremes. At sub-microscopic scales, there's really no such thing as friction and all collisions are essentially elastic and reversible. But when considering groups of atoms and particles taken together, there are far more states that have roughly equal distribution and far fewer at extreme situations, all else the same (such as the total energy being the same in all possible states). it's this property that we are talking about here that explain friction, inelastic collisions, non-conservative forces, and the second law of thermodynamics when scaled up to macroscopic scales. And perhaps most importantly, the reason that getting 5 heads in a 10-toss coin experiment is far more likely than getting 10 heads is essentially the same reason why my coffee cools down on its own instead of heating up spontaneously.]

Last edited:
Yes, I was referring to predicting a specific pattern.
The effects he recorded were small but statistically significant. In another test, for instance, volunteers were told that an erotic image was going to appear on a computer screen in one of two positions, and asked to guess in advance which position that would be. The image's eventual position was selected at random, but volunteers guessed correctly 53.1 per cent of the time.

That may sound unimpressive – truly random guesses would have been right 50 per cent of the time, after all. But well-established phenomena such as the ability of low-dose aspirin to prevent heart attacks are based on similarly small effects, notes Melissa Burkley of Oklahoma State University in Stillwater, who has also blogged about Bem's work at Psychology Today.

This is the test I'm referring to.

As per another thread, probability isn't my strong suit. A very interesting post from you there and I thank you. Cleared up some other questions I had as well.

FlexGunship
Gold Member
collinsmark said:
(III) The experiments were somehow biased in ways not evident from the paper, [STRIKE]or the data were manipulated or corrupted somehow[/STRIKE].
No need to postulate malice where a simple mistake will suffice.

It's got to be this one (well reasoned opinion). Frankly, I think it's because the tests are fundamentally non-causal (i.e. don't take place during forward propagation on the positive t-axis). You can never remove the systematic bias from the test: the data point is always taken before the test is performed.

I don't mean that in a trivial "oh, that's neat" way. Seriously consider it. The data being taken in a "precognitive memorization test" is taken prior to the test being performed.

1)Memorize words
2)Recall words test
3)Record results
4)Perform typing test

So we have a fundamental problem. This is situation in which one of the following two scenarios MUST be true:

1) Either the list of words to be typed during the typing test are generated PRIOR to the recall test, or
2) the list of words to be typed during the typing test are generated AFTER the recall test.

In the case of (1), it would be impossible to separate precognition from remote viewing. In the case of (2), there is a tiny chance that the event is actually causal (in that the generation process could be influenced by the results of the recalled word test).

(For the purposes of this problem description I am assuming that causal events are more likely than non-causal events.)

collinsmark
Homework Helper
Gold Member
The effects he recorded were small but statistically significant. In another test, for instance, volunteers were told that an erotic image was going to appear on a computer screen in one of two positions, and asked to guess in advance which position that would be. The image's eventual position was selected at random, but volunteers guessed correctly 53.1 per cent of the time.

That may sound unimpressive – truly random guesses would have been right 50 per cent of the time, after all. But well-established phenomena such as the ability of low-dose aspirin to prevent heart attacks are based on similarly small effects, notes Melissa Burkley of Oklahoma State University in Stillwater, who has also blogged about Bem's work at Psychology Today.
This is the test I'm referring to.
Okay, I hadn't looked at that experiment yet, but I'll look at it now.

The study paper says in the experiment, "Experiment 1: Precognitive Detection of Erotic Stimuli," that there were 100 participants. 40 of the participants were shown each 12 erotic images (among other images), and the other 60 participants were each shown 18 erotic images (among others). That makes the total number of erotic images shown altogether, (40)(12)+ (60)(18) = 1560 erotic images shown. The paper goes on to say,

"Across all 100 sessions, participants correctly identified the future position of the erotic
pictures significantly more frequently than the 50% hit rate expected by chance: 53.1%"​
However, after reading that, it's not clear to me whether the 53.1% is the total hit rate averaged across all total erotic pictures from all participants, or whether that is the average erotic-image hit rate of each participant. I don't think it matters much, but I'm going to interpret it the former way, meaning a hit rate of 53.1% of the total 1560 erotic images shown.

So this is sort of like a 1560-toss coin experiment. 53.1% of 1560 is ~828. So I'm guessing that the average number of "correct" guesses is 828 out of 1560 (making the percentage more like 53.0769%).

We could use the binomial distribution

$$P(n|N) = \left( \begin{array}{c}N \\ n \end{array} \right) p^n (1-p)^{(N-n)} = \frac{N!}{n!(N-n)!} p^n (1-p)^{(N-n)}$$

Where N = 1560, n = 828, and p = 0.5. But that would give us the probability of getting exactly 828 heads out of 1560 coin tosses.

But we're really interested in finding the probability of getting 828 heads or greater, out of 1560 coin tosses. So we have to take that into consideration too, and our equation becomes,

$$P = \sum_{k = n}^N \left( \begin{array}{c}N \\ k \end{array} \right) p^k (1-p)^{(N-k)} = \sum_{k = n}^N \frac{N!}{k!(N-k)!} p^k (1-p)^{(N-k)}$$

Rather than break my calculator and sanity, I just plopped the following into WolframAlpha:
"sum(k=828 to 1560, binomial(1560,k)*0.5^k*(1-0.5)^(1560-k))"​
Thank goodness for WolframAlpha. (http://www.wolframalpha.com" [Broken])

The results are the probability is 0.00806697 (roughly 0.8%)

That means the probability of 53.1% heads or better in 1560-toss coin experiment, merely by chance with a fair coin, is 1 in 124. Similarly, the chances of the participants randomly choosing the "correct" side of the screen in erotic image precognition test 53.1% or better, on average, on the first experiment (with all 100 subjects choosing which side 12 or 18 times each), merely by chance, is 1 out of 124. I'd call that statistically significant.

As per another thread, probability isn't my strong suit. A very interesting post from you there and I thank you. Cleared up some other questions I had as well.
I'm not very good at probability and statistics either. I used to know this stuff a long time ago, but I promptly forgot most of it. I had to re-teach myself much of it for this thread!

Last edited by a moderator:
FlexGunship
Gold Member
That means the probability of 53.1% heads or better in 1560-toss coin experiment, merely by chance with a fair coin, is 1 in 124.

I could be wrong, but aren't we assuming something by using only the number of erotic images as tests? It implies that there was always an erotic image to be found, and that's not the impression I get from the test.

In fact, and I could be wrong, I understood it to mean that the options were always "left" or "right" but that not every left=right set contained a possible correct answer.

I think I'll have to read again.

A story on daryl bem's paper in the new york times:

One of psychology’s most respected journals has agreed to publish a paper presenting what its author describes as strong evidence for extrasensory perception, the ability to sense future events.

The decision may delight believers in so-called paranormal events, but it is already mortifying scientists. Advance copies of the paper, [Mind Mysteries] to be published this year in The Journal of Personality and Social Psychology, have circulated widely among psychological researchers in recent weeks and have generated a mixture of amusement and scorn.

Some scientists say the report deserves to be published, in the name of open inquiry; others insist that its acceptance only accentuates fundamental flaws in the evaluation and peer review of research in the social sciences.

“It’s craziness, pure craziness. I can’t believe a major journal is allowing this work in,” Ray Hyman, an emeritus professor of psychology at the University Oregon and longtime critic of ESP research, said. “I think it’s just an embarrassment for the entire field.”
http://www.nytimes.com/2011/01/06/science/06esp.html?_r=4&hp=&pagewanted=all

Another quote:
In this case, the null hypothesis would be that ESP does not exist. Refusing to give that hypothesis weight makes no sense, these experts say; if ESP exists, why aren’t people getting rich by reliably predicting the movement of the stock market or the outcome of football games?
I wonder why people suddenly get such sloppy logic when the subject concerns ESP.

A story on daryl bem's paper in the new york times:

http://www.nytimes.com/2011/01/06/science/06esp.html?_r=4&hp=&pagewanted=all

Another quote:
I wonder why people suddenly get such sloppy logic when the subject concerns ESP.

Yes, it's always good to move away from the paper itself, and instead read a reporter's personal take on it... why????

Forget the article and focus on the actual paper, which is a different matter. Beyond that, you need to learn what the scientific method is so you can understand when you posit that null hypothesis, and why. Nobody here should have to argue with you, just to realize that you need further education on the subject.

For instance, would it be logical to assume the existence (i.e. truth of hypothesis) of something, then go about to prove your assumption? That's called... NOT SCIENCE... in fact it's enough to end your career regardless of the research subject. To pass off the results of a test designed to exploit a known neurological process is just... stupid. There's something to be examined here, but IF it's repeatable, then it doesn't sound ESPy to me at all. This is ESP in the way that forgetting where your keys are, then suddenly having an idea in your mind that they're under couch! You must be psychic, and all because of your mindset while waiting for your search pattern to improve based on dim memory.

Yes, it's always good to move away from the paper itself, and instead read a reporter's personal take on it... why????
Perhaps you didnt read the article, but even the quote that i used states that this was the opinion of "experts". So it isnt the reporters "personal take". Im surprised that those experts use such sloppy logic. Perhaps the reporter didnt summarise the experts views well.

Perhaps you didnt read the article, but even the quote that i used states that this was the opinion of "experts". So it isnt the reporters "personal take". Im surprised that those experts use such sloppy logic. Perhaps the reporter didnt summarise the experts views well.

Oh, in that case I'll have Flex do the same referring to ME as an "expert", and I'll call him a journalist. I can see that you really press the standards here when it comes to credulity.

Here is a PDF of a response paper:

http://dl.dropbox.com/u/1018886/Bem6.pdf

It looks like there are some serious flaws with the ESP paper. The one I have the biggest problem with is coming up with a hypothesis from a set of data, and then using that same set of data to test the hypothesis. It's a version of the Texas Sharpshooter Fallacy.

Here's what the paper I linked has to say, in part, on this matter:

The Bem experiments were at least partly exploratory. For instance, Bem’s Experiment tested not just erotic pictures, but also neutral pictures, negative pictures, positive pictures, and pictures that were romantic but non-erotic. Only the erotic pictures showed any evidence for precognition. But now suppose that the data would have turned out differently and instead of the erotic pictures, the positive pictures would have been the only ones to result in performance higher than chance. Or suppose the negative pictures would have resulted in performance lower than chance. It is possible that a new and different story would then have been constructed around these other results (Bem, 2003; Kerr, 1998). This means that Bem’s Experiment 1 was to some extent a fishing expedition, an expedition that should have been explicitly reported and should have resulted in a correction of the reported p-value.

I'm currently reading a book by Dr. Ben Goldacre called "Bad Science" where he goes over this exact sort of thing.

Here is a PDF of a response paper:

http://dl.dropbox.com/u/1018886/Bem6.pdf

It looks like there are some serious flaws with the ESP paper. The one I have the biggest problem with is coming up with a hypothesis from a set of data, and then using that same set of data to test the hypothesis. It's a version of the Texas Sharpshooter Fallacy.

Here's what the paper I linked has to say, in part, on this matter:

I'm currently reading a book by Dr. Ben Goldacre called "Bad Science" where he goes over this exact sort of thing.

I'd call it, "Good Fraud"... better 'atmospherics'.

Ygggdrasil
Gold Member
Perhaps this falls into the category of "journalism" that seems so despised in this discussion, but Jonah Lehrer wrote a nice article for The New Yorker that touches on issues relevant to the debate (similar to the points already brought up in the thread: that subtle biases in study design, analysis and interpretation can introduce significant biases and lead to erroneous results). In particular, he talks about some work done by Jonathan Schooler:
In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to replicate this failure to replicate. In homage to Rhine’s interests, he decided to test for a parapsychological phenomenon known as precognition. The experiment itself was straightforward: he flashed a set of images to a subject and asked him or her to identify each one. Most of the time, the response was negative—the images were displayed too quickly to register. Then Schooler randomly selected half of the images to be shown again. What he wanted to know was whether the images that got a second showing were more likely to have been identified the first time around. Could subsequent exposure have somehow influenced the initial results? Could the effect become the cause?

The craziness of the hypothesis was the point: Schooler knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”—a standard statistical measure—“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhine’s,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”

The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time. And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time! Hell, it’s happened to me multiple times.”
http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer

In essence, Schooler replicated the results of the Bem paper but, after performing many more tests, showed that the results were noting but a statistical anomaly. I'm not aware whether Schooler published these results.

This, especially in light of other such examples detailed in Lehrer's piece, is why I'm hesitant to trust findings based primarily on statistical data without a plausible, empirically-tested mechanism explaining the results.