Precognition paper to be published in mainstream journal

In summary: HUGE for the field of parapsychology. It may finally gain the credibility it has long deserved. However, if it is found to be false, then it has also discredited the entire field.
  • #36
jarednjames said:
No misinterpretation about it, that is what the article said.53% means you are only 3% over the expected 50/50 odds of guesswork. Without a much larger test group that 3% doesn't mean anything. It could simply be a statistical anomaly.

Any of you seen the Derren Brown episode where he flips a coin ten times in a row and it comes out a head each time?

The test group is too small and this 3% doesn't show anything. If I sat in a room and flipped a coin 100 times, calling heads each time, there is a an equal chance that heads will come up as tails and so although you'd expect an even spread of heads vs tails, however there is a chance that you get more heads than tails and as such would show me as being correct >50% of the time. But there's nothing precognitive about that.
Also, as per the Derren Brown experiment, I flip a coin ten times and could call heads ten times in a row and each coin toss come out heads. Again, nothing precognitive there. Despite what it looks like.
Yes, if you were to flip a coin fair ten times in a single experiment, the likelihood of the coin coming up all heads on a given experiment is 1/210 or about 1 chance in 1024. If that happened on the first experimental attempt, it would be a statistical fluke. Not at all impossible but very unlikely. And if an experimenter did not know if the coin was fair or not, he might take that as positive evidence against the coin being fair, and of meriting further trials. But I'm not sure how the analogy applies to this this set of experiments though. Are you suspecting that the author of the study repeated the experiment perhaps hundreds of times, each with 50 or 100 people in each experiment (many thousands or tens of thousands of people total), and then cherry picked the best results? If so, that would be unethical manipulation of the data (and very costly too :-p). [Edit: And besides, there are easier ways to manipulate the data.]

And forgive me for my confusion, but I'm not certain where you are getting the 53%? In my earlier reply, I was talking about the specific set of experiments described in the study as "Experiment 8: Retroactive Facilitation of Recall I" and "Experiment 9: Retroactive Facilitation of Recall II." These are the experiments where participants are asked to memorize a list of words, and try to recall the words. Then later, a computer generated random subset of half the total words are given to the subjects to perform "practice exercises" on, such as typing each word. The study seems to show that the words recalled are correlated to the random subset of "practice" words that was generated after the fact. Those are the only experiments I was previously discussing on this thread. I haven't even really looked at any of the other experiments in the study.

To demonstrate the statistical relevance further, I've modified my C# a little bit to add some more information. I've attached it below. Now it shows how many of the simulated experiments produce a DR% that is greater than or equal to the DR% reported in the study. My results show 1 in 56 chance, and a 1 in 300 chance, for achieving a DR% that is greater than or equal to the mean DR% reported in the study, for the first and second experiment respectively (the paper calls them experiment 8 and experiment 9). The program simulated 10000 experiments in both cases -- the first with 100 participants per experiment, the second with 50, as per the paper.

Here are the possible choices of interpretations, as I see them:

(I) The author of the paper might really be on to something. This study may be worth further investigation and attempted reproduction.

(II) The data obtained in the experiments were a statistical fluke. However, for the record, if the experiment was repeated many times, the statistics show that the chances of achieving a mean DR%, at or above what is given in the paper, merely by chance and equal odds, are roughly 1 out of 56 for the first experiment (consisting of 100 participants, mean DR% of 2.27%) and roughly 1 out of 333 for the second experiment (consisting of 50 participants, mean DR% of 4.21).

(III) The experiments were somehow biased in ways not evident from the paper, or the data were manipulated or corrupted somehow.​

In my own personal, biased opinion [edit: being the skeptic that I am], I suspect that either (II) or (III) is what really happened. But all I am saying in this post is that the statics quoted in the paper are actually relevant. Granted, a larger sample size would have been better, but still, even with the sample size given in the paper, the results are statistically significant. If we're going to poke holes in the study, we're not going to get very far by poking holes in the study's statistics.

Below is the revised C# code. It was written as console program in Microsoft Visual C# 2008, if you'd like to try it out. You can modify the parameters near the top and recompile to test out different experimental parameters and number of simulated experiments.
(Again, pardon my inefficient coding. I wasn't putting a lot of effort into this).
Code:
//Written by Collins Mark.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Precognition_tester
{
    class Program
    {
        static void Main(string[] args)
        {
            int NumLoops = 10000;  // <== number of experiments
            int SampleSize = 50;  // <== number of participants in each experiment.

            // This represents the paper's mean DR% threshold. Used for
            // comparison of simulated mean DR% values. Should be 2.27
            // for SampleSize of 100, and 4.21% for SampleSize of 50,
            // to compare directly with paper's results.
            double DRcomparisonThreshold = 4.21;

            double memoryMean = 18.4; // <== averge number of words recalled.
            double memoryStDev = 5;   // <== standard deviation of number of words 
                                      //     recalled (I had to guess at this one)

            int ItemsPerCat = 12;
            int i;
            Random uniRand = new Random();

            // Load the category lists.
            List<string> foodList = new List<string>();
            foodList.Add("HotDogs");
            foodList.Add("Hamburgers");
            foodList.Add("Waffles");
            foodList.Add("IceCream");
            foodList.Add("Coffee");
            foodList.Add("Pizza");
            foodList.Add("Guinness");
            foodList.Add("SausageEggAndCheeseBiscuit");
            foodList.Add("Toast");
            foodList.Add("Salad");
            foodList.Add("Taco");
            foodList.Add("Steak");

            List<string> animalList = new List<string>();
            animalList.Add("Cat");
            animalList.Add("Dog");
            animalList.Add("Snake");
            animalList.Add("Whale");
            animalList.Add("Bee");
            animalList.Add("Spider");
            animalList.Add("Elephant");
            animalList.Add("Mongoose");
            animalList.Add("Wambat");
            animalList.Add("Bonobo");
            animalList.Add("Hamster");
            animalList.Add("Human");

            List<string> occupationsList = new List<string>();
            occupationsList.Add("Engineer");
            occupationsList.Add("Plumber");
            occupationsList.Add("TalkShowHost");
            occupationsList.Add("Doctor");
            occupationsList.Add("Janitor");
            occupationsList.Add("Prostitute");
            occupationsList.Add("Cook");
            occupationsList.Add("Theif");
            occupationsList.Add("Pilot");
            occupationsList.Add("Maid");
            occupationsList.Add("Nanny");
            occupationsList.Add("Bartender");

            List<string> clothesList = new List<string>();
            clothesList.Add("Shirt");
            clothesList.Add("Shoes");
            clothesList.Add("Jacket");
            clothesList.Add("Undershorts");
            clothesList.Add("Socks");
            clothesList.Add("Jeans");
            clothesList.Add("Wristwatch");
            clothesList.Add("Cap");
            clothesList.Add("Sunglasses");
            clothesList.Add("Overalls");
            clothesList.Add("LegWarmers");
            clothesList.Add("Bra");

            // Add elements to superset without clustering
            List<string> superset = new List<string>();
            for (i = 0; i < ItemsPerCat; i++)
            {
                superset.Add(foodList[i]);
                superset.Add(animalList[i]);
                superset.Add(occupationsList[i]);
                superset.Add(clothesList[i]);
            }

            mainLoop(
                NumLoops,
                SampleSize, 
                DRcomparisonThreshold,
                ItemsPerCat,
                memoryMean,
                memoryStDev,
                superset,
                foodList,
                animalList,
                occupationsList,
                clothesList,
                uniRand);
        }

        // This is the big, main loop.
        static void mainLoop(
            int NumLoops,
            int SampleSize,
            double DRcomparisonThreshold,
            int ItemsPerCat,
            double memoryMean,
            double memoryStDev,
            List<string> superset,
            List<string> foodList,
            List<string> animalList,
            List<string> occupationsList,
            List<string> clothesList,
            Random uniRand)
        {
            // Report something to the screen,
            Console.WriteLine("Simulating {0} experiments of {1} participants each", NumLoops, SampleSize);
            Console.WriteLine("...Calculating...");

            // Create list of meanDR of separate experiments.
            List<double> meanDRlist = new List<double>();

            // Initialze DR comparison counter.
            int NumDRaboveThresh = 0; // Number of DR% above comparison thesh.

            // Loop through main big loop
            for (int mainCntr = 0; mainCntr < NumLoops; mainCntr++)
            {
                // create Array of participant's DR's for a given experiment.
                List<double> DRarray = new List<double>();

                //Loop through each participant in one experiment.
                for (int participant = 0; participant < SampleSize; participant++)
                {
                    // Reset parameters.
                    int P = 0; // number of practice words recalled.
                    int C = 0; // number of control words recalled.
                    double DR = 0; // weighted differential recall (DR) score.

                    // Create recalled set.
                    List<string> recalledSet = new List<string>();
                    createRecalledSet(
                        recalledSet,
                        superset,
                        memoryMean,
                        memoryStDev,
                        uniRand);

                    // Create random practice set.
                    List<string> practiceSet = new List<string>();
                    createPracticeSet(
                        practiceSet,
                        foodList,
                        animalList,
                        occupationsList,
                        clothesList,
                        ItemsPerCat,
                        uniRand);

                    // Compare recalled count to practice set.
                    foreach (string strTemp in recalledSet)
                    {
                        if (practiceSet.Contains(strTemp))
                            P++;
                        else
                            C++;
                    }

                    // Compute weighted differential recall (DR) score
                    DR = 100.0 * (P - C) * (P + C) / 576.0;

                    // Record DR in list.
                    DRarray.Add(DR);

                    // Report output.
                    //Console.WriteLine("DR%:  {0}", DR);
                }
                // record mean DR.
                double meanDR = DRarray.Average();
                meanDRlist.Add(meanDR);

                // Update comparison counter
                if (meanDR >= DRcomparisonThreshold) NumDRaboveThresh++;

                // Report Average DR.
                //Console.WriteLine("Experiment {0}, Sample size: {1},  mean DR:  {2}", mainCntr, SampleSize, meanDR);

            }
            // Finished looping.

            // Calculate mean of meanDR
            double finalMean = meanDRlist.Average();

            // Calculate standard deviation of meanDR
            double finalStDev = 0;
            foreach (double dTemp in meanDRlist)
            {
                finalStDev += (dTemp - finalMean) * (dTemp - finalMean);
            }
            finalStDev = finalStDev / NumLoops;
            finalStDev = Math.Sqrt(finalStDev);

            // Report final results.

            Console.WriteLine(" ");
            Console.WriteLine("Participants per experiment: {0}", SampleSize);
            Console.WriteLine("Number of separate experiments: {0}", NumLoops);
            Console.WriteLine("mean of the mean DR% from all experiments: {0}",
                finalMean);
            Console.WriteLine("Standard deviation of the mean DR%: {0}", finalStDev);
            Console.WriteLine("");
            Console.WriteLine("Comparison theshold (from study): {0}", DRcomparisonThreshold);
            Console.WriteLine("Total number of meanDR above comparison threshold: {0}", NumDRaboveThresh);
            Console.WriteLine("% of meanDR above comparison threshold: {0}%", 100.0*((double)NumDRaboveThresh)/((double)NumLoops));
            Console.ReadLine();

        }

        static double Gaussrand(double unirand1, double unirand2)
        {
            return (Math.Sqrt(-2 * Math.Log(unirand1)) * Math.Cos(2 * Math.PI * unirand2));
        }

        static void createRecalledSet(List<string> recalledSet, List<string> superSet, double mean, double stdev, Random unirand)
        {
            // Determine how many words were recalled. (random)
            double unirand1 = unirand.NextDouble();
            double unirand2 = unirand.NextDouble();
            while (unirand1 == 0.0) unirand1 = unirand.NextDouble();
            while (unirand2 == 0.0) unirand2 = unirand.NextDouble();

            double gaussrand = Gaussrand(unirand1, unirand2);
            gaussrand *= stdev;
            gaussrand += mean;
            int recalledCount = (int)gaussrand;
            if (recalledCount > superSet.Count) recalledCount = superSet.Count;

            // Create temporary superset and copy elements over.
            List<string> tempSuperSet = new List<string>();
            foreach (string strTemp in superSet)
            {
                tempSuperSet.Add(strTemp);
            }

            // Randomize temporary superset.
            shuffleList(tempSuperSet, unirand);

            // Copy over first recalledCount items to recalledSet.
            for (int i = 0; i < recalledCount; i++)
            {
                recalledSet.Add(tempSuperSet[i]);
            }
        }

        static void createPracticeSet(
            List<string> practiceList,
            List<string> foodList,
            List<string> animalList,
            List<string> occupationsList,
            List<string> clothesList,
            int itemsPerCat,
            Random uniRand)
        {
            List<string> tempFoodList = new List<string>();
            List<string> tempAnimalList = new List<string>();
            List<string> tempOccupationsList = new List<string>();
            List<string> tempClothesList = new List<string>();

            // load temporary lists.
            foreach (string strTemp in foodList)
                tempFoodList.Add(strTemp);
            foreach (string strTemp in animalList)
                tempAnimalList.Add(strTemp);
            foreach (string strTemp in occupationsList)
                tempOccupationsList.Add(strTemp);
            foreach (string strTemp in clothesList)
                tempClothesList.Add(strTemp);

            // Shuffle temporary lists
            shuffleList(tempFoodList, uniRand);
            shuffleList(tempAnimalList, uniRand);
            shuffleList(tempOccupationsList, uniRand);
            shuffleList(tempClothesList, uniRand);

            // Load practice list
            for (int i = 0; i < itemsPerCat / 2; i++)
            {
                practiceList.Add(tempFoodList[i]);
                practiceList.Add(tempAnimalList[i]);
                practiceList.Add(tempOccupationsList[i]);
                practiceList.Add(tempClothesList[i]);
            }

            // Shuffle practice list
            shuffleList(practiceList, uniRand);
        }

        // method to shuffle lists.
        static void shuffleList(List<string> list, Random unirand)
        {
            List<string> shuffledList = new List<string>();
            while (list.Count() > 0)
            {
                int indexTemp = unirand.Next(list.Count());
                shuffledList.Add(list[indexTemp]);
                list.RemoveAt(indexTemp);
            }
            foreach (string strTemp in shuffledList) list.Add(strTemp);
        }
    }
}
 
Last edited:
Physics news on Phys.org
  • #37
collinsmark said:
Yes, if you were to flip a coin fair ten times in a single experiment, the likelihood of the coin coming up all heads on a given experiment is 1/210 or about 1 chance in 1024.

Which is exactly the same odds of equal heads and tails coming up.

The test itself, as per the article had 50/50 odds of the test subject guessing correctly. So I don't see 53/47 as being statistically amazing.

EDIT: I'm talking in regards to prediction so far as the coin toss odds.

The 53% must be from another experiment. The first one in the article I believe.
 
Last edited:
  • #38
Perhaps I should elaborate.

By always having a 50/50 chance of any outcome. No matter what you predict the odds of it occurring are the same. Any pattern you choose so far as a coin toss goes is equally likely to occur. So you really need to shift the odds to >70/30 to show strong predictability.

I'd prefer a test with smaller odds, say 1 in 6, of you guessing the result. That way you have significant odds against you simply guessing on each turn. By using 50/50 you are swinging the odds in favour of a guess.

Even a roll of the dice, giving the 1 in 6 odds, gives an even chance of any pattern occurring. However, it does mean that there is a 5 in 6 chance you are wrong on each go, making a string of correct predictions far more spectacular and significantly less likely.
 
  • #39
jarednjames said:
collinsmark said:
Yes, if you were to flip a coin fair ten times in a single experiment, the likelihood of the coin coming up all heads on a given experiment is 1/210 or about 1 chance in 1024.
Which is exactly the same odds of equal heads and tails coming up.
Egads! don't say that! :eek:

It's not the same. Let's take a 2 coin toss experiment to start. There are four possibilities.

H H
H T *
T H *
T T

Only one possibility out of 4 gives you all heads. That's one chance in 4. But there there are two possibilities that given you equal number of heads and tails, H T and T H. So the probability to tossing equal number of heads vs. tails is 50% or one chance in two attempts.

Moving on to a experiment with 4 tosses,

H H H H
H H H T
H H T H
H H T T *
H T H H
H T H T *
H T T H *
H T T T
T H H H
T H H T *
T H T H *
T H T T
T T H H *
T T H T
T T T H
T T T T

There are 16 possible outcomes and only 1 with all heads. So there is one chance in 16 of getting all heads. But there are 6 ways of getting an equal number of heads and tails. So the probability of equal heads and tails is 6/16 = 37.5% or about one chance in 2.67 attempts.

It turns out that one can calculate the number of ways to produce an outcome of the coin toss flip using

[tex]
\left(
\begin{array}{c}n \\ x \end{array}
\right)

= \frac{n!}{x!(n-x)!} [/tex]

where n is the number of tosses, and x is the number of heads (or tails).

So for a 10-toss experiment, the chances of getting all heads is 1 in 1024, but the chances of getting equal number of heads and tails is 24.6094% or about 1 in 4.

By always having a 50/50 chance of any outcome. No matter what you predict the odds of it occurring are the same. Any pattern you choose so far as a coin toss goes is equally likely to occur. So you really need to shift the odds to >70/30 to show strong predictability.
Yes, I agree with that. For a particular pattern the odds are 1 in 1024 (10-toss coin experiment) for any specific pattern. :approve:

But if you don't care which coins come up heads as long as there is an even number of heads and tails, things are very different.

The experiments presented in the paper don't really care which order the words are recalled, or which specific words happen to be in the "practice" or "control" set. The experiments are not looking for overly specific patters, they are looking for sums of choices that are statistically unlikely when taken as a whole.
I'd prefer a test with smaller odds, say 1 in 6, of you guessing the result. That way you have significant odds against you simply guessing on each turn. By using 50/50 you are swinging the odds in favour of a guess.

Even a roll of the dice, giving the 1 in 6 odds, gives an even chance of any pattern occurring. However, it does mean that there is a 5 in 6 chance you are wrong on each go, making a string of correct predictions far more spectacular and significantly less likely.
Again, for a single roll of the die you are correct. :approve: For a single roll of the die, the probability distribution is uniform.

But that is not the case for rolling the die twice and taking the sum. Or, the same thing, guessing on the sum of two dice rolled together.

If you were to guess on the sum being 2 (snake eyes), you have a 1 chance in 36

On the other hand, if you were to guess that the sum is 7, your odds are incredibly better. There are 6 combinations that give you a score of 7. That makes your odds 6/36 = 16.6667% or 1 chance in 6.

[Edit: fixed a math/typo error.]

[Another edit: Sorry if this is a bit off topic but this subject is fascinating. It's a curious aspect of nature that things tend to reach a state of equilibrium. At the heart of nature, this aspect is because there are a far greater number of possible states that are roughly equally distributed and far fewer states at the extremes. At sub-microscopic scales, there's really no such thing as friction and all collisions are essentially elastic and reversible. But when considering groups of atoms and particles taken together, there are far more states that have roughly equal distribution and far fewer at extreme situations, all else the same (such as the total energy being the same in all possible states). it's this property that we are talking about here that explain friction, inelastic collisions, non-conservative forces, and the second law of thermodynamics when scaled up to macroscopic scales. And perhaps most importantly, the reason that getting 5 heads in a 10-toss coin experiment is far more likely than getting 10 heads is essentially the same reason why my coffee cools down on its own instead of heating up spontaneously.]
 
Last edited:
  • #40
Yes, I was referring to predicting a specific pattern.
The effects he recorded were small but statistically significant. In another test, for instance, volunteers were told that an erotic image was going to appear on a computer screen in one of two positions, and asked to guess in advance which position that would be. The image's eventual position was selected at random, but volunteers guessed correctly 53.1 per cent of the time.

That may sound unimpressive – truly random guesses would have been right 50 per cent of the time, after all. But well-established phenomena such as the ability of low-dose aspirin to prevent heart attacks are based on similarly small effects, notes Melissa Burkley of Oklahoma State University in Stillwater, who has also blogged about Bem's work at Psychology Today.

This is the test I'm referring to.

As per another thread, probability isn't my strong suit. A very interesting post from you there and I thank you. Cleared up some other questions I had as well.
 
  • #41
collinsmark said:
(III) The experiments were somehow biased in ways not evident from the paper, [STRIKE]or the data were manipulated or corrupted somehow[/STRIKE].
No need to postulate malice where a simple mistake will suffice.

It's got to be this one (well reasoned opinion). Frankly, I think it's because the tests are fundamentally non-causal (i.e. don't take place during forward propagation on the positive t-axis). You can never remove the systematic bias from the test: the data point is always taken before the test is performed.

I don't mean that in a trivial "oh, that's neat" way. Seriously consider it. The data being taken in a "precognitive memorization test" is taken prior to the test being performed.

1)Memorize words
2)Recall words test
3)Record results
4)Perform typing test

So we have a fundamental problem. This is situation in which one of the following two scenarios MUST be true:

1) Either the list of words to be typed during the typing test are generated PRIOR to the recall test, or
2) the list of words to be typed during the typing test are generated AFTER the recall test.

In the case of (1), it would be impossible to separate precognition from remote viewing. In the case of (2), there is a tiny chance that the event is actually causal (in that the generation process could be influenced by the results of the recalled word test).

(For the purposes of this problem description I am assuming that causal events are more likely than non-causal events.)
 
  • #42
jarednjames said:
The effects he recorded were small but statistically significant. In another test, for instance, volunteers were told that an erotic image was going to appear on a computer screen in one of two positions, and asked to guess in advance which position that would be. The image's eventual position was selected at random, but volunteers guessed correctly 53.1 per cent of the time.

That may sound unimpressive – truly random guesses would have been right 50 per cent of the time, after all. But well-established phenomena such as the ability of low-dose aspirin to prevent heart attacks are based on similarly small effects, notes Melissa Burkley of Oklahoma State University in Stillwater, who has also blogged about Bem's work at Psychology Today.
This is the test I'm referring to.
Okay, I hadn't looked at that experiment yet, but I'll look at it now.

The study paper says in the experiment, "Experiment 1: Precognitive Detection of Erotic Stimuli," that there were 100 participants. 40 of the participants were shown each 12 erotic images (among other images), and the other 60 participants were each shown 18 erotic images (among others). That makes the total number of erotic images shown altogether, (40)(12)+ (60)(18) = 1560 erotic images shown. The paper goes on to say,

"Across all 100 sessions, participants correctly identified the future position of the erotic
pictures significantly more frequently than the 50% hit rate expected by chance: 53.1%"​
However, after reading that, it's not clear to me whether the 53.1% is the total hit rate averaged across all total erotic pictures from all participants, or whether that is the average erotic-image hit rate of each participant. I don't think it matters much, but I'm going to interpret it the former way, meaning a hit rate of 53.1% of the total 1560 erotic images shown.

So this is sort of like a 1560-toss coin experiment. 53.1% of 1560 is ~828. So I'm guessing that the average number of "correct" guesses is 828 out of 1560 (making the percentage more like 53.0769%).

We could use the binomial distribution

[tex]
P(n|N) =
\left(
\begin{array}{c}N \\ n \end{array}
\right)
p^n (1-p)^{(N-n)}
= \frac{N!}{n!(N-n)!} p^n (1-p)^{(N-n)}
[/tex]

Where N = 1560, n = 828, and p = 0.5. But that would give us the probability of getting exactly 828 heads out of 1560 coin tosses.

But we're really interested in finding the probability of getting 828 heads or greater, out of 1560 coin tosses. So we have to take that into consideration too, and our equation becomes,

[tex]
P =
\sum_{k = n}^N
\left(
\begin{array}{c}N \\ k \end{array}
\right)
p^k (1-p)^{(N-k)}
= \sum_{k = n}^N \frac{N!}{k!(N-k)!} p^k (1-p)^{(N-k)}
[/tex]

Rather than break my calculator and sanity, I just plopped the following into WolframAlpha:
"sum(k=828 to 1560, binomial(1560,k)*0.5^k*(1-0.5)^(1560-k))"​
Thank goodness for WolframAlpha. (http://www.wolframalpha.com" )

The results are the probability is 0.00806697 (roughly 0.8%)

That means the probability of 53.1% heads or better in 1560-toss coin experiment, merely by chance with a fair coin, is 1 in 124. Similarly, the chances of the participants randomly choosing the "correct" side of the screen in erotic image precognition test 53.1% or better, on average, on the first experiment (with all 100 subjects choosing which side 12 or 18 times each), merely by chance, is 1 out of 124. I'd call that statistically significant.

As per another thread, probability isn't my strong suit. A very interesting post from you there and I thank you. Cleared up some other questions I had as well.
I'm not very good at probability and statistics either. I used to know this stuff a long time ago, but I promptly forgot most of it. I had to re-teach myself much of it for this thread! :biggrin:
 
Last edited by a moderator:
  • #43
collinsmark said:
That means the probability of 53.1% heads or better in 1560-toss coin experiment, merely by chance with a fair coin, is 1 in 124.

I could be wrong, but aren't we assuming something by using only the number of erotic images as tests? It implies that there was always an erotic image to be found, and that's not the impression I get from the test.

In fact, and I could be wrong, I understood it to mean that the options were always "left" or "right" but that not every left=right set contained a possible correct answer.

I think I'll have to read again.
 
  • #44
A story on daryl bem's paper in the new york times:

One of psychology’s most respected journals has agreed to publish a paper presenting what its author describes as strong evidence for extrasensory perception, the ability to sense future events.

The decision may delight believers in so-called paranormal events, but it is already mortifying scientists. Advance copies of the paper, [Mind Mysteries] to be published this year in The Journal of Personality and Social Psychology, have circulated widely among psychological researchers in recent weeks and have generated a mixture of amusement and scorn.

Some scientists say the report deserves to be published, in the name of open inquiry; others insist that its acceptance only accentuates fundamental flaws in the evaluation and peer review of research in the social sciences.

“It’s craziness, pure craziness. I can’t believe a major journal is allowing this work in,” Ray Hyman, an emeritus professor of psychology at the University Oregon and longtime critic of ESP research, said. “I think it’s just an embarrassment for the entire field.”
http://www.nytimes.com/2011/01/06/science/06esp.html?_r=4&hp=&pagewanted=all

Another quote:
In this case, the null hypothesis would be that ESP does not exist. Refusing to give that hypothesis weight makes no sense, these experts say; if ESP exists, why aren’t people getting rich by reliably predicting the movement of the stock market or the outcome of football games?
I wonder why people suddenly get such sloppy logic when the subject concerns ESP.
 
  • #45
pftest said:
A story on daryl bem's paper in the new york times:


http://www.nytimes.com/2011/01/06/science/06esp.html?_r=4&hp=&pagewanted=all

Another quote:
I wonder why people suddenly get such sloppy logic when the subject concerns ESP.

Yes, it's always good to move away from the paper itself, and instead read a reporter's personal take on it... why?

Forget the article and focus on the actual paper, which is a different matter. Beyond that, you need to learn what the scientific method is so you can understand when you posit that null hypothesis, and why. Nobody here should have to argue with you, just to realize that you need further education on the subject.

For instance, would it be logical to assume the existence (i.e. truth of hypothesis) of something, then go about to prove your assumption? That's called... NOT SCIENCE... in fact it's enough to end your career regardless of the research subject. To pass off the results of a test designed to exploit a known neurological process is just... stupid. There's something to be examined here, but IF it's repeatable, then it doesn't sound ESPy to me at all. This is ESP in the way that forgetting where your keys are, then suddenly having an idea in your mind that they're under couch! You must be psychic, and all because of your mindset while waiting for your search pattern to improve based on dim memory.
 
  • #46
nismaratwork said:
Yes, it's always good to move away from the paper itself, and instead read a reporter's personal take on it... why?
Perhaps you didnt read the article, but even the quote that i used states that this was the opinion of "experts". So it isn't the reporters "personal take". I am surprised that those experts use such sloppy logic. Perhaps the reporter didnt summarise the experts views well.
 
  • #47
pftest said:
Perhaps you didnt read the article, but even the quote that i used states that this was the opinion of "experts". So it isn't the reporters "personal take". I am surprised that those experts use such sloppy logic. Perhaps the reporter didnt summarise the experts views well.

Oh, in that case I'll have Flex do the same referring to ME as an "expert", and I'll call him a journalist. I can see that you really press the standards here when it comes to credulity.
 
  • #48
Here is a PDF of a response paper:

http://dl.dropbox.com/u/1018886/Bem6.pdf

It looks like there are some serious flaws with the ESP paper. The one I have the biggest problem with is coming up with a hypothesis from a set of data, and then using that same set of data to test the hypothesis. It's a version of the Texas Sharpshooter Fallacy.

Here's what the paper I linked has to say, in part, on this matter:

The Bem experiments were at least partly exploratory. For instance, Bem’s Experiment tested not just erotic pictures, but also neutral pictures, negative pictures, positive pictures, and pictures that were romantic but non-erotic. Only the erotic pictures showed any evidence for precognition. But now suppose that the data would have turned out differently and instead of the erotic pictures, the positive pictures would have been the only ones to result in performance higher than chance. Or suppose the negative pictures would have resulted in performance lower than chance. It is possible that a new and different story would then have been constructed around these other results (Bem, 2003; Kerr, 1998). This means that Bem’s Experiment 1 was to some extent a fishing expedition, an expedition that should have been explicitly reported and should have resulted in a correction of the reported p-value.

I'm currently reading a book by Dr. Ben Goldacre called "Bad Science" where he goes over this exact sort of thing.
 
  • #49
Jack21222 said:
Here is a PDF of a response paper:

http://dl.dropbox.com/u/1018886/Bem6.pdf

It looks like there are some serious flaws with the ESP paper. The one I have the biggest problem with is coming up with a hypothesis from a set of data, and then using that same set of data to test the hypothesis. It's a version of the Texas Sharpshooter Fallacy.

Here's what the paper I linked has to say, in part, on this matter:



I'm currently reading a book by Dr. Ben Goldacre called "Bad Science" where he goes over this exact sort of thing.

I'd call it, "Good Fraud"... better 'atmospherics'. :wink:
 
  • #50
Perhaps this falls into the category of "journalism" that seems so despised in this discussion, but Jonah Lehrer wrote a nice article for The New Yorker that touches on issues relevant to the debate (similar to the points already brought up in the thread: that subtle biases in study design, analysis and interpretation can introduce significant biases and lead to erroneous results). In particular, he talks about some work done by Jonathan Schooler:
In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to replicate this failure to replicate. In homage to Rhine’s interests, he decided to test for a parapsychological phenomenon known as precognition. The experiment itself was straightforward: he flashed a set of images to a subject and asked him or her to identify each one. Most of the time, the response was negative—the images were displayed too quickly to register. Then Schooler randomly selected half of the images to be shown again. What he wanted to know was whether the images that got a second showing were more likely to have been identified the first time around. Could subsequent exposure have somehow influenced the initial results? Could the effect become the cause?

The craziness of the hypothesis was the point: Schooler knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”—a standard statistical measure—“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhine’s,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”

The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets canceled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time. And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time! Hell, it’s happened to me multiple times.”
http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer

In essence, Schooler replicated the results of the Bem paper but, after performing many more tests, showed that the results were noting but a statistical anomaly. I'm not aware whether Schooler published these results.

This, especially in light of other such examples detailed in Lehrer's piece, is why I'm hesitant to trust findings based primarily on statistical data without a plausible, empirically-tested mechanism explaining the results.
 
  • #51
Ygggdrasil said:
Perhaps this falls into the category of "journalism" that seems so despised in this discussion, but Jonah Lehrer wrote a nice article for The New Yorker that touches on issues relevant to the debate (similar to the points already brought up in the thread: that subtle biases in study design, analysis and interpretation can introduce significant biases and lead to erroneous results). In particular, he talks about some work done by Jonathan Schooler:
http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer

In essence, Schooler replicated the results of the Bem paper but, after performing many more tests, showed that the results were noting but a statistical anomaly. I'm not aware whether Schooler published these results.

This, especially in light of other such examples detailed in Lehrer's piece, is why I'm hesitant to trust findings based primarily on statistical data without a plausible, empirically-tested mechanism explaining the results.

Nah, when you post journalism, it's OK... you're the world-tree after all :wink:. Plus, your article actually offers information rather than obscuring it when the original paper is available. Thank you.
 
  • #52
nismaratwork said:
Oh, in that case I'll have Flex do the same referring to ME as an "expert", and I'll call him a journalist. I can see that you really press the standards here when it comes to credulity.
The article i posted is about Bems paper, as well as some of the replication efforts. It also has a "debate" section, or rather a criticism section, in which 9 different scientists give their opinion on it. The NYT does not invent its experts, sources or the many scientists it mentions, if that's what you are suggesting. Google them if you don't believe they exist. I was the one who posted Bems original paper btw.

Perhaps you didnt read it because it now requires a login (it didnt when i posted it yesterday), but registration is free.
 
  • #53
pftest said:
The article i posted is about Bems paper, as well as some of the replication efforts. It also has a "debate" section, or rather a criticism section, in which 9 different scientists give their opinion on it. The NYT does not invent its experts, sources or the many scientists it mentions, if that's what you are suggesting. Google them if you don't believe they exist. I was the one who posted Bems original paper btw.

Perhaps you didnt read it because it now requires a login (it didnt when i posted it yesterday), but registration is free.

Oh lord... listen pftest... the NYtimes isn't a peer reviewed journal, so what you're talking about is the fallacy of an appeal to authority. I am also NOT suggesting anything about the NYTimes... I really know very little about them and don't use it for my news; I prefer more direct sources. I did read THIS, but the OPINIONS of 9 people are just that... and not scientific support. AGAIN, I don't believe you're familiar with standards like this, so you're running into trouble... again.
 
  • #54
Ygggdrasil said:
Perhaps this falls into the category of "journalism" that seems so despised in this discussion, but Jonah Lehrer wrote a nice article for The New Yorker that touches on issues relevant to the debate (similar to the points already brought up in the thread: that subtle biases in study design, analysis and interpretation can introduce significant biases and lead to erroneous results). In particular, he talks about some work done by Jonathan Schooler:
http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer

In essence, Schooler replicated the results of the Bem paper but, after performing many more tests, showed that the results were noting but a statistical anomaly. I'm not aware whether Schooler published these results.

This, especially in light of other such examples detailed in Lehrer's piece, is why I'm hesitant to trust findings based primarily on statistical data without a plausible, empirically-tested mechanism explaining the results.

Very interesting, thanks! Although kind of stating the contrary as Bern, I would say that Schooler's findings are almost as mind boggling as those of Bern... Perhaps worth a topic fork?

PS as a personal anecdote, as a kid I once came across a "one-armed bandit" gambling machine with a group of guys around it. They had thrown a lot of false coins(!) in the machine and one of them was about to throw in the last coin when he noticed me. After I confirmed to him that I had never gambled before he asked me to throw it in, and I got jackpot for them - most of it consisting of their own false coins. I left the scene with mixed feelings, as they had robbed my chance on beginners luck for myself...
 
Last edited:
  • #55
It should be noted that so far, all objections are only opinions and anecdotes. The rebuttal paper can only be considered anecdotal evidence - it cannot be used as evidence that he original paper was flawed - unless/until it is published in a mainstream journal. It is fine to discuss the objections, but they cannot be declared valid at this time.

Likewise, one published paper proves nothing. We have experimental evidence for the claim that is subject to peer review and verification.
 
Last edited:
  • #56
Personally, I still stand by my original thoughts which where that 3% isn't that significant.

OK, it's above average (53% correct in an area with 50/50 odds). But given the way the test was performed it didn't prove anything as far as I'm concerned.

If you really want to do something like this, take 1000 people, sit them down and toss a coin for them (via some coin toss machine) and get them to predict the outcome.

No need for anything excessive given the subject.

After that trial, if you have 53% it means that 30,000 of the guesses were correct when they shouldn't have been. Now that is significant.

Regardless, the biggest problem I see with tests like this is that I could sit calling heads each time and the odds say I'll break even, so any additional would count towards precognition. If this happens with a number of subjects, you could end up with a skewed result.
Although you would expect equal numbers of each, it is quite possible for you to get a larger number of heads than tails during the test and so the above system would skew things.

Perhaps you could do the test as outlined above and use the continuous heads/tails method as a set of benchmarks.
 
  • #57
Ivan Seeking said:
It should be noted that so far, all objections are only opinions and anecdotes. The rebuttal paper can only be considered anecdotal evidence - it cannot be used as evidence that he original paper was flawed - unless/until it is published in a mainstream journal. It is fine to discuss the objections, but they cannot be declared valid at this time.

Likewise, one published paper proves nothing. We have experimental evidence for the claim that is subject to peer review and verification.

I had overlooked that there is a rebuttal paper - thanks, I'll read it now! But such a rebuttal as by Wagenmaker et al cannot be considered "anecdotal", that's something very different; and the publication or not of a paper in a "mainstream journal" cannot be taken as evidence for a paper's correctness, just as an email that passed your spam filter isn't necessary true, nor are all emails that have not yet been sent or that fall in your spambox spam. What matters in physics are presented facts and their verification. Discussions on this forum may be limited to peer reviewed stuff for exactly the same anti-spam purpose, but a forum discussion should not be confused with the scientific method.

Harald

Edit: I now see that the essence of Wagenmaker's paper has been accepted for publication: it's "a revised version of a previous draft that was accepted pending revision for Journal of Personality and Social Psychology."
 
Last edited:
  • #58
Jack21222 said:
Here is a PDF of a response paper:

http://dl.dropbox.com/u/1018886/Bem6.pdf

[..]

Thanks a lot for that preview! I'll read it with interest, as it may be useful in general. :smile:
 
Last edited:
  • #59
These are supposedly 3 failed replications of Bems testresults (dont know if they are the same ones as mentioned in the NYT article):
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=13quorf_DWEXBBvlDPngbUNFKm5-BjgXgehJJ7ndnxc_wx2BsXn84iPhLeVfX&hl=en / http://circee.org/Retro-priming-et-re-test.html / 3

There must be more replication efforts out there.

nismaraatwork said:
Oh lord... listen pftest... the NYtimes isn't a peer reviewed journal, so what you're talking about is the fallacy of an appeal to authority. I am also NOT suggesting anything about the NYTimes... I really know very little about them and don't use it for my news; I prefer more direct sources. I did read THIS, but the OPINIONS of 9 people are just that... and not scientific support. AGAIN, I don't believe you're familiar with standards like this, so you're running into trouble... again.
Calm down chap, i just posted an article with an abundance of relevant information. I didnt claim the NYT is a peer reviewed scientific journal...
 
Last edited by a moderator:
  • #60
pftest said:
These are supposedly 3 failed replications of Bems testresults (dont know if they are the same ones as mentioned in the NYT article):
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=13quorf_DWEXBBvlDPngbUNFKm5-BjgXgehJJ7ndnxc_wx2BsXn84iPhLeVfX&hl=en / http://circee.org/Retro-priming-et-re-test.html / 3

There must be more replication efforts out there.


Calm down chap, i just posted an article with an abundance of relevant information. I didnt claim the NYT is a peer reviewed scientific journal...

Sorry, I've been jumping between thread, and threads and work too much. I don't agree with what you clearly believe, but nonetheless I was rude. I apologize.
 
Last edited by a moderator:
  • #61
pftest said:
These are supposedly 3 failed replications of Bems testresults (dont know if they are the same ones as mentioned in the NYT article):
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=13quorf_DWEXBBvlDPngbUNFKm5-BjgXgehJJ7ndnxc_wx2BsXn84iPhLeVfX&hl=en / http://circee.org/Retro-priming-et-re-test.html / 3

There must be more replication efforts out there. [..].

Well, in view of Wagenmakers et al's response paper and their reinterpretation, those are actually successful replications! :-p
 
Last edited by a moderator:
  • #62
Ivan Seeking said:
It should be noted that so far, all objections are only opinions and anecdotes. The rebuttal paper can only be considered anecdotal evidence - it cannot be used as evidence that he original paper was flawed - unless/until it is published in a mainstream journal. It is fine to discuss the objections, but they cannot be declared valid at this time.

Likewise, one published paper proves nothing. We have experimental evidence for the claim that is subject to peer review and verification.

I don't think you know what an "anecdote" means. Pointing out methodological flaws isn't an anecdote. You may argue that it isn't scientifically accepted evidence yet, but it's very convincing if you ask me, especially the part where they formed and tested the hypothesis with the same set of data.

That is a horrible abuse of data points.
 
  • #63
Jack21222 said:
I don't think you know what an "anecdote" means. Pointing out methodological flaws isn't an anecdote. You may argue that it isn't scientifically accepted evidence yet, but it's very convincing if you ask me, especially the part where they formed and tested the hypothesis with the same set of data.

That is a horrible abuse of data points.

I agree with the spirit of what you're saying... do the rules allow for something published so openly, but not peer reviewed to be considered more than anecdotal? It may be an issue of the rules of the site vs. the standard terminology... I hope.
 
  • #64
nismaratwork said:
I agree with the spirit of what you're saying... do the rules allow for something published so openly, but not peer reviewed to be considered more than anecdotal? It may be an issue of the rules of the site vs. the standard terminology... I hope.

An anecdote is a story. What I linked is not a story. It's a criticism based on methodology.
 
  • #65
nismaratwork said:
For instance, would it be logical to assume the existence (i.e. truth of hypothesis) of something, then go about to prove your assumption? That's called... NOT SCIENCE...

I agree that it is not science.

Yet, it is exactly what disbelievers of ESP/paranormal do. They assume that it does not exist, then go about to prove it, finding errors in the procedures, statistical analysys, etc, of the ESP experiments.

So, it seems they are being as unscientific as the ones they criticise.
 
  • #66
Jack21222 said:
An anecdote is a story. What I linked is not a story. It's a criticism based on methodology.

Note that it is just as much peer reviewed as the paper that it criticizes.

The main issue is I think, that the original paper seems to have been a fishing expedition, without properly accounting for that fact. Anyway, I'm now becoming familiar with Bayesian statistics thanks to this. :smile:

Harald
 
  • #67
coelho said:
I agree that it is not science.

Yet, it is exactly what disbelievers of ESP/paranormal do. They assume that it does not exist, then go about to prove it, finding errors in the procedures, statistical analysys, etc, of the ESP experiments.

So, it seems they are being as unscientific as the ones they criticise.

Finding errors in other peoples work is the ENTIRE BASIS OF SCIENCE. That's how we have so much confidence in what survives the scientific process, because it HAS been thoroughly attacked from every angle, and it came out the other end alive.

To use your example, if ESP was real, even after the disbelievers go about to disprove it, attempting to find errors in the procedure, statistical analysis, etc, the evidence would still hold up. If it doesn't hold up, that means it isn't accepted by science yet, come back when you have evidence that can survive the scientific process.

To say that those things you mentioned are "unscientific" is just about the most absurd thing you can possibly say. It's like saying giving live birth and having warm blood is "un-mammalian."
 
  • #68
coelho said:
Yet, it is exactly what disbelievers of ESP/paranormal do. They assume that it does not exist, then go about to prove it, finding errors in the procedures, statistical analysys, etc, of the ESP experiments.

So, it seems they are being as unscientific as the ones they criticise.

Firstly, if you claim ESP exists then it is up to you to prove it.

You give evidence of its existence, people then 'tear it apart'. That's science.

Every flaw, every error, every single thing you can find wrong with the evidence / procedure, whatever is there, is a mark against it. But, if after all of that the evidence still holds, then ESP would still be accepted.

The default assumption is that science has nothing to say on a subject without evidence. Until verifiable evidence comes to light, there is no reason to entertain the notion of it existing. Simple.

The fact is, the evidence for ESP / the paranormal doesn't hold up to even the simplest examination. And let's not get started on the test methods.

There is nothing unscientific about finding flaws in data and test methods (heck, you're encouraged to). There is nothing unscientific in requiring valid evidence for claims.
 
  • #69
Coelho: Jack and Jared have replied to your fundamental lack of understanding of science, better than I could.
 
  • #70
Jack21222 said:
I don't think you know what an "anecdote" means. Pointing out methodological flaws isn't an anecdote. You may argue that it isn't scientifically accepted evidence yet, but it's very convincing if you ask me, especially the part where they formed and tested the hypothesis with the same set of data.

Until we see a published rebuttal, all arguments are anecdotal or unsupported. Unpublished papers count at most as anecdotal evidence, which never trumps a published paper.

We don't use one standard for claims we like, and another for claims we don't like. See the S&D forum guidelines.
 
Last edited:

Similar threads

Replies
11
Views
26K
Back
Top