Was Bem's "Feeling the Future" paper exploratory?

C

Chris

Daryl Bem's paper, "Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect", has been criticised on a number of grounds.
http://caps.ucsf.edu/wordpress/wp-content/uploads/2011/02/bem2011.pdf

Among other things, it's been suggested that instead of fixing beforehand the hypotheses to be tested and the experimental procedure to be used to test them, he modified both in reaction to his findings as he went along, so that the statistical significance of the results can't be taken at face value. In other words, the work was, at least in part, exploratory.

One prominent critic who advanced this view was James Alcock. His criticism, entitled "Back from the Future: Parapsychology and the Bem Affair", can be found here:
http://www.csicop.org/specialarticles/show/back_from_the_future
Bem responded to it here:
http://www.csicop.org/specialarticles/show/response_to_alcocks_back_from_the_future_comments_on_bem
And Alcock responded to Bem's response here:
http://www.csicop.org/specialarticles/show/response_to_bems_comments

What do people think? Did Bem fix his hypotheses in advance of doing the experiments, or did he modify them as he went along? Did he change any of the experimental designs part of the way through? If so, do the changes cast doubt on the validity of the results?
 
It may not matter. Jay mentioned this paper earlier, where Gelman and Locken point out that it is sufficient for the analysis to be contingent upon the data to cast doubt on the validity of the results. By looking at the design, implementation and analysis in terms of whether it is contingent upon the data, you no longer need to put a researcher on the spot (who wants to admit their results may be invalid?) to figure out if their work was exploratory.

http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf

Linda
 
It may not matter. Jay mentioned this paper earlier, where Gelman and Locken point out that it is sufficient for the analysis to be contingent upon the data to cast doubt on the validity of the results.

Yes, I looked at that when he posted it before. It didn't seem to me that their criticisms of Bem's paper made much sense, though. Do you think they're justified?
 
Daryl Bem's paper, "Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect", has been criticised on a number of grounds.
http://caps.ucsf.edu/wordpress/wp-content/uploads/2011/02/bem2011.pdf

Among other things, it's been suggested that instead of fixing beforehand the hypotheses to be tested and the experimental procedure to be used to test them, he modified both in reaction to his findings as he went along, so that the statistical significance of the results can't be taken at face value. In other words, the work was, at least in part, exploratory.

One prominent critic who advanced this view was James Alcock. His criticism, entitled "Back from the Future: Parapsychology and the Bem Affair", can be found here:
http://www.csicop.org/specialarticles/show/back_from_the_future
Bem responded to it here:
http://www.csicop.org/specialarticles/show/response_to_alcocks_back_from_the_future_comments_on_bem
And Alcock responded to Bem's response here:
http://www.csicop.org/specialarticles/show/response_to_bems_comments

What do people think? Did Bem fix his hypotheses in advance of doing the experiments, or did he modify them as he went along? Did he change any of the experimental designs part of the way through? If so, do the changes cast doubt on the validity of the results?
Interested in your take, Chris.... You seem to be as well qualified as anyone to comment. (I'm in danger of making a snarky comment like '"exploratory" appears too polite a word' ;))
 
Interested in your take, Chris.... You seem to be as well qualified as anyone to comment. (I'm in danger of making a snarky comment like '"exploratory" appears too polite a word' ;))

Well, essentially, it seems to me that for each experiment there was a predefined hypothesis (assuming Bem is being truthful, of course), and whatever "exploration" or modifications to the protocol there may have been during the course of the experiments, they weren't of a kind that would produce false positive results, on the assumption that the null hypothesis was true. The only exception is that I'm not clear whether the number of sessions was fixed in advance for every experiment.

But I would be interested in hearing other views.
 
Yes, I looked at that when he posted it before. It didn't seem to me that their criticisms of Bem's paper made much sense, though. Do you think they're justified?
The criticisms are justified, in that they identified multiple hypotheses which would have been reported, contingent upon the data (they gave a number of examples). Under those circumstances, an inflated false-positive rate doesn't depend upon whether Bem felt that he was only testing one hypothesis. This is important because it gives researchers the false impression that if they are sincere in their efforts, the concern over researcher degrees of freedom and what-not is not applicable to them.

Linda
 
The criticisms are justified, in that they identified multiple hypotheses which would have been reported, contingent upon the data (they gave a number of examples). Under those circumstances, an inflated false-positive rate doesn't depend upon whether Bem felt that he was only testing one hypothesis. This is important because it gives researchers the false impression that if they are sincere in their efforts, the concern over researcher degrees of freedom and what-not is not applicable to them.

This is one of the criticisms that doesn't make sense to me.

If someone has a hypothesis, designs an experiment to test it, and tests it, how is that test invalidated by the fact that the experiment brings to light other interesting data that may not have been anticipated? Obviously any investigation of those other data is exploratory, to the extent that it wasn't planned in advance of the experiment. But that doesn't affect the test of the original hypothesis.
 
  • Like
Reactions: K9!
The Wagenmakers et al paper claims to show that some of the experiments were exploratory, I believe.

Yes (a copy of that paper can be found here - http://www.ruudwetzels.com/articles/Wagenmakersetal_subm.pdf).

To some extent this raises the same point. Some of Bem's commentary, quoted in that paper, clearly is exploratory. But Bem says that each of the nine experiments was designed to test a predetermined hypothesis. If that's true, then the tests of those hypotheses weren't exploratory.

The other criticism they make is that in one experiment, two alternative transformations were applied to the response times, but no results were given for the untransformed (raw) data. The implication is that the untransformed data may not have been significant. But what Bem says is that it's standard in such experiments to apply a transform, and that what he did was to apply two alternatives to demonstrate that the results were relatively insensitive to the choice of the transform. Again, it seems to me that that criticism holds water only if Bem is lying about his reason for applying the transforms.
 
The only exception is that I'm not clear whether the number of sessions was fixed in advance for every experiment.

Maybe I'm missing something, but at the moment I can't see a statement that the number of sessions (= number of participants) was fixed in advance. Bem says in a footnote to his description of Experiment 1, "I set 100 as the minimum number of participants/sessions for each of the experiments reported in this article because [of the typical effect sizes reported in the literature]". But as reported, one experiment had only 50 sessions, two had 150 and one had 200.

In Experiment 2, the protocol was different for the final 50 participants. Alcock commented, "Again, given the inherent unreasonableness of changing the procedure in an ongoing experiment, one cannot help but wonder if two separate experiments were run and then combined after neither produced significant results on its own." Bem's original paper says "The results from the last 50 sessions did not differ significantly from those obtained on the first 100 sessions, so all 150 sessions were combined for analysis." Which isn't necessarily inconsistent with Alcock's suggestion - or with the possibility that an additional 50 sessions were added after 100 that had been originally planned.

Another concern might be that the experiment with only 50 participants (Experiment 9) showed an extremely strong effect, whereas the one with 200 participants (Experiment 7) was the only one in which the results weren't statistically significant. Obviously the results couldn't be taken at face value if there were any element of continuing the experiments until significant results were obtained.
 
This is one of the criticisms that doesn't make sense to me.

If someone has a hypothesis, designs an experiment to test it, and tests it, how is that test invalidated by the fact that the experiment brings to light other interesting data that may not have been anticipated? Obviously any investigation of those other data is exploratory, to the extent that it wasn't planned in advance of the experiment. But that doesn't affect the test of the original hypothesis.
The concern would be that it affects "what is the original hypothesis?"

If the original hypothesis is somewhat open (which it must be when the experiment supports multiple hypotheses), then it will feel like your original hypothesis was "psi will be demonstrated on all trials" when the data supports that hypothesis, and it will feel like your original hypothesis was "psi will be demonstrated on erotic trials and not on neutral trials" when you data supports that hypothesis. If your original hypothesis is concrete, then there will be no option to test other hypotheses in your experiment and these issues will be moot. That is, there will be no option to measure psi on neutral trials, if your hypothesis was that erotic trials will show the effect.

So the authors are drawing attention to the fact that, regardless of whether the experimenter feels as though they only have one hypothesis in mind, the experiment could be used to test other hypotheses. The distinction shouldn't be "does the author say they only intended to test one hypothesis?" The distinction should be "does the experiment only test one hypothesis?" And again, the advantage to recognizing this is that it makes the intentions of the researcher moot and obviates concerns about questioning someone's integrity (especially in a research field which is already under suspicion).

Linda
 
Last edited:
The Wagenmakers et al paper claims to show that some of the experiments were exploratory, I believe.

~~ Paul
And then there's the problem that Bem explicitly stated that the data used (in part) in Experiments 5 and 6 was exploratory when he published some of that data 8 years prior to Feeling the Future.

Linda
 
And then there's the problem that Bem explicitly stated that the data used (in part) in Experiments 5 and 6 was exploratory when he published some of that data 8 years prior to Feeling the Future.

It would be interesting to read what he wrote. Do you have a reference, please?
 
A subsequent meta-analysis by Bem on related experiments:

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2423692

And a skeptical response. Note the comments have a detailed post by a psi researcher named Johann, and the author responds:

http://osc.centerforopenscience.org/2014/06/25/a-skeptics-review/

Thanks, but the idea of this thread was to focus on a particular criticism of Bem's original paper. I'm sure there is plenty to discuss about the meta-analyses. Maybe they should have their own thread, though.
 
If the original hypothesis is somewhat open (which it must be when the experiment supports multiple hypotheses), then it will feel like your original hypothesis was "psi will be demonstrated on all trials" when the data supports that hypothesis, and it will feel like your original hypothesis was "psi will be demonstrated on erotic trials and not on neutral trials" when you data supports that hypothesis. If your original hypothesis is concrete, then there will be no option to test other hypotheses in your experiment and these issues will be moot. That is, there will be no option to measure psi on neutral trials, if your hypothesis was that erotic trials will show the effect.

Yes, if the original hypothesis were open, it would be a problem. But Bem (in the passage quoted by Gelman and Loken) claimed the original hypothesis was quite specific. "That experiment was designed to test the hypothesis that participants could identify the future left/right position of an erotic image on the computer screen signicantly more frequently than chance." If that were true, I don't see that there would be a problem.

On the other hand, that's not quite how he put it in the original paper, where he wrote "the main psi hypothesis was that participants would be able to identify the position of the hidden erotic picture significantly more often than chance ..." But the design of the experiment obviously allowed for additional comparisons.
 
Yes, if the original hypothesis were open, it would be a problem. But Bem (in the passage quoted by Gelman and Loken) claimed the original hypothesis was quite specific. "That experiment was designed to test the hypothesis that participants could identify the future left/right position of an erotic image on the computer screen signicantly more frequently than chance." If that were true, I don't see that there would be a problem.

We can ignore what Bem says. The question is, "does the experiment test more than one hypothesis?" Examples were given of multiple hypotheses which were tested by the experiment(s).

Linda
 
We can ignore what Bem says. The question is, "does the experiment test more than one hypothesis?" Examples were given of multiple hypotheses which were tested by the experiment(s).

But the experiment itself can't test hypotheses, can it? That's up to the experimenter. So we very much can't ignore what the experimenter says about what hypothesis the experiment was intended to test.
 
Back
Top