Discussion on Burden of Proof

Neil

New
I am new to the forums and I wanted to make a few observations on a couple topics I have heard many times on the show and in the book. Please note that these are my opinions for what they may be worth and since I am new to the forums I hope this is not out of line since I am not familiar with everything yet.

First I would like to address a common claim made by skeptics that I have heard many times on the show, but I will use a quote from Matt Dillahunty posted in this forum in response to Alex:

Matt Dillahunty said:
But pointing out that a claim hasn't met it's burden of proof and cannot rationally be considered "true" is NOT the same as claiming that the claim is false.

I take issue with the comment that something like psi has not met its burden of proof. Aside from the use of the term "proof," the problem with this statement is it is not quantified in any way, which always leaves the option for the skeptic making this statement to subjectively dismiss evidence since there was not a specified threshold for what would constitute "proof." In science this type of requirement for what constitutes an effect is quantified, as in it has at least a p-value of <0.05 or 0.01 or whatever, or a 3 sigma constitutes and effect and 5 sigma constitutes a discovery.

Why do skeptics almost never quantify what constitutes "proof" for the evidence? Well, in a way, Ray Hyman did during the Ganzfeld debates with Charles Honorton, which was laid out in their 'joint communique' (Hyman and Honorton, 1986. Joint Communique: The Psi Ganzfeld Controversy.), but this was more about methodology than quantifying effect sizes, and in the same paper, it was written:

"We agree that there is an overall significant effect in this data base that cannot reasonably be explained by selective reporting or multiple analysis. We continue to differ over the degree to which the effect constitutes evidence for psi, but we agree that the final verdict awaits the outcome of future experiments conducted by a broader range of investigators and according to more stringent standards."

So when the results of the autoganzfeld experiments came out, which addressed the supposed methodological issues Hyman identified, and the results were significant with a p-value of 0.00005 and a Cohen's h (effect size) of 0.20 (which is considered a "small effect" by convention), which Jessica Utts points out that "the effect size observed in the ganzfeld data base is triple the much publicized effect of aspririn on heart attacks," which was considered very significant (Utts, 1991. Replication and Meta-Analysis in Parapsychology. Statistical Science.), did this result in Hyman conceding that this constitutes evidence for psi? Hyman states:

"Honorton's experiments have produced intriguing results. If independent laboratories can produce similar results with the same relationships and with the same attention to rigorous methodology, then parapsychology may indeed have finally captured its elusive quarry." (Comment in Statistical Science, 1991, pg 392).

Even though the 10 autoganzfeld experiments met the criteria set, and the effect was essentially the same effect demonstrated in the previous experiments, and there was no file-drawer effect, Hyman still would not say that this constitutes evidence for psi. Why not? The autoganzfeld experiments replicated the past database of Ganzfeld studies, demonstrating it was not methodological errors that resulted in the effect. The previous data base had an effect size of 6.60 sigma (remember 5 sigma is a discovery in physics) and a p-value of 3.37 X 10^-11 (Rosenthal, 1986. Meta-Analytic procedures and the Nature of Replication: The Ganzfeld Debate. Journal of Parapsychology.). By any other area of science, this would demonstrate an effect, but Hyman would not admit this. Why cannot skeptics quantify what constitutes evidence? Without quantifying what constitutes evidence, there is continual moving of the goal posts, so that there is an always an out.

This is related to what Alex refers to with Carl Sagan's claim that "Extraordinary claims require extraordinary evidence." Alex has rightfully pointed out the falsity of this statement, and the reason is that nothing here is quantified. What defines an extraordinary claim? And more importantly, quantify what constitutes extraordinary evidence. This is never done, and always leaves an out for skeptics to continue to say "it hasn't met the burden of proof." That's not science.

This is related to abuse of Bayesian reasoning, where a prior probability is used in analysis of data. The major problem is that calculating a prior probability is very subjective, and on top of that, all factors are not considered. If someone uses a prior that says they feel 99.99999999999999999% sure that psi doesn't exist, no amount of evidence (realistically) will then convince them that there is an effect.

The supposedly justified reasoning for this prior probability is essentially based on the fact that psi does not fit into our neuroscientific understanding of brain function and that it does not fit into our current understanding of physics. This is an incomplete and highly biased method for calculating a prior probability.

In calculating a prior, it should be considered that our understanding of consciousness is not only incomplete, but we do not even have a proposed mechanism for consciousness (in neuroscience). We are almost clueless when it comes to what consciousness is, and in philosophy of mind there is reason to think that the neuroscience method may never find a mechanism since we are dealing with strong emergence. Neuroscience may say consciousness is explained at some point, but only based on their narrow and incomplete definition of consciousness (since neuroscience, by definition, considers consciousness to be brain processes). Feyerabend points out that it is unreasonable to require a new theory to match the old theory. This is obvious when we are looking at new domains of exploration, which is, in this case, the domain of consciousness. What right do we have to require that a new theory would fit into the current neuroscientific model of consciousness?

Further, we also know that our physics is incomplete. If consciousness is fundamental to the universe, then we are exploring a new domain of the universe, and we have no real justification to say that some phenomenon of a new domain is impossible or extremely unlikely based on our mathematical models of different domains. How likely was quantum theory within the Newtonian theory paradigm? By standards used to say that psi is extremely unlikely or impossible, quantum theory would have had a prior probability that was extraordinarily low as well.

Then we should also consider pessimistic meta-induction, which is essentially considering the falsity of past theories. Now what is meant by "falsity" does appear to depend on the field of research, but at least in physics, for example, the falsity of past theories is not to say that the mathematical models are false, since it is demonstrated that they are extremely good at modeling the particular domain for which they were describing. The falsity lies in the truth of the theory when exploring new domains, which highlights the problem of scientific induction described by Karl Popper. The falsity of Newtonian theory is not that it is false in calculating paths of projectiles or rockets, but that it cannot be extrapolated to very small scales such as those described by quantum theory. It is false in that quantum theory is a more fundamental theory, even though Newtonian theory still works extremely well for its particular domain. The other falsity is in the metaphysical interpretation of the theory. You really cannot separate metaphysics from physics, and the metaphysical assumptions of theories are proven wrong over and over again. For example, General Relativity theory demonstrates the falsity of the metaphysical assumption of the absolute independent nature of space and time of Newtonian theory, and quantum theory further falsifies the metaphysical interpretation of what matter is that is found in Newtonian theory. History has demonstrated over and over again that our metaphysical interpretations are seemingly always proven wrong, and exploration of new domains uncovers more fundamental aspects of the universe that demonstrate previous theories being approximations only useful for particular domains, not absolute laws of nature.

So the point is that when exploring new domains such as consciousness, we have to seriously consider these factors. We should also consider past data, that would suggest the presence of psi. Even if one were not to consider past evidence as demonstrating psi, it at least has to be considered that we are attempting to replicate an effect in previous experiments. Skeptics would claim that there is no evidence, or at least not good evidence, but when statisticians come in to settle these debates (such as Jessica Utts), they come out saying that there is some sort of effect that needs explanation. Based on the patterns found, replicated, and predictions made and confirmed, this effect, such as in the Ganzfeld database, seems highly unlikely to be simply a lack of understanding of randomness and methods of analysis. In fact, if we do not understand randomness in systems or how to analyze, that would pose very serious problems for many areas of science.

So considering our incomplete theories in physics, the fact that neuroscience does not even have a theory of consciousness, the problems of scientific induction (taking a model and making it universal), pessimistic meta-induction of the falsity of past theories, and previous data, what is the prior probability of something like psi? These really can't be quantified, and the prior probability still remains subjective, but the point is that the origin of the "extraordinary claims require extraordinary evidence" is unscientific since it is not quantified in any way, and really relies on highly subjective prior probabilities in Bayesian reasoning that demonstrates confirmation bias more than anything.

So in the end, I think Alex should press skeptics to quantify what constitutes this "burden of proof" or "extraordinary evidence." At least that would make some sort of progress and hold them accountable so there is not continual moving of the goal posts.
 
Level of proof is mostly about level of evidence and has little to nothing to do with p-values. One way to address this is to use Bayes' factors, which measure the strength of the evidence for or against competing hypotheses. Then there is no need to take subjective prior probabilities into account.

Realistically, I think following the lead of evidence-based medicine and performing studies under conditions which are at a low risk of bias will get parapsychologists the acceptance they are looking for. For what is meant by "low-risk of bias" see this:

http://hiv.cochrane.org/sites/hiv.cochrane.org/files/uploads/Ch08_Bias.pdf

Linda
 
Level of proof is mostly about level of evidence and has little to nothing to do with p-values. One way to address this is to use Bayes' factors, which measure the strength of the evidence for or against competing hypotheses. Then there is no need to take subjective prior probabilities into account.

Realistically, I think following the lead of evidence-based medicine and performing studies under conditions which are at a low risk of bias will get parapsychologists the acceptance they are looking for. For what is meant by "low-risk of bias" see this:

Linda

Hi Linda, thank you for this information. I will need to take some time to read it.

Regarding the use of Bayes' factor, I think this still leaves room for error with respect to analysis of parapsychology data. In particular, Wagenmakers, et al (2011, Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi) used a seemingly arbitrary effect size in the Bayes factor analysis, since it appears they assumed that there was no prior knowledge of the likely effect size that the experiment was designed to detect. This paper was a critique of Bem's paper published on presentiment, in particular, a critique of his use of statistical analysis.

Their arbitrarily high effect size did not match prior experimental effect sizes of similar experiments, which resulted in an inaccurate conclusion. While the Bayes factor analysis may itself be objective, it does leave room for skeptics to claim that there is no prior evidence for these psi effects, so they can choose their own effect size, resulting in essentially a subjective aspect that results in bias.
 
Hi Linda, thank you for this information. I will need to take some time to read it.

Regarding the use of Bayes' factor, I think this still leaves room for error with respect to analysis of parapsychology data. In particular, Wagenmakers, et al (2011, Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi) used a seemingly arbitrary effect size in the Bayes factor analysis, since it appears they assumed that there was no prior knowledge of the likely effect size that the experiment was designed to detect. This paper was a critique of Bem's paper published on presentiment, in particular, a critique of his use of statistical analysis.

Their arbitrarily high effect size did not match prior experimental effect sizes of similar experiments, which resulted in an inaccurate conclusion. While the Bayes factor analysis may itself be objective, it does leave room for skeptics to claim that there is no prior evidence for these psi effects, so they can choose their own effect size, resulting in essentially a subjective aspect that results in bias.

Wagenmakers used a standard procedure which makes no assumptions about the effect size. I don't think it would hurt for parapsychologists to use standard procedures, rather than waste their time arguing for their use of techniques which make assumptions in their favour. Other scientists are used to making conservative assumptions, anyway (if you are trying to get a handle on where goal posts are typically set).

Linda
 
Wagenmakers used a standard procedure which makes no assumptions about the effect size. I don't think it would hurt for parapsychologists to use standard procedures, rather than waste their time arguing for their use of techniques which make assumptions in their favour. Other scientists are used to making conservative assumptions, anyway (if you are trying to get a handle on where goal posts are typically set).

Linda

According to Utts, et al, they did make an assumption of the effect size used in the Bayes factor analysis:

"Because the Bayes factor is independent of the prior odds, many mistakenly believe that it constitutes an objective assessment of the experimental results, uncontaminated by subjective beliefs. But this is not true because the Bayes factor depends on the specification of H1.

Accordingly, our second objection to Wagenmakers et al.’s analysis is that their choice of H1 is unrealistic. Specifically, they assume that we have no prior knowledge of the likely effect sizes that the experiments were designed to detect."

...

In general, we know that effect sizes in psychology typically fall in the range of 0.2 to 0.3. For example, Bornstein’s (1989) meta-analysis of 208 mere exposure experiments—the basis of Bem’s retroactive habituation experiments—yielded an effect size (r) of 0.26. We even have some knowledge about previous psi experiments. The meta-analysis of 56 telepathy studies, cited above, revealed a Cohen’s h effect size of approximately 0.18 (Utts et al., 2010), and a meta- analysis of 38 “presentiment” studies—from which Bem’s experiments 1 and 2 derived—yielded a mean effect size of 0.26 (Mossbridge, Tressoldi, and Utts, 2011).

Surely no reasonable observer would expect effect sizes in laboratory psi experiments to be greater than 0.8—what Cohen (1988) calls a large effect. (Cohen notes that even a medium effect of 0.5 “is large enough to be visible to the naked eye” [p. 26].) Yet the “default prior” that Wagenmakers et al. (2011) use (known as the standard Cauchy distribution) has probability 0.57 that the absolute value of the effect size exceeds 0.8. It even places probability of 0.12 on effect sizes with absolute values exceeding 5.0, and probability of 0.06 on effect sizes with absolute values exceeding 10! If the effect sizes were really that large, there would be no debate about the reality of psi. Thus, the prior distribution they have placed on the possible effect sizes under H1is wildly unrealistic."

Bem, Utts, and Johnson, 2011. Must Psychologists Change the Way They Analyze Their Data?
A Response to Wagenmakers, Wetzels, Borsboom, & Van der Maas
 
According to Utts, et al, they did make an assumption of the effect size used in the Bayes factor analysis:

"Because the Bayes factor is independent of the prior odds, many mistakenly believe that it constitutes an objective assessment of the experimental results, uncontaminated by subjective beliefs. But this is not true because the Bayes factor depends on the specification of H1.

But you can specify H1 in a way which makes no assumptions, which is typical procedure in these cases, and which is what Wagenmakers did.

Accordingly, our second objection to Wagenmakers et al.’s analysis is that their choice of H1 is unrealistic. Specifically, they assume that we have no prior knowledge of the likely effect sizes that the experiments were designed to detect."

Right, they regard it as unrealistic to make no assumptions about H1. If you make no assumptions, then your Bayes Factor reflects the extent to which H1 is supported. If you make assumptions, then BF begins to reflect your prejudices rather than the support for H1 per se. As I said earlier, I don't think it would hurt for parapsychologists to make the same kinds of conservative assumptions as scientists are used to making in other fields, if they want to achieve the same kinds of goals.

Linda
 
But you can specify H1 in a way which makes no assumptions, which is typical procedure in these cases, and which is what Wagenmakers did.



Right, they regard it as unrealistic to make no assumptions about H1. If you make no assumptions, then your Bayes Factor reflects the extent to which H1 is supported. If you make assumptions, then BF begins to reflect your prejudices rather than the support for H1 per se. As I said earlier, I don't think it would hurt for parapsychologists to make the same kinds of conservative assumptions as scientists are used to making in other fields, if they want to achieve the same kinds of goals.

Linda

I don't see how their selection of the effect size has any relation to either typical effect sizes in psychology research or effect sizes seen in prior presentiment experiments.

If the effect size used reflects these typical effect sizes (even from psychology if one refuses to accept parapsychology research), then the Bayesian analysis shows evidence for psi.

If there is such a dramatic difference between a Bayesian and frequent its analysis using the same data, something seems rather wrong. In this case it indeed appears that both biased priors and unrealistic effect sizes for the Bayes factor are at play.
 
I don't see how their selection of the effect size has any relation to either typical effect sizes in psychology research or effect sizes seen in prior presentiment experiments.

If the effect size used reflects these typical effect sizes (even from psychology if one refuses to accept parapsychology research), then the Bayesian analysis shows evidence for psi.

That illustrates the problem - if you make assumptions about the effect size which is based on the research you are trying to evaluate in terms of its strength, you get a higher Bayes Factor than if you don't make any assumptions and let the strength of the evidence speak for itself. So your Bayes Factor becomes a measure of your assumptions, rather than a measure of the strength of the evidence. I realize that parapsychologists prefer to have the Bayes Factor act as a measure of their assumptions. But if we go back to your original question, what non-proponent scientists are looking for in terms of goal-posts is a measure of the strength of the evidence.

If there is such a dramatic difference between a Bayesian and frequent its analysis using the same data, something seems rather wrong. In this case it indeed appears that both biased priors and unrealistic effect sizes for the Bayes factor are at play.

The complaint from Utts et. al. is essentially about Wagenmakers failing to use a biased prior/effect size.

Linda
 
That illustrates the problem - if you make assumptions about the effect size which is based on the research you are trying to evaluate in terms of its strength, you get a higher Bayes Factor than if you don't make any assumptions and let the strength of the evidence speak for itself. So your Bayes Factor becomes a measure of your assumptions, rather than a measure of the strength of the evidence. I realize that parapsychologists prefer to have the Bayes Factor act as a measure of their assumptions. But if we go back to your original question, what non-proponent scientists are looking for in terms of goal-posts is a measure of the strength of the evidence.

The complaint from Utts et. al. is essentially about Wagenmakers failing to use a biased prior/effect size.

Linda

Hi Linda,

Thanks for taking the time to respond to all this. You have a lot more knowledge regarding this matter than I do, so I apologize for not understanding this.

When selecting the effect size to use, I don't understand where Wagenmakers, et al came up with the large effect size. Where did this number come from?
 
Hi Linda,

Thanks for taking the time to respond to all this. You have a lot more knowledge regarding this matter than I do, so I apologize for not understanding this.

When selecting the effect size to use, I don't understand where Wagenmakers, et al came up with the large effect size. Where did this number come from?
They did not use a large effect size. The complaint is that they did not exclude the possibility of a large effect size.

Linda
 
By any other area of science, this would demonstrate an effect, but Hyman would not admit this. Why cannot skeptics quantify what constitutes evidence? Without quantifying what constitutes evidence, there is continual moving of the goal posts, so that there is an always an out.

You are presenting a biased view of the Ganzfeld experiments and ignoring the recent skeptical literature. Hyman didn't "admit" the effect because there was no effect, the results were due to sensory leakage problems, experimental bias etc. Unfortunately most psi believers don't read Hyman's more recent writings. They cherry pick and quote mine his old stuff only. Here is recent Ray Hyman on the autoganzfeld:

The most suspicious pattern was the fact that the hit rate for a given target increased with the frequency of occurrence of that target in the experiment. The hit rate for the targets that occurred only once was right at the chance expectation of 25%. For targets that appeared twice the hit rate crept up to 28%. For those that occurred three times it was 38%, and for those targets that occurred six or more times, the hit rate was 52%. Each time a videotape is played its quality can degrade. It is plausible then, that when a frequently used clip is the target for a given session, it may be physically distinguishable from the other three decoy clips that are presented to the subject for judging

Ray Hyman. Evaluating Parapsychological Claims in Robert J. Sternberg, Henry L. Roediger, Diane F. Halpern. (2007). Critical Thinking in Psychology. Cambridge University Press. pp. 216-231.

According to David Marks:

Wiseman and his colleagues identified various different ways in which knowledge of the target could have been leaked to the experimenter. These included cues from the videocassette recorder and sounds from the sender who, of course, knew the target's identity.

Marks, David; Kammann, Richard. (2000). The Psychology of the Psychic. Prometheus Books. pp. 97-106.

Marks also mentions in his book that none of the rooms in the autoganzfeld were soundproof and the experimenter sat only fourteen feet from the experimenter's room. Fraud and sensory leakage was not ruled out. But even if the results were successful, statistical deviation from chance is not evidence for anything 'paranormal'. Flaws in the experimental design are a common thing in parapsychology experiments, and so the assumption that it must be magic is fallacious.

Most importantly science is based on repeatability. Only parapsychologists in parapsychology rooms claimed successful results in Ganzfeld, not neutral scientists or a significant population of the scientific community.

As Paul Kurtz wrote back in 1981:

If parapsychologists can convince the skeptics, then they will have satisfied an essential criterion of a genuine science: the ability to replicate hypotheses in any and all laboratories and under standard experimental conditions. Until they can do that, their claims will continue to be held suspect by a large body of scientists.

Paul Kurtz. Is Parapsychology a Science?. In Kendrick Frazier. (1981). Paranormal Borderlands of Science. Prometheus Books. pp. 5-23.

We are now in 2015, situation has not changed. No parapsychological experiment has been independently replicated and parapsychology has not made any testable predictions, none - nothing.

But before you accuse me of being a materialist or 'pseudoskeptic' like the usual ad-homimem used here, like Martin Gardner a great debunker of pseudoscience and many other men and women on the planet I believe in God and an afterlife, but obviously these things can not be demonstrated by the scientific method. Deities, afterlife, paranormal powers, magical beliefs there is not a shred of empirical scientific evidence for, they are something to believe in. Beyond and outside science. We must turn to philosophy or religion for these things. Regards.
 
Marks also mentions in his book that none of the rooms in the autoganzfeld were soundproof and the experimenter sat only fourteen feet from the experimenter's room.

Nope, Marks says that the experimenter sat only fourteen feet from the sender's room. Your quote came from Wikipedia which, like most encyclopaedias, is at its weakest when dealing with controversial subjects. If you're going to argue that others are not up to speed with skeptical/proponent literature, it would be nice if you demonstrated that you are.
 
But before you accuse me of being a materialist or 'pseudoskeptic' like the usual ad-homimem used here, like Martin Gardner a great debunker of pseudoscience and many other men and women on the planet I believe in God and an afterlife, but obviously these things can not be demonstrated by the scientific method. Deities, afterlife, paranormal powers, magical beliefs there is not a shred of empirical scientific evidence for, they are something to believe in. Beyond and outside science. We must turn to philosophy or religion for these things. Regards.

Andrews,

Thank you for your detailed response. Because of the detail I will need some time to research and respond. I appreciate the criticism.
 
Andrews,

Regarding my main point, has Hyman ever quantified what would constitute sufficient evidence for a psi effect?
 
Nope, Marks says that the experimenter sat only fourteen feet from the sender's room. Your quote came from Wikipedia which, like most encyclopaedias, is at its weakest when dealing with controversial subjects. If you're going to argue that others are not up to speed with skeptical/proponent literature, it would be nice if you demonstrated that you are.

I was about to say the same, since he seems to be arguing that the "receiver" sat 14 feet away from the experimenter. Also, I do find it mildly funny that we are talking about "cherry picking" Hyman quotes, since he was/is a world class cherry picker himself.
 
You are presenting a biased view of the Ganzfeld experiments and ignoring the recent skeptical literature. Hyman didn't "admit" the effect because there was no effect…

Hyman did indeed admit to an effect in the joint communique written with Charles Honorton:

We [Honorton and Hyman] agree that there is an overall significant effect in this data base that cannot reasonably be explained by selective reporting or multiple analysis. We continue to differ over the degree to which the effect constitutes evidence for psi, but we agree that the final verdict awaits the outcome of future experiments conducted by a broader range of investigators and according to more stringent standards.

(Hyman and Honorton, 1986. Joint Comminque: The psi ganzfeld controversy. Journal of parapsychology) Note: emphasis added.

Regarding the autoganzfeld experiments, Hyman wrote the following:

I commend Honorton and his colleagues (1990) for creating a protocol that eliminates most of the flaws that plagued the original ganzfeld experiments. The 11 autoganzfeld studies consistently yield positive effects that, taken together, are highly significant. (pg 19)

The hit rates are positive and consistent across the studies and experimenters. (pg 20)

(Hyman, 1994. Artifact or Anomaly? Comments on Bem and Honorton. Psychological Bulletin, vol. 115, No. 1, 19-24) Note: emphasis added


Andrews said:
…the results were due to sensory leakage problems, experimental bias etc. Unfortunately most psi believers don't read Hyman's more recent writings. They cherry pick and quote mine his old stuff only.

It is odd that you first state that Hyman didn’t admit to the effect because “there was no effect,” yet in the same sentence you state that the results were due to sensory leakage. If there were no results, then there is nothing to explain by sensory leakage.

Further, the previous quote from the joint communique from Honorton and Hyman did not state that the results were due to sensory leaking. They agreed that replication depends on future experiments “according to more stringent standards.” Hyman has a hypothesis that the results could be due to some sort of methodological error, but this is not the same as concluding that the effect was due to sensory leakage. A hypothesis such as this requires experimentation, which is what was proposed in the joint comminque written by both Honorton and Hyman.

Further, Hyman’s analysis of the pre-autoganzfeld database has been refuted by the following:

1. Psychometrician David Saunders concluded that “the entire analysis is meaningless,” referring to Hyman’s analysis of the database. (Saunders, 1985. On Hyman’s factor analysis. Journal of Parapsychology)

2. Harris and Rosenthal concluded that “Our analysis of the effects of the flaws on study outcome lends no support to the hypothesis that ganzfeld research results are a significant function of the set of flaw variables.” (Harris and Rosenthal, 1988. Postscript to Interpersonal Expectancy Effects and Human Performance Research)

3. Statistics professor Jessica Utts stated the following: “I do not think there is any evidence that the experimental results were due to the identified flaws.” (Utts, 1991. Replication and Meta-Analysis in Parapsychology, Rejoinder. Statistical Significance)

In fact, Hyman himself has published statements that are rather contradictory to the claim that “the results were due to sensory leakage problems, experimental bias etc”:

Are these findings due to an artifact, or do they point to some new, hitherto unrecognized property of psi? We cannot say. The existence of this patter in the database, however, strongly supports the need to replicate the findings before we can be confident that the parapsychologists have finally found a way to capture and tame their elusive quarry.

(Hyman, 1994. Artifact or Anomaly? Comments on Bem and Honorton. Psychological Bulletin, vol. 115, No. 1, 19-24) Note: emphasis added


Andrews said:
Here is recent Ray Hyman on the autoganzfeld:

The most suspicious pattern was the fact that the hit rate for a given target increased with the frequency of occurrence of that target in the experiment. The hit rate for the targets that occurred only once was right at the chance expectation of 25%. For targets that appeared twice the hit rate crept up to 28%. For those that occurred three times it was 38%, and for those targets that occurred six or more times, the hit rate was 52%. Each time a videotape is played its quality can degrade. It is plausible then, that when a frequently used clip is the target for a given session, it may be physically distinguishable from the other three decoy clips that are presented to the subject for judging. Ray Hyman. Evaluating Parapsychological Claims in Robert J. Sternberg, Henry L. Roediger, Diane F. Halpern. (2007). Critical Thinking in Psychology. Cambridge University Press. pp. 216-231.

Daryl Bem has responded to this claim as follows:

If this finding is reliable and not just a fluke of post hoc exploration, then it is difficult to interpret because target repetition is confounded with the chronological sequence of sessions: Higher repetitions of a target necessarily occur later in the sequence than lower repetitions. In turn, the chronological sequence of sessions is confounded with several other variables, including more experienced experimenters, more “talented” receivers (e.g. Julliard students and receivers being retested because of earlier successes), and methodological refinements introduced in the course of the program in an effort to enhance psi performance (e.g. experimenter “prompting”).

Again, however, Hyman’s major concern is that this pattern might reflect an interaction between inadequate target randomization and possible response biases on the part of those receivers or experimenters who encounter the same judging set more than once. This seems highly unlikely. In the entire database, only 8 subjects saw the same judging set twice, and none of them performed better on the repetition than on the initial session.

At the end of his discussion, Hyman wonders whether this relationship between target repetition and hit rates is “due to an artifact or [does it] point to some new, hitherto unrecognized property of psi?”

(Bem, 1994. Response to Hyman. Psychological Bulletin vol. 115, No. 1, 25-27)

I will also repeat a published statement from Hyman regarding this matter:

Are these findings due to an artifact, or do they point to some new, hitherto unrecognized property of psi? We cannot say. The existence of this patter in the database, however, strongly supports the need to replicate the findings before we can be confident that the parapsychologists have finally found a way to capture and tame their elusive quarry.

(Hyman, 1994. Artifact or Anomaly? Comments on Bem and Honorton. Psychological Bulletin, vol. 115, No. 1, 19-24) Note: emphasis added

The point is that there are observations and hypotheses made. A hypothesis requires testing and replication, and an observation of a pattern such as this certainly does not constitute evidence that this is the cause of the effects seen in the ganzfeld and autoganzfeld experiments. A possible explanation needs testing, and simply suggesting a possible explanation does not constitute an actual explanation for the effect seen in the database.

Andrews said:
According to David Marks:

Wiseman and his colleagues identified various different ways in which knowledge of the target could have been leaked to the experimenter. These included cues from the videocassette recorder and sounds from the sender who, of course, knew the target's identity.

Marks, David; Kammann, Richard. (2000). The Psychology of the Psychic. Prometheus Books. pp. 97-106.

Again, a hypothesis of a possible explanation does not constitute an actual explanation of the effects seen in the ganzfeld database. These are hypothesis that require testing. What experiments have been done by Wiseman to test these hypotheses? When analysis is done on the database, they are scored for quality based on methodology, and then analysis can be done to see if the scoring is inversely proportional to methodological errors. I will reiterate comments from the following regarding this matter:

1. Harris and Rosenthal concluded that “Our analysis of the effects of the flaws on study outcome lends no support to the hypothesis that ganzfeld research results are a significant function of the set of flaw variables.” (Harris and Rosenthal, 1988. Postscript to Interpersonal Expectancy Effects and Human Performance Research)

2. Statistics professor Jessica Utts stated the following: “I do not think there is any evidence that the experimental results were due to the identified flaws.” (Utts, 1991. Replication and Meta-Analysis in Parapsychology, Rejoinder. Statistical Significance)



Andrews said:
Marks also mentions in his book that none of the rooms in the autoganzfeld were soundproof and the experimenter sat only fourteen feet from the experimenter's room. Fraud and sensory leakage was not ruled out.

I will quote Jessica Utts in a comment relevant to this:

…Professor Hyman, a skeptic as well as an accomplished magician, participated in the specification of design criteria, and mentalists Bem and Kross observed experimental sessions. Bem is also a well-respected experimental psychologist.

(Utts, 1991. Replication and Meta-Analysis in Parapsychology. Statistical Science.)

Andrews said:
But even if the results were successful, statistical deviation from chance is not evidence for anything 'paranormal'. Flaws in the experimental design are a common thing in parapsychology experiments, and so the assumption that it must be magic is fallacious.

This is another hypothesis that requires testing. The hypothesis is that flaws are not only common, but more importantly that they are responsible for the effects seen in the database. There is a way to test this, and I hate to use this for a third time, but since this flaw accusation keeps coming up, this should be repeated as well:

1. Harris and Rosenthal concluded that “Our analysis of the effects of the flaws on study outcome lends no support to the hypothesis that ganzfeld research results are a significant function of the set of flaw variables.” (Harris and Rosenthal, 1988. Postscript to Interpersonal Expectancy Effects and Human Performance Research)

2. Statistics professor Jessica Utts stated the following: “I do not think there is any evidence that the experimental results were due to the identified flaws.” (Utts, 1991. Replication and Meta-Analysis in Parapsychology, Rejoinder. Statistical Significance)



Andrews said:
Most importantly science is based on repeatability. Only parapsychologists in parapsychology rooms claimed successful results in Ganzfeld, not neutral scientists or a significant population of the scientific community.

Even within the ganzfeld database, there has been very good replication. Utts (1991) has stated that:

Consonant with the definition of replication based on consistent effect sizes, it is informative to compare the autoganzfeld experiments with the direct hit studies in the previous data base. The overall success rates are extremely similar.

As well as the following in the same paper:

…the arguments used to conclude that parapsychology has failed to demonstrate a replicable effect hinge on these misconceptions of replication and failure to examine power.

Regarding the comment on a significant population of the scientific community, I must say that this is an odd requirement, since parapsychology is a specific scientific discipline, which has its own specific journals, just like any other branch of science, and are subject to peer review and criticism. If you read parapsychology journals, they publish a great deal of criticisms, including those by Ray Hyman which has done a great deal to improve methodology. What other specialized branch of science requires “a significant population of the scientific community” to determine whether their experiments are valid or not? This is not how science is practiced in any specialized discipline.

Further, this is, within the current sociological state of science, an impossible requirement because of the taboo placed on parapsychology research. Not only is it impossible, but it is simply not how science works in any other specialized discipline.


Andrews said:
We are now in 2015, situation has not changed. No parapsychological experiment has been independently replicated and parapsychology has not made any testable predictions, none - nothing.

Given a proper understanding of replication within the nature of the scientific discipline in question, Utts (1991) has shown that there is a great deal of replication in parapsychology, and in the same paper, states that:

…the arguments used to conclude that parapsychology has failed to demonstrate a replicable effect hinge on these misconceptions of replication and failure to examine power.

Further, Baptista and Derakhshani state the following:

This suggests that ganzfeld studies elicit the same level of consistency that is expected given the characteristics of those studies, and that they are replicable insofar as we can make predictions about their probability of success and have them verified. The evidence that psi effects, at least in the ganzfeld, lawfully follow the predictions of conventional statistical models to a degree that is conductive to scientific exploration.



All of these power values—from the average power in social psychology[20%] the mean power for small effects in psychology [17%], and the median power for neuroscience studies [18%]—fail to meet the average power for the ganzfeld study conservatively calculated at 30%, for all 105 studies in Storm et al (2010). Considering just the recently gathered 30 ganzfeld studies from 1997-2008 (Storm et al., 2010), the average power is actually higher, at approximately 43%.



In the face of these reproducibility estimates [on cancer, women’s health, and CVD of 25% and 11%], we argue that for any area of parapsychology to achieve a replication rate of 25% to 30% to 37%--the proportion of significant results in the post-PRL, the whole ganzfeld, and the most recent studies, respectively (Storm et al., 2010)—which we have shown to be comparable to other sciences; is in fact quite remarkable, given that the total human and financial resources devoted to psi research from 1882 to 1993 has been estimated to comprise less than two months’ research in conventional psychology (Schouten, 1993, p. 316).

Baptista and Derakhshani, 2014. Beyond the Coin Toss: Examining Wiseman’s Criticisms of Parapsychology. Journal of Parapsychology.)
 
Last edited:
They did not use a large effect size. The complaint is that they did not exclude the possibility of a large effect size.

Linda

Linda,

Thank you. I now see what you mean with this. Utts et al. states that:

Yet the "default prior that Wagenmakers et al. (2011) use has a probability 0.57 that the absolute effect size exceeds 0.8. It even places a probability of 0.12 on effect sizes with absolute values exceeding 5.0, and probability of 0.06 on effect sizes with absolute values exceeding 10! If the effect sizes were really that large, there would be no debate about the reality of psi. Thus, the prior distribution they have placed on the possible effect sizes under H1 is wildly unrealistic.

Bem, Utts, and Johnson, 2011. Must Psychologists Change the Way They Analyze Their Data? A Response to Wagenmakers, Wetzels, Borsbroom, & Van der Maas.)

It seems to me that these effect sizes are very unreasonable, especially considering that their assessment using a much more reasonable effect size distribution ends up not being at dramatic odds with the frequentist methods used. I think their point that effect sizes in experimental psychology being 0.2-0.3 also supports this. I guess I also am considering that Utts and Johnson are both statistics professors. Not that it makes them correct, but when it comes to mathematics, I do put a bit more weight on authority due to the nature of the field.
 
I appreciate the criticisms received, but I think I want to come back to what my main point really is, and that is that skeptics do not quantify what will constitute sufficient evidence for psi.

Jessica Utts brings this up in a rejoinder to her article:

During the six years that I have been working with parapsychologists, they have repeatedly expressed their frustration with the unwillingness of the skeptics to specify what would constitute acceptable evidence, or even to delineate criteria for an acceptable experiment. The Hyman and Honorton Joint Communique was seen as the first major step in that direction...

Honorton and his colleagues then conducted several hundred trials using these specific criteria and found essentially the same effect sizes as in earlier work for both the overall effect and effects with moderator variables taken into account. I would expect Professor Hyman to be very interested in the results of these experiments he helped to create...

Instead, Hyman seems to be proposing yet another set of requirements to be satisfied before parapsychology should be taken seriously.

(Utts, 1991. Replication and Meta-Analysis in Parapsychology, Rejoinder. Statistical Significance)

If skeptics are supposed to be following scientific methods, why can they not quantify what would constitute sufficient evidence for psi? I think this issue should be pressed by Alex since it is allowed to continue, and it continually allows the goal posts to be moved once again. That isn't scientific, and it should be pointed out.
 
Personally, I would try to steer this discussion back towards the original question. Take it from me: the ganzfeld discussion is a maze from which it takes a long time to emerge, especially since the discussion usually revolves around “he said/she said”. Using that format, you can almost always find someone to disagree with a certain point of view throughout the ganzfeld debate.

Having said that, I would like to correct two things on this thread.

First is the misconception that Hyman was somehow resonsible for the methods used by Honorton’s PRL (from Utts, as quoted above, “I would expect Professor Hyman to be very interested in the results of these experiments he helped to create...”). The Joint Communique was published in 1986 and the PRL work began in 1982 with a methodology that remained the same until it closed. I don't see how the Joint Communique could've been an influence.

Also, it should be noted that the Joint Communique asks for replications from a “broader range of investigators” (Journal of Parapsychology, Vol. 50, December 1986, p 351) so the PRL trials by themselves don't meet that requirement.
 
Linda,

Thank you. I now see what you mean with this. Utts et al. states that:

It seems to me that these effect sizes are very unreasonable, especially considering that their assessment using a much more reasonable effect size distribution ends up not being at dramatic odds with the frequentist methods used. I think their point that effect sizes in experimental psychology being 0.2-0.3 also supports this. I guess I also am considering that Utts and Johnson are both statistics professors. Not that it makes them correct, but when it comes to mathematics, I do put a bit more weight on authority due to the nature of the field.

To get back to your original question...Utts et. al. can get into arguments trying to justify favourable assumptions, or they can promote the idea of performing research whose results hold up regardless of whether or favourable or unfavourable assumptions are met. But with respect to "meeting goalposts", it is the latter which will be of interest to non-proponent scientists.

Linda
 
Back
Top