Parapsychology: Science or Pseudoscience?

I didn't claim to. I said that her analysis disagrees with Wagenmakers, which you said was "obvious." If she understands Bayesian analysis and disagrees with Wagenmakers, then how can Wagenmaker's response be "obvious"?
You've made numerous incoherent statements about prior probabilities, the Jeffreys-Lindley paradox, and you implied that Utts' Bayes analysis somehow vindicated Bem of p-hacking. If you don't understand Bayesian inference then you ought not to act like you do.
 
You've made numerous incoherent statements about prior probabilities, the Jeffreys-Lindley paradox, and you implied that Utts' Bayes analysis somehow vindicated Bem of p-hacking. If you don't understand Bayesian inference then you ought not to act like you do.
You said that the problems were "obvious" and referenced Wagenmakers. Utts disagreed with Wagenmakers. You said earlier that Utts understands Bayesian analysis. She also made the statement about the Lindley-Jeffreys paradox and nullifying effects. It's in her paper:

http://dbem.ws/ResponsetoWagenmakers.pdf

My point was that it isn't "obvious" based on this.
 
You still haven't addressed my point. All this is irrelevant to the fact that Wiseman chose, prior to the experiment, an invalid criteria for falsification.





Of course it matters. There could be a squirrel rustling in leaves outside that the dog hears and goes to check out and Wiseman with his relatively deaf human ears doesn't hear and then now rejects the entire trial even though after this the dog was in fact at the door much much more often when the owner was coming home.

The part I put in bold clearly shows you have not looked at the data.
The authors, in conjuntion with PS, seem to have made attempts to take this into account. After the first experiment, they, in discussion with Pam Smart, decided to ignore the first time, and then when Smart told them that according to her parents the dog's behaviour was really obvious when it was waiting for her as compared to when it was for some other reason, so they came up with the two minutes criteria.

It seems to me that what they are doing is trying to isolate the behaviour reported by Smart's parents. The reason being because this behaviour is what led them to consider the animal telepathic. The parents' assertion, as I understand it, is that there was specific behaviour that predicted she was coming. If the dog exhibited that behaviour at times when she was not coming home then we would say that the parent's were not identifying what they thought they were.

Note also, IIRC (its been awhile since I've read the paper in detail), for many of the times when the dog was at the window when Pam was coming home, there were other things going on in front of the window at the time (making those likely false positives). I posted a chart at some point on the old forum, where I mapped it out, not sure if its still there or was deleted.

From what I can tell, Wiseman's protocol seems to be geared at identifying the noted behaviour (which again, remember, was deemed by the parents to be obvious and different from the dog's behaviour the rest of the time) and avoiding false positives.
What you're saying is that the dog went to the window a similar number of times in both sheldrake's and wiseman's experiments (again, been awhile since I read the papers in detail so I don't recall the exact stats). You then conclude that Wiseman is fraudulent to suggest he hasn't replicated Sheldrake's and should have declared it psi.

However, I think we can say that whether there is telepathy or not replicating the pattern is just part of the analysis. The rest of the analysis is whether or not the observed pattern should be considered to indicate telepathy. That's what Wiseman is getting at. Note as well, that in their follow-up paper they pointed out that at the time they had done their experiment, Sheldrake hadn't yet published other papers. It's unfortunate because much of the analysis Sheldrake did after. They responded to it somewhat in their reply but it would have been much better if they would have been able to deal with it while they were constructing their protocols.

In the conclusion of the paper Wiseman essentially says this. Note: the actual conclusions do not state that Sheldrake's experiment was debunked but rather that he didn't take sufficient steps to rule out false positives. I agree it doesn't debunk Sheldrake's, but it suggests that more experiments should be done, using refined protocols, given that it is impossible to review the tapes.

It's fair to disagree with his protocols (note however, they seem to have been developed in conjunction with Pam Smart). But to declare it fraudulent is going far. Note the back and forth exchange where they directly address the issue of how they represented their experiments and how sheldrake did. This issue is one of a disagreement over methodology. Framing it in terms of fraud serves only to distract from the issues (I think I've said before that its possible they might have gone too far in what they said in interviews and speeches, I haven't seem them, and as I've said I give all researchers more leeway in what they say in interviews compared to what they write in their papers, it's pretty common to see researchers overstating things in this context and if we're going to call it fraud its going to apply to a heck of a lot of them!)

What you seem to be saying is, the parents might have been mistaken that certain "obvious" behaviour by the dog indicated telepathy, but that the dog might be telepathic anyway.

These experiments were early attempts and included a lot of exploration with small sample sizes. If Sheldrake had continued his work this exchange would have been just one at the start of a chain, hopefully involving other researchers as well. Unfortunately Sheldrake doesn't seem to have pursued this line, so we don't know how the experiments would evolve and develop towards confirmatory, larger scale ones.

I hope he, or someone else, does pick up the chain, continuing to develop the protocol, identify strengths and weaknesses, and proceed to higher powered confirmatory ones if warranted.

That said, I think it is important to recognize the context of these papers and the fact that the research line is still primarily in its infancy.

I do still plan on getting back to you on the quantum post, haven't had time to go through it yet.
 
If Sheldrake had any real faith in the data the inquisitive scientist in him would have replicated the studies at some point in the last 15 years. He seems frustratingly disinterested in really getting to the bottom of anything he proposes.
 
If Sheldrake had any real faith in the data the inquisitive scientist in him would have replicated the studies at some point in the last 20 years. He seems frustratingly disinterested in really getting to the bottom of anything he proposes.
He has done well over a hundred experiments with the dog.
 
You said that the problems were "obvious" and referenced Wagenmakers.
I said the the p-hacking was obvious just by reading Bem's papers. When asked to explain, I said that I could not add more to what others, including Wakernmakers, have already published (in other words, just read them).

Utts disagreed with Wagenmakers.
You are conflating "Utts" the statistician who understands Bayesian analysis, with "Utts" the paper co-authored with Bem, which denied Bem's p-hacking. Utts the statistician disagreed with Wagenmakers' Bayes analysis, which, again, had nothing to do with p-hacking; in fact, it assumed there was no p-hacking. Utts the paper denied the p-hacking, but the defense consisted of claims that only Bem himself could make, unless of course Utts were psychic, which does not have a low prior probability.

You said earlier that Utts understands Bayesian analysis. She also made the statement about the Lindley-Jeffreys paradox and nullifying effects. It's in her paper:

http://dbem.ws/ResponsetoWagenmakers.pdf
I disagree with Utts's statement of the Jeffrey's–Lindley paradox. The paradox is not that a bad Bayes analysis might contradict a good frequentist analysis, as Utts states. That's not a paradox at all. The paradox is that a good Bayes analysis can contradict a good frequentist analysis. That's a paradox (or at least something closer to one). Utts' point is simply that, in her opinion, Wagenmakers' prior distribution over the alternative was too broad and gave too little weight to small effect sizes. This has nothing to do with the Jeffreys-Lindley paradox per se.

But you are confusing the prior distribution over the alternative hypothesis with the prior probability of the alternative hypothesis. We were talking about my opinion that the prior probability that psi is true is very low. You mistakenly claimed that a low prior such as this "nullifies small to medium effect sizes." It does not. Your claim doesn't even make sense.

My point was that it isn't "obvious" based on this.
You have a knack for ambiguous use of pronouns. "It" apparently means Bem's p-hacking. "This" apparently refers to something in Utts' paper, or the paper itself. Utts' paper did two things: First, it presented a Bayesian meta-analysis that is completely orthogonal to the issue of p-hacking. Second, it presented Bem's defense of the p-hacking allegations. Unfortunately, Bem's rebuttal to the p-hacking allegation failed to address the specific allegations made by Wagenmaker, and hence are unconvincing. Furthermore, in their rejoinder, Wagenmaker et al presented quantitative evidence of a negative relation between effect size and number of participants among Bem's experiments, a relationship that can only be explained by exploiting researcher degrees of freedom (including optional stopping).
 
Last edited:
He has done well over a hundred experiments with the dog.
Can you speculate on why when Jaytee was left on his own (without Pam's parents or sister - 50 of those experiments) he seemed to perform much worse?

Perhaps you could also exlplain why these trials get a more cursory treatment in the results section?
 
Last edited:
C

Chris

Francis's test doesn't reject the psi hypothesis; it indicates that the experimental results aren't valid evidence for the hypothesis. That is, it's the evidence that Francis's results say we should reject, not the hypothesis.
Not "the psi hypothesis", but "a psi hypothesis". My point is that Francis is testing a hypothesis about what the statistics should look like in the presence of a psi effect, and rejecting it (on a frequentist criterion). Obviously his calculations will depend on a number of assumptions. I think it would be instructive for one of your experienced data analysts to write them out formally, and to consider carefully how far it's really safe to assume they are fulfilled in psi experiments. Would they necessarily survive the presence of an "experimenter effect", for example?

But also the other point remains that if there really is something fundamentally wrong with Bem's experiments, Francis's analysis doesn't tell us what it is. For example, my worry in looking at the paper was whether the number of trials in each study had really been fixed in advance. If not, in principle that might explain a larger number of significant results than would otherwise be expected, even if there was a single predetermined hypothesis for each study.
 
What other criteria? I don't know, maybe what Sheldrake used,…
Sheldrake looked at whether or not JayTee spent more time at the window during the period Pam was coming home. That isn't really a criteria - it provides no indication of whether or not Pam is coming home when JayTee is at the window unless one specifies what is meant by "more". Wiseman specified (based on what Pam's parents claimed) that "more" could mean "spends at least 2 minutes by the window when he hasn't done so before" or "visits the window when he has not done so before". It's reasonable to suggest that "more" could mean something else (and that's what I was asking from you). But the point is to find some sort of behaviour which indicates that Pam is on her way home. Merely finding a behaviour which is associated with the passage of time doesn't tell you that anything anomalous is going on. As both Sheldrake and Wiseman pointed out, a dog which misses its owner and looks for that owner more and more, the longer the owner is gone, will also spend more time at the window during the period the owner finally makes their way home, in the absence of anomalous cognition. It's only anomalous if something about the dog's behaviour is different and noticeable during the owner's return.

Sheldrake looked for whether JayTee looked for Pam more and more, the longer she was gone. Wiseman looked for whether or not the dog's behaviour was different (in a way which was noticeable) when Pam was on her way home. Only Wiseman's approach tells you whether or not something anomalous may be going on.

...which I already said, and which Wiseman replicated, which showed a significant effect.
Well, yes, if you look at the data in a way which doesn't tell you whether or not the dog's behaviour is mundane or anomalous (Sheldrake's method), Wiseman and Sheldrake showed the same effect. But the important thing which Wiseman did, which Sheldrake did not, was look at the data in a way which tells you whether or not the dog's behaviour is anomalous. Nobody is interested in the dog's behaviour if the explanation for it is mundane. That's why non-proponents pay attention to Wiseman's experiments over Sheldrake's.

Wiseman replicated this and then lied to promote himself. That's fraud.
Nothing about what Wiseman did is a lie. He performed an experiment where he tried to find something about the dog's behaviour which was anomalous and was unable to do so. Sheldrake did not look for anomalous behaviour in the dog. He only looked for behaviour which was associated with the owner's return. So, yes, both Sheldrake and Wiseman found behaviour which was associated with the owner's return. But only Wiseman went further and looked for behaviour which would show that this was anomalous. In no way is it fraudulent for Wiseman to focus on whether or not the dog's behaviour is anomalous, when he talks about his results. After all, that was the reason that attention was given to the dog in the first place.

I think the problem in all this is that Sheldrake promotes his experiment as though it tested for anomalous behaviour.

Linda
 
Not "the psi hypothesis", but "a psi hypothesis". My point is that Francis is testing a hypothesis about what the statistics should look like in the presence of a psi effect, and rejecting it (on a frequentist criterion). Obviously his calculations will depend on a number of assumptions. I think it would be instructive for one of your experienced data analysts to write them out formally, and to consider carefully how far it's really safe to assume they are fulfilled in psi experiments. Would they necessarily survive the presence of an "experimenter effect", for example?
Sorry, but I don't understand that at all. Francis calculated the probability that at least 9 out of 10 tests would reject the null hypothesis (of no effect) given the statistical power of each test to reject the null if the effect sizes reported were the true population effect sizes. This is not a test about "what the statistics should look like in the presence of a psi effect." It's a test about what the statistics should look like if the experiments were conducted, analyzed, and reported in the manner required for Bem's statistics to be valid. Too many successful tests is thus evidence that the tests were not so conducted, making the results of the experiments uninterpretable, and therefore not scientific evidence of psi.

But also the other point remains that if there really is something fundamentally wrong with Bem's experiments, Francis's analysis doesn't tell us what it is. For example, my worry in looking at the paper was whether the number of trials in each study had really been fixed in advance. If not, in principle that might explain a larger number of significant results than would otherwise be expected, even if there was a single predetermined hypothesis for each study.
That's correct. The validity of Bem's statistics depend on the sample sizes being set in advance. You're describing optional stopping, whereby the investigator monitors the results as they come in and stops the experiment when he gets the result he's looking for, usually statistical significance. However, given enough time, optional stopping will always reject the null, even when the null true. Thus the false positive rate for full-blown optional stopping is 100%, not 5%, as the investigator claims, and so the reported p-values are meaningless. Francis's test is sensitive to optional stopping, because investigators usually stop the experiment as soon as the p-value drops below .05, which means the observed power will usually be around 50%, a fairly low value. So, even if the research (alternative) hypothesis is set in advance, the null will always be rejected whether it is true or false if optional stopping is employed, which means the reported experimental results will have no evidential value.
 
Last edited:
That's correct. The validity of Bem's statistics depend on the sample sizes being set in advance. You're describing optional stopping, whereby the investigator monitors the results as they come in and stops the experiment when he gets the result he's looking for, usually statistical significance. However, given enough time, optional stopping will always reject the null, even when the null true. Thus the false positive rate for full-blown optional stopping is 100%, not 5%, as the investigator claims, and so the reported p-values are meaningless. Francis's test is sensitive to optional stopping, because investigators usually stop the experiment as soon as the p-value drops below .05, which means the observed power will usually be around 50%, a fairly low value. So, even if the hypothesis is set in advance, it will always be rejected whether it is true or false if optional stopping is employed, which means the reported experimental results will have no evidential value.
And of course the easy fix for this going forward is pre-registration, with details including the intended sample size/power, etc.

Note as well that the engagement of optional stopping does not necessarily entail deliberate intent, it can be unconscious.
 
Note as well that the engagement of optional stopping does not necessarily entail deliberate intent, it can be unconscious.
I don't see how an investigator could be unconscious of monitoring the results and stopping when significance is obtained. On the other hand, a researcher could do this innocently in the sense of not realizing that it invalidates the statistics.
 
C

Chris

Sorry, but I don't understand that at all. Francis calculated the probability that at least 9 out of 10 tests would reject the null hypothesis (of no effect) given the statistical power of each test to reject the null if the effect sizes reported were the true population effect sizes.
But the calculation of statistical power from effect size depends on assumptions - for example that the results of different subjects are statistically independent - which may not be valid in the presence of psi. That's why I suggested thinking about an experimenter effect as an example.
 
I don't see how an investigator could be unconscious of monitoring the results and stopping when significance is obtained. On the other hand, a researcher could do this innocently in the sense of not realizing that it invalidates the statistics.
Well, its not always going to be stopping at the exact moment that significance is achieved, right? Nor do the calculations necessarily have to have been exactly done. A researcher could be generally monitoring the results, not necessarily performing calculations after each one. But when deciding whether to continue may be unconciously motivated by the fact that there has been a good run and can talk him or herself into stopping.

I think researcher degrees of freedom can appear in many guises, with many of them not being deliberate attempts to manipulate the results, but rather letting unconscious biases guide their decisions.

Again, the way to control for this is to set the conditions in advance.
 
But the calculation of statistical power from effect size depends on assumptions - for example that the results of different subjects are statistically independent - which may not be valid in the presence of psi. That's why I suggested thinking about an experimenter effect as an example.
Yes, the specific psi hypothesis must be specified in advance. Anything else, like mysterious intersubject interactions has to be assumed not to occur. Otherwise, you can't statistically analyze the results, and if you can't do that, but your results depend on statistical analysis, then for sure you're not doing science anymore.
 
Well, its not always going to be stopping at the exact moment that significance is achieved, right? Nor do the calculations necessarily have to have been exactly done. A researcher could be generally monitoring the results, not necessarily performing calculations after each one. But when deciding whether to continue may be unconciously motivated by the fact that there has been a good run and can talk him or herself into stopping.
Personally, I think that's reaching a bit.
 
C

Chris

Yes, the specific psi hypothesis must be specified in advance. Anything else, like mysterious intersubject interactions has to be assumed not to occur. Otherwise, you can't statistically analyze the results, and if you can't do that, but your results depend on statistical analysis, then for sure you're not doing science anymore.
Yes, that's my point. People who are taking a null hypothesis that includes psi - like Francis - have a problem, because aspects of psi may invalidate their statistical methods.

But there's no problem if the null hypothesis is a "no psi" hypothesi - as is commonly the case in psychical research - because by definition that will exclude such weird effects, and the statistics will be well-behaved.
 
Top