I didn't claim to. I said that her analysis disagrees with Wagenmakers, which you said was "obvious." If she understands Bayesian analysis and disagrees with Wagenmakers, then how can Wagenmaker's response be "obvious"?

- Thread starter Dillinger
- Start date

You've made numerous incoherent statements about prior probabilities, the Jeffreys-Lindley paradox, and you implied that Utts' Bayes analysis somehow vindicated Bem of p-hacking. If you don't understand Bayesian inference then you ought not to act like you do.

http://dbem.ws/ResponsetoWagenmakers.pdf

My point was that it isn't "obvious" based on this.

You still haven't addressed my point. All this is irrelevant to the fact that Wiseman chose, prior to the experiment, an invalid criteria for falsification.

Of course it matters. There could be a squirrel rustling in leaves outside that the dog hears and goes to check out and Wiseman with his relatively deaf human ears doesn't hear and then now rejects the entire trial even though after this the dog was in fact at the door much much more often when the owner was coming home.

The part I put in bold clearly shows you have not looked at the data.

Of course it matters. There could be a squirrel rustling in leaves outside that the dog hears and goes to check out and Wiseman with his relatively deaf human ears doesn't hear and then now rejects the entire trial even though after this the dog was in fact at the door much much more often when the owner was coming home.

The part I put in bold clearly shows you have not looked at the data.

It seems to me that what they are doing is trying to isolate the behaviour reported by Smart's parents. The reason being because this behaviour is what led them to consider the animal telepathic. The parents' assertion, as I understand it, is that there was specific behaviour that predicted she was coming. If the dog exhibited that behaviour at times when she was not coming home then we would say that the parent's were not identifying what they thought they were.

Note also, IIRC (its been awhile since I've read the paper in detail), for many of the times when the dog was at the window when Pam was coming home, there were other things going on in front of the window at the time (making those likely false positives). I posted a chart at some point on the old forum, where I mapped it out, not sure if its still there or was deleted.

From what I can tell, Wiseman's protocol seems to be geared at identifying the noted behaviour (which again, remember, was deemed by the parents to be obvious and different from the dog's behaviour the rest of the time) and avoiding false positives.

What you're saying is that the dog went to the window a similar number of times in both sheldrake's and wiseman's experiments (again, been awhile since I read the papers in detail so I don't recall the exact stats). You then conclude that Wiseman is fraudulent to suggest he hasn't replicated Sheldrake's and should have declared it psi.

However, I think we can say that whether there is telepathy or not replicating the pattern is just part of the analysis. The rest of the analysis is whether or not the observed pattern should be considered to indicate telepathy. That's what Wiseman is getting at. Note as well, that in their follow-up paper they pointed out that at the time they had done their experiment, Sheldrake hadn't yet published other papers. It's unfortunate because much of the analysis Sheldrake did after. They responded to it somewhat in their reply but it would have been much better if they would have been able to deal with it while they were constructing their protocols.

In the conclusion of the paper Wiseman essentially says this. Note: the actual conclusions do not state that Sheldrake's experiment was debunked but rather that he didn't take sufficient steps to rule out false positives. I agree it doesn't debunk Sheldrake's, but it suggests that more experiments should be done, using refined protocols, given that it is impossible to review the tapes.

It's fair to disagree with his protocols (note however, they seem to have been developed in conjunction with Pam Smart). But to declare it fraudulent is going far. Note the back and forth exchange where they directly address the issue of how they represented their experiments and how sheldrake did. This issue is one of a disagreement over methodology. Framing it in terms of fraud serves only to distract from the issues (I think I've said before that its possible they might have gone too far in what they said in interviews and speeches, I haven't seem them, and as I've said I give all researchers more leeway in what they say in interviews compared to what they write in their papers, it's pretty common to see researchers overstating things in this context and if we're going to call it fraud its going to apply to a heck of a lot of them!)

What you seem to be saying is, the parents might have been mistaken that certain "obvious" behaviour by the dog indicated telepathy, but that the dog might be telepathic anyway.

These experiments were early attempts and included a lot of exploration with small sample sizes. If Sheldrake had continued his work this exchange would have been just one at the start of a chain, hopefully involving other researchers as well. Unfortunately Sheldrake doesn't seem to have pursued this line, so we don't know how the experiments would evolve and develop towards confirmatory, larger scale ones.

I hope he, or someone else, does pick up the chain, continuing to develop the protocol, identify strengths and weaknesses, and proceed to higher powered confirmatory ones if warranted.

That said, I think it is important to recognize the context of these papers and the fact that the research line is still primarily in its infancy.

I do still plan on getting back to you on the quantum post, haven't had time to go through it yet.

You said that the problems were "obvious" and referenced Wagenmakers.

Utts disagreed with Wagenmakers.

You said earlier that Utts understands Bayesian analysis. She also made the statement about the Lindley-Jeffreys paradox and nullifying effects. It's in her paper:

http://dbem.ws/ResponsetoWagenmakers.pdf

http://dbem.ws/ResponsetoWagenmakers.pdf

But you are confusing the prior distribution over the alternative hypothesis with the prior probability

My point was that it isn't "obvious" based on this.

Last edited:

He has done well over a hundred experiments with the dog.

Perhaps you could also exlplain why these trials get a more cursory treatment in the results section?

Last edited:

C

Francis's test doesn't reject the psi hypothesis; it indicates that the experimental results aren't valid evidence for the hypothesis. That is, it's the evidence that Francis's results say we should reject, not the hypothesis.

But also the other point remains that if there really is something fundamentally wrong with Bem's experiments, Francis's analysis doesn't tell us what it is. For example, my worry in looking at the paper was whether the number of trials in each study had really been fixed in advance. If not, in principle that might explain a larger number of significant results than would otherwise be expected, even if there was a single predetermined hypothesis for each study.

What other criteria? I don't know, maybe what Sheldrake used,…

Sheldrake looked for whether JayTee looked for Pam more and more, the longer she was gone. Wiseman looked for whether or not the dog's behaviour was different (in a way which was noticeable) when Pam was on her way home. Only Wiseman's approach tells you whether or not something anomalous may be going on.

...which I already said, and which Wiseman replicated, which showed a significant effect.

Wiseman replicated this and then lied to promote himself. That's fraud.

I think the problem in all this is that Sheldrake promotes his experiment as though it tested for anomalous behaviour.

Linda

Not "the psi hypothesis", but "a psi hypothesis". My point is that Francis is testing a hypothesis about what the statistics should look like in the presence of a psi effect, and rejecting it (on a frequentist criterion). Obviously his calculations will depend on a number of assumptions. I think it would be instructive for one of your experienced data analysts to write them out formally, and to consider carefully how far it's really safe to assume they are fulfilled in psi experiments. Would they necessarily survive the presence of an "experimenter effect", for example?

But also the other point remains that if there really is something fundamentally wrong with Bem's experiments, Francis's analysis doesn't tell us what it is. For example, my worry in looking at the paper was whether the number of trials in each study had really been fixed in advance. If not, in principle that might explain a larger number of significant results than would otherwise be expected, even if there was a single predetermined hypothesis for each study.

Last edited:

That's correct. The validity of Bem's statistics depend on the sample sizes being set in advance. You're describing optional stopping, whereby the investigator monitors the results as they come in and stops the experiment when he gets the result he's looking for, usually statistical significance. However, given enough time, optional stopping will always reject the null, even when the null true. Thus the false positive rate for full-blown optional stopping is 100%, not 5%, as the investigator claims, and so the reported p-values are meaningless. Francis's test is sensitive to optional stopping, because investigators usually stop the experiment as soon as the p-value drops below .05, which means the observed power will usually be around 50%, a fairly low value. So, even if the hypothesis is set in advance, it will always be rejected whether it is true or false if optional stopping is employed, which means the reported experimental results will have no evidential value.

Note as well that the engagement of optional stopping does not necessarily entail deliberate intent, it can be unconscious.

Note as well that the engagement of optional stopping does not necessarily entail deliberate intent, it can be unconscious.

C

Sorry, but I don't understand that at all. Francis calculated the probability that at least 9 out of 10 tests would reject the null hypothesis (of no effect) given the statistical power of each test to reject the null if the effect sizes reported were the true population effect sizes.

I don't see how an investigator could be unconscious of monitoring the results and stopping when significance is obtained. On the other hand, a researcher could do this innocently in the sense of not realizing that it invalidates the statistics.

I think researcher degrees of freedom can appear in many guises, with many of them not being deliberate attempts to manipulate the results, but rather letting unconscious biases guide their decisions.

Again, the way to control for this is to set the conditions in advance.

But the calculation of statistical power from effect size depends on assumptions - for example that the results of different subjects are statistically independent - which may not be valid in the presence of psi. That's why I suggested thinking about an experimenter effect as an example.

Well, its not always going to be stopping at the exact moment that significance is achieved, right? Nor do the calculations necessarily have to have been exactly done. A researcher could be generally monitoring the results, not necessarily performing calculations after each one. But when deciding whether to continue may be unconciously motivated by the fact that there has been a good run and can talk him or herself into stopping.

C

Yes, the specific psi hypothesis must be specified in advance. Anything else, like mysterious intersubject interactions has to be assumed not to occur. Otherwise, you can't statistically analyze the results, and if you can't do that, but your results depend on statistical analysis, then for sure you're not doing science anymore.

But there's no problem if the null hypothesis is a "no psi" hypothesi - as is commonly the case in psychical research - because by definition that will exclude such weird effects, and the statistics will be well-behaved.