...is essentially the same as Chris's suggestion that bias is eliminated when instead of averaging over actual number of calm/emotional trials, we average over expected number of calm/emotional trials - this is what your term t/2 represents.
I say that you got there from different starting points because, as best I can tell, Chris got there by starting from thinking about how biased averages could be corrected, and, as best I can tell, you started from thinking about how unbiased sums could be capitalised upon.
No, I think what was in my mind was more a useful way of rescaling the unbiased sums. Strictly I don't think the sum divided by the expected number of trials should be called an average.
Anyhow, the good news is that Linda now seems to be happy with Jay's statement about the sums being unbiased. So we don't need to have any more discussions about sums of no elements. :D:D:D:D:D
The bad news is that she now seems to be suggesting that the bias is actually due to the averages being undefined for sequences of all calm or all emotional trials. :( (Though what she says isn't very clear and anyway it may have changed by tomorrow. ;))
That's not what causes the bias. It's caused by the fact that the response tends to be lower overall for sequences with more emotional trials.
Any sensitivity to assumptions about all-calm or all-emotional sequences will be absolutely tiny in real experiments, because such sequences would be incredibly rare.
You've written this a few times now and I've never understood what you meant, and have always suspected that it stems from a misunderstanding of your own, but have never engaged you on it until now, so here goes.
Which response do you mean? Response over calm trials or over emotional trials? Doesn't, under these assumptions, the difference between these (calm/emotional responses) cancel out, regardless of whether there are fewer or more emotional trials? And isn't the true cause of the bias as identified by Dalkvist et al: that there is a fundamental difference in the averages (of calm/emotional trials) resulting from the two similar scenarios where a string of N calm trials ends in a calm trial versus when that string of N calm trials ends instead in an emotional trial? And yes, it's more complicated than this, but isn't this the essence of it?
Seq Avg(C) Avg(E)
--------------------
CCCC 3/2 n/a
CCCE 1 3
CCEC 1/3 2
CCEE 1/2 1
CECC` 1/3 1
CECE 0 1
CEEC 0 1/2
CEEE 0 1/3
ECCC 1 0
ECCE 1/2 1
ECEC 0 1/2
ECEE 0 1/3
EECC 1/2 0
EECE 0 1/3
EEEC 0 0
EEEE n/a 0
#Cs Avgs(C) Avgs(E)
--------------------------------------
4 3/2 n/a
3 1, 1/3, 1/3, 1 3, 2, 1, 0
2 1/2, 0, 0, 1/2, 0, 1/2 1, 1, 1/2, 1, 1/2, 0
1 0, 0, 0, 0 1/3, 1/3, 1/3, 0
0 n/a 0
#Cs Avg(Avgs(C)) Avg(Avgs(E)) #sequences Sum(Avgs((C)) Sum(Avgs((E))
4 3/2 n/a 1 3/2 n/a
3 2/3 3/2 4 8/3 6
2 1/4 2/3 6 3/2 4
1 0 1/4 4 0 1
0 n/a 0 1 n/a 0
The bad news is that she now seems to be suggesting that the bias is actually due to the averages being undefined for sequences of all calm or all emotional trials.
I think it might be interesting to see an analysis of D*.
What I was drawing attention to was that, despite the unbiased definition, the carry-over will show up in almost every realization of the experiment. And that the carryover isn't produced by the bias produced by undefined sequences.
Seq Avg(C) Avg(E) Diff
-------------------------
CC 1/2 n/a n/a
CE 0 1 1
EC 0 0 0
EE n/a 0 n/a
Avg 1/6 1/3
Well, that's essentially what I think too. Let µ be the true average per-person difference between the response for an activating and a calm trial. Let S be a finite sample space of the type of experiment and carryover we've been talking about. Then, using the notation of my most recent post, as an estimator of µ, D̅ = A̅ – C̅ is undefined on S because of the two sequences, Xa and Xc, in S that are comprised exclusively of activating or calm stimuli, respectively. Therefore, we have to do one of two things. The first thing we can do is decide a priori that if any Xa's or Xc's appear in our data set, we'll throw them away. That amounts to changing the sample space to S' = S \ {Xa, Xc}. Then we can compute D̅ on our sample from S'; however E[D̅(S')] ≠ µ.
The other thing we can do is to agree a priori to use two constants, c1 and c2, to represent the otherwise uncomputable C̅a and A̅c , of any Xa's and Xc's, respectively, that appear in our data set. Then we can compute D̅ on the original sample space S, but then E[D̅(S)] ≠ µ unless we make miraculous choices for c1 and c2.
To summarize, any practical thing we can do to create a well-defined estimator D̅, results in E[D̅] ≠ µ, and the only reason we have to do anything at all is because of the two sequences comprising either all activating or all calm stimuli. So the cause of the bias is the existence of these two sequences in the sample space that force us to create a biased estimator.
The correlations I've described above - which relate to all the possible sequences - will clearly lead to bias, so whether the treatment of these two exceptional sequences introduces a bit of extra bias seems a rather academic point.
#Cs Avg(Avgs(C)) Avg(Avgs(E)) #sequences Sum(Avgs(C)) Sum(Avgs(E))
17 8 n/a 1 8 n/a
16 5 8 17 85 136
15 3.5 5 136 476 680
14 2.6 3.5 680 1768 2380
13 2 2.6 2380 4760 6188
12 1.571429 2 6188 9724 12376
11 1.25 1.571429 12376 15470 19448
10 1 1.25 19448 19448 24310
9 0.8 1 24310 19448 24310
8 0.636364 0.8 24310 15470 19448
7 0.5 0.636364 19448 9724 12376
6 0.384615 0.5 12376 4760 6188
5 0.285714 0.384615 6188 1768 2380
4 0.2 0.285714 2380 476 680
3 0.125 0.2 680 85 136
2 0.058824 0.125 136 8 17
1 0 0.058824 17 0 1
0 n/a 0 1 n/a 0
Any sensitivity to assumptions about all-calm or all-emotional sequences will be absolutely tiny in real experiments, because such sequences would be incredibly rare.
There are so many errors in Linda's most recent post that I thought that, because this is a confusing enough subject to understand without misinformation, it was important to go through and point them all out.
What about D* would you like to see analyzed?
I think what you're getting at is along the lines of something I've stated twice now, but which no one has yet reacted to, so I'm not sure if its importance has registered for anybody.
So, once again:
First, let S be the sample space of all possible sequences in a trial that employs s-length sequences. Let Q be the set of sequences that are realized in a particular experiment. Since carryover differs from sequence to sequence, then even though D* is an unbiased estimator of µ, if Q is randomly generated then it is unlikely that D*|Q ("D* given Q") will be biased. This is why Q should not be randomly generated; it should be fixed, and unless a subset of Q known to be unbiased under the assumed carryover model can be found, then Q should equal S.
Second, all the discussion so far has tacitly assumed that the carryover effect is identical from subject to subject; however, that assumption is implausible. Let's assume that the form of the carryover effect is incremented by c after a calm trial and reset to baseline after an activating trial. Then it is almost sure that the value of c will differ from subject to subject. But carryover also differs from sequence to sequence. So if a subject whose value of c is high, say, is randomized to a sequence with a lot of consecutive Cs and hence a large carryover effect, this will bias D*. That is to say that if P is the set of participants in the experiment, then D*|P may be biased. The way to overcome this is to randomize many subjects to each sequence in the experiment.
The above two considerations taken together imply that, to be reasonably confident that D*|S,P ("D* given S and P") is approximately unbiased, the sequence length in the experiment should be short, so that the size of S is small, so that many subjects may be randomized to each of the fixed set of sequences S. But my suspicion is that parapsychologists are doing the exact opposite: generating long, random sequences, with the number of participants much less than the number of sequences in S.
Did you mean for the bolded word to be "unbiased"?
Some are suggesting single trials with a large number of participants.
That is really what they should do. I didn't bring it up, because I figured I'd just get the usual "parapsychology doesn't have the money" response.
The correlations I've described above - which relate to all the possible sequences - will clearly lead to bias, so whether the treatment of these two exceptional sequences introduces a bit of extra bias seems a rather academic point.