Bootstrap alternative to t-test dependent measures does not produce same p values

4 次查看(过去 30 天)
Hello there and thanks for your interest in my question!
I would like to use a boostrap alternative to a t-test for dependent/repeated measures. I wrote the following code, where the basic idea is to randomly assign the condition for each data pair:
a(1,:) = [3.4 3.5 3.4 3.6 3.2 3.5 3.5 2.8 3.1 3.5 3.4 3.6 3.1 3.1 3.5 3.8 3.5 3.4 3.3 3.5 3.5];
a(2,:) = [3.4 3.7 4.1 3.6 3.6 3.4 3.2 3.4 3.4 3.9 3.3 3.7 4.1 3.8 3.4 3.7 3.4 3.1 3.1 3.9 4];
a = a';
for i = 1:10000
realDiff = abs(mean(a(:,1)-a(:,2)));
shuffIdx1 = repmat([0 1],1,ceil(length(a)/2));
shuffIdx2 = randperm(length(shuffIdx1));
Idx = shuffIdx1(shuffIdx2(1:(length(a))))';
for x = 1:length(Idx)
if Idx(x) == 0
b(x,1) = a(x,1);
b(x,2) = a(x,2);
else
b(x,1) = a(x,2);
b(x,2) = a(x,1);
end
end
shuffDiff(i) = abs(mean(b(:,1)-b(:,2)));
end
p = (mean(shuffDiff > realDiff));
fprintf(num2str(p));
With the data set above, I get a p-value of around 0.011, butrunning a t-test, I get p = .020.
Any idea if I am doing something wrong?
Thanks

回答(1 个)

Jeff Miller
Jeff Miller 2020-5-5
  1. The p values are not that far off, given that the t-test is based on the assumption of normally distributed difference scores and your actual a(1,:)-a(2,:) differences do not appear to be approximately normal. The p values might well differ because of that.
  2. Your code does not really do bootstrapping, as that term is normally used. In bootstrapping the data are resampled with replacement so that the same pair could potentially occur several times in the resampled data set (and some pairs could be left out). Your code appears to include each pair exactly once in all samples.
  3. What you are doing is closer to what is usually called a "randomization test", where each score is randomly assigned to one of the two conditions. However, your code is not exactly doing that either. In a standard randomization test, each pair would be swapped or not randomly and independently of all the others, so you might get a few more swaps in one sample and a few less in another. If I am reading it correctly, your code swaps exactly half of the pairs in every sample.
  4. Note that you can take a few statements out of the for loop, which would speed things up a bit. For example, these two only need to be computed once, not on each pass through the loop.
realDiff = abs(mean(a(:,1)-a(:,2)));
shuffIdx1 = repmat([0 1],1,ceil(length(a)/2));

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by