fitlme different to lmer in R
23 次查看(过去 30 天)
显示 更早的评论
I am trying to illustrate Simpsons Paradox using fitlme:
dat = [
8.7050 21.0000 1.0000
7.3362 22.0000 1.0000
6.2369 23.0000 1.0000
4.8375 24.0000 1.0000
4.9990 25.0000 1.0000
13.5644 31.0000 2.0000
12.5219 32.0000 2.0000
12.3329 33.0000 2.0000
11.4778 34.0000 2.0000
10.0626 35.0000 2.0000
18.9573 41.0000 3.0000
17.9516 42.0000 3.0000
15.1706 43.0000 3.0000
16.1398 44.0000 3.0000
15.1729 45.0000 3.0000
23.7340 51.0000 4.0000
23.6630 52.0000 4.0000
22.4108 53.0000 4.0000
21.0697 54.0000 4.0000
20.3200 55.0000 4.0000
28.0275 61.0000 5.0000
28.0699 62.0000 5.0000
27.3365 63.0000 5.0000
25.8270 64.0000 5.0000
24.5141 65.0000 5.0000];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'})
figure,plot(tab.Age, tab.y,'o')
%tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML')
but I get a positive coefficient for the fixed effect of Age, when I was expecting a negative one:
Linear mixed-effects model fit by REML
Model information:
Number of observations 25
Fixed effects coefficients 2
Random effects coefficients 5
Covariance parameters 2
Formula:
y ~ 1 + Age + (1 | Participant)
Model fit statistics:
AIC BIC LogLikelihood Deviance
120.67 125.21 -56.334 112.67
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue Lower Upper
'(Intercept)' -4.4658 1.3833 -3.2283 23 0.0037183 -7.3274 -1.6042
'Age' 0.49496 0.030545 16.204 23 4.4887e-14 0.43177 0.55815
Random effects covariance parameters (95% CIs):
Group: Participant (5 Levels)
Name1 Name2 Type Estimate Lower Upper
'(Intercept)' '(Intercept)' 'std' 2.4099e-16 NaN NaN
Group: Error
Name Estimate Lower Upper
'Res Std' 2.1706 1.6259 2.898
If I run what I think is equivalent model using "lmer" in R on exactly the same data, I get a negative coefficient for Age, as expected:
summary(lmer('y ~ Age + (1|Participant)', data = tab)) # "tab" is dataframe version of tab above
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ Age + (1 | Participant)
Data: tab
REML criterion at convergence: 88.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.8806 -0.5016 0.1545 0.7367 1.3270
Random effects:
Groups Name Variance Std.Dev.
Participant (Intercept) 673.2727 25.9475
Residual 0.4153 0.6445
Number of obs: 25, groups: Participant, 5
Fixed effects:
Estimate Std. Error t value
(Intercept) 65.94094 12.24102 5.387
Age -1.13831 0.09058 -12.567
Correlation of Fixed Effects:
(Intr)
Age -0.318
What am I doing wrong with fitlme?
2 个评论
the cyclist
2024-6-25
I don't have an answer for you; instead, I'm going to make it even more puzzling.
If I include only the first three participants, I get the sign you expect.
dat = [
8.7050 21.0000 1.0000
7.3362 22.0000 1.0000
6.2369 23.0000 1.0000
4.8375 24.0000 1.0000
4.9990 25.0000 1.0000
13.5644 31.0000 2.0000
12.5219 32.0000 2.0000
12.3329 33.0000 2.0000
11.4778 34.0000 2.0000
10.0626 35.0000 2.0000
18.9573 41.0000 3.0000
17.9516 42.0000 3.0000
15.1706 43.0000 3.0000
16.1398 44.0000 3.0000
15.1729 45.0000 3.0000
% 23.7340 51.0000 4.0000
% 23.6630 52.0000 4.0000
% 22.4108 53.0000 4.0000
% 21.0697 54.0000 4.0000
% 20.3200 55.0000 4.0000
% 28.0275 61.0000 5.0000
% 28.0699 62.0000 5.0000
% 27.3365 63.0000 5.0000
% 25.8270 64.0000 5.0000
% 24.5141 65.0000 5.0000
];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'});
figure
plot(tab.Age, tab.y,'o')
% tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML')
the cyclist
2024-6-25
Also, if I increase the size of your dataset (still including participants 4 & 5):
dat = repmat(dat,2,1);
and run the model, I also get the sign you expect.
I wonder if MATLAB is somehow getting stuck in a local minimum solution, on the original dataset.
回答(1 个)
the cyclist
2024-6-25
编辑:the cyclist
2024-6-25
Disclaimer: I don't fully understand the specifics on why you are seeing what you are.
Thinking about my comment about being stuck in a local minimum, I remembered that one can randomize the start. Here, I do that, run your code 500 times, and see what age coefficient I get. Clearly, things are a bit dicy.
The model with the negative age coefficient, is definitely better, with a much lower AIC.
rng default
dat = [
8.7050 21.0000 1.0000
7.3362 22.0000 1.0000
6.2369 23.0000 1.0000
4.8375 24.0000 1.0000
4.9990 25.0000 1.0000
13.5644 31.0000 2.0000
12.5219 32.0000 2.0000
12.3329 33.0000 2.0000
11.4778 34.0000 2.0000
10.0626 35.0000 2.0000
18.9573 41.0000 3.0000
17.9516 42.0000 3.0000
15.1706 43.0000 3.0000
16.1398 44.0000 3.0000
15.1729 45.0000 3.0000
23.7340 51.0000 4.0000
23.6630 52.0000 4.0000
22.4108 53.0000 4.0000
21.0697 54.0000 4.0000
20.3200 55.0000 4.0000
28.0275 61.0000 5.0000
28.0699 62.0000 5.0000
27.3365 63.0000 5.0000
25.8270 64.0000 5.0000
24.5141 65.0000 5.0000
];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'});
figure
plot(tab.Age, tab.y,'o')
NS = 200;
[aic,ageCoefficient] = deal(zeros(NS,1));
for ns = 1:NS
% tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML','StartMethod','random');
ageCoefficient(ns) = m.Coefficients.Estimate(2);
aic(ns) = m.ModelCriterion.AIC;
end
figure
histogram(ageCoefficient)
xlabel('Age coefficient')
ylabel('Frequency')
figure
histogram(aic)
xlabel('AIC')
ylabel('Frequency')
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Analysis of Variance and Covariance 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!