fitlme different to lmer in R

Question

Rik Henson 2024-6-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2131881-fitlme-different-to-lmer-in-r

评论： Rik Henson 2024-6-26

I am trying to illustrate Simpsons Paradox using fitlme:

dat = [
7050   21.0000    1.0000
3362   22.0000    1.0000
2369   23.0000    1.0000
8375   24.0000    1.0000
9990   25.0000    1.0000
5644   31.0000    2.0000
5219   32.0000    2.0000
3329   33.0000    2.0000
4778   34.0000    2.0000
0626   35.0000    2.0000
9573   41.0000    3.0000
9516   42.0000    3.0000
1706   43.0000    3.0000
1398   44.0000    3.0000
1729   45.0000    3.0000
7340   51.0000    4.0000
6630   52.0000    4.0000
4108   53.0000    4.0000
0697   54.0000    4.0000
3200   55.0000    4.0000
0275   61.0000    5.0000
0699   62.0000    5.0000
3365   63.0000    5.0000
8270   64.0000    5.0000
5141   65.0000    5.0000];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'})
figure,plot(tab.Age, tab.y,'o')
%tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML')

but I get a positive coefficient for the fixed effect of Age, when I was expecting a negative one:

Linear mixed-effects model fit by REML
Model information:
    Number of observations              25
    Fixed effects coefficients           2
    Random effects coefficients          5
    Covariance parameters                2
Formula:
    y ~ 1 + Age + (1 | Participant)
Model fit statistics:
    AIC       BIC       LogLikelihood    Deviance
    120.67    125.21    -56.334          112.67  
Fixed effects coefficients (95% CIs):
    Name                 Estimate    SE          tStat      DF    pValue        Lower      Upper  
    '(Intercept)'        -4.4658       1.3833    -3.2283    23     0.0037183    -7.3274    -1.6042
    'Age'                0.49496     0.030545     16.204    23    4.4887e-14    0.43177    0.55815
Random effects covariance parameters (95% CIs):
Group: Participant (5 Levels)
    Name1                Name2                Type         Estimate      Lower    Upper
    '(Intercept)'        '(Intercept)'        'std'        2.4099e-16    NaN      NaN  
Group: Error
    Name             Estimate    Lower     Upper
    'Res Std'        2.1706      1.6259    2.898

If I run what I think is equivalent model using "lmer" in R on exactly the same data, I get a negative coefficient for Age, as expected:

summary(lmer('y ~ Age + (1|Participant)', data = tab)) # "tab" is dataframe version of tab above
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ Age + (1 | Participant)
   Data: tab
REML criterion at convergence: 88.2
Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.8806 -0.5016  0.1545  0.7367  1.3270 
Random effects:
 Groups      Name        Variance Std.Dev.
 Participant (Intercept) 673.2727 25.9475 
 Residual                  0.4153  0.6445 
Number of obs: 25, groups:  Participant, 5
Fixed effects:
            Estimate Std. Error t value
(Intercept) 65.94094   12.24102   5.387
Age         -1.13831    0.09058 -12.567
Correlation of Fixed Effects:
    (Intr)
Age -0.318

What am I doing wrong with fitlme?

2 个评论
显示无隐藏无

the cyclist 2024-6-25

在 MATLAB Online 中打开

I don't have an answer for you; instead, I'm going to make it even more puzzling.

If I include only the first three participants, I get the sign you expect.

dat = [

8.7050 21.0000 1.0000

7.3362 22.0000 1.0000

6.2369 23.0000 1.0000

4.8375 24.0000 1.0000

4.9990 25.0000 1.0000

13.5644 31.0000 2.0000

12.5219 32.0000 2.0000

12.3329 33.0000 2.0000

11.4778 34.0000 2.0000

10.0626 35.0000 2.0000

18.9573 41.0000 3.0000

17.9516 42.0000 3.0000

15.1706 43.0000 3.0000

16.1398 44.0000 3.0000

15.1729 45.0000 3.0000

% 23.7340 51.0000 4.0000

% 23.6630 52.0000 4.0000

% 22.4108 53.0000 4.0000

% 21.0697 54.0000 4.0000

% 20.3200 55.0000 4.0000

% 28.0275 61.0000 5.0000

% 28.0699 62.0000 5.0000

% 27.3365 63.0000 5.0000

% 25.8270 64.0000 5.0000

% 24.5141 65.0000 5.0000

];

tab = array2table(dat,'VariableNames',{'y','Age','Participant'});

figure

plot(tab.Age, tab.y,'o')

% tab.Participant = nominal(tab.Participant) % doesn't have any effect

m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML')

m =

Linear mixed-effects model fit by REML Model information: Number of observations 15 Fixed effects coefficients 2 Random effects coefficients 3 Covariance parameters 2 Formula: y ~ 1 + Age + (1 | Participant) Model fit statistics: AIC BIC LogLikelihood Deviance 54.254 56.514 -23.127 46.254 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 41.176 8.9474 4.602 13 0.00049589 21.846 60.506 {'Age' } -0.89328 0.11223 -7.9596 13 2.3634e-06 -1.1357 -0.65083 Random effects covariance parameters (95% CIs): Group: Participant (3 Levels) Name1 Name2 Type Estimate Lower Upper {'(Intercept)'} {'(Intercept)'} {'std'} 14.105 5.2253 38.073 Group: Error Name Estimate Lower Upper {'Res Std'} 0.61861 0.40713 0.93996

the cyclist 2024-6-25

在 MATLAB Online 中打开

Also, if I increase the size of your dataset (still including participants 4 & 5):

dat = repmat(dat,2,1);

and run the model, I also get the sign you expect.

I wonder if MATLAB is somehow getting stuck in a local minimum solution, on the original dataset.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

the cyclist 2024-6-25

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2131881-fitlme-different-to-lmer-in-r#answer_1476906

编辑：the cyclist 2024-6-25

在 MATLAB Online 中打开

Disclaimer: I don't fully understand the specifics on why you are seeing what you are.

Thinking about my comment about being stuck in a local minimum, I remembered that one can randomize the start. Here, I do that, run your code 500 times, and see what age coefficient I get. Clearly, things are a bit dicy.

The model with the negative age coefficient, is definitely better, with a much lower AIC.

rng default

dat = [

8.7050 21.0000 1.0000

7.3362 22.0000 1.0000

6.2369 23.0000 1.0000

4.8375 24.0000 1.0000

4.9990 25.0000 1.0000

13.5644 31.0000 2.0000

12.5219 32.0000 2.0000

12.3329 33.0000 2.0000

11.4778 34.0000 2.0000

10.0626 35.0000 2.0000

18.9573 41.0000 3.0000

17.9516 42.0000 3.0000

15.1706 43.0000 3.0000

16.1398 44.0000 3.0000

15.1729 45.0000 3.0000

23.7340 51.0000 4.0000

23.6630 52.0000 4.0000

22.4108 53.0000 4.0000

21.0697 54.0000 4.0000

20.3200 55.0000 4.0000

28.0275 61.0000 5.0000

28.0699 62.0000 5.0000

27.3365 63.0000 5.0000

25.8270 64.0000 5.0000

24.5141 65.0000 5.0000

];

tab = array2table(dat,'VariableNames',{'y','Age','Participant'});

figure

plot(tab.Age, tab.y,'o')

NS = 200;

[aic,ageCoefficient] = deal(zeros(NS,1));

for ns = 1:NS

% tab.Participant = nominal(tab.Participant) % doesn't have any effect

m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML','StartMethod','random');

ageCoefficient(ns) = m.Coefficients.Estimate(2);

aic(ns) = m.ModelCriterion.AIC;

end

figure

histogram(ageCoefficient)

xlabel('Age coefficient')

ylabel('Frequency')

figure

histogram(aic)

xlabel('AIC')

ylabel('Frequency')

3 个评论
显示 1更早的评论隐藏 1更早的评论

Rik Henson 2024-6-26

Thank you so much for your time @the cyclist. I will explore the convergence problems you discovered (and perhaps see whether R is fitting differently).

Rik Henson 2024-6-26

I've played with the Optimizer options (and set FitMethod to REML, to match R's default), but still get the same problem of a local minimum. If I double the number of trials per participant (from 5 to 10), then I consistently get the correct negative coefficient for Age. So I guess there is still something different between Matlab's optimiser and R's, and the latter is just more robust to small data sets? Thanks again for confirming that I wasn't just being stupid (or perhaps I still am...;-).

请先登录，再进行评论。

fitlme different to lmer in R

2 个评论
显示无隐藏无

回答（1 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

fitlme different to lmer in R

2 个评论 显示 无隐藏 无

回答（1 个）

3 个评论 显示 1更早的评论隐藏 1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

2 个评论
显示无隐藏无

3 个评论
显示 1更早的评论隐藏 1更早的评论