REPEATED MEASURES ANOVA MATLAB

Question

Giulia Soresini 2019-11-3

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/488962-repeated-measures-anova-matlab

评论： Adam Danz 2021-4-29

采纳的回答： Adam Danz

Q2.xls

在 MATLAB Online 中打开

Hi everyone!

I am having troubles understanding how to perform the correct anova test on my data. I have 16 subjects who are being treated with 2 different treatments. For each subject and each treatment measures are taken at 6 different time points (attached is the excel file). Is repeated measures anova the best way to go? If so, what is the most apporpriate command to use?

Here is the code I have so far. The results are kind of strange so I am not sure if it is wrong or I am just interpreting it the wrong way.

Thanks,

Giulia

 Q2 = xlsread('Q2.xls');
 Treatment = categorical(Q2(:,2)); 
 ID = Q2(:,1); 
 t = table(Treatment, Q2(:,3),Q2(:,4),Q2(:,5),Q2(:,6),Q2(:,7), Q2(:,8),'VariableNames', {'Treatment', 't1', 't2', 't3', 't4', 't5', 't6'});
 Time = [1 2 3 4 5 6]';
 rm = fitrm( t, 't1-t6~Treatment', 'WithinDesign', Time); 
 ranovatbl = ranova(rm); 
 disp(ranovatbl);
 multcompare(rm, 'Time', 'By', 'Treatment')

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Scott MacKenzie 2021-4-27

编辑：Scott MacKenzie 2021-4-27

Below is an Anova I just did on your data set (not using MATLAB). What you have is a 2 x 6 within-subjects design. You have two independent variables: Treatment (2 levels) and Time Point (6 levels). What was the dependent variable? In your question, you only say "measures are taken". Measures of what? Here, I'm assuming you measured something like reaction time or task completion time with milliseconds as the units.

Here is a summary of the key results: The effect of treatment on task completion time was not statistically significant, F(1,10) = 1.131, p > .05. The effect of time point on task completion time was statistically significant, F(5,50) = 5.127, p < .001. The Treatment x Time Point interaction effect was also statistically significant, F(5,50) = 6.478, p = .0001. See below for an Anova table for your data and experiment design.

I haven't figure out how to generate a table like this in MATLAB, so I'm using a different tool.

BTW, the attached file has data for 11 participants, not 16 as stated in your question.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Adam Danz 2019-11-3

4
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/488962-repeated-measures-anova-matlab#answer_399591

编辑：Adam Danz 2021-4-16

在 MATLAB Online 中打开

Q2unlocked.xlsx

Is repeated measures anova the best way to go?

Your design is certainly applicable to a repeated measures design but the answer to that question depends on what you're testing.

It looks like you want to perform a repeated measures ANOVA with a covariate. In addition to patient number, the treatment type is the covariate, and the 6 time points is the within-subject repeated measure.

There are 4 null hypotheses to look at:

Patient number has no effect on the population mean
Treatment type has no effect on the population mean
There is no interaction between patient number and treatment type (usually the most important question)
There is no 3-way interaction between patient number, treatment type, and treatment time.

Setting up a RM Anova with a convariate in Matlab

Your data are attached in a file named Q2unlocked.xlsx. The following lines set up a RM Anova with a covariate (written in r2019b).

% Read in the table
Q2 = readtable('Q2unlocked.xlsx');
% Replace the 6 measurment headers 
Q2.Properties.VariableNames(3:8) = {'t1', 't2', 't3', 't4', 't5', 't6'}; 
% The within-subjects design
withinDsgn = table((1:6)','VariableNames',{'Time'});  
% Run the RM Anova (note the interaction between PATIENT and TREAT!)
rm = fitrm(Q2, 't1-t6~PATIENT*TREAT', 'WithinDesign', withinDsgn); 

Also see a similar example provided by Matlab.

Testing for assumptions

Before we look at the results, you must make sure your data are appropriate for a RM Anova by confirming the following assumptions.

Independent observations: You collected the data randomly and without bias
Normality: The measurments have an approximately normal distribution
Sphericity: we'll use the Mauchly test.

% Are the data approximately normal? This should ideally be done for each factor

histogram(reshape(Q2{:,3:end},[],1))

A small rightward tail but not too bad.

% Does the data pass the sphericity test?
rm.mauchly
ans = 1×4 table
       W       ChiStat    DF    pValue 
    _______    _______    __    _______

    0.48842    11.537     14    0.64344

Yes: a pValue < 0.05 would fail

https://www.mathworks.com/help/stats/mauchlys-test-of-sphericity.html

RM Anova results

ranova(rm) or rm.ranova produce the results table showing the p-value and then 3 adjusted p-values depending on whether the the response variables have different variances (symmetry assumption).

% Produce ranova table
rm.ranova
ans = 5×8 table
                            SumSq       DF      MeanSq        F        pValue     pValueGG    pValueHF    pValueLB
                          __________    __    __________    ______    ________    ________    ________    ________

    (Intercept):Time      3.9654e+06     5    7.9308e+05    2.2798    0.053221    0.071668    0.053221    0.14843 
    PATIENT:Time          4.0894e+06     5    8.1788e+05    2.3511    0.047012    0.064753    0.047012    0.14259 
    TREAT:Time            3.2657e+06     5    6.5314e+05    1.8775     0.10606     0.12652     0.10606    0.18747 
    PATIENT:TREAT:Time    3.9524e+06     5    7.9048e+05    2.2723    0.053916    0.072433    0.053916    0.14905 
    Error(Time)           3.1309e+07    90    3.4788e+05                                                          

Row 1 representing all differences across the within-subjects factors (treatment times). It has a p value of 0.053 which is just beyond the socially accepted threshold of 0.05. A boxplot will show us if this value seems reasonable.

figure()

boxplot(Q2{:,3:end})

xlabel('Treatment times')

ylabel('measurement value')

title('Data pooled between all Patients and treatment factors')

It doesn't look like there's much of an effect of treatment time when factors are combined

Row 2 shows the interaction between Patient and treatment times and is p=0.047

bpdata = [];

for i = 1:max(Q2.PATIENT) %assuming patient numbers are 1:max

bpdata = [bpdata, Q2{Q2.PATIENT==i,3:8},nan(size(unique(Q2.TREAT)))];

end

figure()

boxplot(bpdata)

arrayfun(@xline,7:7:size(bpdata,2))

xlabel('6 treatment times across 11 patients')

ylabel('measurement value')

title('Data pooled between treatment factors')

set(gca,'XTick', [])

Clearly the difference between the 6 treatment times differs across subjects (Treatment type combined)

Row 3 shows the interaction between treatment type and treatment times and is p=0.106.

bpdata = [];

for i = 1:max(Q2.TREAT) %assuming TREAT numbers are 1:max

bpdata = [bpdata, Q2{Q2.TREAT==i,3:8},nan(size(unique(Q2.PATIENT)))];

end

figure()

boxplot(bpdata)

xline(7)

xlabel('6 treatment times across 2 TREAT factors')

ylabel('measurement value')

title('Data pooled between patients')

set(gca,'XTick', [])

Other than the 2nd treatment factor there is no difference between the 2 Treatment types.

Row 4 shows the 3-way interaction between patient, treatment type, and treatment times.

14 个评论
显示 12更早的评论隐藏 12更早的评论

Scott MacKenzie 2021-4-27

编辑：Scott MacKenzie 2021-4-28

Adam, sorry, but concerning your comment on the degrees of freedom, this is simply wrong. Remember, we are talking about the main effect of Treatment in Giulia's data set. Treatment is the primary independent variable. Determining whether or not it has a signficant effect on the dependent variable is the main reason for doing the analysis of variance in the first place. But, your Anova table provides no insight on this issue.

It is a simple fact that the first degree of freedom in the F-statistic for a main effect is n-1 where n is the number of conditions or levels in the independent variable. Just look in the Anova table I provide above or in any statistics textbook. Treatment is an independent variable with two levels, so its df is 1. But, your Anova table has no line entry with df = 1. The only reasonable conclusion is that there is either no main effect for Treatment given in your table, or, if there is, it is wrong.

This is probably the wrong place for dialog like this. Sorry. I think I'll going to put together a separate question on this point, as I, like Giulia, am interested in using MATLAB for an analysis of variance on data obtained in a factorial experiment.

Adam Danz 2021-4-28

编辑：Adam Danz 2021-4-28

在 MATLAB Online 中打开

Hi Scott I'm not a statitics expert and have met very few people who are. Not many people here answer stats questions so your commentary is welcomed.

The repeated measures ANOVA table that Matlab generates (not me) is a produced by ranova and is similar to the examples from the documentation here and here. You can review the Wilkinson Notation here. The model includes a patient term, a treatment group term, and an interaction term between patient and treatment group. The table it produces includes a term representing all differences across the within-subjects factors (time) and terms for all interactions between the within-subject terms and all between-subject model terms, as explained in the ranova documentation.

The df for the Time term should be 5 since there are 6 measurements. The df for treatment*time interaction should also be 5 since there are 2 treatment levels and 5*1=5. But I don't know why the patient*time df is 5 in the table. I'd have to go through the code to figure it out since the documentation on statistical functions in Matlab tends to be scant.

These data could also be reshaped to perform an n-way analysis of variance using anovan which contains the df you were expecting.

% Reshape data 
data = readmatrix('Q2unlocked.xlsx');
M = repmat(data(:,1:2),6,1); 
M(:,end+1) = repelem((1:6)',size(data,1),1);
rmeas = data(:,3:end);
M(:,end+1) = rmeas(:); 
[p, tbl] = anovan(M(:,end),{M(:,1), M(:,2), M(:,3)},'model','interaction',...
    'varnames',{'Participants','Treatment','Time'})

Although, I'm sure the model used to generate the table in your comment differed. Several people in the past have asked why anova tables differ between software and those questions rarely get solved. I don't advise anyone to use statistical significance testing to begin with so I only dive in when needed.

Lastly, you bring up a good point that this model may not be the one that answers the OP's question. Perhaps specifying the within-groups model when calling ranova is the better choice. Instead of rm.ranova which is the same as ranova(rm),

ranova(rm,'WithinModel','Time')

which results in

10×8 table
                            SumSq       DF      MeanSq         F         pValue     pValueGG    pValueHF    pValueLB
                          __________    __    __________    ________    ________    ________    ________    ________
    (Intercept)           2.9816e+07     1    2.9816e+07       6.914    0.017011    0.017011    0.017011    0.017011
    PATIENT               2.4972e+06     1    2.4972e+06     0.57908     0.45653     0.45653     0.45653     0.45653
    TREAT                 1.6441e+05     1    1.6441e+05    0.038124     0.84738     0.84738     0.84738     0.84738
    PATIENT:TREAT         9.5387e+05     1    9.5387e+05     0.22119     0.64378     0.64378     0.64378     0.64378
    Error                 7.7623e+07    18    4.3124e+06                                                            
    (Intercept):Time      2.1085e+07     1    2.1085e+07      5.7044    0.028088    0.028088    0.028088    0.028088
    PATIENT:Time          1.8057e+06     1    1.8057e+06     0.48853     0.49352     0.49352     0.49352     0.49352
    TREAT:Time            5.4909e+05     1    5.4909e+05     0.14856     0.70443     0.70443     0.70443     0.70443
    PATIENT:TREAT:Time    8.4605e+05     1    8.4605e+05      0.2289      0.6381      0.6381      0.6381      0.6381
    Error(Time)           6.6531e+07    18    3.6962e+06   

Scott MacKenzie 2021-4-29

编辑：Scott MacKenzie 2021-4-29

在 MATLAB Online 中打开

q2_data.xlsx

Problem solved!

T = readtable('q2_data.xlsx');
T.Properties.VariableNames = {'T1_TP1', 'T1_TP2', 'T1_TP3', 'T1_TP4', 'T1_TP5', 'T1_TP6', ... 
    'T2_TP1', 'T2_TP2', 'T2_TP3', 'T2_TP4', 'T2_TP5', 'T2_TP6' };
withinDesign = table([1 1 1 1 1 1 2 2 2 2 2 2]',[1:6 1:6]','VariableNames',{'Treatment','TimePeriod'});
withinDesign.Treatment = categorical(withinDesign.Treatment);
withinDesign.TimePeriod = categorical(withinDesign.TimePeriod);
rm = fitrm(T, 'T1_TP1-T2_TP6~1', 'WithinDesign', withinDesign);
ranova(rm, 'WithinModel', 'Treatment*TimePeriod') 

This script will generate an Anova table with the exact same SS, MS, df, F-statistics, and p-values as in the table above and in the table in my first comment on this question. For this, I rearranged the data originally posted by Giulia into an 11x12 matrix. See the attached spreadsheet for the rearranged data and some explanatory comments.

Full disclosure. The script above is not mine. It's a slightly modified verion of a script provided by Jeff Miller in response to a question I posted yesterday.

Adam Danz 2021-4-29

Thanks for the update, Scott. I saw Jeff's solution earlier today and also noticed the difference in withinDesign. I look forward to examining it closer and perhaps updating my answer when I have a chance to do so, shortly.

请先登录，再进行评论。