How to remove outliers in a matrix, according to two different column entries?

2 次查看(过去 30 天)
Hellow, I'm a bit novice in matlab coding. And i require your assistance.
I have a 3250x3 numeric matrix as depicted below and I want to identify and remove the latencies which falls outside the +-0.5 from the mean for each subject. Next, I want to average the latencies in the column 3 according to the trialcode (column2) for each subject (column1) and output as a matrix. Finally, I want to run a repeated measures ANOVA (2x2) according to the trial code.
I require assistance for the first two steps pimarily.
subject trialcode latency
8 4 340
8 4 328
8 3 218
8 4 338
8 3 213
8 4 328
8 3 254
8 4 323
8 4 340
8 3 273
9 3 580
9 4 363
9 4 371
9 3 374
9 3 383
9 3 302
9 4 406
9 3 390
9 3 380
9 3 366
9 4 468
I want to remove outliers for each subject across each trial code.
I tried the following codes which did not work :
[K, ~, G] = unique(Experiment1engS1(:, 1:2), 'rows')
mean= rmoutliers(K(:,3),'center','mean','ThresholdFactor', 2.5)
I also tried the for function:
Subject=[999];
% trialcode (1=mask_cong, 2=mask_incong, 3=nomask_cong, 4=nomask_incong)
trialcode = [999];
% Latency
latency = [999];
%calcolo delle medie
for i = 1:160:3250
%Calcolo medie
SUB_temp = mean(Experiment1engS1(i:i+159,1));
trialcode_temp = mean(Experiment1engS1(i:i+159,2));
latency_temp = rmoutlier(Experiment1engS1(i:i+159,3));
%scrivo nelle matrici
Subject=[Subject; SUB_temp];
trialcode = [trialcode; trialcode_temp];
latency = [latency; latency_temp];
end
This does not work, as some subjects don't have a total of 160 trials, as the data was pre processed to remove error trials.
I tried to use the splitapply, unique and rmoutlier, with no luck!
K= splitapply(@rmoutlier,Experiment1engS1(:,3),unique(Experiment1engS1(:, 1:2), 'rows'))
Kindly suggest what can be done. Thank you.

回答(1 个)

Vidhi Agarwal
Vidhi Agarwal 2024-11-28
The error you're encountering suggests that the groupsummary function is expecting a table or a dataset array, but it's receiving a standard numeric matrix instead. To resolve this issue, try converting matrix into a table format before using groupsummary.
To do this follow the given below steps:
  • Convert the filtered data matrix into a table format.
  • Apply "groupsummary" on the table.
The revised code for the same is given below:
% Sample data
data = [
8 4 340; 8 4 328; 8 3 218; 8 4 338; 8 3 213;
8 4 328; 8 3 254; 8 4 323; 8 4 340; 8 3 273;
9 3 580; 9 4 363; 9 4 371; 9 3 374; 9 3 383;
9 3 302; 9 4 406; 9 3 390; 9 3 380; 9 3 366;
9 4 468
];
% Extract columns
subjects = data(:, 1);
trialcodes = data(:, 2);
latencies = data(:, 3);
% Define a function to remove outliers
removeOutliers = @(latencies) latencies(abs(latencies - mean(latencies)) <= 0.5 * std(latencies));
% Group by subject and trialcode
[G, ~] = findgroups(subjects, trialcodes);
% Apply the function to each group
cleanedData = splitapply(@(latencies) {removeOutliers(latencies)}, latencies, G);
% Reconstruct the data matrix without outliers
filteredData = [];
for i = 1:length(cleanedData)
if ~isempty(cleanedData{i})
group = unique(data(G == i, 1:2), 'rows');
filteredData = [filteredData; repmat(group, size(cleanedData{i}, 1), 1), cleanedData{i}];
end
end
% Convert filtered data to a table
filteredTable = array2table(filteredData, 'VariableNames', {'Subject', 'TrialCode', 'Latency'});
% Calculate the mean latency for each subject and trialcode
averageLatencies = groupsummary(filteredTable, {'Subject', 'TrialCode'}, 'mean', 'Latency');
% Display the result
disp(averageLatencies);
To understand more about "groupsummary" refer to the following documentation:
  • https://www.mathworks.com/help/matlab/ref/double.groupsummary.html
Hope this helps!

类别

Help CenterFile Exchange 中查找有关 NaNs 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by