I understand that you have variables in the dataset that are functionally identical but have different variable names. Now when doing group analysis, you wanted to group these variables and consider them as a single variable and you also wanted to do the same for a different set of variables simultaneously.
If your task is to merge the two over 50 variables and the two 21-50 variables , and not merge all four of them, then you have two use two different “regexp”, one will merge the two over 50 variables and another “regexp” will merge the two 21-50 variables together.
I am also providing the updated code for the reference:
protocols = groupcounts(B, "Protocol");
protocols = sortrows(protocols, "GroupCount", "descend");
idx_over_50 = ~cellfun(@isempty, regexp(protocols.Protocol(:), '(chest.*abd.*pel.*over.*50|cap.*w.*over.*50)'));
B.idx_over_50 = ismember(B.Protocol, protocols.Protocol(idx_over_50));
B.Protocol(B.idx_over_50) = {'CAP w/ contrast over 50 kg'};
idx_21_to_50 = ~cellfun(@isempty, regexp(protocols.Protocol(:), '(chest.*abd.*pel.*21.*50|cap.*w.*21.*50)'));
B.idx_21_to_50 = ismember(B.Protocol, protocols.Protocol(idx_21_to_50));
B.Protocol(B.idx_21_to_50) = {'CAP w/ contrast 21 to 50 kg'};
B{:, (~cellfun(@isempty, (strfind(B.Properties.VariableNames, 'idx'))))} = [];
You can also refer to the MATLAB documentation for "regexp" to obtain more information on its usage and syntax. The link is provided below: -