Remove duplicate variables depending on a second variable

8 次查看(过去 30 天)
Dear experts, I have a list of variables where I need te remove duplicate variables. However, in case of duplicate variables I want to keep the varibles that have value 1 in the second column. In cases when there are multiple duplicates with a 1 then it needs to keep randomly only one variable. See example below: Here I want to keep the variable BG1028 where the data in the third column is 1.3. For BG1030, I want to keep the variable with 3.0 or 0.3 in the third column. I hope it is clear. Im puzzling how to do this. This is the code I came up with so far.
ppn(:,1) = {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1030';'BG1030';'BG1030';'BG1030'};
ppn(:,2) = {'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'};
ppn(:,3) = {'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'};
% find duplicates
ppn2 = ppn(:,1);
idx = find(strcmp(ppn2(1:end-1),ppn2(2:end)))+1;
%remove duplicates
ppn((idx),:) = [];

采纳的回答

Kirby Fears
Kirby Fears 2015-9-21
Hi Marty,
Try the code below.
% Defining ppn (all at once)
ppn = [ {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'},... % start col 3
{'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'}];
% Storing ppn column 2 as numerical values
bPpn=cell2mat(cellfun(@(c)str2double(c),ppn(:,2),...
'UniformOutput',false));
% Deleting all duplicates with 0 in bPpn
idx = strcmp(ppn(1:end-1,1),ppn(2:end,1));
delidx = ([idx;false] | [false;idx]) & ~bPpn;
ppn(delidx,:)=[];
clear bPpn idx delidx;
% Get names of remaining duplicates
chooseNames = ppn([strcmp(ppn(1:end-1,1),ppn(2:end,1));false],1);
% Loop over chooseNames and keep one at random
if numel(chooseNames)>0,
for j=1:numel(chooseNames),
dupidx=find(strcmp(chooseNames{j},ppn(:,1)));
dupidx(randi(numel(dupidx)))=[];
ppn(dupidx,:)=[];
end,
end,
Hope this helps.
  2 个评论
Marty Dutch
Marty Dutch 2015-9-22
Hi Kirby,
Thanks for your response. And this works perfectly! Although I forgot to mention something... The script you've written deletes duplicates when they have a zero. In cases when there are multiple duplicates with a zero then it needs to keep randomly only one variable.
I really appreciate your time helping me! I'll have a look at your script and maybe I can adapt it on my own.
Marty Dutch
Marty Dutch 2015-9-22
Wait, it works now. I just deleted this part of your code:
% Deleting all duplicates with 0 in bPpn
idx = strcmp(ppn(1:end-1,1),ppn(2:end,1));
delidx = ([idx;false] | [false;idx]) & ~bPpn;
ppn(delidx,:)=[];
clear bPpn idx delidx;

请先登录,再进行评论。

更多回答(1 个)

the cyclist
the cyclist 2015-9-21
This is not the world's most efficient code, but is a very straightforward implementation of what you want (or at least my understanding of it). It displays the indices you want to keep.
It's not documented at all, but I tried to use some intuitive variable names, so maybe you can figure it out.
ppn(:,1) = {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1030';'BG1030';'BG1030';'BG1030'};
ppn(:,2) = {'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'};
ppn(:,3) = {'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'};
[unique_ppn,~,indexFromUniqueBackToAll] = unique(ppn(:,1));
number_unique_ppn = numel(unique_ppn);
indices_to_keep = [];
for np = 1:number_unique_ppn
index_to_this_ppn = find((indexFromUniqueBackToAll==np));
if numel(index_to_this_ppn) == 1
indices_to_keep = [indices_to_keep; index_to_this_ppn];
else
remove_zero_index = ismember(ppn(index_to_this_ppn,2),'0');
index_to_this_ppn(remove_zero_index) = [];
random_one_to_keep = index_to_this_ppn(randi(numel(index_to_this_ppn)));
indices_to_keep = [indices_to_keep; random_one_to_keep];
end
end
indices_to_keep

类别

Help CenterFile Exchange 中查找有关 Filter Banks 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by