Remove duplicate variables depending on a variable in a second column

2 次查看(过去 30 天)
Dear experts, I have a list of variables where I need to remove duplicate variables based on the variable in column 2. Variables with a '1' in column 2 are of better quality than variables with a '0'.
1) In case of duplicate variables, I want to keep the variables that have value 1 in the second column. In cases when there are multiple duplicates with a 1 then it needs to keep randomly only one variable. See example below: Here I want to keep the variable BG1028 where the data in the third column is 1.3. For BG1030, I want to keep the variable with 3.0 or 0.3 in the third column.
2) In case of duplicate variables which all have a zero in the second column then it needs to keep randomly only one variable. See example below: I need to keep one variable of BG1027 (random choice).
I hope it is clear. Im puzzling how to do this. This is the code I came up with so far with help from Kirby Fear.
ppn = [ {'BG1026';'BG1027';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'0';'1';'0';'0';'1';'0';'0';'1';'0';'1'},... % start col 3
{'1.2';'2.2';'5.2';'4.2';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3';'1.7'} ];
% Storing ppn column 2 as numerical values
bPpn=cell2mat(cellfun(@(c)str2double(c),ppn(:,2),...
'UniformOutput',false));
% Get names of duplicates
chooseNames = ppn([strcmp(ppn(1:end-1,1),ppn(2:end,1));false],1);
% Loop over chooseNames and keep one at random.
if numel(chooseNames)>0,
for j=1:numel(chooseNames),
dupidx=find(strcmp(chooseNames{j},ppn(:,1)));
dupidx(randi(numel(dupidx)))=[];
ppn(dupidx,:)=[];
end
end

采纳的回答

WAT
WAT 2015-9-24
Give something like this a try:
ppn = [ {'BG1026';'BG1027';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'0';'1';'0';'0';'1';'0';'0';'1';'0';'1'},... % start col 3
{'1.2';'2.2';'5.2';'4.2';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3';'1.7'} ];
[uniqNames, ia, ic] = unique(ppn(:,1));
ia = [ia; 1+length(ic)];
ppn_out = {}; % initialize output
for i = 1:length(uniqNames);
sub = ppn(ia(i):ia(i+1)-1,:); % find only entries with uniqNames(i)
sub = sub(find(cell2mat(sub(:,2)) == max(cell2mat(sub(:,2)))),:); % find only those entries with the maximal value in col 2
ppn_out = [ppn_out; sub(randi(size(sub,1)),:)]; % select one entry at random, put it in ppn_out
end
  3 个评论
WAT
WAT 2015-9-24
编辑:WAT 2015-9-24
That's odd, it's also skipping BG1026 for you. It seems to be behaving fine for me, I wonder if there's something goofy in the unique() command? (I'm on R2013a or R2015a and it works fine on both)
Try getting rid of all the semicolons, it's short enough that it should be easy to follow what the code is doing.
Marty Dutch
Marty Dutch 2015-9-25
Yes, youre right. I tried running it on 2013b and in 2015a and in both it seems to work fine now... Thanks for the response. It does not worked in the 2012b version.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Elementary Math 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by