Index to elements not listed in numeric index?

25 次查看(过去 30 天)
Some functions return lists of indices, such as unique and ismember. Let's say I want to index to every element that isn't listed:
A = [1 1 2 2 3 3];
[uA, idxuA] = unique(A); % uA = [1 2 3], idxuA = [1 3 5]
idxDuplicates = true(length(A),1);
idxDuplicates(idxuA) = false;
duplicatesInA = A(idxDuplicates);
But it seems like that isn't very efficient and it would be nice to do something like-
duplicatesInA = A(~idxuA);
I really have two questions for the matlab/coding experts:
(1) Is there an efficient and direct way to use the '~' for a list of indices
(2) Is it worth it to optimize this or should I just deal with the extra few lines of code?
  2 个评论
Rik
Rik 2018-11-25
I don't really consider myself to be an expert, but I'll still add my thoughts on this:
  1. Not that I know of. If it were a logical vector this would indeed be the way to do it, but since linear indices are returned, this might be the only way.
  2. Longer code can actually be more optimal, and more readable. That being said, as long as you are aware where the bottlenecks of your code are, you are miles ahead of many users. Unless your function is doing this millions of times in a loop, I don't think it is worth the extra effort to optimize this particular issue.

请先登录,再进行评论。

采纳的回答

Andrew Landau
Andrew Landau 2018-11-25
编辑:Andrew Landau 2018-11-25
Thanks everyone. I was looking for the function Matt J suggested - setdiff. However, I did a little profiling to check speeds. Making a true array and setting the indexed elements to false is faster than setdiff by an order of magnitude. So, right you are Rik. Longer code more optimal in this case.
Here's the code I used if you want to test it:
% Set up some random data for testing
% ** the result was robust to changing N and K
N = 10000;
K = 500;
data = randn(N,1);
idx = randperm(N,K);
% if anyone has a better way to preallocate cell arrays please tell me!
P = 1000;
timing = cell(1,2);
timing = cellfun(@(c) zeros(P,1), timing, 'uni', 0);
for p = 1:P
% Fastest by order of magnitude
tic
i1 = true(1,N); % define boolean array
i1(idx) = false; % set all elements from index to false
d11 = data(i1); % keep everything that wasn't in the index
timing{1}(p) = toc;
% Ten times slower
tic
i2 = setdiff(1:N,idx); % Get index of everything from 1:N not in idx
d12 = data(i2); % setdiff(1:N,idx) as argument to data() had comparable timing
timing{2}(p) = toc;
end
avgtime = cellfun(@mean, timing, 'uni', 1);
fprintf('Boolean array: %.2fµs -- Setdiff: %.2fµs -- Ratio: %.2f\n', avgtime(1)*1000000, avgtime(2)*1000000, avgtime(2)/avgtime(1));

更多回答(2 个)

Matt J
Matt J 2018-11-25
编辑:Matt J 2018-11-25
Your way is probably the most efficient, but an alternative with shorter syntax is,
duplicatesInA = A( setdiff(1:numel(A), idxuA) );

Matt J
Matt J 2018-11-25
编辑:Matt J 2018-11-25
Is it worth it to optimize this or should I just deal with the extra few lines of code?
There's never a reason to deal with extra lines of code if it's an operation that you do often. That's what mfunctions are for.
function Ac = complement(A,idx)
Ic=true(numel(A),1);
Ic(idx)=false;
Ac=A(lc(idx));
end

类别

Help CenterFile Exchange 中查找有关 Loops and Conditional Statements 的更多信息

产品


版本

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by