Efficient Way To Split Dataset Into Subsets

2 次查看(过去 30 天)
E
E 2017-11-18
评论: E 2017-11-26
Hello,
I need to split a large dataset (DxN numeric array) into multiple subsets. I can use the code below (where groupIDs is an Nx1 matrix of integer IDs - the group to which each datapoint belongs).
groups = unique(groupIDs);
for i = 1:numel(groups)
tempData = data(:,groupIDs==groups(i));
%do work on tempData
end
However, 90% of the run time of the above code is spent just creating tempData! That amounts to over a minute every time I want to do this. Is there a more efficient way to split data by groupIDs? I tried splitapply() but it doesn't seem to be any faster.
Are there any matlab gurus out there that know a trick? Thanks!
  5 个评论
Jos (10584)
Jos (10584) 2017-11-24
12Gb? That is quite a lot. If this doesn't fit in memory, swapping to disk is the likely bottleneck ...
E
E 2017-11-26
Thanks for the replies. I do have plenty of RAM left to spare, so it doesn't look like the hard drive is involved. Confirmed (re Greg) that using the output of unique is no better. For example, numeric indexing offers no improvement, and the indexing itself is not really the problem - it's probably the data copying:
disp('a. original (without "doing work")');
tic;
for i = 1:numel(groups)
tempData = data(:,groupIDs==groups(i));
end
toc
disp('b. numeric indexing');
idxs = cell(numel(groups));
for i = 1:numel(groups)
idxs{i} = find(groupIDs==groups(i));
end
tic;
for i = 1:numel(groups)
tempData = data(:,idxs{i});
end
toc
disp('c. logical operation alone');
tic;
for i = 1:numel(groups)
tempData = (groupIDs==groups(i));
end
toc
a. original (without "doing work")
Elapsed time is 4.590886 seconds.
b. numeric indexing
Elapsed time is 4.526391 seconds.
c. logical operation alone
Elapsed time is 0.066057 seconds.
There's gotta be another way - if I use a for loop with 3 million iterations it only takes 2 seconds longer.

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Scope Variables and Generate Names 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by