Efficiently populating an array without for loops
26 次查看(过去 30 天)
显示 更早的评论
Hi Everyone,
I have a list of data with 10,000,000 rows and 3 columns. The columns correspond to the shape, size, and color of an object, which is indexed with a number. There are 100 shapes, 100 sizes, and 50 colors.
I want to create a matrix (100x100x50) that essentially stores the count of each object type, kind of like a histogram for unique objects.
Rather than my following code, which is too slow to run because of the for-loops, does anyone know of a way to complete the same operation using direct matrix operations? It seems these comparisons should be relatively fast, but are extremely slow in Matlab the way I am doing it.
ObjectTypes = zeros(100,100,50);
for Shape=1:100
for Size=1:100
for Color=1:50
ObjectTypes(Shape,Size,Color) = size(MyData(MyData(:,1) == Shape & MyData(:,2) == Size & MyData(:,3) == Color),1);
end
end
end
0 个评论
采纳的回答
Geoff
2012-5-27
Hah... So an alternative in Order(N) time...
for n = 1:size(MyData,1)
row = MyData(n, [1,2,3]);
ObjectTypes(row(1),row(2),row(3)) = ObjectTypes(row(1),row(2),row(3)) + 1;
end
更多回答(2 个)
Geoff
2012-5-27
Yeah that's searching through your data an awful lot every time you do the == comparisons. The way I do this kind of thing when populating a matrix from database results is to have the data sorted by two variables, and then use diff and find to get the data ranges.
So start with this:
MyData = sortrows(MyData);
Grab out the begin and end index for each group of values in column one.
% Partition by shape
begin1 = [1; 1+find(diff(MyData(:,1)))];
end1 = [begin1(2:end)-1; size(MyData,1)];
Now you can combine these into a loop variable, so each time through the loop will give you a 2x1 vector containing the start and end range. You do the same thing again with column 2. Finally I use accumarray to count up all the colours for a given size and shape:
% Process the Shape partitions
for r1 = [begin1, end1]'
Shape = MyData(r1(1), 1); % Single Shape
% Partition by Size
idx1 = r1(1):r1(2);
col2 = MyData(idx, 2);
begin2 = [1; 1+find(diff(col2))];
end2 = [begin2(2:end)-1; numel(col2)];
% Process the Size partitions
for r2 = [begin2, end2]'
Size = col2(r2(1)); % Single Size
idx2 = r1(1)+r2(1):r1(1)+r2(2);
% Count up all the Color occurrences for Shape and Size
Color = MyData(idx2, 3);
colorCount = accumarray(Color, ones(numel(Color),1));
ObjectTypes(Shape, Size, 1:max(Color)) = colorCount;
end
end
I would hope this is faster than your current loop, although there are probably clever ways to use accumarray without all the looping guff I've done. Apologies if there are errors in this code. I just hacked it straight into my web browser =)
1 个评论
Walter Roberson
2012-5-27
Are the numbers for the shape, size, color consecutive integers each starting from 1? If they are then the code can be reduced to
ObjectTypes = accumarray(MyData, 1);
If not then you can create the consecutive integers by using the thiree-output version of unique().
[ushape, junk, shapeidx] = unique(MyData(:,1));
[ucol, junk, colidx] = unique(MyData(:,2));
[usize, junk, sizidx] = unique(MyData(:,3));
ObjectTypes = accumarray( [shapeid(:), colidx(:), sizidx(:)], 1);
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Distribution Plots 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!