Speed up a procedure that uses ACCUMARRAY and UNIQUE functions

7 次查看（过去 30 天）

显示更早的评论

Igor Dakic 2018-5-27

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/402736-speed-up-a-procedure-that-uses-accumarray-and-unique-functions

评论： Jan 2021-2-20

在 MATLAB Online 中打开

Given is a matrix A, e.g:

A=[1 2; 2 3; 2 5; 3 1; 4 2; 5 1; 5 3; 5 3; 5 5; 6 1; 7 1; 7 1; 7 2]

I would like to get a new matrix B, where the first column is a vector containing the unique elements of the first column of matrix A, and the second column is a vector obtained by adding the values of the second column of matrix A having the same value in the first column of matrix A for a given unique element.

One way to do this is using the following code:

[B1,~,b1_indx] = unique(A(:,1));
B2= accumarray(b1_indx,A(:,2));
B= [B1, B2];

However, this operation is relatively slow when the size of matrix A is rather large. Is there a way to do the same procedure in a faster way, e.g., without using UNIQUE function?

Thanks!

11 个评论
显示 9更早的评论隐藏 9更早的评论

Igor Dakic 2018-5-28

@Jan: Please find attached an example of matrix A. The largest element in A should not be higher than 2250, so the range would be [0,2250].

Jan 2018-5-28

编辑：Jan 2018-5-28

There is no attached matrix. -- Ah, I've found one in the comment below. Please post relevant data in the question, because it is not easy to search them in comments to answers.

请先登录，再进行评论。

请先登录，再回答此问题。

回答（3 个）

Walter Roberson 2018-5-27

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/402736-speed-up-a-procedure-that-uses-accumarray-and-unique-functions#answer_322104

You could try https://www.mathworks.com/help/matlab/ref/splitapply.html together with findgroups()

4 个评论
显示 2更早的评论隐藏 2更早的评论

Guillaume 2018-5-27

Not only does splitapply calls accumarray but it only uses it to build a cell array of indices to group. The actual applying of the function is done with a traditional loop. splitapply is guaranteed to be much slower than accumarray.

Igor Dakic 2018-5-27

So the proposed approach is the optimal one?

请先登录，再进行评论。

John D'Errico 2018-5-27

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/402736-speed-up-a-procedure-that-uses-accumarray-and-unique-functions#answer_322112

编辑：John D'Errico 2018-5-27

在 MATLAB Online 中打开

But how much more than 255? You keep being almost evasive in answering questions.

For example, this would work, IF the elements of A are ALWAYS purely positive integers, of a limited size:

A=[1 2; 2 3; 2 5; 3 1; 4 2; 5 1; 5 3; 5 3; 5 5; 6 1; 7 1; 7 1; 7 2]
B2 = accumarray(A(:,1),A(:,2));
B1 = find(B2);
B = [B1,B2(B1)];

The result is the same. But, it requires that ALL elements of the first column of A are positive integers, that none of them are HUGE, that none of the elements of B(:,2) are negative or zero (because then the find may get screwed up. And the find MAY be necessary.)

A = randi([1,100],[1e6,2]);
tic
B2 = accumarray(A(:,1),A(:,2));
B1 = find(B2);
B = [B1,B2(B1)];
toc
Elapsed time is 0.048715 seconds.
tic
[B1,~,b1_indx] = unique(A(:,1));
B2= accumarray(b1_indx,A(:,2));
B= [B1, B2];
toc
Elapsed time is 0.226216 seconds.

So significantly faster. HOWEVER, if the elements of the first column of A were huge, so on the order of 1e10 or so? Then what I have shown you would be completely worthless. Likewise, if some of the elements of B(:,2) are zero or negative? Again, the wrong answer may be the result.

The point is, algorithms are very often dependent on the properties of the data being fed to it. I can actually think of another algorithm that would work efficiently under different assumptions about the characteristics of A. But if we are given no serious clue as to what to expect, then it is impossible to be helpful. We are just making random guesses then.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Igor Dakic 2018-5-28

matrix_A.mat

@John: Thanks for your help. I am attaching an example of matrix A. The second column is the probability vector, so there might be some values that are equal to zero. Regarding the size of A, the largest element in A should not be higher than 2250. Hope this helps.

请先登录，再进行评论。

Jan 2018-5-28

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/402736-speed-up-a-procedure-that-uses-accumarray-and-unique-functions#answer_322245

在 MATLAB Online 中打开

I will C-mex this in the evening. But here the idea of an efficient solution:

Data = load('matrix_A.mat');
A    = Data.A;
n    = max(A(:, 1));
R    = zeros(n, 1);
for k = 1:size(A, 1)
  a1    = A(k);
  R(a1) = R(a1) + A(k, 2);
end
index = find(R);
B = [index(:), R(index)];

This is 200 times slower than the accumarray approach for the test data provided by John. But in C this might be faster.

4 个评论
显示 2更早的评论隐藏 2更早的评论

wei zhang 2021-2-20

编辑：wei zhang 2021-2-20

@Jan My inputs are a large COO format sparse matrix. I have a same problem with accumarray based on 2d index. The details are in the link . I wrote a c++ mex, which is similar to your idea to speed up, but failed. I expected you could give me some suggestion.

Besides, what is a lookup table? similar to hash map?

Do you think the cuda/ thrust would be a better attempt?

Jan 2021-2-20

在 MATLAB Online 中打开

What failed in your MEX file? Is it a programming problem or is the speed too low?

A lookup table us very efficient, if there is a "small" set of data, which can be used as an index. Example:

% Check is value is member of a set:
Set  = uint8(randperm(255, 100));
Data = randi([1, 255], 1, 1e7, 'uint8');
tic;
match1 = ismember(Data, Set);
toc
tic;
LUT = false(1, 255);
LUT(Set) = true;
match2   = LUT(Data);
toc
% Elapsed time is 0.315920 seconds.    ISMEMBER
% Elapsed time is 0.061450 seconds.    Lookup table

This is can be applied to modify the values also, e.g. if the elements of an RGB image.

RGB = randi([0,255], 4000, 3000, 3);  % Large data set
LUT    = log((0:255).^3);   % Expensive, but small data set
Result = LUT(RGB + 1);     

请先登录，再进行评论。

请先登录，再回答此问题。

类别

MATLAB Language Fundamentals Matrices and Arrays Matrix Indexing

在 Help Center 和 File Exchange 中查找有关 Matrix Indexing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Speed up a procedure that uses ACCUMARRAY and UNIQUE functions

11 个评论
显示 9更早的评论隐藏 9更早的评论

回答（3 个）

4 个评论
显示 2更早的评论隐藏 2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Speed up a procedure that uses ACCUMARRAY and UNIQUE functions

11 个评论 显示 9更早的评论隐藏 9更早的评论

回答（3 个）

4 个评论 显示 2更早的评论隐藏 2更早的评论

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

4 个评论 显示 2更早的评论隐藏 2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

11 个评论
显示 9更早的评论隐藏 9更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论