how to calculate cosine similarity on a codistributed array?

Question

Frank 2012-7-2

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/42512-how-to-calculate-cosine-similarity-on-a-codistributed-array

I have to calculate the cosine similarity between the rows of an array. It works in a serial execution with pdist, but this is not working when working with codistributed arrays on MDCS. In the parallel setup, 4 compute nodes are used and the (large) array is distributed row-wise over the 4 nodes. I wrote a naive function to calculate the cosine similarity, but it takes for ages, even with a small array it takes (too) long.

This is the test I use currently: I generate a random array

r = floor(rand(100, codistributor('1d', 1)))
q = cosineSimilarityNaive(r)

the code of the function:

function [res] = cosineSimilarityNaive(data)
% get the dimensions
[n_row n_col] = size(data);
% calculate the norm for each row
%
norm_r = sqrt(sum(abs(data).^2,2));
%
for i = 1:n_row
    % 
    for j = i:n_row
        %
        res(i,j) = dot(data(i,:), data(j,:)) / (norm_r(i) * norm_r(j));
        res(j,i) = res(i,j);
    end
end

Currently I have no idea on how to make it run faster, codistributed arrays on different nodes are necessary since the array is so large that is does not fit on 1 compute node. I did some testing on with svd on a distributed array over 4 nodes, and this works fine. I think I am missing something in my code, but currently I have no clue. Any tips?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Jill Reese 2012-7-2

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/42512-how-to-calculate-cosine-similarity-on-a-codistributed-array#answer_52333

在 MATLAB Online 中打开

It would be much more efficient to lump all of the multiplications together. Also, when you use for loops with codistributed arrays you need to use the drange command to make sure that the workers only operate on the data that they own. I think rewriting your code a bit will speed things up:

spmd
   % Create the data.  Don't use floor because that will return all zeros.
   r = rand(100,codistributor1d(1));
end
% Find the norm of each row
norm_r = sqrt(sum(abs(r).^2,2));
% get the dimensions
[n_row n_col] = size(data);
% Scale each row by its norm first.  
% Use drange so that each worker operates only on the data it owns/
spmd
   for i=drange(1:n_row)
      r(i,:) = r(i,:)/norm_r(i);
   end
end
% Transpose the data so we can use matrix multiplication to 
% perform the dot products all at once.  A transpose is cheap and 
% incurs no communication.  Of course this is only useful if you have 
% enough memory to store another copy of the local part on each worker.
tr = transpose(r);
res = r*tr;

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

how to calculate cosine similarity on a codistributed array?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

how to calculate cosine similarity on a codistributed array?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论