How to permute the rows and columns in a sparse GPU matrix?

113 次查看(过去 30 天)
How can I permute the elements of a sparse matrix in the GPU? I mean, for a CPU array the command
M=speye(2)
M([2,1],:)
permutes the rows 1 and 2. However for a GPU array
Mg=gpuArray(M)
Mg([2,1],:)
returns the error
Sparse gpuArray matrices only support referencing whole rows or whole columns.
Of course, I can use a permutation matrix:
Mg=gpuArray(M)
P=gpuArray(sparse([0,1;1,0]));
P*Mg
however, the first method is faster in the CPU than the one using a permutation matrix. I would want something analogous in the GPU.
EDIT
Comparison of both methods in CPU with the matrix structured I'm dealing with:
N=10;
lgth = 1/2*(N+1)*(N+2); %size of the matrices
blocks0 = cell(1, N+1);
for k0 = 1:N+1
A = randn(N+2-k0) + 1i * randn(N+2-k0);
H0 = (A + A') / 2;
blocks0{k0} = sparse(H0);
end
% Construct one sparse block diagonal matrix
M0= blkdiag(blocks0{:});
blocks1 = cell(1, N+1);
for k1 = 1:N+1
B = randn(N+2-k1) + 1i * randn(N+2-k1);
H1 = (B + B') / 2;
blocks1{k1} = sparse(H1);
end
% Construct another sparse block diagonal matrix
M1= blkdiag(blocks1{:});
Id = sparse(eye(lgth));
% First Permutation vector
v = [1, cumsum(N+2-(1:N)) + 1];
p = v;
for jj=1:N
p = [p (v(jj)+1):(v(jj+1)-1)];
end
P = Id(p,:);
% Second Permutation vector
p2 = [N+1 1:N];
for jj=2:N
a = v(jj):(v(jj+1)-1);
p2 = [p2 a(end) a(1:end-1)];
end
p2 = [p2 lgth];
P2 = Id(p2,:);
M = sparse(lgth, lgth);
M(1,1) = 1;
%Elapsed time after 200000 repetitions, first method
tic
for m0=1:200000
M0.*M(p2,:) + M1.*M(p,p);
end
toc
%Elapsed time after 200000 repetitions, second method
tic
for m1=1:200000
M0.*(P2*M) + M1.*(P*M*P');
end
toc
Elapsed time is 1.630648 seconds.
Elapsed time is 8.218439 seconds.
  6 个评论
Mike Croucher
Mike Croucher 2024-12-18,13:12
Would you mind showing all your timings please complete with the sparse matrix so we can reproduce it? You say the first method is faster in the CPU but is it faster than the permutation matrix on the GPU?
ANGEL
ANGEL 2024-12-18,14:38
@Mike Croucher Well, my code is a bit long involving some loops, but I have edited my question with a code for CPU comparison of the elapsed time for both methods, with the kind of block matrices and operations that I have (you can increase the dimmension of the matrices by increasing the value of N). The code shows that the first method is almost an order of magnitude faster and than the one using permutation matrices. I can't give you the same comparison on the GPU because I don't know how to implement the first method.

请先登录,再进行评论。

采纳的回答

Matt J
Matt J 2024-12-18,23:55
编辑:Matt J 2024-12-19,2:42
The permAndRebuild() function below is a reimplementaiton of the row permutation operation M(p,:) that also works on the GPU. On my CPU, I find that it is only slightly slower than doing M(p,:) explicitly, so maybe it is a close enough analog for you to accept as a benchmark on the GPU as well.
Regardless, I find that multiplication with a permutation matrix reigns supreme, both on the CPU and on the GPU (NVIDIA GeForce RTX 4080 SUPER):
N=1e4;
M=sprand(N,N,0.11);
p=randperm(N); %permutation
ip=1:N; ip(p)=ip; %inverse permutation
P=speye(N); P=P(p,:); %permuation - matrix form
isequal(M(p,:), P*M, permAndRebuild(M,ip))
ans =
logical
1
%CPU performance
timeit(@() M(p,:))
ans =
0.4020
timeit(@() P*M)
ans =
0.1508
timeit(@() permAndRebuild(M,ip))
ans =
0.4899
%GPU performance
M=gpuArray(M);
p=gpuArray(p); ip=gpuArray(ip); P=gpuArray(P);
gputimeit(@() P*M)
ans =
0.0212
gputimeit(@() permAndRebuild(M,ip))
ans =
0.0914
function Mperm=permAndRebuild(M,ip)
[I,J,S]=find(M);
Mperm=sparse(ip(I),J,S);
end
  1 个评论
ANGEL
ANGEL 2024-12-22,11:20
Thank you. It is still a bit slower than the Matlab command, but it seems true that for very large sizes the permutation matrix can be faster.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Sparse Matrices 的更多信息

产品


版本

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by