How can I perform large sparse matrix multiplication efficiently when one sparse matrix is block diagonal?
14 次查看(过去 30 天)
显示 更早的评论
Background:
I have a 3d matrix V of size [J,I,K], where J and I are large, say J=1e5 and I=1e4. I would like to get two matrices:
V_ik = sum(V,1);
V_jk = sum(V,2);
The 3d matrix V is actually very sparse (around 90% sparsity), but the full [J,I,K] matrix takes up too much memory usage.
To deal with this problem, I code V into a 2d sparse matrix V_sp = [V_1;V_2;...;V_I], where each block matrix V_i is a matrix of size [J,K]. The problem arises when I calculate V_ik and V_jk. From the view of matrix multiplication, I write the following code:
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
However, the sparse matrix kron(speye(I),ones(1,J)) is of size [I,J*I], and repmat(speye(J),1,I) is of size [J,J*I]. Either of them takes up 22.4G storage for J=1e5 and I=1e4.
My Question:
Is there an efficient way to perform this kind of matrix multiplication without too much memory usage? I tried using a for-loop to sum along the desired axis, the code with for-loop worked at the cost of around 5x more running time.
If there's no such efficient solution, can I code the 3d sparse matrix V in some other smart format such that I can reach a balance between the memory usage and running time? (I found ndSparse class that could store n-dimensional sparse arrays, but the class will store it internally as an ordinary 2D sparse array, and the sum function applied to it is not that fast.)
Thanks in advance for your help!
7 个评论
Bruno Luong
2021-10-28
The blockdiag idea seems to work well
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
V_blkdiag=sparse(i+(k-1)*I,j+(k-1)*J,V,I*K,J*K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
% James's method using V_blkdiag
tic
V_ik = reshape(sum(V_blkdiag,2),[I K]);
V_jk = reshape(sum(V_blkdiag,1),[J K]);
toc
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
采纳的回答
Bruno Luong
2021-10-27
编辑:Bruno Luong
2021-10-28
You might rethink of the storage of your data, such as this (tic/toc result obtained from run on TMW online server):
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
更多回答(3 个)
Matt J
2021-10-27
编辑:Matt J
2021-10-27
With ndSparse, did you try the summl() method? It's more memory intensive, but it can also be faster. Here's what I see in a simple test comparing ndSparse.sum() and ndSparse.summl().
>> A=ndSparse.sprand([1e4,1e4,10],0.1);
>> tic; sum(A,1);toc
Elapsed time is 10.697812 seconds.
>> tic; summl(A,1);toc
Elapsed time is 0.927524 seconds.
Bjorn Gustavsson
2021-10-27
Shouldn't straight summation be preferable for the calculations of sums rather than doing it by way of matrix multiplication. Perhaps something like this might work:
v_ik = zeros(I,K);
v_jk = zeros(J,K);
for i1 = 1:I,
v_ik(i1,:) = sum(v_sp((1+J*(i1-1)):(J*i1),:));
end
for j1 = 1:J,
v_jk(j1,:) = sum(v_sp(j1:J:end,:));
end
HTH
Matt J
2021-10-28
编辑:Matt J
2021-10-28
The typical value of K is 50.
Since the third dimension is so small, it's probably worthwhile just storing the data as a length-K cell array of JxI matrices. You can then just do the summations with,
cellfun(@(v) sum(v,1), V,'uni',0);
cellfun(@(v) sum(v,2), V,'uni',0);
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Matrix Indexing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!