After exploring ndSparse, I'm still not certain that approach would offer any improvement over option 3 with B=accumarray(I, V.*beta(K).*X(J), N, N), as is suffers from the same problem of repeated multiplication before summation.
Without resorting to FEX/MEX, can the overhead of repeated calls to sparse() be avoided with operations on I, J, V directly?
4 次查看(过去 30 天)
显示 更早的评论
I have a time-marching problem bottlenecked by matrix multiplication , where the sparse matrix is the sum of many time-independent sparse matrices with time-dependent scalar coefficients:
I'm aware of FEX solutions that would enable a 3D to be defined once prior to time-marching, though I would prefer to rule out native MATLAB first. My current thinking is to supplement the I, J, V vectors with an index vector K, and do one of the following:
- Generate a cell array of sparse matrices along index K, with multiplied with bsxfun, similar to this. (Probably the slowest option)
- Sparse matrix construction with B=sparse(I, J, V.*beta(K), N, N)*X
- Avoid sparse matrix entirely with B=accumarray(I, V.*beta(K).*X(J), N, N)
I haven't tested option 1, but option 3 with accumarray runs about 50x slower than option 2 (with or without issparse option). This makes sense, as the multiplication with X in accumarray is done before accumulation, hence on a much larger array.
However, option 2 still appears to be wasting performance. For example, sparse(I, J, V, N, N)*X runs about 50x slower than basic A*X. My initial interpretation of this is that much of the time in sparse is spent on unavoidable summation of common terms in I,J,V. I cant profile sparse internally, but tested this by accumulating V prior to sparse(I, J, V, N, N)*X, which comes out to ~7x slower than A*X.
My conclusion is that there is still some significant & (potentially) avoidable overhead from calling sparse many times unrelated to summation of duplicate indices, and the larger piece of summation that is unavoidable due to the summation of time-dependent terms co-located in .
To boil this down, I have two questions I cannot answer:
- Considering the pattern of summation of duplicate indices is always the same (static I/J), can the summation overhead in sparse be improved?
- What is the source of the remaining overhead in sparse and can it be avoided?
20 个评论
Bruno Luong
2022-8-7
@James Tursa yes you get it right. However
- In the range of density/size and N_k (m in your notation) your method is still 50% slower than the accumarray solution we come up with,
- The runtime scale with N_k and is not bounded by the assumption of limited "bandwidth" (density of A0) stated by @Joel Lynch. It is probably an important requirement for his application.
采纳的回答
Bruno Luong
2022-8-7
编辑:Bruno Luong
2022-8-7
EDIT: Here is the final code. Method used in B2 is the fatest;
M = 10000;
N = 10000;
N_k = 100;
A_k = cell(1,N_k);
for k=1:N_k
A_k{k} = sprand(M,N,0.0005);
end
N_k = length(A_k);
[M,N] = size(A_k{1});
beta = rand(N_k,1);
X = rand(N,1);
I = cell(1,N_k);
J = cell(1,N_k);
V = cell(1,N_k);
K = cell(1,N_k);
for k=1:N_k
[I{k},J{k},V{k}] = find(A_k{k});
K{k} = k + zeros(size(I{k}));
end
I = cat(1,I{:});
J = cat(1,J{:});
K = cat(1,K{:});
V = cat(1,V{:});
[IJ,~,ic] = unique([I,J],'rows');
I2 = IJ(:,1);
J2 = IJ(:,2);
Y = sparse(ic, K, V, size(IJ,1), N_k);
tic
B1=sparse(I, J, V.*beta(K), M, N)*X ;
toc
tic
B3 = sparse(I2,J2,Y*beta,M,N)*X;
toc
tic
B2 = accumarray(I2,(Y*beta).*X(J2),[M,1]);
toc
5 个评论
Bruno Luong
2022-8-7
编辑:Bruno Luong
2022-8-7
It actualy entirely your method and credit. I simply put the code so as readers can test and understand better the problem and solutions.
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!