Vec-trick implementation (multiple times)
显示 更早的评论
Dear all,
the question is related to Tensorproduct. Since the question was not answered as intended, i want to revisit the question.
Introduction:
Suppose you have a matrix vector multiplication, where a matrix C with size (np x mq) is constructed by a Kronecker product of matrices A with size (n x m) and B with size (p x q). The vector is denoted v with size (mp x 1) or its vectorized version X with size (m x p).
In two dimensions this operation can be performed with O(npq+qnm) operations instead of O(mqnp) operations, see Wikipedia.
Expensive variant (in case of flops):
Cheap variant (in case of flops):
Main question:
I want to perform many of these operations at ones, e.g. 2500000. Example: n=m=p=q=7 with A=size(7x7), B=size(7x7), v=size(49x2500000).
In Tensorproduct i have implemented a MeX-C version of the cheap variant which is quite slower than a Matlab version of the expensive variant provided by Bruno Luong.
Is it possible to implement the cheap version in Matlab without looping?
5 个评论
"Is it possible to implement the cheap version in Matlab without looping"
The method
B = matrix_xi.';
A = matrix_eta;
X = reshape(vector, size(A,2), size(B,1), []);
CX = pagemtimes(pagemtimes(A, X), B); % Reshape CX if needed
given in this thread is cheap version in MATLAB without looping! Granted it is no faster than the expensive method, but it's what you ask for, no ?
ConvexHull
2021-8-23
编辑:ConvexHull
2021-8-23
Your completely right. I over looked the "pagetimes" version.
I am still suprised about the fact that there is no cheap version which is able to outperform the expensive vectorized Matlab version (actually BLAS version).
I mean, the cheap variant is of order O(2*7*7*7)=O(686), whereas the expensive variant is of order O(7*7*7*7)=O(2401), in case of flops.
Thank you!
By the way i tried to optimize the C-code in Tensorproduct, which gave me 20% speed-up, however still not usefull.
Because smaller flops doesn't mean necessary faster. Memory access, cache, thread management are as well important, and which is fatest method probably depends on n=m=p=q.
ConvexHull
2021-8-23
编辑:ConvexHull
2021-8-23
Yeah that's definitly the case here.
The main problem is that, if you want to perform the Vec-trick multiple times in a vectorized fashion you have to reorder the datastructure. After applying AX you cannot perform a Matrix-Matrix multiplication directly with B.
Stupid Memory access O.o!
采纳的回答
ConvexHull
2021-8-24
编辑:ConvexHull
2021-8-25
Here is a pure intrinsic Matlab version without loops, however with two transpose operations and quite slow.
n=7;m=7;p=7;q=7;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,500000,5);
n = 5;
C = kron(B,A);
tic
for i=1:n
v1 = reshape(C*reshape(v,49,[]),size(v));
end
toc % Elapsed time is 0.456353 seconds
tic
for i=1:n
v2 = reshape(reshape(B*reshape((A*reshape(v,7,[])).',7,[]),7*2500000,[]).',7,[]);
end
toc % Elapsed time is 3.879752 seconds
max(abs(v1(:)-v2(:)))
% 1.4211e-14
22 个评论
According to my test, your code is slightly slower than pagemtimes method that is provided in other thread.
for i=1:n
X = reshape(v, size(A,2), size(B,1), []);
v3 = pagemtimes(pagemtimes(A, X), 'none', B, 'transpose');
end
That makes sense. I think pagetime will not do much different. However, only recently (R2020b) introduced.
ConvexHull
2021-8-24
编辑:ConvexHull
2021-8-24
@Bruno: Do you know a faster way tranposing a matrix in Matlab? :)
Actually MATLAB is smarther than you though, when you do
C = AA * BB.';
Matlab does not call transpose, it call Blas to do matrix multiplication without explicit transposition (no memory copying or moving).
So if you wonder if your code is not efficient because of .', the answer is NO.
I don't know what you mean.
- The ().' is far the most expensive operation no matter what is being done in the background.
- Reshape is for free.
- The small 7er matrix-matrix multiplication is cheaper than the 49er big one.
- By the way ()' or ().' are nearly same expensive.
n=7;m=7;p=7;q=7;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,500000,5);
n = 5;
tic
for i=1:n
vv = reshape(v,7,[]); %#ok<*NASGU>
end
toc % Elapsed time is 0.000186 seconds
tic
for i=1:n
vvv = A*vv;
end
toc % Elapsed time is 0.350487 seconds
tic
for i=1:n
vvvv = (vvv).';
end
toc % Elapsed time is 1.682334 seconds
tic
for i=1:n
vvvvv = reshape(vvvv,7,[]);
end
toc % Elapsed time is 0.000181 seconds
tic
for i=1:n
vvvvvvv = B*vvvvv;
end
toc % Elapsed time is 0.347840 seconds
tic
for i=1:n
vvvvvvvv = reshape(vvvvvvv,7*2500000,[]);
end
toc % Elapsed time is 0.000174 seconds
tic
for i=1:n
vvvvvvvvv = (vvvvvvvv).';
end
toc % Elapsed time is 1.470868 seconds
tic
for i=1:n
vvvvvvvvvv = reshape(vvvvvvvvv,7,[]);
end
toc % Elapsed time is 0.000148 seconds
No
CC = BB.'
alone take time. However with
CC = AA*BB.';
there is no transposition happens in the background.
As showed here (timings obtained by running the code on TMW server, R2021a)
AA = rand(49);
BB = rand(49,500000*5);
BBt = BB.';
tic
CC = AA*BB;
toc
Elapsed time is 0.631366 seconds.
tic
CC = AA*BBt.'; % MATLAB does not perform transposition then multiplication
% it does both by a Blas implementation
toc
Elapsed time is 0.631350 seconds.
tic
BBtt = BBt.';
CC = AA*BBtt;
toc
Elapsed time is 1.105448 seconds.
I understand your remarks. However, my operations are in following order:
reshape(B*reshape(A.')).'
I think the reshape between them is the problem. This makes it expensive.
Oh I see, I missread your code and the transposition is before the reshape.
In the case, yes the tranposition must be carried out explicitly by Matlab. Sorry but I don't know how one can accelerate the transposition.
Suprisingly, even two small 7er matrix-matrix multiplications are slower than one 49er matrix-matrix multiplication.
n=7;m=7;p=7;q=7;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,500000,5);
n = 5;
C = kron(B,A);
tic
for i=1:n
v1 = reshape(C*reshape(v,49,[]),size(v));
end
toc % Elapsed time is 0.456353 seconds
tic
for i=1:n
v2 = reshape(A*reshape(v,7,[]),size(v));
v3 = reshape(B*reshape(v,7,[]),size(v));
end
toc % Elapsed time is 0.683820 seconds
ConvexHull
2021-8-25
编辑:ConvexHull
2021-8-25
According to https://github.com/MichielStock/Kronecker.jl the vec-trick becomes useful for larger sizes than n=m=p=q=100.
According to my test, it is "cheap" method is faster from size 27.
stab = 5:5:100;
t1 = zeros(size(stab));
t2 = zeros(size(stab));
t3 = zeros(size(stab));
for i = 1:length(stab)
fprintf('%d/%d\n', i, length(stab));
s = stab(i);
n=s;
m=s;
p=s;
q=s;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,100000);
tic
C = kron(B,A);
v1 = reshape(C*reshape(v,s*s,[]),size(v));
t1(i) = toc;
tic
X = reshape(v, size(A,2), size(B,1), []);
v3 = pagemtimes(pagemtimes(A, X), 'none', B, 'transpose');
t3(i) = toc;
end
close all
semilogy(stab, [t1; t3]');
legend('Expensive method', 'Cheap method using pagemtimes');
xlabel('s');
ylabel('time [sec]');
grid on;

ConvexHull
2021-8-25
编辑:ConvexHull
2021-8-25
Unfortunately, my problem is restricted to sizes of max n=m=p=q=16. Rather smaller.
With the intrinsic method it is nearly the same.
stab = 5:5:100;
t1 = zeros(size(stab));
t2 = zeros(size(stab));
t3 = zeros(size(stab));
for i = 1:length(stab)
fprintf('%d/%d\n', i, length(stab));
s = stab(i);
n=s;
m=s;
p=s;
q=s;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,1000);
tic
C = kron(B,A);
v1 = reshape(C*reshape(v,s*s,[]),size(v));
t1(i) = toc;
tic
v2 = reshape(reshape(B*reshape((A*reshape(v,s,[])).',s,[]),s*1000,[]).',s,[]);
t3(i) = toc;
end
close all
semilogy(stab, [t1; t3]');
legend('Expensive method', 'Cheap method using intrinsic operations');
xlabel('s');
ylabel('time [sec]');
grid on;

Bruno Luong
2021-8-25
编辑:Bruno Luong
2021-8-25
Your curves do not clearly cross and not coherent with your finding for size = 7.
ConvexHull
2021-8-25
编辑:ConvexHull
2021-8-25
Could you make a test between pagetimes and the intrinsic one? Currently, I have no pagetimes available.
Would be great!
ConvexHull
2021-8-25
编辑:ConvexHull
2021-8-25
Yes perhaps the vector size (=1000) was too small. Note that today i use different computer.
Bruno Luong
2021-8-25
编辑:Bruno Luong
2021-8-25
There is an mtimesx in File exchange that you can use for older matlab that does similar task as pagemtimes.
AFAIK pagemtimes is slighly faster.
Results with 3 methods

stab = 5:5:100;
t1 = zeros(size(stab));
t2 = zeros(size(stab));
t3 = zeros(size(stab));
for i = 1:length(stab)
fprintf('%d/%d\n', i, length(stab));
s = stab(i);
n=s;
m=s;
p=s;
q=s;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,100000);
tic
C = kron(B,A);
v1 = reshape(C*reshape(v,s*s,[]),size(v));
t1(i) = toc;
tic
v2 = reshape(reshape(B*reshape((A*reshape(v,s,[])).',s,[]),s*1000,[]).',s,[]);
t2(i) = toc;
tic
X = reshape(v, size(A,2), size(B,1), []);
v3 = pagemtimes(pagemtimes(A, X), 'none', B, 'transpose');
t3(i) = toc;
end
close all
semilogy(stab, [t1; t2; t3]');
legend('Expensive method', 'Cheap method using transposition', 'Cheap method using pagemtimes');
xlabel('s');
ylabel('time [sec]');
grid on;
Thanks for the effort. There is a small typo in line "v2 = ... s*1000...". But i don't think it would influence the results much.
Add benchmark with mtimesx

Conclusion
- For version before R2020b, use expensive method for s < 44, use mtimesx otherwise;
- For version R2020b or later, use expensive method for s < 27, use pagemtimes otherwise.
stab = 5:5:100;
t1 = zeros(size(stab));
t2 = zeros(size(stab));
t3 = zeros(size(stab));
t4 = zeros(size(stab));
for i = 1:length(stab)
fprintf('%d/%d\n', i, length(stab));
s = stab(i);
n=s;
m=s;
p=s;
q=s;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,100000);
tic
C = kron(B,A);
v1 = reshape(C*reshape(v,s*s,[]),size(v));
t1(i) = toc;
tic
v2 = reshape(reshape(B*reshape((A*reshape(v,s,[])).',s,[]),[],s).',s,[]);
t2(i) = toc;
tic
X = reshape(v, size(A,2), size(B,1), []);
v3 = pagemtimes(pagemtimes(A, X), 'none', B, 'transpose');
t3(i) = toc;
tic
X = reshape(v, size(A,2), size(B,1), []);
v4 = mtimesx(mtimesx(A, X), 'N', B, 'T');
t4(i) = toc;
end
close all
semilogy(stab, [t1; t2; t3; t4]');
legend('Expensive method', ...
'Cheap method using transposition', ...
'Cheap method using pagemtimes', ...
'Cheap method using mtimesx');
xlabel('s');
ylabel('time [sec]');
grid on;
Hi there! May I ask if you are aware of implementation of functions similar to "pagemtimes" but able to work with at least one sparse input? Alternatively do you see any easy workaround? More precisely I need someting like
pagemtimes(A, V)
where A is a nxnxn sparse real tensor and V is a real dense nxn matrix...
@Stefano Cipolla "sparse real tensor"
I'm not aware this native MATLAB class.
But you can put the A as diagonal block of n^2 x n^2 sparse matrix
SA = [A(:,:,1) 0 0 ... 0
0 A(:,:,2) 0 ... 0
...
9=0 0 ... A(:,:,n)]
Do the same expansion for V (with the same block) then solve it
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Linear Algebra 的更多信息
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!选择网站
选择网站以获取翻译的可用内容,以及查看当地活动和优惠。根据您的位置,我们建议您选择:。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
