Is it possible to make this tiny loop faster?

It seems that it might be possible to make this loop faster. Does anyone have any thoughts? I have to call a loop like this millions of times in a larger workflow, and it is getting to be the slowest part of the code. I appreciate any insights!
a = 2; % constant
b = 3; % constant
n = 10; % number of elements of outputs c and s
% n could be up to 100
% preallocate and set the initial conditions
c = zeros(n,1);
s = c;
c(1) = 3; % set the initial condition for c
s(1) = 1; % set the initial condition for s
% run the loop
for i = 2:n
c(i) = a*c(i-1) -b*s(i-1);
s(i) = b*c(i-1) + a*s(i-1);
end

2 个评论

I ran that loop a million times with n=100 and it took only 0.33 seconds. How long does it take for you? How fast do you need it to be.
Also, what physical process is this supposed to simulate? I'm wondering because some of the values get up to 10^55 which is extremely large for any real world situation.
Good questions... this is part of a recursive fast multipole method that I'm working on speeding up. This loop is nested inside 3 others, so it can get called a bunch of times and if n is large it can take some time. This loop is pretty fast already, like you mentioned, but since it could get called so many times a small improvment could pay off. The a and b values I picked at first were arbitrary, but really they are going to be between -1 and 1. Also typical initial values of c and s are 1 and 0.5 respectively. One last thing I was thinking: if this loop could be turned into some kind of matrix operation then I can eliminate it altogether to get a speed up. I wasn't sure that was possible, so I thought I would put it out there to see if someone else had an idea. I appreciate the comments!

请先登录,再进行评论。

 采纳的回答

This saves considerable time for large n, but for small n, all bets are off.
a = .2; % constant
b = .3; % constant
n = 1e6; % number of elements of outputs c and s
x=[3;1];
timeit( @() loopMethod(a,b,x,n) )
ans = 0.1145
timeit( @() vectorizedMethod(a, b, x,n))
ans = 0.0209
function results = loopMethod(a,b,x,n)
% x=[c1;s1]
%
%results=[c1,c2,c3,....cn;
% s1,s2,s3,....sn]
A = [a, -b; b, a];
results=nan(2,n);
results(:,1)=x;
tmp=x;
for k = 2:n
tmp=A*tmp;
results(:,k)=tmp;
end
end
function results = vectorizedMethod(a, b, x,n)
% x=[c1;s1]
%
%results=[c1,c2,c3,....cn;
% s1,s2,s3,....sn]
z = a + 1i * b; % complex form of A
w = x(1) + 1i * x(2); % complex form of vector x
zAng=angle(z);
zMag=abs(z);
wAng=angle(w);
wMag=abs(w);
k = 1:n-1;
yMag=wMag*zMag.^k;
yAng=wAng+zAng.*k;
results = [x(1) yMag.*cos(yAng); x(2) yMag.*sin(yAng)]; % convert to 2xN real matrix
end

更多回答(1 个)

Akira Agata
Akira Agata 2025-5-16
移动:Voss 2025-5-16
If you need to calculate this efficiently, I believe the following relationship can be utilized.

7 个评论

Did I do it correctly? It doesn't seem to give the same answers. And it's 69 times slower.
a = 2; % constant
b = 3; % constant
n = 10; % number of elements of outputs c and s
% n could be up to 100
% preallocate and set the initial conditions
c = zeros(n,1);
s = c;
% Preallocate to speed up
c = zeros(1, n);
s = zeros(1, n);
c(1) = 3; % set the initial condition for c
s(1) = 1; % set the initial condition for s
% run the loop
tic
for k = 1 : 100000
for i = 2:n
c(i) = a*c(i-1) -b*s(i-1);
s(i) = b*c(i-1) + a*s(i-1);
end
end
t1 = toc
t1 = 0.0211
c
c = 1×10
3 3 -27 -147 -237 963 6933 15213 -29277 -314877
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
s
s = 1×10
1 11 31 -19 -479 -1669 -449 19901 85441 83051
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
% Method 2
% Preallocate to speed up
ct = zeros(1, n);
st = zeros(1, n);
ct(1) = 3; % set the initial condition for c
st(1) = 1; % set the initial condition for s
tic
for k = 1 : 100000
for i = 2:n
T = [a, -b; b, a] .^ (i - 1);
m = T * [ct(1); st(1)];
ct(i) = m(1);
st(i) = m(2);
end
end
t2 = toc
t2 = 1.4584
ratio = t2/t1
ratio = 69.1135
ct
ct = 1×10
3 3 21 -3 129 -147 921 -1803 7329 -18147
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
st
st = 1×10
1 11 31 89 259 761 2251 6689 19939 59561
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Looks like it works to me.
syms a b
syms c0 s0
c(1) = c0;
s(1) = s0;
N = 15;
for i = 2 : N
c(i) = a*c(i-1) -b*s(i-1);
s(i) = b*c(i-1) + a*s(i-1);
end
sc = expand(c(end))
sc = 
ss = expand(s(end))
ss = 
t2 = [a -b; b a]^(N-1) * [c(1); s(1)];
st2 = expand(t2(1));
isequal(sc, st2)
ans = logical
1
ss2 = expand(t2(2));
isequal(ss, ss2)
ans = logical
1
Thanks everyone for the ideas! It would seem that the original for loop is the fastest by a good bit (see below). Any other ideas? I'm good with the speed if it is actually the fastest. Note that I changed the initial conditions and the a & b values from the original posting to be more realistic.
Nloops = 1e6;
c_init = 1;
s_init = 0.5;
a = 0.2; % constant
b = 0.3; % constant
n = 10; % number of elements of outputs c and s
% n could be up to 100
% Preallocate to speed up
c = zeros(1, n);
s = zeros(1, n);
c(1) = c_init; % set the initial condition for c
s(1) = s_init; % set the initial condition for s
% run the loop
tic
for k = 1 : Nloops
for i = 2:n
c(i) = a*c(i-1) -b*s(i-1);
s(i) = b*c(i-1) + a*s(i-1);
end
end
t1 = toc
t1 = 0.0661
% Method 2
% Preallocate to speed up
ct = zeros(1, n);
st = zeros(1, n);
ct(1) = c_init; % set the initial condition for c
st(1) = s_init; % set the initial condition for s
M = [a, -b; b, a];
tic
for k = 1 : Nloops
for i = 2:n
m = M * [ct(i-1); st(i-1)];
ct(i) = m(1);
st(i) = m(2);
end
end
t2 = toc
t2 = 1.0441
ratio = t2/t1
ratio = 15.7967
isequal(c,ct)
ans = logical
1
isequal(s,st)
ans = logical
1
% Method 3
% Preallocate to speed up
ct3 = zeros(1, n);
st3 = zeros(1, n);
ct3(1) = c_init; % set the initial condition for c
st3(1) = s_init; % set the initial condition for s
tic
for k = 1 : Nloops
for i = 2:n
m = M^(i-1) * [ct3(1); st3(1)];
ct3(i) = m(1);
st3(i) = m(2);
end
end
t3 = toc
t3 = 3.9479
% isequal(c,ct3) % this turns up false, but they are essentially identical
% isequal(s,st3) % this turns up false, but they are essentially identical
sum(c-ct3)
ans = 3.9302e-19
sum(s-st3)
ans = 2.0498e-19
ratio3 = t3/t1
ratio3 = 59.7268
Hmmm, looping does appear to be faster, at least for moderate n (up to at least 50000).
c_init = 1;
s_init = 0.5;
a = 0.2; % constant
b = 0.3; % constant
n = 500; % number of elements of outputs c and s
c = zeros(1,n);
s = zeros(1,n);
c(1) = c_init;
s(1) = s_init;
tic
for i = 2 : n
c(i) = a*c(i-1) -b*s(i-1);
s(i) = b*c(i-1) + a*s(i-1);
end
toc
Elapsed time is 0.002018 seconds.
tic
t = [a -b; b a]^(n-1) * [c(1); s(1)];
toc
Elapsed time is 0.003356 seconds.
c(end) - t(1)
ans = 5.4056e-236
s(end) - t(2)
ans = 6.3885e-236
c_init = 1;
s_init = 0.5;
a = 0.2; % constant
b = 0.3; % constant
n = 500; % number of elements of outputs c and s
M = [a,-b;b,a]^(n-1); % constant
c = zeros(1,n);
s = zeros(1,n);
c(1) = c_init;
s(1) = s_init;
tic
for i = 2 : n
c(i) = a*c(i-1) -b*s(i-1);
s(i) = b*c(i-1) + a*s(i-1);
end
toc
Elapsed time is 0.001904 seconds.
tic
t = M * [c(1); s(1)];
toc
Elapsed time is 0.001400 seconds.

I think M = [a,-b;b,a]^(n-1) is taking a notable amount of time.

Maybe the analytical solution of the recurrence can save time - I didn't check it.
I assumed that the complete vectors c(1:n) and s(1:n) are required. If only c(n) and s(n) are needed, things will become much faster, of course.
c_init = 3;
s_init = 1;
a = 2; % constant
b = 3; % constant
n = 10; % number of elements of outputs c and s
F = cumprod([1;(a + 1i*b)*ones(n-1,1)]);
% N = (0:(n-1)).';
% F = (a + 1i*b).^N;
reF = real(F);
imF = imag(F);
c = c_init*reF - s_init*imF
s = s_init*reF + c_init*imF
I tested it and: oh dear, that's really slow !

请先登录,再进行评论。

类别

帮助中心File Exchange 中查找有关 Brain Computer Interfaces 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by