how to avoid for loop to increase speed

Question

1 个投票

I would like to calculate the mean value of vector A for every sample as defined in vector B. So, the first point of the resulting vector C is the mean of the first 9 (=B(1)) datapoints of A. The second point of C is the mean of the following 10 (=B(2) datapoints of A. Etc. The following code works, but takes time when processing large vectors:

A=rand(91,1); % vector with  91 random samples
B=[9,10,10,8,11,10,10,9,10,6]; % vector with the number of samples 
C= zeros(length(B),1); % preallocate C
first=1;
for i = 1: length(C)
    last = first+B(i)-1;
    interval=(first:last);
    C(i) = mean(A(interval));
    first =last+1;
end

Is there a way to use B in an index of A, instead of using this for loop?

2 个评论
显示无隐藏无

Matt J 2023-1-4

在 MATLAB Online 中打开

Shouldn't sum(B) equal length(A)?

A=rand(91,1); % vector with  91 random samples
B=[9,10,10,8,11,10,10,9,10,6]; % vector with the number of samples 
sum(B)
ans = 93
length(A)
ans = 91

Bertil Veenstra 2023-1-4

Yes, you are right. They should be the same length

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Matt J 2023-1-4

编辑：Matt J 2023-1-5

在 MATLAB Online 中打开

0 个投票

A=rand(91,1); % vector with  91 random samples
B=[9,10,10,8,11,10,10,9,10,6]; % vector with the number of samples 
G=repelem(1:numel(B), B);
n=min(numel(A),numel(G));
G=G(:); A=A(:);
C=accumarray(G(1:n),A(1:n))./accumarray(G(1:n),1); %NOTE: faster than accumarray(G,A,[],@mean)

4 个评论
显示 2更早的评论隐藏 2更早的评论

Matt J 2023-1-5

编辑：Matt J 2023-1-5

在 MATLAB Online 中打开

Better to avoid two ACCUMARRAY() calls and specify the function instead:

That would be quite a bit slower, unfortunately. I very deliberately avoided it for that reason:

A=rand(900000,1); % vector with  91 random samples
B=ones(1,90000)*10; % vector with the number of samples 
assert(sum(B)==numel(A))
G=repelem(1:numel(B), B);
n=min(numel(A),numel(G));
G=G(:); A=A(:);
tic;
C=accumarray(G(1:n),A(1:n))./accumarray(G(1:n),1);
toc
Elapsed time is 0.022540 seconds.
tic;
C=accumarray(G(1:n),A(1:n),[],@mean); % simpler and faster
toc
Elapsed time is 0.333534 seconds.

Stephen23 2023-1-5

"That would be quite a bit slower, unfortunately. I very deliberately avoided it for that reason:"

Aah, that is a shame. Perhaps a code comment would help to make that choice clear.

请先登录，再进行评论。

Answer 2

Mathieu NOE 2023-1-4

在 MATLAB Online 中打开

1 个投票

hello

had to change A to 93 samples so that it matches with sum(B) = 93

see my suggestion below. On this small data we can see an iùprovement of factor 4

Elapsed time is 0.004598 seconds (your code)

Elapsed time is 0.001204 seconds. (my code)

delta = 1.0e-14 *

0

-0.0222

-0.0111

-0.0333

-0.0555

0.1110

0.0777

-0.0222

-0.0444

-0.0222

wonder if that is going to be even better for larger vectors ?

A=rand(93,1); % vector with  93 random samples
B=[9,10,10,8,11,10,10,9,10,6]; % vector with the number of samples 
C= zeros(length(B),1); % preallocate C
first=1;
tic 
 for i = 1: length(C)
     last = first+B(i)-1;
     interval=(first:last);
     C(i) = mean(A(interval));
     first =last+1;
 end
 
 toc
 
% alternative code 
tic 
As = cumsum(A(:));
Bs = cumsum(B(:));
As = As(Bs);
Cs = [As(1); diff(As)]./B(:);
toc
delta = C - Cs
plot(C,'-*b')
hold on
plot(Cs,'dr')
hold off

7 个评论
显示 5更早的评论隐藏 5更早的评论

Mathieu NOE 2023-1-5

编辑：Mathieu NOE 2023-1-5

在 MATLAB Online 中打开

Just for fun I compared the speed of my solution vs Matt's code

here tested on much larger vectors (1100063 samples) , still my solution is by far better , but is numerically less accurate as Matt's code . (we are talking relative error below 10^-10)

Original code : Elapsed time is 0.449751 seconds.

My suggestion : Elapsed time is 0.011151 seconds.

delta_max = 3.4435e-11

Matt suggestion : Elapsed time is 0.058991 seconds.

delta_max = 0

% original code
B = 8 + randi(5,100000,1);
samples = sum(B);
A=rand(samples,1); % vector with  1100063 random samples
C= zeros(length(B),1); % preallocate C
first=1;
tic 
 for i = 1: length(C)
     last = first+B(i)-1;
     interval=(first:last);
     C(i) = mean(A(interval));
     first =last+1;
 end
  toc
 
% alternative code # 1 (me)
tic 
As = cumsum(A(:));
Bs = cumsum(B(:));
As = As(Bs);
Cs = [As(1); diff(As)]./B(:);
toc
delta_max = max(abs(C - Cs))
% alternative code # 2 (Matt J)
tic 
G=repelem(1:numel(B), B);
n=min(numel(A),numel(G));
G=G(:); A=A(:);
Cs=accumarray(G(1:n),A(1:n))./accumarray(G(1:n),1);
toc
delta_max = max(abs(C - Cs))

Matt J 2023-1-6

编辑：Matt J 2023-1-6

在 MATLAB Online 中打开

@Stephen23 already showed you earlier that you can use accumarray to apply any function to the blocks.

A = randi(5,1,9300); % vector with random integers ranging from 1 to 5
B=ones(1,930)*10; % vector the number of samples 
tic
C= zeros(length(B),1); % preallocate C
first=1;
 for i = 1: length(C)
     last = first+B(i)-1;
     interval=(first:last);
     C(i) = mode(A(interval));
     first =last+1;
 end
 toc
Elapsed time is 0.032246 seconds.
 
 tic
  G=repelem((1:numel(B)),B);
  C=accumarray(G(:),A(:),[],@mode);
 toc
Elapsed time is 0.019723 seconds.

It is to be expected that speed-up is more modest, unfortunately. Accumarray isn't as well optimized for arbitrary functions.

Bertil Veenstra 2023-1-6

Thanks you Matt for this code. It doesns't seem to run faster than the for loop, but it did learn me more about writing and understanding matlab code.

请先登录，再进行评论。

Answer 3

Matt J 2023-1-6

编辑：Matt J 2023-1-6

在 MATLAB Online 中打开

0 个投票

And instead of the mean, I am interested in the mode.

This method should offer speed-up for a generic function, provided that it can ignore NaNs and provided the blocks don't vary too greatly in length:

B = randi([5,10],1,9300);
A = randi(5,1,sum(B));
discrepancy = max( abs(loopMethod(A,B)-altMethod(A,B)),[],'all')
discrepancy = 0
timeit(@() loopMethod(A,B))
ans = 0.1231
timeit(@() altMethod(A,B))
ans = 0.0027
function C=loopMethod(A,B)
    C= zeros(1,length(B)); % preallocate C
    first=1;
    
     for i = 1: length(C)
         last = first+B(i)-1;
         interval=(first:last);
         C(i) = mode(A(interval));
         first =last+1;
     end
end
 
function C=altMethod(A,B)
     bmax=max(B);
     
     I=(1:bmax)'<=B;
     
     T=nan(size(I));
     T(I)=A(:);
     
     C=mode(T,1);
     
end

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Bertil Veenstra 2023-1-6

Hi Matt, this works nice. On larger files I get an 3-4 fold increase in speed. Smart solution. Thanks for your time.

请先登录，再进行评论。

how to avoid for loop to increase speed

2 个评论
显示无隐藏无

采纳的回答

4 个评论
显示 2更早的评论隐藏 2更早的评论

更多回答（2 个）

7 个评论
显示 5更早的评论隐藏 5更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

类别

产品

版本

标签

Community Treasure Hunt

how to avoid for loop to increase speed

2 个评论 显示 无 隐藏 无

采纳的回答

4 个评论 显示 2更早的评论 隐藏 2更早的评论

更多回答（2 个）

7 个评论 显示 5更早的评论 隐藏 5更早的评论

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

2 个评论
显示无隐藏无

4 个评论
显示 2更早的评论隐藏 2更早的评论

7 个评论
显示 5更早的评论隐藏 5更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论