Can the efficienty of this code be improved, either computationally or just in terms of lines of code?
4 次查看(过去 30 天)
显示 更早的评论
Dumb question for a smart person who has a moment to kill.
Let's say I have data that will come in from n groups, and I know a priori those groups will be numbered 1 through n in some variable, A. I will have a second variable, B, that contains the data. Then, I want to get (for example) the mean of the data in each group. It is easy to pull off with a loop, but is there better code I could be using for this procedure? For a small example dataset, I might have
A = [2; 3; 1; 2; 2; 3; 1; 2; 2; 3];
B = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
tic
%%% Can this be done better or in one line of code? %%%
C = NaN(max(A), 1);
for ii = 1:numel(C)
C(ii) = mean(B(A == ii));
end
%%% Can this be done better or in one line of code? %%%
toc
disp(C)
bar(C)
Is there a better way to do this?
0 个评论
采纳的回答
Jan
2022-12-5
编辑:Jan
2022-12-5
A0 = [2; 3; 1; 2; 2; 3; 1; 2; 2; 3];
B0 = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
A = repmat(A0, 1e6, 1); % Let Matlab work with more than tiny data
B = repmat(B0, 1e6, 1);
tic
C = NaN(max(A), 1);
for ii = 1:numel(C)
m = A == ii;
C(ii) = sum(B(A == ii));
end
toc
Shorter but slower:
tic
D = accumarray(A, B, [], @mean);
toc
isequal(C, D)
Another apporach:
tic
S = zeros(max(A), 1);
N = zeros(size(S));
for k = 1:numel(A)
m = A(k);
S(m) = S(m) + B(k);
N(m) = N(m) + 1;
end
E = S ./ N;
toc
isequal(C, E) % Not equal!!!
% But the differences are caused by rounding only:
(C - E) ./ C
The difference is caused by the numerical instability of sums. Comparing the results with the mean of A0 and B0 shows, that all methods have comparable accuracy.
Locally under R2018b I get these timings:
Elapsed time is 0.205890 seconds. % Original
Elapsed time is 0.512173 seconds. % ACCUMARRAY
Elapsed time is 0.061097 seconds. % Loop over inputs
2 个评论
Torsten
2022-12-5
I took your repmat modification and added Steven Lord's answer, below, and the original loop looks like the clear winner.
Or "arrayfun" (see above).
更多回答(1 个)
Steven Lord
2022-12-5
A = [2; 4; 1; 2; 2; 4; 1; 2; 2; 4];
B = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
[C, groupnumbers] = groupsummary(B, A, @mean)
The groupnumbers output can help if some elements in 1:n don't appear in A (as is the case using the modified A I used in this example where all the 3's are replaced by 4's.)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Matrix Indexing 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!