Tips to eliminate FOR loops and vectorize code

Greetings - I am looking for philosophical help on how best to vectorize the code shown to eliminate the numerous FOR loops. As it stands, this simulation takes about 8 hours to run. The sample selection process and the curve fit are the big slowdowns. Any hints to try or paradigm shifts in the approach are welcome. I have tried to comment the code to help you understand just what I am trying to do. I had hoped to create the multidimensional array of samples without loops, but the need to choose different sample sizes from DBH ranges pushed me back into using FOR loops. Thanks!
%%Script to run ASMA analysis
tic
clear all
%get data
load fakeforest_v2_array.mat %array of dbh x biomass length 100,000
%simulation parameters
nums=10:20:1000; %number of sample trees per pull
nsim=1000; %number of simulations per set nums; Ideally this should be 10,000
%set up dbh bins
edges=[0 15 20 30 40 50 60 70 max(fakeforest(:,1))+1];
bins=discretize(fakeforest(:,1),edges);
fakeforest=[bins fakeforest];
groups=unique(bins);
% distribution array, left weighted to right plus others to determine the
% number of samples from each dbh category to use. These are precentages to
% pull from each dbh category
distros= [100 0 0 0 0 0 0 0;
33 67 0 0 0 0 0 0;
17 33 50 0 0 0 0 0;
10 20 30 40 0 0 0 0;
7 13 20 27 33 0 0 0;
5 10 14 19 24 29 0 0;
4 7 11 14 18 21 25 0;
3 6 8 11 14 17 19 22;
67 33 0 0 0 0 0 0;
50 33 17 0 0 0 0 0;
40 30 20 10 0 0 0 0;
33 27 20 13 7 0 0 0;
29 24 19 14 10 5 0 0;
25 21 18 14 11 7 4 0;
22 19 17 14 11 8 6 3;
0 0 0 0 0 0 0 100;
0 0 0 0 0 0 33 67;
0 0 0 0 0 17 33 50;
0 0 0 0 10 20 30 40;
0 0 0 7 13 20 27 33;
0 0 5 10 14 19 24 29;
0 4 7 11 14 18 21 25;
3 6 8 11 14 17 19 22];
% set up output arrays
UncertaintyM=zeros(size(distros,1),length(nums));
UncertaintyU=zeros(size(distros,1),length(nums));
UncertaintyL=zeros(size(distros,1),length(nums));
% start to sample the 'forest'
for ns=1:length(nums) %loop for each sample size above
n=nums(ns);
for i=1:length(distros) %start with first sample distribution type
forestsample=nan(n,2,nsim); %initialize multidimensional array to hold sample
nsamplegroup=distros(i,:);
nsamplegroup=round(nsamplegroup./100.*n);
for sim=1:nsim
for g=1:length(groups)
grptemp=fakeforest(fakeforest(:,1)==g,2:3);
%datatmp=datasample(grptemp,nsamplegroup(g),'Replace',false);
datatmp=datasample(grptemp,nsamplegroup(g)); %large n fails as not enough trees available
%the following ugliness is to deal with the odd cases of
%distributions
if isempty(datatmp) && g==1
cnt=1;
elseif nsamplegroup(g)>0 && g==1
forestsample(1:size(datatmp,1),:,sim)=datatmp;
cnt=length(datatmp)+1;
elseif size(datatmp,1)==1
forestsample(cnt,:,sim)=datatmp;
cnt=cnt+size(datatmp,1);
else
forestsample(cnt:cnt+size(datatmp,1)-1,:,sim)=datatmp;
cnt=cnt+size(datatmp,1);
end
end
cnt=0;
end
%fit the curves and calculate the difference
% fit is 'exp(a+b.*log(x))' The user function below just loops through the
% sets of data and fits the equation
[R G]=ASMACurvesfit(nsim,forestsample(:,1,1:nsim),forestsample(:,2,1:nsim));
%coeffs=coeffExtract(R,nsim,2); %In case I ever need the coefficients
uncert=zeros(nsim,1);
for r=1:nsim
uncert(r)=abs((sum(R{r}(fakeforest(:,2)))-totalbiomass)/totalbiomass);
end
uncertM=prctile(uncert,50);
uncertU=prctile(uncert,97.5);
uncertL=prctile(uncert,2.5);
UncertaintyM(i,ns)=uncertM;
UncertaintyU(i,ns)=uncertU;
UncertaintyL(i,ns)=uncertL;
end
end
toc

3 个评论

Take a fair amount of digging to wrap head around the actual code but the "philosophical answer" :) is to start from the innermost loop and see what can vectorize/refactor there then work your way out.
Is the outermost loop one simply to see the results sensitivity to the sample size I'm guessing? If so, I'd probably start by factoring out that level and making that a loop outside the computational function having the function just return the results for a given size.
That starts to reduce complexity of the function and may then make further factorization easier to see. More smaller pieces are usually far easier to deal with than is one larger code section.
Thanks for your thoughts. You are quite right in that outermost loop can be separated from the others for ease of understanding. I guess I just went FOR loop crazy!
I would also suggest using profiler on your code to identify places where the slow down really occurs and then come up with a way to vectorize that particular section of the code.

请先登录,再进行评论。

回答(0 个)

类别

帮助中心File Exchange 中查找有关 Loops and Conditional Statements 的更多信息

提问:

2017-6-17

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by