Tips to eliminate FOR loops and vectorize code

Question

0 个投票

Greetings - I am looking for philosophical help on how best to vectorize the code shown to eliminate the numerous FOR loops. As it stands, this simulation takes about 8 hours to run. The sample selection process and the curve fit are the big slowdowns. Any hints to try or paradigm shifts in the approach are welcome. I have tried to comment the code to help you understand just what I am trying to do. I had hoped to create the multidimensional array of samples without loops, but the need to choose different sample sizes from DBH ranges pushed me back into using FOR loops. Thanks!

%%Script to run ASMA analysis
tic
clear all
%get data
load fakeforest_v2_array.mat %array of dbh x biomass length 100,000
%simulation parameters
nums=10:20:1000; %number of sample trees per pull
nsim=1000; %number of simulations per set nums; Ideally this should be 10,000
%set up dbh bins
edges=[0 15 20 30 40 50 60 70 max(fakeforest(:,1))+1];
bins=discretize(fakeforest(:,1),edges);
fakeforest=[bins fakeforest];
groups=unique(bins);
% distribution array, left weighted to right plus others to determine the
% number of samples from each dbh category to use. These are precentages to
% pull from each dbh category
distros=   [100  0  0  0  0  0  0  0;
            33  67  0  0  0  0  0  0;
            17  33  50  0  0  0  0  0;
            10  20  30  40  0  0  0  0;
            7  13  20  27  33  0  0  0;
            5  10  14  19  24  29  0  0;
            4  7  11  14  18  21  25  0;
            3  6  8  11  14  17  19  22;
            67  33  0  0  0  0  0  0;
            50  33  17  0  0  0  0  0;
            40  30  20  10  0  0  0  0;
            33  27  20  13  7  0  0  0;
            29  24  19  14  10  5  0  0;
            25  21  18  14  11  7  4  0;
            22  19  17  14  11  8  6  3;
            0  0  0  0  0  0  0  100;
            0  0  0  0  0  0  33  67;
            0  0  0  0  0  17  33  50;
            0  0  0  0  10  20  30  40;
            0  0  0  7  13  20  27  33;
            0  0  5  10  14  19  24  29;
            0  4  7  11  14  18  21  25;
            3  6  8  11  14  17  19  22];
   % set up output arrays
   UncertaintyM=zeros(size(distros,1),length(nums));
   UncertaintyU=zeros(size(distros,1),length(nums));
   UncertaintyL=zeros(size(distros,1),length(nums));
   % start to sample the 'forest'
   for ns=1:length(nums) %loop for each sample size above
       n=nums(ns);
   for i=1:length(distros) %start with first sample distribution type
      forestsample=nan(n,2,nsim); %initialize multidimensional array to hold sample
      nsamplegroup=distros(i,:);
      nsamplegroup=round(nsamplegroup./100.*n);
   for sim=1:nsim
      for g=1:length(groups)
          grptemp=fakeforest(fakeforest(:,1)==g,2:3);
          %datatmp=datasample(grptemp,nsamplegroup(g),'Replace',false);
          datatmp=datasample(grptemp,nsamplegroup(g)); %large n fails as not enough trees available
          %the following ugliness is to deal with the odd cases of
          %distributions
          if isempty(datatmp) && g==1
              cnt=1;
          elseif nsamplegroup(g)>0 && g==1
              forestsample(1:size(datatmp,1),:,sim)=datatmp;
              cnt=length(datatmp)+1;
          elseif size(datatmp,1)==1
              forestsample(cnt,:,sim)=datatmp;
              cnt=cnt+size(datatmp,1);
          else
              forestsample(cnt:cnt+size(datatmp,1)-1,:,sim)=datatmp;
              cnt=cnt+size(datatmp,1);
          end
      end
      cnt=0;
   end
      %fit the curves and calculate the difference
      % fit is 'exp(a+b.*log(x))' The user function below just loops through the
      % sets of data and fits the equation
      [R G]=ASMACurvesfit(nsim,forestsample(:,1,1:nsim),forestsample(:,2,1:nsim));
      %coeffs=coeffExtract(R,nsim,2); %In case I ever need the coefficients
      uncert=zeros(nsim,1);
      for r=1:nsim
          uncert(r)=abs((sum(R{r}(fakeforest(:,2)))-totalbiomass)/totalbiomass);
      end
      uncertM=prctile(uncert,50);
      uncertU=prctile(uncert,97.5);
      uncertL=prctile(uncert,2.5);
      UncertaintyM(i,ns)=uncertM;
      UncertaintyU(i,ns)=uncertU;
      UncertaintyL(i,ns)=uncertL;
   end
   end
   toc

3 个评论
显示 1更早的评论隐藏 1更早的评论

CraigW 2017-6-19

Thanks for your thoughts. You are quite right in that outermost loop can be separated from the others for ease of understanding. I guess I just went FOR loop crazy!

Sagar Doshi 2017-6-20

I would also suggest using profiler on your code to identify places where the slow down really occurs and then come up with a way to vectorize that particular section of the code.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question