Can anyone tell me why my function is taking so long to run?

Question

Alex 2011-11-24

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/22200-can-anyone-tell-me-why-my-function-is-taking-so-long-to-run

I wrote a K-means clustering algorithm and it works fine but now I have to run it over 100 iterations and it is taking way too long. Can anyone tell me why this is? Here is my code

%This code computes the accuracy of the k-means algorithm
clear all;
dataPts=(genData())';
numClust=3;
[m,n]=size(dataPts);
%Run the k-means algorithm 100 times
numIter=35;
for iter=1:numIter
%choose initial centers randomly.  First generate random indeces from which
%to choose the centers.
clustPos=randi([1,n],[1,numClust]);
%Can now identify initial clusters
for i=1:numClust
    clustCenter(1:m,i)=dataPts(1:m,clustPos(i));
end
oldClustCenter=clustCenter;
while 1
    r=zeros([n,numClust]);
%Now assign each data point to a cluster
    for j=1:n
          colNorm=(sqrt(sum((clustCenter-repmat(dataPts(:,j),1,numClust)).^2,1))).^2;
          [distToCenter(j),closestClust(j)]=min(colNorm);
          r(j,closestClust(j))=1;
      end
%Now recompute the cluster centers using the indicator variables
    %clustCenter(:,1)=
      for j=1:numClust
          sum1=0;
          sum2=0;
          for i=1:n  
              numerator=r(i,j)*dataPts(:,i)+sum1;
              sum1=numerator;
              denominator=r(i,j)+sum2;
              sum2=denominator;
          end
              clustCenter(:,j)=numerator/denominator;
      end
      if clustCenter==oldClustCenter
          break
      end
      oldClustCenter=clustCenter;
  end
%must now determine which data points are in the correct cluster are which
%are in error.
clust11Count=0;
clust12Count=0;
clust13Count=0;
for k=1:50
    if r(k,:)==[1,0,0]
        clust11Count=clust11Count+1;
    elseif r(k,:)==[0,1,0]
        clust12Count=clust12Count+1;
    else
        clust13Count=clust13Count+1;
    end
end
[clust1Error,bestClust1]=max([clust11Count,clust12Count,clust13Count]);
error1=50-clust1Error;
clust21Count=0;
clust22Count=0;
clust23Count=0;
for k=51:100
    if r(k,:)==[1,0,0]
        clust21Count=clust21Count+1;
    elseif r(k,:)==[0,1,0]
        clust22Count=clust22Count+1;
    else
        clust23Count=clust23Count+1;
    end
end
[clust2Error,bestClust2]=max([clust21Count,clust22Count,clust23Count]);
error2=50-clust2Error;
clust31Count=0;
clust32Count=0;
clust33Count=0;
for k=101:150
    if r(k,:)==[1,0,0]
        clust31Count=clust31Count+1;
    elseif r(k,:)==[0,1,0]
        clust32Count=clust32Count+1;
    else
        clust33Count=clust33Count+1;
    end
end
[clust3Error,bestClust3]=max([clust31Count,clust32Count,clust33Count]);
%It sometimes happens that one k-means generated cluster covers two true clusters.
%In this case every points in k-means cluster 3 will be incorrect.
if bestClust2==bestClust3
    error3=50;
else
    error3=50-clust3Error;
end
avgError(iter)=((error1+error2+error3)/150);
end
avgavgError=(sum(avgError)/numIter)*100;
disp(['The average error is ' num2str(avgavgError) ' percent']);

and here is the function that generates the data

function [ data ] = genData
% generate data
data = randn(150,2) ;
data(1:50,:)    = 3 + sqrt(.2) * data(1:50,:) ; 
data(51:100,:)  = 6 + sqrt(.5) * data(51:100,:) ;
data(101:150,:) = 7 + sqrt(.6) * data(101:150,:) ;
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Walter Roberson 2011-11-24

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/22200-can-anyone-tell-me-why-my-function-is-taking-so-long-to-run#answer_29212

Several of your arrays are not pre-allocated. That can slow things down a lot.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Hin Kwan Wong 2011-11-24

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/22200-can-anyone-tell-me-why-my-function-is-taking-so-long-to-run#answer_29232

Please read in the MATLAB help about the use of profiler to isolate lines of code which takes most of time to run. There are also many good tips and techniques on improving coding performance in the help file. There are many for loops which has potential for vectorization and the tips covers that with examples.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 3

Jan 2011-11-24

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/22200-can-anyone-tell-me-why-my-function-is-taking-so-long-to-run#answer_29235

在 MATLAB Online 中打开

1. The already mention pre-allocation is most important.

2. Unnecessary calculations:

colNorm=(sqrt(sum((clustCenter - repmat(dataPts(:,j), 1, numClust)) .^ 2, 1))).^2;

At first SQRT(X).^2 == X. At second it is cheaper to square the vector dataPts(:,j) before REPMATting it.

colNorm = sum(clustCenter - repmat(dataPts(:,j) .^ 2, 1, numClust), 1);

3. Slow loop

for i=1:numClust
  clustCenter(1:m,i)=dataPts(1:m,clustPos(i));
end

Faster without loop:

clusterCenter = dataPts(:, clustPos);

4. The loops over k to calculate clustXYCount can be accelerated also:

q = r(1:50, :);
clust11Count = sum(q(:, 1)      & q(:, 2)==0 & q(:, 3)==0);
clust12Count = sum(q(:, 1) == 0 & q(:, 2)    & q(:, 3)==0);
clust13Count = 50 - clust11Count - clust12Count;

5. clear all wastes time without any benefit. It deletes all formerly loaded functions from the memory. The reloading is very expensive. See: Answers: good programming practize.