How can I speed up my code

4 次查看(过去 30 天)
smarthu
smarthu 2022-6-6
编辑: Jan 2022-6-6
I just started learning Matlab.
I follow the help and use the code shown below (Matlab Full) to calculate the Mandelbrot Set, z(n+1)=z(n)^2+C (30000 iterations). The code uses GPU.
It is compared to a program I make by using VB.net, which, I also don't know much and the code is not optimized. The code uses CPU because I don't know how to use GPU in VB.
The (Matlab Part) code is a modification of the Matlab code, in which I try to calculate part of the pixels instead of the full image. The code uses GPU, but only 84%~90% depending on the region calculating.
For each type of code, I calculate three regions (with same number of pixels):
(1) Divergent, small region around Creal=-2.7, Cimag=0.5, where all C values give divergent results. VB program takes only 0.14s to complete, since the VB program uses for loop to go through all pixels and at each pixel if the calculation diverges, the iteration will stop and it will move on to the next pixel. The matlab code, on the contrary, takes a lot of time since it needs to go through all the iterations.
(2) Full region, covering Creal from -2.8 to 1.5, Cimag centered at 0. This region shows the whole Mandelbrot Set structure and contains both divergent and convergent regions.
(3) Convergent, small region around Creal=0 and Cimag=0, where all C values give convergent results. Both VB and Matlab programs need to go through all pixels. But the Matlab Full code takes much less time to complete.
So my questions are:
(1) How can I speed up my (Matlab Full) code? Now, the code needs to calculate all pixels (the whole z matrix) even if some of them diverged. And the code needs to go through all iterations even if all pixels diverged. I try to calculate some of the pixels only, as shown in (Matlab Part) code. But the code takes much longer time, as shown in the figure.
(2) The (Matlab Part) code taks much longer time. It takes 84% to 90% of the GPU instead of 100% as in (Matlab Full) case. And (Matlab Part) code takes 100% of one of the CPU thread. Why is this?
(3) In each region, the (Matlab Full) code will need to go through all iterations. I expect it to take the same amount of time to complete. But no, it takes much less time to complete in the convergent region. Why is this difference?
(4) This is not a question. From the comparison of (Matlab Full) and VB results at convergent region, we see that Matlab code runs much faster than my (not optimized) VB code. I expect the Matlab code will be much faster at other regions if it doesn't need to go through all iterations.
Calculation Time Figure:
(Matlab Full) Code:
function ButtonGPUPushed(app, event)
%Drawing size
XN=1500;
YN=830;
%User set drawing region
X1=str2double(app.X1EditField.Value);
X2=str2double(app.X2EditField.Value);
Y1=str2double(app.Y1EditField.Value);
Y2=Y1+YN/XN*(X2-X1);
%axes and grid
XX=gpuArray.linspace(X1,X2,XN);
YY=gpuArray.linspace(Y1,Y2,YN);
[ttt,vvv]=meshgrid(XX,YY);
CU=complex(ttt,vvv);
count=zeros(size(CU),'gpuArray');
%Custom color map
CStepSize=201;
Cin=[0 0.15 0.25;0 0.6 1;1 0.6 0;0 1 0;0.6 0 1;1 0 0.6;0 0 0];
Cblock=Cin(1,:);
for k=1:(size(Cin,1)-1)
ooo=[(linspace(Cin(k,1),Cin(k+1,1),CStepSize))' (linspace(Cin(k,2),Cin(k+1,2),CStepSize))' (linspace(Cin(k,3),Cin(k+1,3),CStepSize))'];
ooo=ooo(2:end,:);
Cblock=[Cblock; ooo];
end
%Iteration
zz=CU;
for k=1:(size(Cblock,1)-1)
zz=zz.*zz+CU;
inside=(abs(zz)<=2);
count=count+inside;
end
%Display
app.UIAxes.XLim=[X1 X2];
app.UIAxes.YLim=[Y1 Y2];
colormap(app.UIAxes, Cblock)
ii=image(app.UIAxes,[X1 X2],[Y1 Y2],count);
set(app.UIAxes,"YDir","normal")
set(ii,"HitTest","off")
end
(Matlab Part) Code, the 'For' iteration part is changed to:
for k=1:(size(Cblock,1)-1)
inside=(abs(zz)<=2);
count=count+inside;
zz(inside)=zz(inside).*zz(inside)+CU(inside);
end
  2 个评论
dpb
dpb 2022-6-6
MATLAB works best when it can be vectorized -- branching and convergence tests and portions of an array which need operations unique to either the iteration or location break that model and can be quite difficult to improve on over the linear structure. It's just a result of the fundamental design of the language.
It's been too long since I've played with the Mandlebrot set to remember about the convergence test details on iterations vs decision to have any specific recommendations about a MATLAB implementation, just the general comment that it may not turn out to be easy (or even possible) to gain too much.
Jan
Jan 2022-6-6
编辑:Jan 2022-6-6
By the way, a more compact way to create Cblock:
CStepSize = 201;
Cin = [0 0.15 0.25;0 0.6 1;1 0.6 0;0 1 0;0.6 0 1;1 0 0.6;0 0 0];
nCin = size(Cin, 1);
nCblock = (nCin - 1) * (CStepSize - 1) + 1;
Cblock = interp1(1:nCin, Cin, linspace(1, nCin, nCblock));
meshgrid() can be replaced by implicit expanding usually:
XX = gpuArray.linspace(X1,X2,XN);
YY = gpuArray.linspace(Y1,Y2,YN);
CU = XX + 1i * YY.';

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 GPU Computing 的更多信息

标签

产品


版本

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by