Question about major difference in computation speed with gpuArray's
1 次查看(过去 30 天)
显示 更早的评论
I've been trying to optimize my code recently for a project and noticed an interesting phenomenon that occurs with it. I have tried to google around what possibly would create it, but nothing so far has had a good answer.
The following code that runs extremely fast is:
x = linspace(-20,20,25);
z = linspace(0,100,29);
Columns =5;
singleframeofdata = gpuArray(rand(2816,128,'single'));
fgpu = gpuArray(rand(2816,1,'single'));
tofgpu = rand(length(z),length(x),128,'single');
SingleFrameOfDatarep = repmat(singleframeofdata,1,length(z)*length(x));
y = -2i*pi*-1*fgpu*reshape(tofgpu,1,size(tofgpu,1)*size(tofgpu,2)*size(tofgpu,3),1);
tic
holder = SingleFrameOfDatarep.*y;
toc
clear holder
tic
SingleFrameOfDatarep = SingleFrameOfDatarep.*y;
toc
The value of holder returns around 0.09s while SingleFrameOfDatarep will return around 0.00009s. Now i know that because the second calculation uses in place operations it will operate faster.
However, if i change x = linspace(-20,20,25) to x = linspace(-20,20,26) a drastic slow occurs. The value of holder returns around 0.09s again while SingleFrameOfDatarep will return around 0.07s. The original code ran ~ 770X faster than the second code.
Now my only thought/explanation on this is that when the elements of an array gets too large, matlab will create a new variable like it does for holder and this allocation time is where the slowdown occurs but i am not fully sure about this nor do i know how to test/check for this.
Could anyone point me in the correct direction to read on this or give a possible explanation/solution for this?
1 个评论
Joss Knight
2017-6-22
I can't check your code right now but I can say two things. Firstly MATLAB does have a memory pool and when GPU memory overflows the pool there are raw allocations; those allocations force synchronization and that's slow. Secondly, your timing with tic and toc is flawed because the GPU operates asynchronously. This means when toc is reporting the time the previous command is still running. What happens when you insert wait(gpuDevice) before each tic and before each toc? You may find the timings change completely.
Finally, you should use gpuArray.rand not gpuArray(rand(...)). The former creates random data directly on the GPU; the latter does it slowly on the CPU then copies the data over to the device.
采纳的回答
Matt J
2017-6-22
编辑:Matt J
2017-6-22
The times that you see are probably false. You shouldn't be using tic() and toc() to time GPU operations. You should be using gputimeit(), as below. I see no significant speed difference between any of the cases that you tested, when implemented this way.
x = linspace(-20,20,25);
z = linspace(0,100,29);
singleframeofdata = gpuArray(rand(2816,128,'single'));
fgpu = gpuArray(rand(2816,1,'single'));
tofgpu = rand(length(z),length(x),128,'single');
SingleFrameOfDatarep = repmat(singleframeofdata,1,length(z)*length(x));
y = -2i*pi*-1*fgpu*reshape(tofgpu,1,size(tofgpu,1)*size(tofgpu,2)*size(tofgpu,3),1);
gputimeit(@() fun(SingleFrameOfDatarep,y) )
gputimeit(@() hfun(SingleFrameOfDatarep,y) )
function SingleFrameOfDatarep=fun(SingleFrameOfDatarep,y)
SingleFrameOfDatarep=SingleFrameOfDatarep.*y;
function holder=hfun(SingleFrameOfDatarep,y)
holder=SingleFrameOfDatarep.*y;
Incidentally also, your code should get a bit faster (and certainly conserve memory) if you use bsxfun instead of repmat.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 GPU Computing in MATLAB 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!