why imresize() is slower with gpuArray?

7 次查看(过去 30 天)
I want to make copies of a image by resizing it -downsampling and then upsampling- purpose is to blur image b/c imgaussfilt() and imfilter() are not giving required results. Strategy is to, resize and save each image in gpuArray then move this batch to CPU and write images to disk.
Gtx 1080 vs core i3 7100.
160 sec vs 20 sec .
folderList = dir('e:/a'); % in this directory >23000 folders, each with one image
rect =[125 0 569 570]; % size to crop image and make it square
for i=1:23581
gpuDevice(1); % reset gpu
% read image from folder, take it to gpu, convert to 'double' for use in imresize()
I = im2double(gpuArray(imread([folderList(i).folder '\' folderList(i).name '\' [folderList(i).name '.png']])));
Igr = rgb2gray(I); % make image 2-d
Icrp = imcrop(Igr, rect); % crop image using predefined size
rszVal = 1; % to be used in for loop for resizing image
imgArray= ones(570,570,900, 'gpuArray'); % create matrix to save 2-d resized images as its dimensions
tic
for j=1:900 % to create 900 resized images
rszVal = rszVal - .001;
Irsz0 = imresize(Icrp, rszVal); % resize , sampledown ,'OutputSize' not working in gpu mode.
Irsz = imresize(Irsz0, [570 570]);% resize , sampleup
imgArray(:,:,j)= imcomplement(Irsz); % added resized image to matrix as a dimension
end
toc % *takes 6 sec on CPU -no gpuArray- and 140sec on GPU*
imgGather = gather(imgArray); % move image matrix to CPU
%create folder to save resized images
imgfolder = sprintf('e:/imgarray/%s', folderList(i).name);
if mkdir(imgfolder)== 0; mkdir(imgfolder);end
for k = 1:900
%create unique name for each image
imgname = sprintf('/%s_%d.png', folderList(i).name, k);
imgpath = sprintf('%s%s', imgfolder, imgname);
% write to disk each dimension of the matrix as an image
imgwrite = imgGather(:,:,k);
imwrite(imgwrite, imgpath, 'png');
end
toc % *for outer loop takes 19 sec on CPU and 160 sec on GPU*
end
Question .. why it is taking more time in gpuArray. .. Tips to increase performance .. Better options
also tried using imfilter() instead of imresize() using parfor it takes 83sec for outer forloop.
  2 个评论
Adam
Adam 2018-9-3
Not everything is faster on the GPU. If the calculation is quite fast and you are having to transfer lots of small bits of data then the communication overhead of passing data back and forth can easily outweigh the faster calculation once the data is on the GPU (if indeed even the calculation itself is faster on GPU).
adam R
adam R 2018-9-3
ya you are right, but atleast calculations should be faster on GPU. you see, in above case resizing and appending an image to matrix takes longer on GPU than CPU. communication overhead will involve when we gather() imgArray to CPU, but the difference is visible before that.
may be there is something wrong with code.

请先登录,再进行评论。

回答(2 个)

Matt J
Matt J 2018-9-3
I guess you should check if your GPU is occupied by any other processes. Also, you should use gputimeit() instead of tic...toc to time a gpuArray routine. On the Titan X, I find that your code executes in about 2.7 sec.
  1 个评论
adam R
adam R 2018-9-4
so probably some hardware issue.
no display is attached to this gpu. no nvidia sli. I'll try thsi code with some other gpu

请先登录,再进行评论。


Joss Knight
Joss Knight 2018-9-5
编辑:Joss Knight 2018-9-5
It's hard to tell because your code is very confusing, but it looks to me like you are resizing 900 images one image at a time. Yet as far as I can tell, your cropped images are all the same size. So you could just stack them all along the 3rd dimension and resize them all in batch in a single call. Resizing a single image of this size on the GPU isn't going to be faster than the CPU because the GPU isn't fully utilized, but doing 900 images at once should be.
The other thing is that you appear to be down-sampling the cropped images and then upsampling back to 570-by-570, which is the same as running some sort of cubic filter kernel on them. So probably you should be using imfilter instead.
Finally, do this in single precision not double, you'll get better performance. (Use im2single.)
Oh, and using parfor with a single GPU is utterly pointless. You only have one GPU, so you can't improve things by having multiple MATLAB workers all trying to use the same one. Unless maybe you use MPS on Linux.
  7 个评论
Matt J
Matt J 2018-9-6
编辑:Matt J 2018-9-6
OK. Except unfortunately I don't think you could do all slices in a single batch, here. The resizing factor, rszVal is varying from slice to slice.
Although, I guess you could do the 2nd resize
Irsz = imresize(Irsz0, [570 570]);
separately after the loop.
Joss Knight
Joss Knight 2018-9-6
I saw that, but I noticed that the user seemed to be processed 23581 different files, so each of those could be cropped and resized in batch, with a loop over the downsample scale factor. User should consider using an imageDatastore which can hide the cost of file I/O by doing it in the background (might have to convert png to jpg first though, can't remember whether png is supported). imageDatastore can read files in batches using ReadSize, for processing in batches.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 GPU Computing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by