Fetching outputs from different GPU's, results in an error ?

1 次查看(过去 30 天)
I have 2-GPU in my computer, I wanted to use both the GPU's to perform the function. Hence I feed, part of the array to one GPU and the remaining to the second GPU.
Agpu1=gpuArray(A(:,:,1:n/2)); %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n)); %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@Function,2,Agpu1,1);
F(2)=parfeval(@Function,2,Agpu2,2);
[o1,o2] = fetchOutputs(F,'UniformOutput',false); % Blocks until complete
When I fetch the outputs using the last statement, I get the error "Error using parallel.Future/fetchOutputs : One or more futures resulted in an error" .
1) Does this mean, fetch outputs is trying to fetch the output, when the other GPU is still performing the operation. How to solve this ?
In the above link, when I try printing the gpuDevice used, it always shows gpu2 is being used and gpu 1 is idle. How to confirm both GPU's are being used ?
Thankyou!
  3 个评论
Srinidhi Ganeshan
Srinidhi Ganeshan 2019-1-29
Below is the code :
for i=1:500
A(:,:,i)=rand(500,500);
end
Agpu1=gpuArray(A(:,:,1:n/2)); %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n)); %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@fcn,2,Agpu1,1);
F(2)=parfeval(@fcn,2,Agpu2,2);
[o1,o2] = fetchOutputs(F,'UniformOutput',false); % Blocks until complete
function [q,r]=fcn(A,Id)
if nargin>1, gpuDevice(Id);end
for i=size(A,3):-1:1
[q(:,:,i),r(:,:,i)]=qr(A(:,:,i),0);
end
end
1) a) Error:
ans =
ParallelException with properties:
identifier: 'parallel:gpu:array:InvalidData'
message: 'The data no longer exists on the device.'
cause: {}
remotecause: {[1x1 MException]}
stack: [1x1 struct]
2) I am using 16 workers. In this case how will parfeval use the GPU
3) In my program, I used different GPU's using the gpuDevice Id. When I do that and execute my program, I get an error in line 5 i.e at fetch outputs. The error message is mentioned above.
4)Thanks, for mentioning that, function is not called a function in my program.
5) How to do "Another would be to open a pool with a single worker, and use the client for the other half of the computation. This would help with data transfer since you don't need to transfer half of the array to another process." ? Is there any small example you could provide ?
6)So inorder to solve (3), I tried using wait, one of the methods of parallel.FevalFuture this way
Agpu1=gpuArray(A(:,:,1:n/2)); %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n)); %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@fcn,2,Agpu1,1);
F(2)=parfeval(@fcn,2,Agpu2,2);
wait(F,'finished');
[o1,o2] = fetchOutputs(F,'UniformOutput',false); % Blocks until complete
Still I get the same error.
I also tried using fetchNext so that, each completed job arrives when it is done,,
Q1=cell(1,2);
R1=cell(1,2);
for idx=1:2
[completedIdx,Q,R] = fetchNext(F);
disp(completedIdx);
Q1{completedIdx}=Q;
R1{completedIdx}=R;
end
toc
Q=cat(3,gather(Q1{1}),gather(Q1{2}));
R=cat(3,gather(R1{1}),gather(R1{2}));
Eventhough I do this, I get the same error stating
One or more future results resulted in an error.
What should I do to solve this ?
To sum it up , I am planning to do a small part of my QR in CPU and rest of part split between the GPU devices. So that Cpu, gpu1, gpu2.. are kept busy at the same time.
Joss Knight
Joss Knight 2019-1-30
编辑:Joss Knight 2019-1-30
You can try to use the same GPUs on more than one parallel worker, but it's pointless - the work will happen in serial. If you have two GPUs, open a pool with two workers. If you want to do some work on the GPU and some on the CPU, take a look at the answer to this question.
The error is a pretty simple one. Every time you select the device using gpuDevice, you are resetting it, clearing all gpuArray variables in memory, including the ones you passed in. As I said, there is no point in moving the data to the GPU on the client MATLAB and then sending it to your worker in a parfeval call. All that happens is that the data gets transferred back to the system memory, then transmitted to the other process, then deserialised and put back on whatever device is currently selected. Create your data on your worker or send it as a CPU array and then transfer it to the GPU at the other end. You could also try using a parallel.pool.Constant to define data on your workers that persists from call to call.
If I was trying to do pagewise QR like you are on two GPUs I'd probably use SPMD, and I probably would limit the GPU work to just the call to qr - there's no advantage to all that indexing and storage on the GPU, I don't think:
parpool('local', gpuDeviceCount);
spmd
nPages = size(A,3);
blocksize = ceil(nPages/numlabs);
strt = (labindex-1)*blocksize + 1;
fnsh = min(nPages, strt+blocksize);
for j = fnsh:-1:strt
Agpu = gpuArray(A(:,:,j));
[qgpu,rgpu] = qr(Agpu, 0);
i = j-strt+1;
q(:,:,i) = gather(qgpu);
r(:,:,i) = gather(rgpu);
end
end
% q and r are now Composites so need to be indexed to recreate result
Q = cat(3, q{:});
R = cat(3, r{:});
By the way, I hope you're not actually doing this
for i=1:500
A(:,:,i)=rand(500,500);
end
Since it's just the same as A = rand(500,500,500), but way slower.

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Parallel and Cloud 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by