gpu memory code optimization

1 次查看(过去 30 天)
Octavian
Octavian 2014-12-13
评论: Octavian 2014-12-15
Dear Wizes,
I would appreciate if you could break this: My code includes gpuArray operations inside a for loop; the relevant portion is here:
  1. % allocate gpu memory:
  2. A=GPUArray.eye(x,'single'); B=GPUArray.zeros(y,x,'single'); C=GPUArray.zeros(x,y,'single'); % x>>>y
  3. for n=1:t %for loop begins
  4. ... % not relevant, B and C are 'filled' by specific matrix multiplications
  5. D=B*A; % size(D)= (y,x)
  6. E=C*D; % size(E)= (x,x)
  7. A=A-E;
  8. clear E D
  9. ...
  10. end
I must mention that all of A,B,C,D,E are different with each iteration in the for loop as they are reused.
The problem is that x is large, and A and E are huge (2 to 7Gb, depending on x), killing my gpu. I made it run, albeit slowly, by breaking E (performing operations row-wise in A for steps 6-7 above:
for i=1:size (A,1)
E=C(i,:)*D;
A(i,:)=A(i,:)-E;
clear E D
1. This works, but is very slow, I was wondering if there is a way to calculate the same for blocks of n rows at once, not one row at a time (with n scaled based on what the gpu can take, where x=kn+p, where p<n); or using mtimesx-like bsxfun routines for matrix multiplication.
2. It would be great if A could be broken in blocks of rows or columns, or in one at a time (row-wise or column-wise), however this is above my job description, given that A is the right multiplier in step 5. This would allow me to expand the size of x I can use.
Thank you, as always Octavio
  6 个评论
Matt J
Matt J 2014-12-15
Are none of these matrices sparse? I know that the GPU doesn't support sparse matrices, but if they are sparse, maybe the CPU is better?
Octavian
Octavian 2014-12-15
With the first iterations, A stays sparse, but then it fills pretty quickly, thanks, Octavio

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Get Started with Optimization Toolbox 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by