gpu memory code optimization
2 次查看(过去 30 天)
显示 更早的评论
Dear Wizes,
I would appreciate if you could break this: My code includes gpuArray operations inside a for loop; the relevant portion is here:
- % allocate gpu memory:
- A=GPUArray.eye(x,'single'); B=GPUArray.zeros(y,x,'single'); C=GPUArray.zeros(x,y,'single'); % x>>>y
- for n=1:t %for loop begins
- ... % not relevant, B and C are 'filled' by specific matrix multiplications
- D=B*A; % size(D)= (y,x)
- E=C*D; % size(E)= (x,x)
- A=A-E;
- clear E D
- ...
- end
I must mention that all of A,B,C,D,E are different with each iteration in the for loop as they are reused.
The problem is that x is large, and A and E are huge (2 to 7Gb, depending on x), killing my gpu. I made it run, albeit slowly, by breaking E (performing operations row-wise in A for steps 6-7 above:
for i=1:size (A,1)
E=C(i,:)*D;
A(i,:)=A(i,:)-E;
clear E D
1. This works, but is very slow, I was wondering if there is a way to calculate the same for blocks of n rows at once, not one row at a time (with n scaled based on what the gpu can take, where x=kn+p, where p<n); or using mtimesx-like bsxfun routines for matrix multiplication.
2. It would be great if A could be broken in blocks of rows or columns, or in one at a time (row-wise or column-wise), however this is above my job description, given that A is the right multiplier in step 5. This would allow me to expand the size of x I can use.
Thank you, as always Octavio
6 个评论
Matt J
2014-12-15
Are none of these matrices sparse? I know that the GPU doesn't support sparse matrices, but if they are sparse, maybe the CPU is better?
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Kernel Creation from MATLAB Code 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!