inexplicable GPU memory usage

9 次查看(过去 30 天)
I'd like to perform fft's on multiple arrays that are significantly smaller in size than the total amount of memory on my GPU (e.g. 1GB arrays on an 8GB RTX 2070).
When I load just one array onto the GPU and perform an FFT, reduction in free memory is disproportionately high relative to the size of the original array. I would expect this to be the case during the operation, but would also expect the memory to be freed up after the operation. This doesn't seem to be the case, however, and memory required to perform the FFT operation never gets freed after the operation. Consider the following example, where the free memory is tracked as 1) an array is placed on the GPU, 2) an in-place FFT is computed in the column dimension, 3) an iFFT is computed in place in the column dimension.
gpu = gpuDevice;
m(1) = gpu.FreeMemory; % Free memory on GPU
% 3D data array on GPU
A = gpuArray(randn(2048, 64, 2000, 'single'));
% Check free memory:
m(2) = gpu.FreeMemory
% Perform in-place fft:
A = fft(A);
% re-check free memory:
m(3) = gpu.FreeMemory
% Perform in-place ifft:
A = ifft(A);
% final check of free memory:
m(4) = gpu.FreeMemory
After execution I get the following values for m:
m =
1.0e+09 *
6.9215 5.8729 1.6472 0.5986
The first value seems fine, more or less (I never get more than ~7GB free on my 8GB card but I'm not going to complain), and the second value is fine as well. But the 3rd value, (~1.6GB of memory free) is very concerning, and indicates that after the operation more than 5GB of memory are "in use." I would expect 2GB because the size of A hasn't changed and the values are now complex, but not 5GB. Even if 5GB were required during the FFT operation, why don't I get 3GB back afterwards?
Then, to continue, after performing an inverse Fourier transform (which I threw in there just out of curiousity) it looks like a full 1GB are additionally sucked from the pool of free memory, never to return.
Is there a way to free up the memory that isn't apparently being actively used for variable storage on the GPU without resetting the GPU or clearing the variable (neither of which are good options), or even better, a way of avoiding the problem in the first place without taking a major performance hit? Matlab version is R2019a

采纳的回答

Edric Ellis
Edric Ellis 2020-2-16
编辑:Edric Ellis 2020-2-19
MATLAB caches GPU memory and FFT plans etc. to make subsequent operations more efficient. This does mean that the FreeMemory property reflects this. The AvailableMemory property takes the caching into account, and tells you how much memory is available to use (i.e. it knows that the caches will be flushed automatically by MATLAB when necessary). See https://uk.mathworks.com/help/parallel-computing/parallel.gpu.gpudevice.html for more
  2 个评论
Tim
Tim 2020-2-17
Thanks for bringing the AvailableMemory property to my attention, I don't know how I missed that. Does retaining the plan information in cache accelerate any subsequent FFT operations on arrays of the same size, dimension and class type as A, or is it essentially junk memory that's just going to get replaced as other operations are performed?
Joss Knight
Joss Knight 2020-2-17
Yes, the caching improves performance on subsequent operations.

请先登录,再进行评论。

更多回答(1 个)

Image Analyst
Image Analyst 2020-2-15
I'm not sure if A gets converted to double after fft(), but you know that A will be complex and take up twice as much as a real matrix. And if it's double, it takes up 4 times as much space. Try calling delete(A) or clear('A') to see how that affects memory
whos A
memory
gpu.FreeMemory
clear('A');
memory
gpu.FreeMemory
  1 个评论
Tim
Tim 2020-2-15
Thank you for the comment, Image Analyst. Yes, I did this. When I gather the variable back to the workspace it is still in single (complex), so I presume it stayed in single on the GPU, so it should take up 2x the original. When I clear A it frees up nearly the whole of the memory (less something like 250MB, not sure where that went), but it also clears the variable - something I don't want to do, and don't think I should have to do.
Plus, even if it did convert to double, (which I don't think it did), it should only take up 4x the space, not 5x, as it does, since it is in-place.
I'm wondering if cufftDestroy (section 3.8 here) isn't performing successfully and the memory allocated to the fft plan isn't being wiped after completion. I don't know how I would check this, though.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Get Started with GPU Coder 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by