Correctly timing kernel functions created with GPU Coder on Jetson

2 次查看(过去 30 天)
Hi,
I'm getting started testing out a Jetson nano, and have been able to deploy code, run it, save variables to a file, and gather those back on the host computer (a Windows pc), but I'm fairly certain i'm not corretly timing the execution time. The basic structure of the main function is below, i've omitted the code of the kernels as it didn't seem necessary. I'm fairly sure this is not producing correct timings of different kernels, possibly due to the C code not waiting for the kernel call to finish before executing the 'toc' line?
Three quesitons: 1) If i were writing directly in CUDA C, i could put cudaDeviceSynchronize(); in. Would this solve this issue? If so, is there a matlab command I can use to get GPU Coder to place that line of code where I tell it? 2) Is there a method in GPU Coder to have it generate the code first, which I can then go in and edit, and have it compile my now edited code? I've been following the example herehttps://www.mathworks.com/help/supportpkg/nvidia/examples/getting-started-with-the-gpu-coder-support-package-for-nvidia-gpus.html, and I don't see a manner that lets me edit the code that GPU Coder creates before it gets compiled on the Jetson. I'm sure the option is there, but I don't know how to do that. 3) Is there a better method for timing kernels that the community reccomends? Although I've done a bit of CUDA coding, I'm very far from an expert, and am aware that I might be going at this totally wrong.
times = zeros(5,4);
outputArray1=zeros(200,200);
outputArray2=zeros(200,200);
outputArray3=zeros(200,200);
outputArray4=zeros(200,200);
for i = 1:5;
tic
outputArray1=SimpleFunction1;
times(i,1)=toc;
tic
outputArray2=SimpleFunction2;
times(i,2)=toc;
tic
outputArray3=SimpleFunction3;
times(i,3)=toc;
tic
outputArray4=SimpleFunction4;
times(i,4)=toc;
end
fId=fopen('times.bin','w');
fwrite(fId,times,'single');
...more file output for the other arrays

回答(1 个)

Aaron Meldrum
Aaron Meldrum 2020-2-14
Answering part of my own quesiton here, the solution to getting the device synchronize command in is straightforward.
The cudaDeviceSynchronize call can be added after the function call with the following line of code.
coder.ceval('-gpudevicefcn', 'cudaDeviceSynchronize');

类别

Help CenterFile Exchange 中查找有关 Get Started with GPU Coder 的更多信息

产品


版本

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by