Use Dynamically Allocated C++ Arrays in Generated Function Interfaces
In most cases, when you generate code for a MATLAB® function that accepts or returns an array, there is an array at the interface of the generated CUDA® function. For an array size that is unknown at compile time, or whose bound exceeds a predefined threshold, the memory for the generated array is dynamically allocated.
By default, the dynamically allocated array is implemented by using the C style
emxArray
data structure in the generated code. Alternatively, dynamically
allocated array can be implemented as a class template called
coder::gpu_array
in the generated code.
coder::gpu_array
offers several advantages over
emxArray
style data structures:
The generated code is exception safe.
Generated code is easier to read.
Better C++ integration because of ease of initializing the input data and working with the output data.
Because
coder::gpu_array
is defined in a header file that ships with MATLAB, you can write the interface code before the generating code.
To use dynamically allocated arrays in your custom CUDA code that you integrate with the generated CUDA C++ functions, learn to use the coder::gpu_array
template.
Change Interface Generation
By default, the generated CUDA code uses the C style emxArray
data structure to implement
dynamically allocated arrays. Instead, you can choose to generate CUDA code that uses the coder::gpu_array
template to implement
dynamically allocated arrays. To generate the coder::gpu_array
template,
do one of the following:
In a code configuration object (
coder.MexCodeConfig
,coder.CodeConfig
, orcoder.EmbeddedCodeConfig
), set theDynamicMemoryAllocationInterface
parameter to'C++'
.In the GPU Coder™ app, on the Memory tab, set Dynamic memory allocation interface to
Use C++ coder::array
.
Using the coder::gpu_array
Class Template
When you generate CUDA code for your MATLAB functions, the code generator produces header files
coder_gpu_array.h
and coder::array.h
in the build
folder. The coder_gpu_array.h
header file contains the definition of the
class template gpu_array
in the namespace coder
and
the definitions for the function templates arrayCopyCpuToGpu
and
arrayCopyGpuToCpu
. The coder::gpu_array
template
implements the dynamically allocated arrays in the generated code. The declaration for this
template
is:
template <typename T, int32_T N> class gpu_array
T
and has N
dimensions. For example, to declare a two-dimensional dynamic array
myArray
that contains elements of type int32_T
in
your custom CUDA code, use:coder::gpu_array<int32_T, 2> myArray
The function templates arrayCopyCpuToGpu
and
arrayCopyGpuToCpu
implement data transfers between the CPU and GPU
memory. On the CPU, the dynamically allocated arrays are implemented by using the
coder::array
template. For more information on the APIs you use to
create and interact with dynamic arrays in your custom code, see Use Dynamically Allocated C++ Arrays in Generated Function Interfaces.
To use dynamically allocated arrays in your custom CUDA code that you want to integrate with the generated code (for example, a custom
main function), include the coder_gpu_array.h
and
coder_array.h
header files in your custom .cu
files.
Generate C++ Code That Accepts and Returns a Variable-Size Numeric Array
This examples shows how to customize the generated example main function to use the
coder::gpu_array
and coder::array
class templates
in your project.
Your goal is to generate a CUDA executable for xTest1
that can accept and return an array
of int32_T
elements. You want the first dimension of the array to be
singleton and the second dimension to be unbounded.
Define a MATLAB function
xTest1
that accepts an arrayX
, adds the scalarA
to each of its elements, and returns the resulting arrayY
.function Y = xTest1(X, A) Y = X; for i = 1:numel(X) Y(i) = X(i) + A; end
Generate initial source code for
xTest1
and movexTest1.h
from the code generation folder to your current folder. Use the following commands:cfg = coder.gpuConfig('lib'); cfg.DynamicMemoryAllocationInterface = 'C++'; cfg.GenerateReport = true; inputs = {coder.typeof(int32(0), [1 inf]), int32(0)}; codegen -config cfg -args inputs xTest1.m
The function prototype for
xTest1
in the generated code is shown here:extern void xTest1(const coder::array<int, 2U> &X, int A, coder::array<int, 2U> &Y);
Interface the generated code by providing input and output arrays that are compatible with the function prototype shown above.
Define a CUDA main function in the file
xTest1_main.cu
in your current working folder.This main function includes the header files
coder_gpu_array.h
andcoder_array.h
that contain thecoder::gpu_array
andcoder::array
class template definitions respectively. The main function performs these actions:Declare
myArray
andmyResult
as two-dimensionalcoder::array
dynamic arrays ofint32_T
elements.Dynamically set the sizes of the two dimensions of
myArray
to1
and100
by using theset_size
method.Access the size vector of
myResult
by usingmyResult.size
.
#include<iostream> #include<coder_array.h> #include<xTest1.h> int main(int argc, char *argv[]) { static_cast<void>(argc); static_cast<void>(argv); // Instantiate the input variable by using coder::array template coder::array<int32_T, 2> myArray; // Allocate initial memory for the array myArray.set_size(1, 100); // Access array with standard C++ indexing for (int i = 0; i < myArray.size(1); i++) { myArray[i] = i; } // Instantiate the result variable by using coder::array template coder::array<int32_T, 2> myResult; // Pass the input and result arrays to the generated function xTest1(myArray, 1000, myResult); for (int i = 0; i < myResult.size(1); i++) { if (i > 0) std::cout << " "; std::cout << myResult[i]; if (((i+1) % 10) == 0) std::cout << std::endl; } std::cout << std::endl; return 0; }
Generate code by running this script:
cfg = coder.gpuConfig('exe'); cfg.DynamicMemoryAllocationInterface = 'C++'; cfg.GenerateReport = true; cfg.CustomSource = 'xTest1_main.cu'; cfg.CustomInclude = '.'; codegen -config cfg -args inputs xTest1_main.cu xTest1.m
The code generator produces an executable file
xTest1
in your current working folder. Run the executable using the following commands:if ispc !xtest1.exe else !./xTest1 end
1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099
Limitations
For generating CUDA code that uses
coder::gpu_array
, the GPU memory allocation mode must be set todiscrete
.To change the memory allocation mode in the GPU Coder app, use the
Malloc Mode
drop-down box under More Settings->GPU Coder. When using the command-line interface, use theMallocMode
build configuration property and set it to either'discrete'
or'unified'
.GPU Coder does not support
coder::gpu_array
in Simulink®.