gpucoder.reduce
Optimize GPU implementation for reduction operations
Syntax
Description
aggregates the values in the input array to a single value using every function handle
provided in the cell array. The size of output is 1-by-N, where
N is the number of function handles.S
= gpucoder.reduce(A
,{@FUN1,@FUN2,...})
The code generator uses shuffle
intrinsics to perform reduction
operations on the GPU. The function aggregates multiple function handles inside a single
kernel on the GPU.
aggregates the values in the input array using the options specified by one or more
name-value arguments.S
= gpucoder.reduce(___,Name=Value
)
Examples
Input Arguments
Output Arguments
Limitations
gpucoder.reduce
does not support reducing complex arrays.The user-defined function must accept two inputs and return one output. The data types of the inputs and output must match the data type of the preprocessed input array.
The user-defined function must be commutative and associative. Otherwise, the behavior of the function is undefined.
For code generation,
gpucoder.reduce
accepts a limited number of user-defined function handles based on the size of the output data type. For example, you can input up to 46 function handles that output thehalf
data type or up to 11 function handles that output thedouble
data type. If you input too many function handles, code generation generates an error.For inputs that are of the integer data type, the generated code may contain intermediate computations that reach saturation. In this case, the results from the generated code may not match the simulation results from MATLAB®.
Version History
Introduced in R2019bSee Also
Apps
Functions
codegen
|coder.gpu.kernel
|coder.gpu.kernelfun
|gpucoder.stencilKernel
|coder.gpu.constantMemory
|gpucoder.sort