Optimize Loops in Generated Code
Transform loops in the generated code to suit your execution speed and memory requirements. Loop control objects instruct the code generator to optimize loops in the generated code.
You can append these transforms to a loop control object to optimize the generated loops:
interchange
: Interchanging nested loops can improve cache performance when accessing array elements.Usually, accessing an array element involves storing an entire block of data from memory to cache. Interchanging loops can help improve execution speed since the subsequent array elements stored in cache are readily available to the processor.
parallelize
: Parallelized loop execution might improve execution speed by utilizing available threads.Each available thread is assigned to sequentially access a data structure and operate on its indices one-by-one. Use this optimization when your loop sequentially accesses array elements and operations are independent of other array elements.
reverse
: Reverse loop iteration order.Use this transform when you know the upper bound of the loop iterator.
tile
: Tiling loop nests can reduce memory access latency.Tiling partitions the iteration space of a loop into smaller blocks which helps data remain in cache until it is reused. This involves partitioning a large array from memory into smaller blocks that fit into your cache size. Use this transform when you have limited cache availability.
unrollAndJam
: Unroll and jam loops can improve cache locality.Unroll and jam transforms are usually applied to perfectly nested loops, which are loops where all the data elements are accessed within the inner loop. This transform unrolls the body of the inner loop according to the loop index of the outer loop.
vectorize
: Generate code for loops that use SIMD instructions to apply multiple operations simultaneously.
Instruct the code generator to optimize loops in the generated code by:
Using objects of
coder.loop.Control
Calling its member functions
Optimize Loops By Using Objects of coder.loop.Control
Create objects of coder.loop.Control
in your MATLAB® code and append the required transformations to the object. For
example, to apply the vectorize
transform by using a
coder.loop.Control
object, follow this
pattern:
function out = applyVectorize out = zeros(1,100); loopObj = coder.loop.Control; loopObj = loopObj.vectorize('loopId'); loopObj.apply; for loopId = 1:100 out = out + loopId; end
To generate code for this function for an Intel® target processor, use these commands:
cfg = coder.config('lib'); cfg.InstructionSetExtensions = "SSE2"; codegen -config cfg applyVectorize -launchreport
The generated code uses the SSE2
SIMD instruction
set.
void applyVectorize(double out[100]) { int i; int loopId; memset(&out[0], 0, 100U * sizeof(double)); for (loopId = 0; loopId < 100; loopId++) { for (i = 0; i <= 98; i += 2) { __m128d r; r = _mm_loadu_pd(&out[i]); _mm_storeu_pd(&out[i], _mm_add_pd(r, _mm_set1_pd((double)loopId + 1.0))); } } }
You can append multiple transforms to the same loop control object. Call the
apply
method before defining the loops in your code. For
example, you can add a parallelize transform to the loop if a variable
inputVal
is greater than some threshold
value.
... loopObj = coder.loop.Control; loopObj = loopObj.parallelize('i'); if inputVal > threshold loopObj = loopObj.vectorize('inputVal'); end ... loopObj.apply; for i = 1:10 for inputVal = 1:10 ... end end...
Optimize Loops By Calling Member Functions Independently
You can apply loop transformations by calling the loop optimization functions immediately before the loop itself. You can apply these functions to the loops in your code:
Follow the pattern shown here:
function out = applyInterchange out = rand(10,7); coder.loop.interchange('loopA','loopB'); for loopA = 1:10 for loopB = 1:7 out(loopA,loopB) = out(loopA,loopB) + loopA; end end
coder.loop.interchange
function call. However, you must
call the apply
method for the returned object before defining the
loop.function out = applyInterchange out = rand(10,7); loopObj = coder.loop.interchange('loopA','loopB'); ... loopObj.apply; for loopA = 1:10 for loopB = 1:7 out(loopA,loopB) = out(loopA,loopB) + loopA; end end
Generate code for these functions by running this command:
codegen -config:lib applyInterchange -launchreport
The generated code is shown here:
void applyInterchange(double out[70]) { int loopA; int loopB; if (!isInitialized_applyInterchange) { applyInterchange_initialize(); } b_rand(out); for (loopB = 0; loopB < 7; loopB++) { for (loopA = 0; loopA < 10; loopA++) { int out_tmp; out_tmp = loopA + 10 * loopB; out[out_tmp] += (double)loopA + 1.0; } } }
See Also
coder.loop.Control
| coder.loop.interchange
| coder.loop.parallelize
| coder.loop.reverse
| coder.loop.tile
| coder.loop.unrollAndJam
| coder.unroll
| coder.loop.vectorize