Accelerate Simulation Using GPUs
A GPU-based System object™ looks and behaves much like the non-GPU-based System objects in the Communications Toolbox™ product. The important difference is that the algorithm is executed on a graphics processing unit (GPU) rather than on a CPU. Using the GPU can accelerate your simulation.
GPUs excel at processing large quantities of data and performing computations with high compute intensity. Processing large quantities of data is one way to maximize the throughput of your GPU in a simulation. The amount of data that the GPU processes at any one time depends on the size of the data passed to the input of a GPU-based System object. Therefore, one way to maximize this data size is by processing multiple frames of data.
You can use a single GPU-based System object to process multiple data frames simultaneously or in parallel. This
differs implementation from the way standard System objects are implemented. For
GPU-based System objects, the number of frames the objects process in a single call to
the object function is either implied by one of the object properties or explicitly
stated using the NumFrames
property on the objects.
Passing MATLAB® arrays to a GPU-based System object requires transferring the initial data from a CPU to the GPU. Then, the
GPU-based System object performs calculations and transfers the output data back to the CPU. This
process introduces latency. When you pass data in the form of a gpuArray
(Parallel Computing Toolbox) to a GPU-based System object, the object does not incur the latency from data transfer. Therefore, a
GPU-based System object runs faster when you supply a gpuArray
as the
input.
In general, you should try to minimize the amount of data transfer between the CPU and the GPU in your simulation. For more information, see Establish Arrays on a GPU (Parallel Computing Toolbox).
GPU-Based System Object Construction
System objects for the Communications Toolbox product are located in the comm
namespace and are constructed as:
H = comm.<object name>
For example, you construct a Viterbi decoder System object as:
H = comm.ViterbiDecoder
In cases where a corresponding GPU-based implementation of a System object exists, they are located in the comm.gpu
namespace and constructed as:
H = comm.gpu.<object name>
For example, you construct a GPU-based Viterbi decoder System object as:
H = comm.gpu.ViterbiDecoder
For a list of available GPU-based implementations, see GPU Arrays Support List for System Objects.
Process Multiple Data Frames Using GPU-Based System Objects
You can use a single GPU System object™ to process multiple data frames simultaneously. Some GPU-based System objects, such as the LDPC decoder, can infer the number of frames from the object properties. Other GPU-based System objects, such as the Viterbi decoder, include a NumFrames
property to define the number of frames present in the input data.
First, simultaneously process two data frames using a GPU-based LDPC decoder System object™. The ParityCheckMatrix
property determines the frame size. The frame size and the input data vector length determine the number of frames processed by the LDPC decoder object.
numframes = 2; ldpcEnc = comm.LDPCEncoder; ldpcGPUDec = comm.gpu.LDPCDecoder; ldpcDec = comm.LDPCDecoder; msg = randi([0 1],32400,2); for ii=1:numframes encout(:,ii) = ldpcEnc(msg(:,ii)); end % Single ended to bipolar (for LLRs) encout = 1-2*encout; % Decode on the CPU for ii=1:numframes cout(:,ii) = ldpcDec(encout(:,ii)); end % Multiframe decode on the GPU gout = ldpcGPUDec(encout(:)); % Check equality isequal(gout,cout(:))
ans = logical
1
Next, process multiple data frames using the NumFrames
property of the GPU-based Viterbi decoder System object. For a Viterbi decoder, the frame size of your system cannot be inferred from an object property. Instead, you must define the number of frames present in the input data by using the NumFrames
property of the Viterbi decoder object.
numframes = 10; convEncoder = comm.ConvolutionalEncoder( ... TerminationMethod="Terminated"); vitDecoder = comm.ViterbiDecoder( ... TerminationMethod="Terminated");
Create a GPU-based Viterbi decoder System object using the NumFrames
property.
vitGPUDecoder = comm.gpu.ViterbiDecoder( ... TerminationMethod="Terminated", ... NumFrames=numframes); msg = randi([0 1],200,numframes); for ii=1:numframes convEncOut(:,ii) = 1-2*convEncoder(msg(:,ii)); end % Decode on the CPU for ii=1:numframes cVitOut(:,ii) = vitDecoder(convEncOut(:,ii)); end % Decode on the GPU gVitOut = vitGPUDecoder(convEncOut(:)); % Check equality isequal(gVitOut,cVitOut(:))
ans = logical
1
Pass Data to GPU-Based System Objects Using gpuarray Input
In this example, you transmit 1/2 rate convolutionally encoded 16-PSK-modulated data through an AWGN channel, demodulate and decode the received data, and assess the error rate of the received data. For this implementation, you use the GPU-based Viterbi decoder System object™ to process multiple signal frames in a single call and then use gpuArray
(Parallel Computing Toolbox) objects to pass data into and out of the GPU-based System objects.
Create GPU-based System objects for PSK modulation and demodulation, convolutional encoding, Viterbi decoding, and AWGN. Create a System object for error rate calculation.
M = 16; % Modulation order numframes = 100; gpuconvenc = comm.gpu.ConvolutionalEncoder; gpupskmod = comm.gpu.PSKModulator(M,pi/16,BitInput=true); gpupskdemod = comm.gpu.PSKDemodulator(M,pi/16,BitOutput=true); gpuawgn = comm.gpu.AWGNChannel( ... NoiseMethod='Signal to noise ratio (SNR)',SNR=30); gpuvitdec = comm.gpu.ViterbiDecoder( ... InputFormat='Hard', ... TerminationMethod='Truncated', ... NumFrames=numframes); errorrate = comm.ErrorRate(ComputationDelay=0,ReceiveDelay=0);
Due to the computational complexity of the Viterbi decoding algorithm, loading multiple frames of signal data on the GPU and processing them in one call can reduce overall simulation time. To enable this implementation, the GPU-based Viterbi decoder System object contains a NumFrames
property. Instead of using an external for
-loop to process individual frames of data, you use the NumFrames
property to configure the GPU-based Viterbi decoder System object to process multiple data frames. Generate numframes
of binary data frames. To efficiently manage the data frames for processing by the GPU-based System objects, represent the transmission data frames as a gpuArray
object.
numsymbols = 50; rate = 1/2; dataA = gpuArray.randi([0 1],rate*numsymbols*log2(M),numframes);
The error rate object does not support gpuArray
objects or multichannel data, so you must retrieve the array from the GPU by using the gather
(Parallel Computing Toolbox) function to compute the error rate on each frame of data in a for
-loop. Perform the GPU-based encoding, modulation, AWGN, and demodulation inside a for
-loop.
for ii = 1:numframes encodedData = gpuconvenc(dataA(:,ii)); modsig = gpupskmod(encodedData); noisysig = gpuawgn(modsig); demodsig(:,ii) = gpupskdemod(noisysig); end
The GPU-based Viterbi decoder performs multiframe processing without a for
-loop.
rxbits = gpuvitdec(demodsig(:)); errorStats = errorrate(gather(dataA(:)),gather(rxbits)); fprintf('BER = %f\nNumber of errors = %d\nTotal bits = %d', ... errorStats(1), errorStats(2), errorStats(3))
BER = 0.009800 Number of errors = 98 Total bits = 10000
MATLAB System Block Support for GPU-Based System Objects
If you are using MATLAB System (Simulink) blocks in your implementation, you can include these GPU-based System objects in them.
comm.gpu.AWGNChannel
comm.gpu.BlockDeinterleaver
comm.gpu.BlockInterleaver
comm.gpu.ConvolutionalDeinterleaver
comm.gpu.ConvolutionalEncoder
comm.gpu.ConvolutionalInterleaver
comm.gpu.PSKDemodulator
comm.gpu.PSKModulator
comm.gpu.TurboDecoder
comm.gpu.ViterbiDecoder
The GPU System objects must be simulated using Interpreted
execution
. You must select this option explicitly on the block
mask; the default value is Code generation
.
Related Topics
- Code Generation and Acceleration Support
- GPU Computing Requirements (Parallel Computing Toolbox)