Multicore Execution of Interpolated FIR Filter Using Dataflow Domain
This example shows how to speed up execution of an Interpolated FIR Filter using dataflow as the execution domain of the root model in Simulink®. Specifying the execution domain as dataflow at the root model enables multicore analysis, simulation, and code generation for the entire model.
Many real-time audio and digital signal processing applications require filtering of a signal streaming at a high sampling rate. The computational power thus required grows exponentially with the increase in the input sample rate or the filter order. One way to optimize the filtering process is to break it into multiple stages, but the input will still be processed at the same rate. This example demonstrates how to use multicore processing in the context of an Interpolated FIR Filter to improve simulation performance of the model and generate multicore code.
Interpolated FIR Filter
Interpolated FIR Filter provides an efficient alternative to a high filter order FIR Filter by using an FIR Decimator and an FIR Interpolator to change the rate at which the input is filtered.
In this example, the input is a Gaussian pseudorandom distribution signal. The input is first passed through an FIR Decimator to lower the sampling rate. Then, the input is filtered by a set of FIR Filters. Before emitting the output, the sampling rate of the filtered output is converted back to its original value, which is performed using an FIR Interpolator. A MATLAB System block is used in the model to verify the output data in code generation. A Stop Simulation block is used in the model to specify the finite number of steps the simulation and generated code runs. In this model, the number of steps is set to 1000
and each step processes one sample of size [7000 1].
Specify Dataflow Execution Domain
In the Simulink, to specify dataflow as the execution domain for the root model:
In the Simulink Toolstrip, on the Modeling tab, in the Design gallery, select Property Inspector.
In the Property Inspector, select Set domain specification.
Set Domain parameter to
Dataflow
.
Multicore Simulation of Dataflow Domain
Dataflow domain automatically partitions your model into multiple threads for better performance. Once you set the Domain parameter to Dataflow
, you can use the Multicore tab from the toolstrip to analyze your model to get better performance. To learn more about the Multicore tab, see Perform Multicore Analysis for Dataflow.
For this example, the Multicore tab mode is set to Simulation Profiling
for simulation performance analysis.
It is recommended to optimize model settings for optimal simulation performance. To accept the proposed model settings, on the Multicore tab, click Optimize. Alternatively, you can use the drop menu below the Optimize button to change the settings individually. In this example, the model settings have already been optimized for simulation performance.
On the Multicore tab, click the Run Analysis button to start the analysis of the dataflow domain for simulation performance. Once the analysis finishes, the Analysis Report and Suggestions window shows how many threads the dataflow system uses during simulation.
After analyzing the model, the Analysis Report and Suggestions window shows one thread because the data dependency between the blocks in the model prevents blocks from being executed concurrently. By pipelining the data dependent blocks, the dataflow system can increase concurrency for higher data throughput. The Analysis Report and Suggestions window shows the recommended number of pipeline delays as Suggested for Increasing Concurrency. The suggested latency value is computed to give the best performance.
The following diagram shows the Analysis Report and Suggestions window where the suggested latency is 4 for the dataflow system.
Click the Accept button to use the recommended latency for the dataflow system. This value can also be entered directly in the Property Inspector for Latency parameter. Simulink shows the Latency parameter value using tags at the output ports of the dataflow system.
The Analysis Report and Suggestions window now shows the number of threads as 5 meaning that the blocks inside the dataflow system simulate in parallel using 5 threads. To highlight the blocks based on their thread allocation, select Highlight threads. Thread Highlighting Legend displays the colors of the allocated threads. Select Show pipeline delays to show where pipelining delays were inserted within the dataflow system using tags. Note that the number of threads that can be used in the dataflow domain depends on the machine configuration as well as the filter specifications defined in ifir_init.m
.
Dataflow Simulation Performance
Simulate the model and measure model execution time. Execution time is measured using the sim command. The amount of speedup is obtained by dividing the execution time taken by the model using multiple threads with the execution time taken by the original model.
Performance improvement with dataflow domain is based on execution time, which is measured using the sim command. Note that simulation time improvement may vary on different hardware. On a Windows desktop computer with Intel® Xeon® CPU W-2133 v3 @ 3.6 GHz 6 Cores 12 Threads processor, this model using dataflow domain executes 3.3x times faster compared to the original model.
To programmatically set parameters and calculate the actual speedup using dataflow domain on your own hardware, run CalculateSpeedup.mlx
.
Multicore Code Generation of Dataflow Domain
Code generation requires a Simulink Coder™ or an Embedded Coder® license. To enable multicore code generation for the model, select the Allow tasks to execute concurrently on target parameter in the Solver pane under Solver details. When you select this parameter:
Each rate in the model executes as an independent concurrent task on the target processor
The dataflow system generates additional concurrent tasks by automatically partitioning the blocks
In the generated code you can observe the generated functions for each concurrent task created by the dataflow domain and realized as an OpenMP section. The model generates two thread functions, MulticoreInterpolate_ThreadFcn0
and MulticoreInterpolate_ThreadFcn1
.
The generated code for the main program does not use timers or time steps, instead the step function is in a while loop. The code executes a finite number of steps based on the Number of steps parameter specified in the Stop Simulation block. To run the generated code indefinitely remove the Stop Simulation block from the model. MAT-File logging is not supported in code generation of a root model with dataflow execution domain. The MATLAB System block, which runs ToBinFile.m
, is provided to verify the output data in code generation. To read the output data, run the following command in the MATLAB Command Window: fread(fid,'double');