Workflow for Generating a Multithreaded MEX File using
dspunfold
Run the entry-point MATLAB® function with the inputs that you want to test. Make sure that the function has no runtime errors. Call
codegen
on the function and make sure that it generates a MEX file successfully.Generate the multithreaded MEX file using
dspunfold
. Specify a state length using the-s
option. The state length must be at least the same length as the algorithm in the MATLAB function. By default,-s
is set to0
, indicating that the algorithm is stateless.Run the generated analyzer function. Use the
pass
flag to verify that the output results of the multithreaded MEX file and the single-threaded MEX file match. Also, check if the speedup and latency displayed by the analyzer function are satisfactory.If the output results do not match, increase the state length and generate the multithreaded MEX file again. Alternatively, use the automatic state length detection (specified using
-s auto
) to determine the minimum state length that matches the outputs.If the output results match but the speedup and latency are not satisfactory, increase the repetition factor using
-r
or increase the number of threads using-t
. In addition, you can adjust the state length. Adjust thedspunfold
options and generate new multithreaded MEX files until you are satisfied with the results..
For best practices for generating the multithreaded MEX file using
dspunfold
, see the 'Tips' section of dspunfold
.
Workflow Example
Run the Entry Point MATLAB Function
Create the entry-point MATLAB function.
function [y,mse] = AdaptiveFilter(x,noise) persistent rlsf1 ffilt noise_var if isempty (rlsf1) rlsf1 = dsp.RLSFilter(32, 'ForgettingFactor', 0.98); ffilt = dsp.FIRFilter('Numerator',fir1(32, .25)); % Unknown System noise_var = 1e-4; end d = ffilt(x) + noise_var * noise; % desired signal [y,e] = rlsf1(x, d); mse = 10*log10(sum(e.^2)); end
The function models an RLS filter that filters the input signal
x
, using d
as the desired signal. The
function returns the filtered output in y
and the filter error in
e
.
Run AdaptiveFilter
with the inputs that you want to test.
Verify that the function runs without errors.
AdaptiveFilter(randn(1000,1), randn(1000,1));
Call codegen
on AdaptiveFilter
and
generate a MEX
file.
codegen AdaptiveFilter -args {randn(1000,1), randn(1000,1)}
Generate a Multithreaded MEX File Using
dspunfold
Set the state length to 32
samples and the repetition factor to
1
. Provide a state length that is greater than or equal to
the algorithm in the MATLAB function. When at least one entry of frameinputs
is
set to true
, state length is considered in samples.
dspunfold AdaptiveFilter -args {randn(1000,1), randn(1000,1)} -s 32 -f true
Analyzing input MATLAB function AdaptiveFilter Creating single-threaded MEX file AdaptiveFilter_st.mexw64 Creating multi-threaded MEX file AdaptiveFilter_mt.mexw64 Creating analyzer file AdaptiveFilter_analyzer
Run the Generated Analyzer Function
The analyzer considers the actual values of the input. To increase the analyzer effectiveness, provide at least two different frames along the first dimension of the inputs.
AdaptiveFilter_analyzer(randn(1000*4,1),randn(1000*4,1))
Analyzing multi-threaded MEX file AdaptiveFilter_mt.mexw64 ... Latency = 8 frames Speedup = 3.5x Warning: The output results of the multi-threaded MEX file AdaptiveFilter_mt.mexw64 do not match the output results of the single-threaded MEX file AdaptiveFilter_st.mexw64. Check that you provided the correct state length value to the dspunfold function when you generated the multi-threaded MEX file AdaptiveFilter_mt.mexw64. For best practices and possible solutions to this problem, see the 'Tips' section in the dspunfold function reference page. > In coder.internal.warning (line 8) In AdaptiveFilter_analyzer ans = Latency: 8 Speedup: 3.4686 Pass: 0
Increase the State Length
The analyzer did not pass the verification. The warning message displayed
indicates that a wrong state length value is provided to the
dspunfold
function. Increase the state length to
1000
samples and repeat the process from the previous
section.
dspunfold AdaptiveFilter -args {randn(1000,1),randn(1000,1)} -s 1000 -f true
Analyzing input MATLAB function AdaptiveFilter Creating single-threaded MEX file AdaptiveFilter_st.mexw64 Creating multi-threaded MEX file AdaptiveFilter_mt.mexw64 Creating analyzer file AdaptiveFilter_analyzer
Run the generated analyzer.
AdaptiveFilter_analyzer(randn(1000*4,1),randn(1000*4,1))
Analyzing multi-threaded MEX file AdaptiveFilter_mt.mexw64 ... Latency = 8 frames Speedup = 1.8x ans = Latency: 8 Speedup: 1.7778 Pass: 1
The analyzer passed verification. It is recommended that you provide different numerics to the analyzer function and make sure that the analyzer function passes.
Improve Speedup and Adjust Latency
If you want to increase speedup and your system can afford a larger latency,
increase the repetition factor to 2
.
dspunfold AdaptiveFilter -args {randn(1000,1),randn(1000,1)} -s 1000 -r 2 -f true
Analyzing input MATLAB function AdaptiveFilter Creating single-threaded MEX file AdaptiveFilter_st.mexw64 Creating multi-threaded MEX file AdaptiveFilter_mt.mexw64 Creating analyzer file AdaptiveFilter_analyzer
Run the analyzer.
AdaptiveFilter_analyzer(randn(1000*4,1), randn(1000*4,1))
Analyzing multi-threaded MEX file AdaptiveFilter_mt.mexw64 ... Latency = 16 frames Speedup = 2.4x ans = Latency: 16 Speedup: 2.3674 Pass: 1
Repeat the process until you achieve satisfactory speedup and latency.
Use Automatic State Length Detection
Choose a state length that is greater than or equal to the state length of your
algorithm. If it is not easy to determine the state length for your algorithm
analytically, use the automatic state length detection tool. Invoke automatic state
length detection by setting -s
to auto
. The
tool detects the minimum state length with which the analyzer passes the
verification.
dspunfold AdaptiveFilter -args {randn(1000,1),randn(1000,1)} -s auto -f true
Analyzing input MATLAB function AdaptiveFilter Creating single-threaded MEX file AdaptiveFilter_st.mexw64 Searching for minimal state length (this might take a while) Checking stateless ... Insufficient Checking 1000 ... Sufficient Checking 500 ... Insufficient Checking 750 ... Insufficient Checking 875 ... Sufficient Checking 812 ... Insufficient Checking 843 ... Sufficient Checking 827 ... Insufficient Checking 835 ... Insufficient Checking 839 ... Sufficient Checking 837 ... Sufficient Checking 836 ... Sufficient Minimal state length is 836 Creating multi-threaded MEX file AdaptiveFilter_mt.mexw64 Creating analyzer file AdaptiveFilter_analyzer
Minimal state length is 836
samples.
Run the generated analyzer.
AdaptiveFilter_analyzer(randn(1000*4,1), randn(1000*4,1))
Analyzing multi-threaded MEX file AdaptiveFilter_mt.mexw64 ... Latency = 8 frames Speedup = 1.9x ans = Latency: 8 Speedup: 1.9137 Pass: 1
The analyzer passed the verification.