Latency Considerations with Native Floating Point

Open Script

HDL Coder™ native floating-point technology can generate HDL code from your floating-point design. Native floating-point operators have a latency. When you generate HDL code, the code generator figures out this latency and adds matching delays to balance parallel paths.

View Latency of a Floating-Point Operator

Open the hdlcoder_nfp_delay_allocation Simulink® model. The model uses single data types and computes the square root. The model has a parallel path to illustrate how the code generator balances delays.

load_system('hdlcoder_nfp_delay_allocation')
open_system('hdlcoder_nfp_delay_allocation/DUT')

To generate HDL code:

Right-click the subsystem. To add the HDL Coder app options to the context menu, point to Select Apps and click HDL Coder. Then, in the HDL Coder app section, select Generate HDL for Subsystem.
To see the generated model after HDL code generation, at the command line, enter gm_hdlcoder_nfp_delay_allocation.

The NFP sqrt block is the floating-point operator corresponding to the Sqrt block in your model, and has a latency of 28. The code generator determines this latency and adds a matching delay of length 28 in the parallel path. To see the latency of the square root operation, double-click the NFP Sqrt block. The Delay length of the Sqrt_pd1 block corresponds to the operator latency.

You can customize the latency of your design. Use custom latency settings to design for trade-offs between latency and throughput. You can then optimize your design implementation on the target FPGA device for area and speed. Customize the latency by using:

Latency Strategy setting: Specify whether to map your entire Simulink model or individual blocks in your model to maximum, minimum, or zero latency of the floating-point operator.
Custom Latency: You can specify a custom latency for certain blocks that you use in your Simulink model. The custom latency setting can take values from zero to the maximum latency of the floating-point operator.
Oversampling factor: Increasing the Oversampling factor operates the design at a faster clock rate and absorbs the clock-rate pipelines with the latency of the floating-point operator.
Delay blocks in the model: If your Simulink model has a latency, HDL Coder can absorb some or all of the latency with the native floating-point implementation.

Latency Strategy Setting for Model

You can specify the latency strategy setting for an entire model or for individual blocks in your model.

To specify this setting for a model:

In the hdlcoder_nfp_delay_allocation model, right-click the DUT Subsystem. In the HDL Coder app section, select HDL Block Properties.
On the HDL Code Generation > Floating Point, select Use Floating Point.
For Latency Strategy, select MAX, MIN, or ZERO.

To specify this setting from the command line:

1. Create a hdlcoder.FloatingPointTargetConfig object for native floating point by using the hdlcoder.createFloatingPointTargetConfig function.

nfpconfig = hdlcoder.createFloatingPointTargetConfig("NativeFloatingPoint");
hdlset_param('hdlcoder_nfp_delay_allocation', 'FloatingPointTargetConfiguration', nfpconfig);

2. Specify the latency strategy by using the LatencyStrategy property of the nfpconfig object.

nfpconfig.LibrarySettings.LatencyStrategy = 'MAX'

nfpconfig = 

  FloatingPointTargetConfig with properties:

                  Library: 'NATIVEFLOATINGPOINT'
          LibrarySettings: [1×1 fpconfig.NFPLatencyDrivenMode]
                 IPConfig: [1×1 hdlcoder.FloatingPointTargetConfig.IPConfig]
            VendorLibrary: []
    VendorLibrarySettings: []
           VendorIPConfig: []

To see the latency information, generate HDL code and then open the generated model. To open the generated model, enter the command gm_hdlcoder_nfp_delay_allocation.

Custom Latency Strategy for Blocks

For blocks in your Simulink model, you can selectively customize the latency strategy. By default, the blocks inherit the latency strategy setting you specify for the model. For certain blocks, you can specify a custom latency value that is between zero and the maximum latency of the floating-point operator.

By specifying a custom latency, you can customize your design for trade-offs between:

Clock frequency and power consumption: A higher latency value increases the maximum clock frequency (Fmax) that you can achieve, which increases the dynamic power consumption.
Oversampling factor and sampling frequency: A combination of higher latency value and higher oversampling factor increases the Fmax that you can achieve but reduces the sampling frequency.

To learn more about this setting and how to specify the latency strategy for a block, see LatencyStrategy.

For example, if you have an Add block in the parallel path in your model, you can specify a custom latency value of 2 for the Add block by entering these commands.

load_system('hdlcoder_nfp_delay_allocation_custom')
open_system('hdlcoder_nfp_delay_allocation_custom')
hdlset_param('hdlcoder_nfp_delay_allocation_custom/DUT/Add','LatencyStrategy','Custom')
hdlset_param('hdlcoder_nfp_delay_allocation_custom/DUT/Add','NFPCustomLatency',2)

To see the latency information, generate HDL code and then open the generated model. To open the generated model, enter the command gm_hdlcoder_nfp_delay_allocation_custom. In the generated model, you see that the NFP Add block has a latency of 2.

Custom Latency Settings for Native Floating-Point IPs

For a model that has a large number of floating-point operators, you can customize the latency for native floating-point IPs by setting the global custom latency for NFP operators. The customization applies to all operators in the model.

For example, if you have a model that has multiple add and product blocks, by default, the block inherits the latency strategy settings specified for the model. By using these commands, you can customize the latency of all the NFP add blocks to 4 and all the NFP mul blocks to 3.

load_system('hdlcoder_nfp_delay_allocation_global_custom')
open_system('hdlcoder_nfp_delay_allocation_global_custom/Sample_DUT');
hdlset_param('hdlcoder_nfp_delay_allocation_global_custom', 'FloatingPointTargetConfiguration', ...
hdlcoder.createFloatingPointTargetConfig('NativeFloatingPoint', 'IPConfig', ...
{{ 'ADDSUB',  'SINGLE', 'CustomLatency', 4} ...
, { 'ADDSUB',  'DOUBLE', 'CustomLatency', 4} ...
, { 'MUL',  'SINGLE', 'CustomLatency', 3} ...
, { 'MUL',  'DOUBLE', 'CustomLatency', 3}}))

To see the latency information, generate HDL code and then open the generated model. To open the generated model, enter the command gm_hdlcoder_nfp_delay_allocation_global_custom. In the generated model, you see that the all the NFP Add block has a latency of 4 and all the NFP mul blocks has latency of 3.

For the list of keywords to use for native floating-point IPs in the API in the command, refer to the table in Latency Values of Floating-Point Operators.

Oversampling Factor

When you design the blocks in your Simulink model at the data rate, specify an Oversampling factor greater than one. The Oversampling factor inserts pipeline registers at a faster clock rate, which improves clock frequency and reduces area usage. To learn more about clock-rate pipelining, see Clock-Rate Pipelining.

To see the effect of Oversampling factor on the model, in the hdlcoder_nfp_delay_allocation model:

Add a Delay block with a Delay length of 1 at the output of the Sqrt block.
Right-click the DUT and select HDL Code > HDL Coder Properties.
On the HDL Code Generation > Global Settings pane, enter a value of 1 for Oversampling factor.

After HDL code generation, the generated model shows the NFP Sqrt block operating at a clock rate that is 40 times faster than the Sqrt block in your model. The NFP Sqrt block absorbed the Delay block in your Simulink model. The Delay block now operates at the clock rate. This implementation saves area by absorbing the additional latency, and improves timing by operating at the faster clock rate.

Delay Absorption in the Model

If your Simulink model has a Delay block with sufficient Delay length adjacent to an operator or separated from the operator by only a component that does not take zero input and output a non-zero value, such as a NOT Logical Operator block, HDL Coder absorbs the delays as part of the operator latency.

If the Delay length is equal to the latency of the floating-point operator, HDL Coder absorbs the delays and does not introduce any additional latency.

In the hdlcoder_nfp_delay_allocation model:

Double-click the Delay block at the output of the Sqrt block and change the Delay length to 28.
Generate HDL code for the DUT Subsystem.
After HDL code generation, at the command line, enter gm_hdlcoder_nfp_delay_allocation to open the generated model.

In the generated model, you see that the NFP Sqrt block absorbs the Delay block adjacent to the Sqrt block in your original model. This delay absorption occurs because the operator latency is equal to the Delay length. The code generator therefore avoids the additional latency in your model.

If the Delay length is less than the operator latency, HDL Coder absorbs the available delays and balances parallel paths by adding matching delays.

In the hdlcoder_nfp_delay_allocation model:

Double-click the Delay block at the output of the Sqrt block and change the Delay length to 21.
Generate HDL code for the DUT Subsystem.
After HDL code generation, at the command line, enter gm_hdlcoder_nfp_delay_allocation to open the generated model.

You see that the NFP Sqrt block absorbed a Delay of length 21 and added a matching delay of length 7 in the parallel path because the square root operation requires 28 delays.

If the delay length is greater than the operator latency, the code generator absorbs a certain number of delays equal to the latency and the excess delays appear outside the operator.

In the hdlcoder_nfp_delay_allocation model:

Double-click the Delay block at the output of the Sqrt block and change the Delay length to 34.
Generate HDL code for the DUT Subsystem.
After HDL code generation, at the command-line, enter gm_hdlcoder_nfp_delay_allocation to open the generated model.

The NFP Sqrt block absorbed 28 delays because the square root operation has a latency of 28. The excess latency of 6 is outside the operator.

Latency Considerations with Native Floating Point

View Latency of a Floating-Point Operator

Latency Strategy Setting for Model

Custom Latency Strategy for Blocks

Custom Latency Settings for Native Floating-Point IPs

Oversampling Factor

Delay Absorption in the Model

See Also

Modeling Guidelines

Functions

Topics