HDL Filter Block Properties
AdderTreePipeline
This property applies to frame-based filters. It specifies how
many pipeline registers the architecture includes between levels of
the adder tree. These pipeline stages increase filter throughput while
adding latency. The default value is 0
. To improve
the speed of this architecture, the recommended setting is 2
.
Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the coder automatically adds balancing delays on parallel data paths.
For more information on the frame-based filter architecture, see Frame-Based Architecture.
AddPipelineRegisters
This property applies to scalar input filters. When you enable
this property, the default linear adder of the filter is implemented
as a pipelined tree adder instead. This architecture increases filter
throughput while adding latency. The default value is off
.
The following limitations apply to AddPipelineRegisters
:
If you use
AddPipelineRegisters
, the code generator forces full precision in the HDL and the generated filter model. This option implements a pipelined adder tree structure in the HDL code for which only full precision is supported. If you generate a validation model, you must use full precision in the original model to avoid validation mismatches.Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the coder automatically adds balancing delays on parallel data paths.
Note
When you use this property with the CIC Interpolation (DSP System Toolbox) block, delays in parallel paths are not automatically balanced. Manually add delays where required by your design.
For filter architecture diagrams that indicate where the pipeline stages are added, see HDL Filter Architectures.
ChannelSharing
You can use the ChannelSharing
implementation
parameter with a multichannel filter to enable sharing a single filter
implementation among channels for a more area-efficient design. This
parameter is either 'on'
or 'off'
.
The default is 'off'
, and a separate filter will
be implemented for each channel.
See Multichannel FIR Filter for FPGA (DSP System Toolbox).
CoeffMultipliers
The CoeffMultipliers
implementation parameter
lets you specify use of canonical signed digit (CSD) or factored CSD
optimizations for processing coefficient multiplier operations in
code generated for certain filter blocks. Specify the CoeffMultipliers
parameter
using one of the following options:
'csd'
: Use CSD techniques to replace multiplier operations with shift-and-add operations. CSD techniques minimize the number of addition operations required for constant multiplication by representing binary numbers with a minimum count of nonzero digits. This representation decreases the area used by the filter while maintaining or increasing clock speed.'factored-csd'
: Use factored CSD techniques, which replace multiplier operations with shift-and-add operations on prime factors of the coefficients. This option lets you achieve a greater filter area reduction than CSD, at the cost of decreasing clock speed.'multipliers'
(default): Retain multiplier operations.
HDL Coder™ supports CoeffMultipliers
for
fully-parallel filter implementations. It is not supported for fully-serial
and partly-serial architectures.
DALUTPartition
The size of the LUT grows exponentially with the order of the
filter. For a filter with N
coefficients, the LUT
must have 2^N
values. For higher order filters,
LUT size must be reduced to reasonable levels. To reduce the size,
you can subdivide the LUT into a number of LUTs, called LUT
partitions. Each LUT partition operates on a different
set of taps. The results obtained from the partitions are summed.
For example, for a 160-tap filter, the LUT size is (2^160)*W
bits,
where W
is the word size of the LUT data. Dividing
this into 16 LUT partitions, each taking 10 inputs (taps), the total
LUT size is reduced to 16*(2^10)*W
bits.
Although LUT partitioning reduces LUT size, more adders are required to sum the LUT data.
You can use DALUTPartition
to enables DA
code generation and specify the number and size of LUT partitions.
Specify LUT partitions as a vector of integers [p1
p2...pN]
where:
N
is the number of partitions.Each vector element specifies the size of a partition. The maximum size for an individual partition is 12.
The sum of all vector elements equals the filter length
FL
.FL
is calculated differently depending on the filter type. You can find how FL is calculated for different filter types in the next section.
See Distributed Arithmetic for HDL Filters.
Specifying DALUTPartition for Single-Rate Filters
To determine the LUT partition for one of the supported single-rate
filter types, calculate FL
as shown in the following
table. Then, specify the partition as a vector whose elements sum
to FL
.
Filter Type | Filter Length (FL) Calculation |
---|---|
Direct-form FIR | FL = length(find(Hd.numerator ~= 0)) |
Direct-form asymmetrical FIR, direct-form symmetrical FIR | FL = ceil(length(find(Hd.numerator ~= 0))/2) |
You can also specify generation of DA code for your filter design without LUT partitioning. To do so, specify a vector of one element, whose value is equal to the filter length.
Specifying DALUTPartition for Multirate Filters
For supported multirate filters (FIR Decimation and FIR Interpolation), you can specify the LUT partition as
A vector defining a partition for LUTs for all polyphase subfilters.
A matrix of LUT partitions, where each row vector specifies a LUT partition for a corresponding polyphase subfilter. In this case, the
FL
is uniform for all subfilters. This approach provides fine control for partitioning each subfilter.
The following table shows the FL
calculations
for each type of LUT partition.
LUT Partition | Filter Length (FL) Calculation |
---|---|
Vector: Determine FL as
shown in the Filter Length (FL) Calculation column
to the right. Specify the LUT partition as a vector of integers whose
elements sum to FL . | FL = size(polyphase(Hm), 2) |
Matrix: Determine the subfilter length FL i based
on the polyphase decomposition of the filter, as shown in the Filter Length (FL) Calculation column to the
right. Specify the LUT partition for each subfilter as a row vector
whose elements sum to FL i. | p = polyphase(Hm); FLi = length(find(p(i,:))); p represents the ith subfilter. |
DARadix
The inherently bit-serial nature of DA can limit throughput.
To improve throughput, the basic DA algorithm can be modified to compute
more than one bit sum at a time. The number of simultaneously computed
bit sums is expressed as a power of two called the DA radix.
For example, a DA radix of 2 (2^1
) indicates that
one bit sum is computed at a time. A DA radix of 4 (2^2
)
indicates that two bit sums are computed at a time, and so on.
To compute more than one bit sum at a time, the LUT is replicated. For example, to perform DA on 2 bits at a time (radix 4), the odd bits are fed to one LUT and the even bits are simultaneously fed to an identical LUT. The LUT results corresponding to odd bits are left-shifted before they are added to the LUT results corresponding to even bits. This result is then fed into a scaling accumulator that shifts its feedback value by 2 places.
Processing more than one bit at a time introduces a degree of parallelism into the operation, improving speed at the expense of area.
You can use DARadix
to specify the number
of bits processed simultaneously in DA. The number of bits is expressed
as N
, which must be:
A nonzero positive integer that is a power of two
Such that
mod(W, log2(N)) = 0
, whereW
is the input word size of the filter
The default value for N
is 2, specifying
processing of one bit at a time, or fully serial DA, which is slow
but low in area. The maximum value for N
is 2^W
,
where W
is the input word size of the filter. This
maximum specifies fully parallel DA, which is fast but high in area.
Values of N
between these extrema specify partly
serial DA.
Note
When setting a DARadix
value for symmetrical
and asymmetrical filters, see Considerations for Symmetrical and Asymmetrical Filters.
FoldingFactor
FoldingFactor
specifies the total number
of clock cycles taken for the computation of filter output in an IIR
SOS filter with serial architecture. It is complementary with NumMultipliers. You must select one property
or the other; you cannot use both. If you do not specify either FoldingFactor
or NumMultipliers
,
HDL code for the filter is generated with fully parallel architecture.
MultiplierInputPipeline
You can use this parameter to generate a specified number of pipeline stages at multiplier inputs for FIR filter structures. The default value is 0.
The following limitation applies to MultiplierInputPipeline
:
Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the coder automatically adds balancing delays on parallel data paths.
For diagrams of where these pipeline stages occur in the filter architecture, see HDL Filter Architectures.
MultiplierOutputPipeline
You can use this parameter to generate a specified number of pipeline stages at multiplier outputs for FIR filter structures. The default value is 0.
The following limitation applies to MultiplierOutputPipeline
:
Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the coder automatically adds balancing delays on parallel data paths.
For diagrams of where these pipeline stages occur in the filter architecture, see HDL Filter Architectures.
NumMultipliers
NumMultipliers
specifies the total number
of multipliers used for the filter implementation in an IIR SOS filter
with serial architecture. It is complementary with FoldingFactor property. You must select
one property or the other; you cannot use both. If you do not specify
either FoldingFactor
or NumMultipliers
,
HDL code for the filter is generated with fully parallel architecture.
ReuseAccum
You can use this parameter to enable or disable accumulator reuse in a serial HDL architecture. The default is a fully parallel architecture.
To Generate This Architecture... | Set ReuseAccum to... |
---|---|
Fully parallel | Omit this property |
Fully serial | Not specified, or 'off' |
Partly serial | 'off' |
Cascade-serial with explicitly specified partitioning | 'on' |
Cascade-serial with automatically optimized partitioning | 'on' |
For more information on parallel and serial filter architectures, see HDL Filter Architectures
SerialPartition
Use this parameter to specify partitions for a serial filter architecture. The default is a fully parallel architecture.
To Generate This Architecture... | Set SerialPartition to... |
---|---|
Fully parallel | Omit this property |
Fully serial | N , where N is the length of the filter |
Partly serial |
|
Cascade-serial with explicitly specified partitioning | [p1 p2 p3...pN] : A vector of N integers, where
N is the number of serial partitions. Each element of the vector
specifies the length of the corresponding partition. The sum of the vector elements must be
equal to the length of the filter. The values of the vector elements must be in descending
order, except the last two elements, which can be equal. For example, for a filter length of
8, partitions [5 3] or [4 2 2] are valid, but the
partitions [2 2 2 2] and [3 2 3] raise an error at code
generation time. |
Cascade-serial with automatically optimized partitioning | Omit this property. |
For more information on parallel and serial filter architectures, see HDL Filter Architectures.
This property is also used for Min/Max blocks with cascade-serial architectures. For how to configure Min/Max cascades, see SerialPartition.