Complex Burst Matrix Solve Using Q-less QR Decomposition

Compute the value of X in the equation A'AX = B for complex-valued matrices using Q-less QR decomposition

Libraries:
Fixed-Point Designer HDL Support / Matrices and Linear Algebra / Linear System Solvers

Description

The Complex Burst Matrix Solve Using Q-less QR Decomposition block solves the system of linear equations, A'AX = B, using Q-less QR decomposition, where A and B are complex-valued matrices.

When Regularization parameter is nonzero, the Complex Burst Matrix Solve Using Q-less QR Decomposition block solves the matrix equation

${[\begin{matrix} λ I_{n} \\ A \end{matrix}]}^{'} \cdot [\begin{matrix} λ I_{n} \\ A \end{matrix}] X = (λ^{2} I_{n} + A' A) X = B$

where λ is the regularization parameter, A is an m-by-n matrix, and I_n = eye(n).

Examples

Implement Hardware-Efficient Complex Burst Matrix Solve Using Q-less QR Decomposition

How to use the Complex Burst Matrix Solve Using Q-less QR Decomposition block.

Open Script

Implement Hardware-Efficient Complex Burst Matrix Solve Using Q-less QR Decomposition with Tikhonov Regularization

Use the Complex Burst Matrix Solve Using QR Decomposition block to solve the regularized least-squares matrix equation

Open Script

Algorithms to Determine Fixed-Point Types for Complex Q-less QR Matrix Solve A'AX=B

Derivation of algorithms for determining fixed-point types for complex Q-less QR matrix solve.

Open Live Script

Determine Fixed-Point Types for Complex Q-less QR Matrix Solve A'AX=B

Use fixed.complexQlessQRFixedpointTypes to determine fixed-point types for computation of the complex least-squares matrix equation.

Open Live Script

Determine Fixed-Point Types for Complex Q-less QR Matrix Solve with Tikhonov Regularization

Use the fixed.complexQlessQRMatrixSolveFixedpointTypes function to analytically determine fixed-point types for the solution of the complex least-squares matrix equation

Open Live Script

Ports

Input

expand all

A(i,:) — Rows of matrix A
vector

Rows of matrix A, specified as a vector. A is an m-by-n matrix where m ≥ 2 and m ≥ n. If B is single or double, A must be the same data type as B. If A is a fixed-point data type, A must be signed, use binary-point scaling, and have the same word length as B. Slope-bias representation is not supported for fixed-point data types.

Data Types: single | double | fixed point
Complex Number Support: Yes

B(i,:) — Rows of matrix B
vector

Rows of matrix B, specified as a vector. B is an n-by-p matrix where n ≥ 2. If A is single or double, B must be the same data type as A. If B is a fixed-point data type, B must be signed, use binary-point scaling, and have the same word length as A. Slope-bias representation is not supported for fixed-point data types.

Data Types: single | double | fixed point

validIn — Whether inputs are valid
`Boolean` scalar

Whether inputs are valid, specified as a Boolean scalar. This control signal indicates when the data from the A(i,:) and B(i,:) input ports are valid. When this value is 1 (true) and the ready value is 1 (true), the block captures the values at the A(i,:) and B(i,:) input ports. When this value is 0 (false), the block ignores the input samples.

After sending a true validIn signal, there may be some delay before ready is set to false. To ensure all data is processed, you must wait until ready is set to false before sending another true validIn signal.

Data Types: Boolean

restart — Whether to clear internal states
`Boolean` scalar

Whether to clear internal states, specified as a Boolean scalar. When this value is 1 (true), the block stops the current calculation and clears all internal states. When this value is 0 (false) and the validIn value is 1 (true), the block begins a new subframe.

Data Types: Boolean

Output

expand all

X(i,:) — Rows of matrix X
scalar | vector

Rows of the matrix X, returned as a scalar or vector.

Data Types: single | double | fixed point

validOut — Whether output data is valid
`Boolean` scalar

Whether the output data is valid, returned as a Boolean scalar. This control signal indicates when the data at the output port X(i,:) is valid. When this value is 1 (true), the block has successfully computed a row of X. When this value is 0 (false), the output data is not valid.

Use fixed.getQlessQRMatrixSolveModel(A,B) to generate a template model containing a Complex Burst Matrix Solve Using Q-less QR Decomposition block for complex-valued input matrices A and B.

Algorithms

expand all

Choosing the Implementation Method

Systolic implementations prioritize speed of computations over space constraints, while burst implementations prioritize space constraints at the expense of speed of the operations. The following table illustrates the tradeoffs between the implementations available for matrix decompositions and solving systems of linear equations.

Implementation	Throughput	Latency	Area
Systolic	High	O(nlog2(m))	O(mn²)
Partial-Systolic	Medium	O(mn)	O(n²)
Burst	Low	O(mn)	O(n)

Where m is the number of rows in matrix A and n is the number of columns in matrix A. Regardless of architecture, a larger word length results in lower throughput, larger latency, and larger area.

For additional considerations in selecting a block for your application, see Choose a Block for HDL-Optimized Fixed-Point Matrix Operations.

Synchronous vs Asynchronous Implementation

The Matrix Solve Using QR Decomposition blocks operate synchronously. These blocks first decompose the input A and B matrices into R and C matrices using a QR decomposition block. Then, a back substitute block computes RX = C. The input A and B matrices propagate through the system in parallel, in a synchronized way.

Example signal path for synchronous matrix solve blocks.

The Matrix Solve Using Q-less QR Decomposition blocks operate asynchronously. First, Q-less QR decomposition is performed on the input A matrix and the resulting R matrix is put into a buffer. Then, a forward backward substitution block uses the input B matrix and the buffered R matrix to compute R'RX = B. Because the R and B matrices are stored separately in buffers, the upstream Q-less QR decomposition block and the downstream Forward Backward Substitute block can run independently. The Forward Backward Substitute block starts processing when the first R and B matrices are available. Then it runs continuously using the latest buffered R and B matrices, regardless of the status of the Q-less QR Decomposition block. For example, if the upstream block stops providing A and B matrices, the Forward Backward Substitute block continues to generate the same output using the last pair of R and B matrices.

Example signal path for asynchronous matrix solve blocks.

The Burst (Asynchronous) Matrix Solve Using Q-less QR Decomposition blocks are available in both synchronous and asynchronous operation variants, as denoted by the block name.

AMBA AXI Handshake Process

This block uses the AMBA AXI handshake protocol [1]. The valid/ready handshake process is used to transfer data and control information. This two-way control mechanism allows both the manager and subordinate to control the rate at which information moves between manager and subordinate. A valid signal indicates when data is available. The ready signal indicates that the block can accept the data. Transfer of data occurs only when both the valid and ready signals are high.

Block Timing

The Burst Matrix Solve Using Q-less QR Decomposition blocks accept and process A and B matrices row by row synchronously. After accepting m rows, the block outputs the X matrix row by row continuously. The matrix is output from the first row to the last row.

For example, assume that the input A and B matrices are 3-by-3. Additionally assume that validIn asserts before ready, meaning that the upstream data source is faster than the QR decomposition.

Timing diagram for the Burst Q-less QR Decomposition blocks.

In the figure,

A1r1 is the first row of the first A matrix, X1r3 is the third row of the first X matrix, and so on.
validIn to ready — From a successful row input to the block being ready to accept the next row within one matrix.
Last row validIn to validOut — From the last row input to the block starting to output the solution.
Last row validIn to new matrix ready — From the block starting to output the solution to the block ready to accept the next matrix input.

The following table provides details of the timing for the Complex Burst Matrix Solve Using Q-less QR Decomposition block. Latency depends on the size of matrix A and the data types of the A and B matrices. In the table:

In the table, m represents the number of rows in matrix A, and n is the number of columns in matrix A. wl represents the word length of the input data.

m is the number of rows in matrix A.
n is the number of columns in matrix A.
wl represents the word length of the input data in matrix A.

Input Data Type	`validIn` to `ready` (cycles)	Last Row `validIn` to `validOut` (cycles)	Last row `validIn` to new matrix ready (cycles)
Fixed point `fi`	(wl2 + 11)n + 2	7n² + 33n + 6 + 4nwl + 2nnextpow2(wl)	7n² + 33n + 6 + 4nwl + 2nnextpow2(wl) + n
Scaled double `fi`	(wl2 + 11)n + 2	7n² + (4wl + 31)*n + 6	7n² + (4wl + 31)*n + 6 + n
`double`	117*n + 2	7n² + 135n + 6	7n² + 135n + 6 + n
`single`	59*n + 2	7n² + 77n + 6	7n² + 77n + 6 + n

Hardware Resource Utilization

This block supports HDL code generation using the Simulink^® HDL Workflow Advisor. For an example, see HDL Code Generation and FPGA Synthesis from Simulink Model (HDL Coder) and Implement Digital Downconverter for FPGA (DSP HDL Toolbox).

This example data was generated by synthesizing the block on a Xilinx^® Zynq^® UltraScale™ + RFSoC ZCU111 evaluation board. The synthesis tool was Vivado^® v.2020.2 (win64).

The following parameters were used for synthesis.

Block parameters:
- m = 16
- n = 16
- p = 1
- Matrix A dimension: 16-by-16
- Matrix B dimension: 16-by-1
Input data type: sfix16_En14
Target frequency: 250 MHz

The following tables show the post place-and-route resource utilization results and timing summary, respectively.

Resource	Usage	Available	Utilization (%)
CLB LUTs	30915	425280	7.27
CLB Registers	34833	850560	4.10
DSPs	12	4272	0.28
Block RAM Tile	0	1080	0.00
URAM	0	80	0.00

	Value
Requirement	4 ns
Data Path Delay	3.686 ns
Slack	0.296 ns
Clock Frequency	269.98 MHz

References

[1] "AMBA AXI and ACE Protocol Specification Version E." https://developer.arm.com/documentation/ihi0022/e/

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Slope-bias representation is not supported for fixed-point data types.

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.

HDL Architecture

This block has one default HDL architecture.

HDL Block Properties

General
ConstrainedOutputPipeline	Number of registers to place at the outputs by moving existing delays within your design. Distributed pipelining does not redistribute these registers. The default is `0`. For more details, see ConstrainedOutputPipeline (HDL Coder).
InputPipeline	Number of input pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see InputPipeline (HDL Coder).
OutputPipeline	Number of output pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see OutputPipeline (HDL Coder).

Restrictions

Supports fixed-point data types only.

Version History

Introduced in R2020a

expand all

R2022a: Support for Tikhonov regularization parameter

The Complex Burst Matrix Solve Using Q-less QR Decomposition block now supports the Tikhonov Regularization parameter.

R2021a: Reduced HDL resource utilization

This block now has an improved algorithm to reduce resource utilization on hardware-constrained target platforms.

Complex Burst Matrix Solve Using Q-less QR Decomposition

Description

Examples

Implement Hardware-Efficient Complex Burst Matrix Solve Using Q-less QR Decomposition

Implement Hardware-Efficient Complex Burst Matrix Solve Using Q-less QR Decomposition with Tikhonov Regularization

Algorithms to Determine Fixed-Point Types for Complex Q-less QR Matrix Solve A'AX=B

Determine Fixed-Point Types for Complex Q-less QR Matrix Solve A'AX=B

Determine Fixed-Point Types for Complex Q-less QR Matrix Solve with Tikhonov Regularization

Ports

Input

A(i,:) — Rows of matrix A vector

B(i,:) — Rows of matrix B vector

validIn — Whether inputs are valid Boolean scalar

restart — Whether to clear internal states Boolean scalar

Output

X(i,:) — Rows of matrix X scalar | vector

validOut — Whether output data is valid Boolean scalar

ready — Whether block is ready Boolean scalar

Parameters

Number of rows in matrix A — Number of rows in matrix A 4 (default) | positive integer-valued scalar

Programmatic Use

Number of columns in matrix A and rows in matrix B — Number of columns in matrix A and rows in matrix B 4 (default) | positive integer-valued scalar

Programmatic Use

Number of columns in matrix B — Number of columns in matrix B 1 (default) | positive integer-valued scalar

Programmatic Use

Regularization parameter — Regularization parameter 0 (default) | real nonnegative scalar

Programmatic Use

Output datatype — Data type of output matrix X fixdt(1,18,14) (default) | double | single | fixdt(1,16,0) | <data type expression>

Programmatic Use

Tips

Algorithms

Choosing the Implementation Method

Synchronous vs Asynchronous Implementation

AMBA AXI Handshake Process

Block Timing

Hardware Resource Utilization

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using Simulink® Coder™.

HDL Code Generation Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

Version History

R2022a: Support for Tikhonov regularization parameter

R2021a: Reduced HDL resource utilization

See Also

Blocks

Functions

Topics

A(i,:) — Rows of matrix A
vector

B(i,:) — Rows of matrix B
vector

validIn — Whether inputs are valid
`Boolean` scalar

restart — Whether to clear internal states
`Boolean` scalar

X(i,:) — Rows of matrix X
scalar | vector

validOut — Whether output data is valid
`Boolean` scalar

ready — Whether block is ready
`Boolean` scalar

Number of rows in matrix A — Number of rows in matrix A
`4` (default) | positive integer-valued scalar

Number of columns in matrix A and rows in matrix B — Number of columns in matrix A and rows in matrix B
`4` (default) | positive integer-valued scalar

Number of columns in matrix B — Number of columns in matrix B
`1` (default) | positive integer-valued scalar

Regularization parameter — Regularization parameter
0 (default) | real nonnegative scalar

Output datatype — Data type of output matrix X
`fixdt(1,18,14)` (default) | `double` | `single` | `fixdt(1,16,0)` | `<data type expression>`

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.