Complex PartialSystolic QR Decomposition
Libraries:
FixedPoint Designer HDL Support /
Matrices and Linear Algebra /
Matrix Factorizations
Description
The Complex PartialSystolic QR Decomposition block uses QR decomposition to compute R and C = Q'B, where QR = A, and A and B are complexvalued matrices. The leastsquares solution to Ax = B is x = R\C. R is an upper triangular matrix and Q is an orthogonal matrix. To compute C = Q', set B to be the identity matrix.
When Regularization parameter is nonzero, the
Complex PartialSystolic QR Decomposition block transforms $$\left[\begin{array}{c}\lambda {I}_{n}\\ A\end{array}\right]$$ inplace to $$R=Q\text{'}\left[\begin{array}{c}\lambda {I}_{n}\\ A\end{array}\right]$$ and $$\left[\begin{array}{c}{0}_{n,p}\\ B\end{array}\right]$$ inplace to $$C=Q\text{'}\left[\begin{array}{c}{0}_{n,p}\\ B\end{array}\right]$$ where λ is the regularization parameter, QR is the
economy size QR decomposition of $$\left[\begin{array}{c}\lambda {I}_{n}\\ A\end{array}\right]$$, A is an mbyn
matrix, p is the number of columns in B,
I_{n} =
eye(n)
, and
0_{n,p} =
zeros(n,p)
.
Ports
Input
A(i,:) — Rows of matrix A
vector
Rows of matrix A, specified as a vector. A is an mbyn matrix where m ≥ 2 and n ≥ 2. If B is single or double, A must be the same data type as B. If A is a fixedpoint data type, A must be signed, use binarypoint scaling, and have the same word length as B. Slopebias representation is not supported for fixedpoint data types.
Data Types: single
 double
 fixed point
Complex Number Support: Yes
B(i,:) — Rows of matrix B
vector
Rows of matrix B, specified as a vector. B is an mbyp matrix where m ≥ 2. If A is single or double, B must be the same data type as A. If B is a fixedpoint data type, B must be signed, use binarypoint scaling, and have the same word length as A. Slopebias representation is not supported for fixedpoint data types.
Data Types: single
 double
 fixed point
Complex Number Support: Yes
validIn — Whether inputs are valid
Boolean
scalar
Whether inputs are valid, specified as a Boolean scalar. This control signal
indicates when the data from the A(i,:)
and
B(i,:)
input ports are valid. When this value is 1
(true
) and the value at ready
is 1
(true
), the block captures the values on the
A(i,:)
and B(i,:)
input ports. When this
value is 0 (false
), the block ignores the input samples.
After sending a true
validIn
signal, there may be some delay before
ready
is set to false
. To ensure all data is
processed, you must wait until ready
is set to
false
before sending another true
validIn
signal.
Data Types: Boolean
restart — Whether to clear internal states
Boolean
scalar
Whether to clear internal states, specified as a Boolean scalar. When this value
is 1 (true
), the block stops the current calculation and clears all
internal states. When this value is 0 (false
), and the
validIn
value is 1 (true
), the block begins
a new subframe.
Data Types: Boolean
Output
R — Matrix R
matrix
Economysize QR decomposition matrix R, returned as a matrix. R is an upper triangular matrix. The size of matrix R is nbyn. R has the same data type as A.
Data Types: single
 double
 fixed point
C — Matrix C=Q'B
matrix
Economysize QR decomposition matrix C=Q'B, returned as a matrix or vector. C has the same number of rows as R. C has the same data type as B.
Data Types: single
 double
 fixed point
validOut — Whether output data is valid
Boolean
scalar
Whether the output data is valid, returned as a Boolean scalar. This control
signal indicates when the data at output ports R
and
C
is valid. When this value is 1 (true
), the
block has successfully computed the R and C
matrices. When this value is 0 (false
), the output data is not
valid.
Data Types: Boolean
ready — Whether block is ready
Boolean
scalar
Whether the block is ready, returned as a Boolean scalar. This control signal
indicates when the block is ready for new input data. When this value is 1
(true
), and the validIn
value is 1
(true
), the block accepts input data in the next time step. When
this value is 0 (false
), the block ignores input data in the next
time step.
After sending a true
validIn
signal, there may be some delay before
ready
is set to false
. To ensure all data is
processed, you must wait until ready
is set to
false
before sending another true
validIn
signal.
Data Types: Boolean
Parameters
Number of rows in matrices A and B — Number of rows in input matrices A and B
4
(default)  positive integervalued scalar
The number of rows in input matrices A and B, specified as a positive integervalued scalar.
Programmatic Use
Block Parameter:
m 
Type: character vector 
Values: positive integervalued scalar 
Default:
4 
Number of columns in matrix A — Number of columns in input matrix A
4
(default)  positive integervalued scalar
The number of columns in input matrix A, specified as a positive integervalued scalar.
Programmatic Use
Block Parameter:
n 
Type: character vector 
Values: positive integervalued scalar 
Default:
4 
Number of columns in matrix B — Number of columns in input matrix B
1
(default)  positive integervalued scalar
The number of columns in input matrix B, specified as a positive integervalued scalar.
Programmatic Use
Block Parameter:
p 
Type: character vector 
Values: positive integervalued scalar 
Default:
1 
Regularization parameter — Regularization parameter
0 (default)  nonnegative scalar
Regularization parameter, specified as a nonnegative scalar. Small, positive values of the regularization parameter can improve the conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance of the estimate often results in a smaller mean squared error when compared to leastsquares estimates.
Programmatic Use
Block Parameter:
regularizationParameter 
Type: character vector 
Values: positive integervalued scalar 
Default:
0 
Algorithms
Choosing the Implementation Method
Systolic implementations prioritize speed of computations over space constraints, while burst implementations prioritize space constraints at the expense of speed of the operations. The following table illustrates the tradeoffs between the implementations available for matrix decompositions and solving systems of linear equations.
Implementation  Throughput  Latency  Area 

Systolic  C  O(n)  O(mn^{2}) 
PartialSystolic  C  O(m)  O(n^{2}) 
PartialSystolic with Forgetting Factor  C  O(n)  O(n^{2}) 
Burst  O(n)  O(mn)  O(n) 
Where C is a constant proportional to the word length of the data, m is the number of rows in matrix A, and n is the number of columns in matrix A.
For additional considerations in selecting a block for your application, see Choose a Block for HDLOptimized FixedPoint Matrix Operations.
AMBA AXI Handshake Process
This block uses the AMBA AXI handshake protocol [1]. The valid/ready
handshake process is used to transfer data and control information. This twoway control mechanism allows both the manager and subordinate to control the rate at which information moves between manager and subordinate. A valid
signal indicates when data is available. The ready
signal indicates that the block can accept the data. Transfer of data occurs only when both the valid
and ready
signals are high.
Block Timing
The PartialSystolic QR Decomposition blocks accept and process A and B matrices row by row. After accepting m rows, the block outputs the R and C matrices as vectors. The partialsystolic implementation uses a pipelined structure, so the block can accept new matrix inputs before outputting the result of the current matrix.
For example, assume that the input A and B matrices
are 3by3. Additionally assume that validIn
asserts before
ready
, meaning that the upstream data source is faster than the QR
decomposition.
In the figure,
A1r1
is the first row of the first A matrix,R1
is the first R matrix, and so on.validIn
toready
— From a successful row input to the block being ready to accept the next row.Last row
validIn
tovalidOut
— From the last row input to the block starting to output the solution.
The following table provides details of the timing for the PartialSystolic QR Decomposition blocks.
Block  validIn to ready (cycles)  Last Row validIn to validOut
(cycles) 

Real PartialSystolic QR Decomposition  wl + 7  (wl + 6)*n + 6 
Complex PartialSystolic QR Decomposition  wl + 9  (wl + 7.5)*2*n + 6 
In the table, m represents the number of rows in matrix A, and n is the number of columns in matrix A. wl represents the word length of the input data.
If the data type of A is double, then wl is 53.
If the data type of A is single, then wl is 24.
If the data types of A and B are fixed point, then wl is given by
max(A.WordLength + ~issigned(A), B.WordLength + ~issigned(B))
Hardware Resource Utilization
This block supports HDL code generation using the Simulink^{®} HDL Workflow Advisor. For an example, see HDL Code Generation and FPGA Synthesis from Simulink Model (HDL Coder) and Implement Digital Downconverter for FPGA (DSP HDL Toolbox).
In R2023a: The table below shows a summary of the resource utilization results.
This example data was generated by synthesizing the block on a Xilinx^{®} Zynq^{®}7 ZC706 evaluation board (2 speed grade).
The following parameters were used for synthesis.
Block parameters:
m = 10
n = 10
p = 1
Matrix A dimension: 10by10
Matrix B dimension: 10by1
Input data type:
sfix18_En12
Resource  Usage 

LUT  108464 
LUTRAM  5000 
Flip Flop  68404 
In R2022b: The tables below show the post placeandroute resource utilization results and timing summary, respectively.
This example data was generated by synthesizing the block on a Xilinx Zynq UltraScale™ + RFSoC ZCU111 evaluation board. The synthesis tool was Vivado^{®} v.2020.2 (win64).
The following parameters were used for synthesis.
Block parameters:
m = 16
n = 16
p = 1
Matrix A dimension: 16by16
Matrix B dimension: 16by1
Input data type:
sfix16_En14
Target frequency: 300 MHz
Resource  Usage  Available  Utilization (%) 

CLB LUTs  319908  425280  75.22 
CLB Registers  250839  850560  29.49 
DSPs  0  4272  0.00 
Block RAM Tile  0  1080  0.00 
URAM  0  80  0.00 
Value  

Requirement  3.3333 ns 
Data Path Delay  3.299 ns 
Slack  0.016 ns 
Clock Frequency  301.45 MHz 
References
[1] "AMBA AXI and ACE Protocol Specification Version E." https://developer.arm.com/documentation/ihi0022/e/AMBAAXI3andAXI4ProtocolSpecification/SingleInterfaceRequirements/Basicreadandwritetransactions/Handshakeprocess
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Slopebias representation is not supported for fixedpoint data types.
HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.
HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.
This block has one default HDL architecture.
General  

ConstrainedOutputPipeline  Number of registers to place at
the outputs by moving existing delays within your design. Distributed
pipelining does not redistribute these registers. The default is

InputPipeline  Number of input pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is

OutputPipeline  Number of output pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is

Supports fixedpoint data types only.
Version History
Introduced in R2020bR2023a: Smart unrolling for improved resource utilization
When you update the diagram, the loop which composes the partialsystolic pipeline is unrolled. This updated internal architecture removes dead operations in simulation and generated code, resulting in a significant decrease in the number of hardware resources required. This block simulates with clock and bittrue fidelity with respect to library versions of these blocks in previous releases.
Resource  Usage in R2022b  Usage in R2023a 

LUT  177813  108464 
LUTRAM  6620  5000 
Flip Flop  113857  68404 
This example data was generated by synthesizing the block on a Xilinx Zynq7 ZC706 evaluation board (2 speed grade).
The following parameters were used for synthesis.
Block parameters:
m = 10
n = 10
p = 1
Matrix A dimension: 10by10
Matrix B dimension: 10by1
Input data type:
sfix18_En12
R2021a: Reduced HDL resource utilization
This block now has an improved algorithm to reduce resource utilization on hardwareconstrained target platforms.
MATLAB 命令
您点击的链接对应于以下 MATLAB 命令：
请在 MATLAB 命令行窗口中直接输入以执行命令。Web 浏览器不支持 MATLAB 命令。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)