## Data Types and Scaling in Digital Hardware

### Fixed-Point Data Types

In digital hardware, numbers are stored in binary words. A binary word is a fixed-length sequence of bits (1's and 0's). How hardware components or software functions interpret this sequence of 1's and 0's is defined by the data type. Binary numbers are represented as either fixed-point or floating-point data types.

A fixed-point data type is characterized by the word length in bits, the position of the binary point, and whether it is signed or unsigned. The position of the binary point is the means by which fixed-point values are scaled and interpreted.

For example, a binary representation of a generalized fixed-point number (either signed or unsigned) is shown below:

where

*b*_{i}is the*i*^{th}binary digit.*wl*is the word length in bits.*b*_{wl-1}is the location of the most significant, or highest, bit (MSB).*b*_{0}is the location of the least significant, or lowest, bit (LSB).The binary point is shown four places to the left of the LSB. In this example, the number is said to have four fractional bits, or a fraction length of four.

Fixed-point data types can be either signed or unsigned. Whether a fixed-point value is signed or unsigned is usually not encoded explicitly within the binary word; that is, there is no sign bit. Instead, the sign information is implicitly defined within the computer architecture.

Signed binary fixed-point numbers are typically represented in computer hardware in one of three ways:

Sign/magnitude – One bit of a binary word is always the dedicated sign bit, while the remaining bits of the word encode the magnitude of the number. Negation using sign/magnitude representation consists of flipping the sign bit from 0 (positive) to 1 (negative), or from 1 to 0.

One's complement – Negating a binary number in one's complement requires a bitwise complement. That is, all 0's are flipped to 1's and all 1's are flipped to 0's. In one's complement notation there are two ways to represent zero. A binary word of all 0's represents "positive" zero, while a binary word of all 1's represents "negative" zero.

Two's complement – Negation using signed two's complement representation consists of a bit inversion (translation into one's complement) followed by the binary addition of a one. For example, the two's complement of 000101 is 111011.

Two's complement is the most common representation of signed fixed-point numbers and is the only representation used by Fixed-Point Designer™ documentation.

### Binary Point Interpretation

The binary point is the means by which fixed-point numbers are scaled. It is usually the
software that determines the binary point. When performing basic math functions such as
addition or subtraction, the hardware uses the same logic circuits regardless of the value
of the scale factor. In essence, the logic circuits have no knowledge of a scale factor.
They are performing signed or unsigned fixed-point binary algebra as if the binary point is
to the right of *b*_{0}.

Fixed-Point Designer supports general binary point scaling
*V = **Q* ✕ 2^{E}, where *V* is the real-world value, *Q*
is the stored integer value, and the fixed exponent *E* is equal to the
negative of the fraction length. In other words, *RealWorldValue* = *StoredInteger* ✕ 2^{−FractionLength}.

The fraction length defines the scaling of the stored integer value. The word length
limits the values that the stored integer can take, but it does not limit the values that
the fraction length can take. The software does not restrict the value of the exponent
*E* based on the word length of the stored integer `Q`

.
Because *E* is equal to *−FractionLength*, restricting the
binary point to being contiguous with the fraction is unnecessary; the fraction length can
be negative or greater than the word length.

For example, a word consisting of three unsigned bits is usually represented in scientific notation in one of the following ways:

$$\begin{array}{l}bbb.=bbb.\times {2}^{0}\\ bb.b=bbb.\times {2}^{-1}\\ b.bb=bbb.\times {2}^{-2}\\ .bbb=bbb.\times {2}^{-3}\end{array}$$

If the exponent were greater than 0 or less than -3, then the representation would involve additional zeros:

$$\begin{array}{c}bbb00000.=bbb.\times {2}^{5}\\ bbb00.=bbb.\times {2}^{2}\\ .00bbb=bbb.\times {2}^{-5}\\ .00000bbb=bbb.\times {2}^{-8}\end{array}$$

These extra zeros never change to ones, so they do not show up in the hardware. Unlike floating-point exponents, a fixed-point exponent never shows up in the hardware, so fixed-point exponents are not limited by a finite number of bits.

Consider a signed value with a word length of 8, a fraction length of 10, and a stored
integer value of 5 (binary value `00000101`

). The real-word value is
calculated using the formula *RealWorldValue* = *StoredInteger* ✕
2^{−FractionLength}. In this case, *RealWorldValue* = 5 ✕ 2^{−10} =
0.0048828125. Because the fraction length is 2 bits longer than the word length, the
binary value of the stored integer is `x.xx00000101`

, where
`x`

is a placeholder for implicit zeros. `0.0000000101`

(binary) is equivalent to `0.0048828125`

(decimal). For an example using a
`fi`

object, see Fraction Length Greater Than Word Length.

### Floating-Point Data Types

Floating-point data types are characterized by a sign bit, a fraction (or mantissa)
field, and an exponent field. Fixed-Point Designer adheres to the IEEE^{®} Standard 754-1985 for Binary Floating-Point Arithmetic (referred to simply as
the IEEE Standard 754 throughout this guide) and supports half-, single- and
double-precision data types.

When choosing a data type, you must consider these factors:

The numerical range of the result

The precision required of the result

The associated quantization error (i.e., the rounding mode)

The method for dealing with exceptional arithmetic conditions

These choices depend on your specific application, the computer architecture used, and the cost of development, among others.

With Fixed-Point Designer, you can explore the relationship between data types, range, precision, and
quantization error in the modeling of dynamic digital systems. With Simulink^{®}
Coder™, you can generate production code based on that model. With HDL Coder™, you can generate portable, synthesizable VHDL and Verilog code from
Simulink models and Stateflow^{®} charts.