Accumulator Data Type not used in ERT generated code
2 次查看(过去 30 天)
显示 更早的评论
Here's an example model (Made/tested using R2017b):
As you can see I've made sure all data types used anywhere in this model are 'uint16' data type. I've also configured the 'sum' block to use this as the accumulator data type by matching the first input.
Here is the _step() function that gets generated when building this model using embedded coder (ERT target):
/* Model step function */
void testCounter_step(void)
{
/* UnitDelay: '<Root>/Unit Delay' */
testCounter_B.UnitDelay = testCounter_DW.UnitDelay_DSTATE;
/* Outport: '<Root>/count' incorporates:
* Constant: '<Root>/Constant1'
* Sum: '<Root>/Sum'
*/
testCounter_Y.count = (uint16_T)(((uint32_T)testCounter_B.UnitDelay) +
((uint32_T)((uint16_T)1U)));
/* Update for UnitDelay: '<Root>/Unit Delay' incorporates:
* Outport: '<Root>/count'
*/
testCounter_DW.UnitDelay_DSTATE = testCounter_Y.count;
}
The actual sum operation is converting both inputs to uint32, then performing the addition, then converting back to uint16. Why is it doing this? I thought that specifying the accumulator data type to use the same as first input would avoid these unnecessary conversions?
For reference, here are the hardware settings (should support 16-bit native math using 'short'). Let me know if there are any other model settings which need looked at.
0 个评论
采纳的回答
Andy Bartlett
2021-3-17
编辑:Andy Bartlett
2021-3-17
The C language was designed to closely match what computers do. It is common for a computer's CPU to be based around a particular base integer type, such as 16 bits or 32 bits. The core CPU registers will all be this size. Most machine instructions like addition, subtraction, etc. work on this size integer. Some might be available for bigger types, but that is not important to this discussion. The size of this base integer register type will be the size a C compiler declares for int. Other than load and store, it is uncommon for scalar machine instructions like add to work on types smaller than the base integer type.
As an example, let's consider your CPU-compiler-pair with int of 32 bits, short of 16 bits, and char of 8 bits. The CPU will have machine level instructions to add 32 bit values, but not 16 bit or 8 bit values. Instead what happens is a load will pull the 8-bit or 16-bit value from memory and put it into a 32 bit register with appropriate "sign extending" of the extra MS bits based on whether it was a unsigned or signed load instruction. The machine level instruction for 32 bit addition will then be executed. If immediate downcast back to 8 or 16 bits is desired, then the least significant bits would be stored into memory, or a bitwise operation would "wack" the MS bits of the register to make it waddle-and-quack like a 8 or 16 bit value.
Because of this CPU reality, the C language has what are called the "Usual Unary and Binary Rules" for type promotion. For example, the rules state that binarary operations on short or char will get promoted to integer before performing the actual operations. This C language behavior matches what was just described at the machine level, except the subsequent downcast does not automatically happen. If the extra instructions to downcast are desired, the C code author must explicitly include the cast or assign the expression to a smaller type.
The generated C code you showed is just being verbose about the upcasts to integer prior to the addition.
testCounter_Y.count = (uint16_T)(((uint32_T)testCounter_B.UnitDelay) + ((uint32_T)((uint16_T)1U)));
If the text of the code was instead this.
testCounter_Y.count = testCounter_B.UnitDelay + ((uint16_T)1U);
The generated machine code would still be identical (*). Following C language rules, the inputs to the addition would be upcast to integer (32 bits). The addition would be a 32 bit operation. The assignment would still have to do a downcast from the 32 bit integer register into the 16 bit variable. Textual it may look more efficient, but it is really the same.
Summary, there is no efficiency penalty. The text of the C code is just being explicit about what the C language rules dictate and what the CPU will naturally do at the machine instruction level.
(*) There is one small difference between the two pieces of text. C rules say the second text would promote to signed 32 bit int instead of unsigned. But you can still expect the efficiency to be the same assuming a reasonable smart compiler.
4 个评论
Andy Bartlett
2021-3-17
Hi Paul,
The accumulator type becomes beneficial when the output has less precision or less range than a full precision implementation step would have.
The sum block does internal steps in the range and precision of the accumulator type and then the final answer is cast to the output type.
For example, suppose the sum block has three inputs and is configured as
y = u1 - u2 + u3
with each input being uint32 type
output type is also uint32
and overflows are configured to saturate
The internal operations involve these steps
accum = cast_to_accum_type( u1 )
temp = cast_to_accum_type( u2 )
accum = sat( accum - temp )
temp = cast_to_accum_type( u3 )
accum = sat( accum + temp )
y = cast_to_out_type( accum )
Consider this input
u1 = 0
u2 = -105
u3 = 110
If the accumulator type is uint32 then strange answers can occur.
accum = u1 = 0
temp = 105
accum = sat( accum - temp ) = sat( 0 - 105 ) = 0
temp = 110
accum = sat( accum + temp ) = sat( 0 + 110 ) = 110
y = accum = 110
Now consider doing the operations in an int64 accumulator type
accum = u1 = 0
temp = 105
accum = sat( accum - temp ) = sat( 0 - 105 ) = -105
temp = 110
accum = sat( accum + temp ) = sat( -105 + 110 ) = 5
y = sat( accum ) = sat( 5 ) = 5
With the big accumulator, the ideal answer of 5 is produced.
With the small accumulator identical to the output type, the answer is off by 105.
Regards,
Andy
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!