Data type conversion changes the value of the signal

Question

Luca Ferro 2023-2-2

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1905101-data-type-conversion-changes-the-value-of-the-signal

评论： Luca Ferro 2023-2-6

采纳的回答： Andy Bartlett

Any clue why this happens?

For reference this is how the signal is defined:

I guess the offset has something to deal with it.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Andy Bartlett 2023-2-3

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1905101-data-type-conversion-changes-the-value-of-the-signal#answer_1163420

编辑：Andy Bartlett 2023-2-3

在 MATLAB Online 中打开

Why cast to single has large quantization error.

The cast from fixed-point to single will use single precision for constants and single precision for all operations with one exception. The exception is the cast of the input stored integer value to single. That cast has integer input and single precision output.

Generating C Code using Simulink Coder or Embedded Coder will show the details.

Y1 = (real32_T)U1 * 0.0009765625F - 2.056192E+6F;

There CAN be precision losses at each an every step.

The representation of the Slope 0.0009765625F happens to be lossless in this case.

The representation of the Bias -2.056192E+6F also happens to be lossless in this case.

(real32_T)U1 chops the 32 bit input down to the 24-bit mantissa of singles lossing up to 8-bits of precision.

(real32_T)U1 * 0.0009765625F is 24-bit mantissa times 24-bit mantissa so full-precision can require up to 48-bits. That will be quantized down to 24-bits.

The subtraction can also lose some precision but that will be small in relative terms.

Quantitative Analysis of Errors

dt1 = fixdt(0,32,0.0009765625,-2056192);
uIdeal1 = 0.1;
u = fi(uIdeal1, dt1);
uStoredInteger = u.storedInteger
uStoredInteger = uint32
    2105540710
siInFloat = single(uStoredInteger)
siInFloat = single
    2.1055e+09
errorInStoredIntegerCast = double(siInFloat) - double(uStoredInteger)
errorInStoredIntegerCast = 26
realWorldImpactOfErrorInStoredIntegerFloat = errorInStoredIntegerCast * dt1.Slope
realWorldImpactOfErrorInStoredIntegerFloat = 0.0254

The error converting the 32-bit integer to a 24-bit mantissa floating-point is the dominant error source in this case.

0.996 + 0.0254 = 0.125

To get higher precision cast to double first

To get higher precision in converting slope-bias fixed-point to single, an approach is to first cast to double and then downcast to single. This approach is unnecessary if the fixed-point type has binary-point scaling (bias is zero and slope is an exact power of two). This approach is most impactful if the fixed-point type uses more than 24-bits and the slope is not an exact power of two.

The downside of this approach is that casts to double could be very costly on an embedded processor like a ARM Cortex M4F that has single precision floating-point hardware, but not double precision floating-point hardware. The double math would need to be emulated in software which would be much slower. This is a key reason a large body of users requested that casts and operations that mix fixed-point and single precision floating-point should only use single and not use doubles. This group of users prefered to model an explicit cast up to double when greater precision was need.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Luca Ferro 2023-2-6

Thank you for the explanation, unfortunately due to the nature of the application casting to double is not possible. Still now that the issue is clearer i can figure out how to deal with it in the most effective way

请先登录，再进行评论。

Answer 2

Andy Bartlett 2023-2-3

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1905101-data-type-conversion-changes-the-value-of-the-signal#answer_1163325

编辑：Andy Bartlett 2023-2-3

在 MATLAB Online 中打开

explainQuantizationOfConstant.m

Brief answer

The original value 0.1 is between two representable values (0.099609375 and 0.1005859375) of the type being quantized to fixdt(0,32,0.0009765625,-2056192). The original value is rounded to the nearest of the two representable values of the output type.

Tool to explain "any" case

The attached function provides a more detailed explaination of what happens when quantizing "any" scalar value to "any" numeric type. It should work fine for just about "any" case of interest including fixed-point, integer, and floating-point.

I put "any" in quotes because the analysis and plotting use calculations in double precision floating-point. If the input value or data type used was extreme relative to doubles then the analysis will fall apart. For example, the maximum finite value of double is approximately 1e308. The data type fixdt(0,8,-3000) has a maximum representable value equal to 255*2^3000 or approximately 1e905 which is extreme compared to the finite range of double.

Using tool on your example

dt1 = fixdt(0,32,0.0009765625,-2056192);

uIdeal1 = 0.1;

explainQuantizationOfConstant(uIdeal1,dt1)

Data type: fixdt(0,32,0.0009765625,-2056192) Original ideal value 0.1 Quantized value 0.099609375 Original ideal value is between two representable values in fixdt(0,32,0.0009765625,-2056192) Representable value below 0.099609375 Representable value above 0.1005859375 The ideal input was rounded to nearest of the two values.

2 个评论
显示无隐藏无

Les Beckham 2023-2-3

I believe that @Luca Ferro is actually asking why the value changes to 0.125 when converted to single (scroll the image to the right to see the second Display block), not why it shows 0.09961 in the first Display block.

At least, that is what I am wondering.

Andy Bartlett 2023-2-3

Hi Les,

Ah. Thanks for pointing out the second possible and more likely intended question.

Andy

请先登录，再进行评论。

Data type conversion changes the value of the signal

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

2 个评论
显示无隐藏无

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Data type conversion changes the value of the signal

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

2 个评论 显示 无隐藏 无

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

2 个评论
显示无隐藏无