precision of calculation with Matlab

Question

DRIDI Fethi 2020-2-10

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/504701-precision-of-calculation-with-matlab

评论： DRIDI Fethi 2020-2-11

Hello, I have a problem of precision of calculation with Matlab when I make a multiplication of two positive numbers of 32 bits I had the results of the calculator on the other hand Matlab give us the last two bits in hex of zeros.for example:

result with calculator: 5A0C83 DE64F998F9

results with Matlab: 5A0C83 DE64F99800

2 个评论
显示无隐藏无

James Tursa 2020-2-10

You need to provide us with a complete example. Give us the exact inputs and then describe the output issues you are seeing.

DRIDI Fethi 2020-2-11

clc;
clear all;
E1=31;
E2=16;
E3=(2^32)-E1-E2;
x1 = 1510769636;
x2 = 757236075;
x3 = 1466008126;
out = uint64((E3 * x1) + (E1 * x2) + (E2 * x3));
y = uint32(mod(out,2^32));

Hello,

So, normally the result equal to 5A0C83 DE64F998F9 for multiplication, I need an exact result since I only use 32-bit LSB (modular calculation).

thank you for your reply

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

John D'Errico 2020-2-11

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/504701-precision-of-calculation-with-matlab#answer_414953

编辑：John D'Errico 2020-2-11

This must of course fail as you are doing it, since out must generally exceed 2^53-1, the limits of flintmax for a double precision number. Remember that E1,E2,E3,x1,x2,x3 are all doubles. So when we compute

(E3 * x1) + (E1 * x2) + (E2 * x3)

It is also a double. Just because you then push it into a uint64 is not sufficient. You have already tossed too many bits into the bit bucket.

But, suppose that we force x1,x2,x3 to be uint64 in advance? That is, what if x1,x2,x3 all are created as uint64? Now products like E1*x1 will now be created directly as uint64 numbers, not temporarily as doubles. And the sum in out will now already be uint64.

E1=31;
E2=16;
E3=(2^32)-E1-E2;
x1 = uint64(1510769636);
x2 = uint64(757236075);
x3 = uint64(1466008126);
out = (E3 * x1) + (E1 * x2) + (E2 * x3);
y = mod(out,2^32)
y =
  uint64
   1694079225
   
uint32(y)
ans =
  uint32
   1694079225

y is the correct integer value we would expect it to have. There is no need to convert y into a uint32 though, as we effectively achieved that using the mod operation. As long as out will never overflow uint64, we have no problems.

intmax('uint64')
ans =
  uint64
   18446744073709551615
   
out =
  uint64
   6488706154334099705

As you can see, it does not. But can there ever be a problem? No. Since the sum E1+E2+E3 == 2^32, we need not worry about that being a problem as long as x1,x2,x3 all live inside the 32 bit limit themselves, out is just a convex linear combination of the x's, so no problem will exist.

Essentially, there is no problem, as long as you are careful in how you do the computation. Do NOT let MATLAB use doubles here though, as that will fail.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

DRIDI Fethi 2020-2-11

absolutely, the problem is solved.

thank you so much

请先登录，再进行评论。

Answer 2

James Tursa 2020-2-10

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/504701-precision-of-calculation-with-matlab#answer_414841

Calculators may not use the same floating point representations or arithmetic routines that MATLAB uses, so differences in the trailing bits should not be unexpected in these cases.