huge differences in single vs double precision math

9 次查看(过去 30 天)
I am calculating a sum of squares in 32-bit FP precision (for comparison with a GPU algorithm, which isn't relevant here).
Here is the code:
Y=single((0:499).^2);
sum(Y)
ans =
41541684
sum(double(Y))
ans =
41541750
The (correct) double answer is off by 66! The largest value, 499^2 = 249001, is nowhere near any FP limits.
This is R2013A on OS X 10.9.

回答(1 个)

John D'Errico
John D'Errico 2014-8-7
What you don't understand is that single precision has a 23 bit mantissa. While there are 32 total bits stored in a single, don't forget that one of those bits is a sign bit, which leaves 8 bits to store an exponent in a biased form. So you cannot store an INTEGER larger than 2^24-1 in a single, if you wish to do so without error.
The sum you formed was larger than that limit, so you should expect an error.
log2(41541750)
ans =
25.308
It is time for you to start reading about floating point arithmetic.
Computers are not all powerful, except for those in the movies/tv.

类别

Help CenterFile Exchange 中查找有关 Mathematics 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by