MATLAB precision (?) error that escalates to infinity
显示 更早的评论
Assume some given vectors
,
,
. Then, I apply repetetively (e.g. 2000 times) the following update:
,
. Then, I apply repetetively (e.g. 2000 times) the following update:
For every update, I tried 2 different ways:
- I directly compute
and assign it to
, or - For every i=1,...,m I compute:

Theoretically the 2 techniques should be totally equivalent. However, while the updates go on a small error is stacked and grows to infinity (as shown from the figure produced by the attached code). I also attach a .mat file used for initialization of the arrays.
So, which of the 2 techniques is correct to be used with MATLAB and which one is not? Is the reason for the error the floating point precision error of MATLAB, or something else?
Note: I am running R2021a.
%%%%% Initialization
load 'my_mats.mat';
max_iters = 2000;
mu = mu_init;
MU = mu_init;
[mu2, MU2] = deal(mu, MU);
[mus, MUS] = deal(mu, MU);
%%%%% Run the multiplications
iter = 0;
while iter <= max_iters
for idx = 1:length(mu_init)
mu2(idx) = cc(idx) + AA(idx,:)*mu;
end
MU2 = cc+AA*MU;
mus = [mus mu2];
MUS = [MUS MU2];
[mu, MU] = deal(mu2, MU2);
iter = iter + 1;
end
%%%%%%% PROBLEM: The difference is not zero. Why???
diffrnce = mus - MUS;
diffs = squeeze(vecnorm(diffrnce, 1)).';
figure;
semilogy(0:max_iters+1, diffs); grid on;
3 个评论
"However, while the updates go on a small error is stacked and grows to infinity"
Your algorithm is numerically unstable, apparently it amplifies tiny numeric noise until it becomes significant.
"which of the 2 techniques is correct to be used with MATLAB and which one is not?"
Both of them are correct. I looked at the difference between the loop vs vectrorized results, they differ around 15th significant digits or so. Unwanted perhaps, but not particularly meaningful in the world of binary floating point numbers.
You could spend some time investigating which floating point value is closer to the symbolic/algebraic solution, if it entertains you. However your input data are very very unlikely to be measured to 15 significant figures of accuracy:
This means you expect the results of a computation to have a higher precison than the input data.
The solution is to focus on improving the algorithm robustness, or finding alternative approaches.
I think it's not focussing on the wrong things. I find it natural that for an unexperienced person in numerical maths, it's of course surprising that - starting with exactly the same input data and using two equivalent techniques - results already differ after the first iteration.
By the way: In this special case, it's also surprising for me.
"By the way: In this special case, it's also surprising for me."
My guess is that the vectorized code uses some BLAS (or similar) routine, while the loop ... well, something else, perhaps something written by TMW. There are various differences that those routines could have, such as the use/number of extended-precision bits, the algorithm used, the order of operations, etc.
"I find it natural that for an unexperienced person in numerical maths..."
Should be aware of how errors accumulate, even before they touch a computer: accumulated error applies to data measurements and any calculations on it, not just those on computers using floating point numbers.
采纳的回答
更多回答(1 个)
Walter Roberson
2022-6-10
0 个投票
Matrix multiplication calculates each location as the sum of a number of products of pairs. Sometimes this is expressed using the dot product operator, dot(A(P, :), B(:, Q)) = A(P, 1)*B(1,Q) + A(P, 2)*B(2,Q) and so on.
Now if you are working to indefinite precision, associative and commutative properties would hold, and the order that you did the additions would not matter. When working with indefinite precision, it would be perfectly fine to divide the products up into four groups, use a core to sum within the group (working in parallel), and then at the end total the four partial sums. If you were working with indefinite precision, you could add 2^1024 + 2^-1024 to get something different from 2^1024, and then you could subtract off 2^1024 to get 2^-1024 exactly as the result. Furthermore if you were working with indefinite precision, you could add those 2^1024 millions of time before subtracting out an equal number of them.
It follows that if you use indefinite precision, that your calculation system must be able to use pretty much all of your memory to store a single number, since A+B-A-B+A+B-A-B repeatedly trillions of times with indefinite precision mathematically would have to robustly give an exact zero if you were to rewrite as (A+A+A+... A) + (B+B+B+... B) - (A+A+A+... A) - (B+B+B+... B). Make A large absolute magnitude and B small absolute magnitude and you can obviously drive the number of bits needed to represent the intermediate numbers as large as you want, and every bit would have to be preserved just in case what is being subtracted does not exactly balance until trillions more calculations are done.
And so far we are just dealing with addition, subtraction, and pairwise multiplication of finite precision numbers. Even without having gotten into transcendental calculations such as trig or exponentials we have shown that if you want mathematical properties of association and commutation to hold, then you need to use a calculation system that can use all of your memory to represent a single number. (Well, really you need to be able to use more memory than that!)
The reality, of course, is that hardware representation of numbers has a fixed maximum number of bits for each number. And that because of that, the associative and commutative laws cannot hold. Whenever you fix the maximum number of bits to represent something, then for any given number A, there exists a non-zero number B such that A+B is indistinguishable from A. The same logic holds for decimal representation and holds whether you use 32 bits or 64 bits or 256 bits or 65536 bits per number: if your numeric representation for each number cannot grow to as much memory as you have, then associative and commutative laws must fail. This is not a MATLAB bug: it is Information Theory. Any fixed representation can only represent a finite number of states, but associative and commutative laws require an unbounded number of states.
And so, getting back to the sum of products for the matrix multiplication, it follows that the order of the additions must be important for any representation that has a maximum size of representation of numeric values. That, depending on how you add the terms, you may get a different result. This is not a MATLAB bug, this is a fundamental limitation of representations of limited length.
When you take the two different approaches, you are going through different paths in determining what order to do the additions. The full matrix multiplication is probably being sent to a high performance routine that probably splits the calculation between different cores, creating a partial sum for each core and then adding the partial sums. The other calculation might be considered to be too small to be worth doing multiple cores, as there is communication cost to splitting the load ("A human can outrun a horse over short distances.")
Which one is "right"? Neither of them. Switch to the Symbolic Toolbox if you need the "right" answer.
类别
在 帮助中心 和 File Exchange 中查找有关 Loops and Conditional Statements 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!