Backslash (mldivide) slower than inverse and multiplication

24 次查看(过去 30 天)
The common wisdom is that A\y is more accurate than inv(A)*y and I believe that to be true, but when is it also faster? Say the matrix A is well-conditioned so I don't really care about the ability of \ to find a least-squares solution.
Am I doing something misleading here? The \ takes much longer.
A = randn(20);
A = A*A';
s = randn(20,1);
timeit(@() inv(A)*s)
ans = 5.4755e-06
timeit(@() A \ s)
ans = 9.2574e-06
From the documentation for mldivide, it sounds like it should be using the Cholesky solver since A is Hermitian, why is that not faster than an inv and matrix multiplication?
ishermitian(A)
ans = logical
1

采纳的回答

Christine Tobler
Christine Tobler 2025-7-31
Hi AB,
This is an interesting question. The conventional wisdom is based on the computational complexity, meaning it prioritizes large matrices. For small matrices, things become more complicated (and, as John mentioned, much more painful to measure and understand due to noise).
BLOT: For small matrices, estimating the condition number in mldivide takes a considerable time of the computation. If you're confident in your conditioning, use pagemldivide which doesn't compute the condition number unless a second output is requested, which is faster than inv and mldivide.
We have recently taken an interest in matrices with smaller sizes, as part of the introduction of page* functions. These are functions that take a 3D array representing a stack of matrices, and apply the same operation to each matrix in this stack (each "page" A(:, :, i)); which can be much faster than using a for-loop due to reduction in function call overhead.
Remember that MATLAB's backslash (or mldivide) will warn if the left matrix is ill-conditioned:
magic(10) \ randn(10, 1);
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 3.081152e-18.
For us to give this warning, we need to estimate the condition number of that matrix, which is done through an iterative algorithm, solving several linear systems. For large matrices, this isn't too expensive compared to the factorization itself ((O(n^2) to solve linear system, O(n^3) for the factorization); but for small matrices, it can become dominant.
In pagemldivide, which is focused on many small matrices, we decided not to warn when one of the matrices is ill-conditioned. Instead, a second output gives the estimated condition number of each matrix. This means we don't need to estimate the condition numbers when the second output isn't requested.
Let's compare inv, mldivide, and pagemldivide timings for your example:
function tim = timeMldivide(n)
A = randn(n);
A = A*A';
s = randn(n,1);
rep = 1e7 / n^2;
tic;
for ii=1:rep
y = inv(A)*s;
end
tInv = toc;
tic;
for ii=1:rep
y = A \ s;
end
tMldivide = toc;
tic;
for ii=1:rep
y = pagemldivide(A, s);
end
tPagemldivide = toc;
if nargout == 0
tInv
tMldivide
tPagemldivide
else
tim = [tInv tMldivide tPagemldivide] ./ rep;
end
end
timeMldivide(20)
tInv = 0.1325
tMldivide = 0.2022
tPagemldivide = 0.0921
We can see that inv is faster than mldivide, and pagemldivide is faster than both. So, if you know your problem to be well-conditioned, pagemldivide is a better choice than inv.
You may wonder: Doesn't inv also warn for ill-conditioned inputs - meaning it also needs to compute the condition number? This is true, but the condition number is defined as cond(A) = norm(A, 1) * norm(inv(A), 1). Since inv(A) is already is computed, it's cheaper to just compute its 1-norm, instead of using an iterative estimator as is done in mldivide.
To finish, let's look at timings for different matrix sizes, to see how these timings behave there:
nlist = [2 3 5 8 12 16 20 32 64 128 256 512];
tim = [];
for n=nlist
tim = [tim; timeMldivide(n)];
end
loglog(nlist, tim);
legend('inv', 'mldivide', 'pagemldivide', Location='northwest')
xlabel('Matrix size')
ylabel('Time (s)')
We can see pagemldivide (which doesn't check condition number) is fastest in all cases, although the difference to mldivide decreases as the matrix size increases. We see inv becomes much slower for large sizes, but is faster than mldivide for sizes 20 and 32 (the specific bounds will depend on the machine, and can change from release to release).
That mldivide is faster for very small matrices I would put down to the special-case treatment in mldivide for very small matrices - we always use LU there, because the cost of structure detection was becoming noticeable for these very small matrices (see the doc for mldivide). In inv, this structure detection is needed, because inv for Hermitian matrices returns an output that is also Hermitian - and this requires a separate algorithm.
  4 个评论
Stephen23
Stephen23 2025-7-31
@Christine Tobler: perhaps a summary of this information would not go amiss in the Tips section of MR/MLDIVIDE. Some users might really appreciate this little speed boost.
Christine Tobler
Christine Tobler 2025-8-1
Thank you!
I've passed this on to our documentation team to consider adding to the mldivide page.

请先登录,再进行评论。

更多回答(1 个)

John D'Errico
John D'Errico 2025-7-30
编辑:John D'Errico 2025-7-30
First of all, you should NEVER be computing A*A' or A'*A. that is where most of the loss in accuracy arises. Whatever is the condition number of A, that operation squares it. Effectively, you might as well be doing your work in single precision, because the lower 8 bits of your result are now meaningless garbage.
Next, a 20x20 matrix is far too small for anything intelligent to be made of a timing estimate like that. It is far more likely to get bogged down in simple things like deciding if the matrix is indeed symmetric, etc.
Next, learn to use tools like timeit.
A = randn(2000); A = A*A';
s = randn(2000,1);
timeit(@() A\s)
ans = 0.0466
timeit(@() inv(A)*s)
ans = 0.1206
Do you see that the 2kx2k example is now considerably faster for backslash?
As well, it is almost certain that you could have done what you needed to do without ever squaring that condition number in the first place.
  2 个评论
AB
AB 2025-7-30
Hi John, thanks for the timeit advice, I've updated the question to use timeit.
I am not sure what you mean by "a 20x20 matrix is far too small for anything intelligent to be made of a timing estimate like that". The matrix I am interested in is indeed a 20x20 matrix. Is your answer that for small matrices, inv(A)*s is expected to be faster than \ due to overhead to characterize the matrix?
John D'Errico
John D'Errico 2025-7-30
For a 20x20 matrix, that matrix is so small that it is very difficult to know where the time penalty may come from. No, my answer is NOT that overhead is expected to be the cause of the problem, because I have no expectation at all. But a matrix that small can easily see the actual computation time be swamped by other things. If I had to make a guess, it would be that.
And again, you should probably not be computing things that way in the first place.
Yes, you CAN solve a linear least squares probem using the normal equations, but that squares the condition number. And yes, this is what so many people seem to learn. They learn it from someone else, who learned it long ago from another, etc. it is likely you are doing something like what I've described, because you talked about least squares, and because you are forming something like A*A' in the first place, and then inverting it.
So I would suggest that you consider other ways to solve the problem you want to solve. It might involve a QR factorization. For example, you can get a Cholesky factorization directly from a QR, without needing to go through the nasty issues of forming A'*A or A*A'. What you really want to do might involve backslash. But I don't know, because all I see is the test case you ran.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Linear Algebra 的更多信息

产品


版本

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by