Computing the outer product (BLAS 2 operation) raised to the power k using the least number of flops
6 次查看(过去 30 天)
显示 更早的评论
Explain how (xy^T)^k can be computed using the least number of flops, where x, y in R^n are vectors. Then write the corresponding MATLAB algorithm
Here is my algorithm:
function M = outer(x, y, k)
n = length(x);
n = length(y);
A = zeros(n, n);
for j = 1 : n
for i = 1 : n
A(i, j) = x(i) * y(j);
M(i,j) = A(i,j)^k;
end
end
so for example, if I have x = [1 2], and y = [2 4], and k = 2, then (xy^T) = [2 4; 4 8], and (xy^T)^2 = [4 16; 16 64], while (xy^T)^2 should be equal to [20 40; 40 80]. what is wrong with my algorithm? (And I looped over the columns first and put M(i,j) outside the for loop to use the least number of flops as the question wants. Is it correcr?)
3 个评论
Bjorn Gustavsson
2020-9-19
If you want the [20 40;40 80] result, then you should take the matrix-power, and not the elementwise power. The matrix power are (for the case of an exponent of 2 and 3):
M2 = M*M; % k = 2
M3 = M*M*M; % k = 3
You have to modify your algorithm and read up on efficient calculations of matrix-powers...
回答(2 个)
Walter Roberson
2020-9-22
LAPACK and BLAS are designed for computational efficiency, which is not the same thing as reducing the number of FLOPS. The least number of FLOPS is not necessarily the most efficient computation. The theoretical computations for matrix multiplications can involve a lot of cache misses for the right hand matrix; it is a lot more efficient in processor time to do block calculations, which are still "pretty" efficient but might have more FLOPS, -- for example it might require (say) 2*n+m additional additions compared to the theoretical, and yet a block multiplication might be 100 times faster due to not having as many cache misses.
matrix to an integer power can have the integer power decomposed into binary. For example, if k = 9, then instead of 9 matrix multiplications, M*M*M*M*M*M*M*M*M you can take M2 = M*M; M4 = M2*M2; M8 = M4*M2; M9 = M8*M -> only 4 matrix multiplies. Or M2 = M*M, M3 = M2*M, M9 = M3*M3 -> only 3 matrix multiplies
However, for large enough k even this is not necessarily the most efficient. Instead you can do [U,S,V] = svd(M); U * S^k * V and S is guaranteed to be diagonal, so S^k is S.^k .
But to know whether that is going to be more efficient than prime decomposition, you have to know the cost of SVD.
I have a suspicion that the problem might have been structured in a way that makes it easier to do SVD. Experimentally, in the tests I did, S was always non-zero only in its (1,1) entry.... though the values of the matrices are not at all obvious to me at the moment, so I do not know at the moment if there would be an "inexpensive" SVD that could be done.
1 个评论
Bjorn Gustavsson
2020-9-22
Well, since y is simply 2*x I think we can cook up the eigenvectors in 2-D space rather easily (x/norm(x) and one perpendicular to that) and the nonzero eigenvalue I get to be the product norm(x)*norm(y). But since the problem is clearly designed for someone to work out, it surely is a home-work task?
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!