Processing timer for Sparse matrix is too long?

1 次查看(过去 30 天)
Dear All,
I am a new user with Matlab, and I meet a problem with Sparse matrix.
Time for processing with Sparse matrix is very long.
Here is my code:
lab=randi([1 7],1,100000);
M = sparse(100000,100000);
for i = 1:100000
for j = (i+1):100000
if lab(i,1) == lab(j,1)
M(j,i) = 1;
else
M(j,i) = 0;
end
end
end
Do I have any method for reduce timing process? Thanks and Best regard!
  1 个评论
Tim Berk
Tim Berk 2017-9-19
Dear Nghia,
Looping through large arrays is very slow, especially in what is called 'nested loops' as you are using. Your nested loop requires 100000*100000/2 calculations (the 1/2 because of the (i+1) as starting value for j).
Lets assume that evaluating
if lab(1,i) == lab(1,j)
M(i,j) = 1;
else M(j,i) = 0;
end
Takes 10 micro seconds (which I think is quite a low estimate), than your script would take 1E5*1E5*0.5*1E-5 = 50000 seconds (14 hours).
Hope this clarifies why it is slow.
Often this can be solved by vectorization as explained here: https://uk.mathworks.com/help/matlab/matlab_prog/vectorization.html
Hope this helps.
Cheers,
Tim

请先登录,再进行评论。

采纳的回答

Cedric
Cedric 2017-9-19
编辑:Cedric 2017-9-19
You won't be able to do it, vectorizing or not.
If random integers are really in 1:7, the probability, when you pick two elements, that one is equal to the other is 1/7. On average, with your lab vector of n=1e6 elements and assuming that you are interested in the lower triangular part of M, this means n^2/2/7 non-zero elements, and hence
>> (16 * (1e6)^2/2/7 + 8 * 1e6 + 8) / 1e12
ans =
1.1429
more than 1.14 TB of RAM to store (just a single version of it).
Are random integers really in 1:7 or is the pool of integers larger, e.g. 1:n ? In the second case, you will gain a lot by vectorizing the approach. Often in these cases that involve sparse matrices, the best approach consists in building indices of non-zero elements ( I and J ), and performing a unique call to SPARSE at the end:
M = sparse( I, J, 1, n, n ) ; % Values are all 1 in your case.

更多回答(1 个)

Walter Roberson
Walter Roberson 2017-9-19
编辑:Walter Roberson 2017-9-19
Doing M(j,i) = 0 is a waste of time on a sparse array unless M(j,i) might already have a non-zero value. sparse() initializes to 0.
Your line,
M = sparse(100000,100000);
is equivalent to
M = sparse([], [], [], 100000, 100000, 0)
which generates a sparse array that is expected to have 0 occupied cells in it. Every time a cell is written, the sparse array needs to be recreated to allocate room for the new entry. It is a lot more efficient to use
M = spalloc(100000, 100000, N)
where N is an estimate of the number of locations that will eventually be occupied with non-zero values. In your situation you can estimate N as 100000^2/7 which is roughly 1428571429. One element in seven occupancy is not very sparse.
Your inner loop could be vectorized, which could improve performance a lot.

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by