Problem of ``sprandn'': the number of nonzero elements is inconsistent with density

1 次查看(过去 30 天)
R = sprandn(m,n,density) is a random, m-by-n, sparse matrix with approximately density*m*n normally distributed nonzero entries (0 <= density <= 1).
However, if density is large, the number of nonzero entries it not consistent with density.
nnz(sprandn(100,10,0.3))
ans =
259
nnz(sprandn(100,10,0.7))
ans =
492
For density=0.7, the number of nonzero elements should be approximately 700, but the result is only 492.
This is an example, I have tried for many times but results are similar.
I apologize in advance if it is a trivial problem.

采纳的回答

Bruno Luong
Bruno Luong 2020-7-31
编辑:Bruno Luong 2020-7-31
The approximation is only good when density << 1. I think SPRAND/SPRANDN simply create random draw of (density*m*n) pairs of (i,j) WITH replacement (perhaps for the reason of efficiency and memory).
When density specified is close to 1, there are a lot of pairs that collide in the same place and the effective density is less than what user specifies.
>> nnz(sprand(100,100,1))/100^2
ans =
0.6400
I believe the "defective ratio" is about
>> 1-1/exp(1)
ans =
0.6321
for density ~ 1 and large m*n
  2 个评论
Bruno Luong
Bruno Luong 2020-7-31
You can check the fact that draw with replacement is used from the code of SPRAND
nnzwanted = round(m * n * min(density,1));
i = fix( rand(nnzwanted, 1) * m ) + 1;
j = fix( rand(nnzwanted, 1) * n ) + 1;
ij = unique([i j],'rows');
if ~isempty(ij)
i = ij(:,1);
j = ij(:,2);
end

请先登录,再进行评论。

更多回答(1 个)

John D'Errico
John D'Errico 2020-7-31
编辑:John D'Errico 2020-7-31
Bruno has already explained why sprand/sprandn fail to produce the requested density of arrays. So this answer is here only to add some information. Honestly, I would think they might have done at least a little better job in those codes for matrices that are only semi-sparse. My guess is the author did not consider that anyone would want to generate sparse matrices that are even remotely close to full.
I would add that you really don't have any rational need for high density sparse matrices. That is, a sparse matrix with a density even as large as 50% will be slower in terms of computation. It will take more memory to store.
Af = randn(5000);
Af(rand(size(A)) > 0.5) = 0;
As = sparse(Af);
whos Af As
Name Size Bytes Class Attributes
Af 5000x5000 200000000 double
As 5000x5000 200048280 double sparse
nnz(As)/numel(As)
ans =
0.50002068
This, by the way, has one virtue when computing a semi-sparse matrix, in the sense that it does produce a matrix with a very good approximation of the requested density. A problem of course, is I created a full matrix, and then discarded half of the elements. Not only that, but the way I chose the elements to discard forced me to create a temporary second full matrix of that size. So the method I used above is arguably a poor one. But a more careful sampling scheme would logically have been possible.
Regardless, the sparse version (As) takes more memory to store then the corresponding full matrix (Af). This happens because a sparse matrix needs to store not only the non-zero elements, but also the locations of those elements. And that too requires memory.
Are these quasi-sparse matrices faster to use? Often not the case.
timeit(@() lu(Af))
ans =
0.457781138132
timeit(@() lu(As))
ans =
48.439253306132
timeit(@() Af*Af)
ans =
0.864906439132
timeit(@() As*As)
ans =
32.712310654132
Sparse matrices are splendid tools, when your matrix truly is sparse. But when it is only 50% sparse or even anything in that vicinity, that sparsity becomes a cost, not a gain.

类别

Help CenterFile Exchange 中查找有关 Sparse Matrices 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by