Problem of ``sprandn'': the number of nonzero elements is inconsistent with density

Question

Zenan Li 2020-7-31

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/573241-problem-of-sprandn-the-number-of-nonzero-elements-is-inconsistent-with-density

编辑： Bruno Luong 2020-7-31

R = sprandn(m,n,density) is a random, m-by-n, sparse matrix with approximately density*m*n normally distributed nonzero entries (0 <= density <= 1).

However, if density is large, the number of nonzero entries it not consistent with density.

nnz(sprandn(100,10,0.3))
ans = 
        259
nnz(sprandn(100,10,0.7))
ans = 
        492

For density=0.7, the number of nonzero elements should be approximately 700, but the result is only 492.

This is an example, I have tried for many times but results are similar.

I apologize in advance if it is a trivial problem.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Bruno Luong 2020-7-31

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/573241-problem-of-sprandn-the-number-of-nonzero-elements-is-inconsistent-with-density#answer_473302

编辑：Bruno Luong 2020-7-31

The approximation is only good when density << 1. I think SPRAND/SPRANDN simply create random draw of (density*m*n) pairs of (i,j) WITH replacement (perhaps for the reason of efficiency and memory).

When density specified is close to 1, there are a lot of pairs that collide in the same place and the effective density is less than what user specifies.

>> nnz(sprand(100,100,1))/100^2
ans =
    0.6400

I believe the "defective ratio" is about

>> 1-1/exp(1)
ans =
    0.6321

for density ~ 1 and large m*n

2 个评论
显示无隐藏无

Zenan Li 2020-7-31

Thank you, I agree it.

Bruno Luong 2020-7-31

You can check the fact that draw with replacement is used from the code of SPRAND

   nnzwanted = round(m * n * min(density,1));
   i = fix( rand(nnzwanted, 1) * m ) + 1;
   j = fix( rand(nnzwanted, 1) * n ) + 1;
   ij = unique([i j],'rows');
   if ~isempty(ij)
      i = ij(:,1);
      j = ij(:,2);
   end

请先登录，再进行评论。

Answer 2

John D'Errico 2020-7-31

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/573241-problem-of-sprandn-the-number-of-nonzero-elements-is-inconsistent-with-density#answer_473365

编辑：John D'Errico 2020-7-31

Bruno has already explained why sprand/sprandn fail to produce the requested density of arrays. So this answer is here only to add some information. Honestly, I would think they might have done at least a little better job in those codes for matrices that are only semi-sparse. My guess is the author did not consider that anyone would want to generate sparse matrices that are even remotely close to full.

I would add that you really don't have any rational need for high density sparse matrices. That is, a sparse matrix with a density even as large as 50% will be slower in terms of computation. It will take more memory to store.

Af = randn(5000);
Af(rand(size(A)) > 0.5) = 0;
As = sparse(Af);
whos Af As
  Name         Size                  Bytes  Class     Attributes
  Af        5000x5000            200000000  double         
  As        5000x5000            200048280  double    sparse       
nnz(As)/numel(As)
ans =
                0.50002068

This, by the way, has one virtue when computing a semi-sparse matrix, in the sense that it does produce a matrix with a very good approximation of the requested density. A problem of course, is I created a full matrix, and then discarded half of the elements. Not only that, but the way I chose the elements to discard forced me to create a temporary second full matrix of that size. So the method I used above is arguably a poor one. But a more careful sampling scheme would logically have been possible.

Regardless, the sparse version (As) takes more memory to store then the corresponding full matrix (Af). This happens because a sparse matrix needs to store not only the non-zero elements, but also the locations of those elements. And that too requires memory.

Are these quasi-sparse matrices faster to use? Often not the case.

timeit(@() lu(Af))
ans =
            0.457781138132
            
timeit(@() lu(As))
ans =
           48.439253306132
           
timeit(@() Af*Af)
ans =
            0.864906439132
timeit(@() As*As)
ans =
           32.712310654132

Sparse matrices are splendid tools, when your matrix truly is sparse. But when it is only 50% sparse or even anything in that vicinity, that sparsity becomes a cost, not a gain.