Sparse matrix memory understanding
39 次查看(过去 30 天)
显示 更早的评论
I'm currently working with large matrices (significantly over 20000x20000) with most zero elements. Usually, around 5% of the elements are non-zero, so it would make sense to use sparse programming. Before sparsifying the computations, I wanted to test some basic concepts. I generated a random 20000x20000 matrix and checked its memory. Afterwards, I made most of its elements zero so that less than 5% non-zero elements remain. After checking its memory, I found out that the sparse matrix takes up even more memory. A screenshot of the code can be found below. Can someone shed light on this?
0 个评论
采纳的回答
Steven Lord
2024-10-30,19:43
Please don't post pictures of code. MATLAB Answers can't run pictures of code.
rng default
Yb = unifrnd(-1, 1, [20000 20000]);
fprintf('%g%% of Yb is non-zero.', 100*nnz(Yb)/numel(Yb))
Yb2 = Yb;
Yb2(Yb2<0.9) = 0;
fprintf('%g%% of Yb2 is non-zero.', 100*nnz(Yb2)/numel(Yb2))
Ybs = sparse(Yb2);
fprintf('%g%% of Ybs is non-zero.', 100*nnz(Ybs)/numel(Ybs))
Let's check and make sure that what you're saying is correct, that Ybs takes up more memory than Yb or Yb2.
whos Yb Yb2 Ybs
In your separate whos calls the bytes column was not aligned. You can see that Ybs does in fact take up much less memory than either Yb or Yb2.
0 个评论
更多回答(2 个)
Walter Roberson
2024-10-30,19:58
Sparse memory storage starts with a vector of pointers, with one element per column. The pointer content is 0 (NULL pointer) if the column has no non-zero entries.
For the columns that have non-zero entries, the above pointer points to a block of memory that is a paired list. The initial element of the pair is a row number. The second element of the pair is the value at the indicated row.
You are populating 20000 x 20000 randomly with 5% fill. The probability that at least one row in any given column will be selected is as close to 1 as to make no difference (about 1 minus 3e-446). So the marginal column vector pointer is going to be completely full, occupying 20000 * sizeof(pointer) = 20000*16 = 320000 bytes.
About 1 in 20 columns will be full, so approximately 1000 slots will need to be allocated per column. Each slot needs a 48 bit offset (row numbers can be up to 2^48-1) and an 8 byte value, 14 bytes total. That is provided that the offsets are stored as 48 bit instead of as 64 bit. So you should expect roughly 1000*14*20000 = 280000000 or 1000*16*20000 = 32000000 for the column values. Total 280320000 bytes or 32320000 bytes.
Comparing to the actual memory used, it appears that MATLAB uses the full 8 bytes for the offsets, and in practice was able to save a bit on the marginal vectors.
0 个评论
John D'Errico
2024-10-30,20:36
编辑:John D'Errico
2024-10-30,20:46
I think you missed a zero.
By the way, 5% sparse is NOT at all what I would consider sparse, or for there to often be a gain. Linear algebra wants a matrix to be REALLY sparse, say 5 or 10 elements per row. Then you will see it fly!
And if all you are doing is using that matrix to store data, well, there may be better ways. Inserting new elements into a sparse matrix will be slow.
Now, as to your claim of size.
Afull = unifrnd(-1,1,[20000,20000]);
Afull(Afull < 0.9) = 0;
Asparse = sparse(Afull);
whos Afull Asparse
nnz(Asparse)/numel(Asparse)
So Afull takes up 10 times as much space, but is only 5% full in either case.
The sparse matrix storage has an inefficiency, due to the need to store pointers to the non-zero elements. But it is still 10 times less memory. NOT the same amount of memory.
0 个评论
另请参阅
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!