sparse function cannot get integer arrays for the indices
9 次查看(过去 30 天)
显示 更早的评论
Hello, we are working with some pretty large sparse matrices and we found this strange behaviour of the sparse function:
>> sparse(uint32([1 2]),uint32([1 2]),[3 4],2,2)
Undefined function 'sparse' for input arguments of type 'uint32'.
>> sparse(int32([1 2]),int32([1 2]),[3 4],2,2)
Undefined function 'sparse' for input arguments of type 'int32'.
>> sparse(double([1 2]),double([1 2]),[3 4],2,2)
ans =
(1,1) 3
(2,2) 4
Is it not possible to index with integers? Any reason for that?
Thanks, Sergio
0 个评论
回答(2 个)
James Tursa
2016-11-11
The sparse matrix storage has always been the following internally as far as I know (I think back to R2006a is all I have checked ... I have never had access to earlier versions of MATLAB):
Row indexes: An integer type (e.g., size_t)
Column indexes: An integer type (e.g., size_t)*
Data: Either double or byte (for logical)
* Note that the "column indexes" are actually cumulative column number totals, not indexes per se
The fact that the "sparse" function in MATLAB only accepts double indexes is maybe because these indexes (and the associated data) need to get sorted and converted internally in order to build the sparse matrix, and TMW didn't want to write these routines at the low level for all possible combinations of numeric input types. This is all a guess on my part. Regardless, these indexes will get converted to an integer type internally ... they are not stored as double and to my knowledge never have been.
Of course, MATLAB could simply cast the input indexes of any numeric type to double internally (with the user none the wiser) for the sorting etc, but that would necessitate potentially large temporary copies of the input indexes (pre-sorted versions). So rather than doing this and having a hidden additional memory hit in the background, TMW forces the user to do this conversion manually. Again, this is simply conjecture on my part.
The minimum data storage requirement formula for a double m x n sparse matrix with nnz non-zero elements, including the index data, is as follows on a 32-bit system:
bytes = max(nnz,1) * (4 + 8) + (n+1)*4
Which breaks down as follows:
nnz * 4 = Storing the row index of the non-zero elements
nnz * 8 = Storing the non-zero double element values themselves
(n+1)*4 = Storing the cumulative number of non-zero elements thru column
For 64-bit systems using 8-byte integers for the indexing you can replace the 4's above with 8's. This is just the minimum requirements. A sparse matrix can have excess memory allocated beyond the minimum if desired. And for logical sparse matrices, replace the nnz * 8 with just nnz of course (only 1 byte per value).
The only thing that has changed over the years that I am aware of, is that shared data copies of sparse matrices used to only share the data memory, not the index memory. But several years ago (I don't recall the version) that was changed so that the index memory is now also shared.
Guillaume
2016-11-11
Since double can represent exactly all uint32 integers, why is it a problem anyway? You will only start having problems if your indices are greater than flintmax (= 9007199254740992) which is well outside the range of uint32 (but not out of the range of uint64).
4 个评论
Matt J
2016-11-11
编辑:Matt J
2016-11-11
Since sparse arrays accept indices above this value ... they would have to be stored either as double or as uint64 which use the same amount of space.
Yes, but it would be nice if the user could have some freedom to use a smaller type if the matrix size allowed.
I believe internally, matlab use an integer type (size_t ?) to store the integers. That's what the blurb about sparse matrices on this page seems to imply.
That would be a fairly recent thing. I tested this some releases ago with a comparison like the following
>> Af=rand(5e6,3); As=sparse(Af);
>> Whos Af As
Name Size Kilobytes Class Attributes
Af 5000000x3 117188 double
As 5000000x3 234376 double sparse
The memory consumption of the sparse version does seem to have dropped a bit from a factor of 3 difference, so perhaps you're right.
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!