Generate a 100000x100000 matrix that takes less time and memory
13 次查看(过去 30 天)
显示 更早的评论
I have written a random matrix generator code that generates an adjacency matrix of any size. I am targetting larger size like 100kx100k but the problem that I face is the time to generate that (which is related to the RAM memory). It needs ~ 60 GB to do this.
I presume that there has to be a smarter way to do this, like by using a small int instead of a double word or something similar to the datatype. Any help would be appreciated. Thanks
function [a,ed] = Random_graph_genar_function(nodes, connectivity)
for it=1:3
ni = nodes;
ac= connectivity;
mi=(ni*(ni-1))/2;
no=round(mi*ac);
a=zeros(ni,ni);
in=randperm(mi,no); p=1;
for i=1:ni
for j=i+1:ni
if (any(in(:)==p))
a(i,j)=1;
a(j,i)=1;
end
p=p+1;
end
end
p=0;
for i=1:ni
for j=i+1:ni
if (a(i,j)==1)
p=p+1;
ed(1,p)=i;
ed(2,p)=j;
end
end
end
s=sum(a);
mx=max(s)
for i=1:ni
bc(i)=mx-s(i);
end
tbc=sum(bc);
end
end
4 个评论
Walter Roberson
2020-10-20
The percentage of the matrix that is non-zero is random
Is it going to be 0.000378% in one run, but 82.19% in another run? You have no idea what the percentage will be, other than more than 0% and less than 100% ?
采纳的回答
Matt J
2020-10-20
编辑:Matt J
2020-10-20
Seems to me the whole code can be replaced by,
function [a,ed] = Random_graph_genar_function(nodes, connectivity)
a=logical(sprandsym(nodes,connectivity));
a=a-a.*speye(nodes);
G=graph(a);
ed=table2array(G.Edges).';
end
although instead of having the function return a and ed, I suspect that everything you are trying to do is more easily accomplished with the graph object G instead.
8 个评论
Walter Roberson
2020-10-20
编辑:Walter Roberson
2020-10-20
If you have a fixed number of iterations to work with, then you will need to proceed by either
- growing the graph step by step so that at no point are there disconnected points; or
- imposing a maximum distance away from other existing points upon new points, and "reserving" a number of iterations to do nothing but "fix-up" the disconnected subtrees by connecting them to other sub-trees; or
- after the initial iterations, find all disconnected subtrees and move them to be attached to the main tree. If you have a constrained geometry, it might take some searching to find attachment points that satisfy the constraints
I suspect that the first option, growing step by step, is the easiest.
更多回答(2 个)
Ameer Hamza
2020-10-19
If most of the elements equal to zero, then use sparse array: https://www.mathworks.com/help/matlab/ref/sparse.html. You can also try to create uint8 array which will only use 1/8 memory
a=zeros(ni,ni,'uint8');
7 个评论
Walter Roberson
2020-10-19
a=zeros(ni,ni,'uint8');
You are already using only one byte per entry.
If you were to create logic that packed 8 adjacent entries into one byte, you could potentially get 8:1 compression... and would still need 116.4 gigabytes of memory.
Your only hope would be if you could use a sparse() array. See https://www.mathworks.com/matlabcentral/answers/100287-how-much-memory-a-sparse-matrix-created-using-sprand-with-given-number-of-rows-columns-and-density for a guideline to the amount of memory a sparse array uses. I suspect the 16 is 8 bytes for an offset, plus 8 bytes for storage -- so using a sparse logical array possibly only takes 8+1 = 9 bytes per entry (this could be tested.)
Which is to say that if your occupancy is more than about 1/20 then sparse would be less efficient. If you had a target of (say) 16 gigabytes then you would need to be about only 1/1000 occupied.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!