Preserving node names in a digraph
5 次查看(过去 30 天)
显示 更早的评论
I am constructing a large digraph with between 10k-100k nodes, in which I want to add, delete, and merge nodes. The nodes represent objects with other externally-stored data, which are indexed numerically, so the nodeIDs must be preserved to reference properly to the related data.
Is there a way of preserving node ids in a graph, other than giving the nodes string names?
In the following code
from_node=[1 1 2 3 4 4 5 6 7 3];
to_node= [3 2 5 7 6 5 7 7 2 4];
weights=rand(size(from_node));
g=digraph(to_node, from_node, weights);
h=rmnode(g,2);
when you remove node 2, it will reorder the nodes and call some other node 2 unless you specify node names, which must be strings, as such:
from_node=[1 1 2 3 4 4 5 6 7 3];
to_node= [3 2 5 7 6 5 7 7 2 4];
weights=rand(size(from_node));
names = cellstr(string(1:7));
g=digraph(to_node, from_node, weights,names);
h=rmnode(g,findnode(g,num2str(2)));
This is fine for small graphs, but for very large graphs that must be modified, this is extremely memory-inefficient, since you are forced to store a giant table of strings, which is redundant to your node id names.
Moreover, in this case you will need to do a findnode search each time that involves converting the number to a string, which could also be costly if done many many times.
Therefore, I am wondering if there is a more efficient way of preserving node ids upon insertion/deletion than using the names?
Thanks!!
0 个评论
采纳的回答
Walter Roberson
2018-3-1
Convert the node numbers to base 2^16. char() the result. Use those as the strings. For node names that are no larger than 100k then this takes two characters (4 bytes) each (plus any overhead from cell arrays.)
更多回答(1 个)
Christine Tobler
2018-3-1
编辑:Christine Tobler
2018-3-1
Unfortunately, there is no direct way of doing this. The graph and digraph classes are designed to be fast when working on an existing graph, but this came at the cost of being relatively slow when adding and removing nodes one at a time.
To avoid having to convert the numbers to strings, you could construct and maintain two vectors which convert from the external indices to graph indices. For example like this:
maxExtInd = 1e6;
s = [1234 6543 765];
t = [6543 765 1234];
% graph2ext(indexIntoGraph) returns externalIndex
graph2ext = unique([s(:); t(:)]);
% ext2graph(externalIndex) returns indexIntoGraph
% (or zero if externalIndex is not in the graph)
ext2graph = sparse(maxExtInd, 1);
ext2graph(graph2ext) = 1:numel(graph2ext);
% Construct the graph:
g = graph(full(ext2graph(s)), full(ext2graph(t)));
graph2ext(g.Edges.EndNodes)
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]
% Add a node:
newNode = 456;
assert(ext2graph(newNode) == 0); % Check the node ID is not already in the graph
g = addnode(g, 1);
graph2ext(end+1) = newNode;
ext2graph(newNode) = numnodes(g);
figure;
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]
% Remove a node:
nodeToRemove = 1234;
graphNodeToRemove = ext2graph(nodeToRemove);
g = rmnode(g, graphNodeToRemove);
graph2ext(graphNodeToRemove) = [];
ext2graph(nodeToRemove) = 0;
ext2graph(ext2graph > graphNodeToRemove) = ext2graph(ext2graph > graphNodeToRemove) - 1;
figure;
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]
2 个评论
Christine Tobler
2018-3-5
Hi Michael,
With the table char array, you should factor in not only the cost for each 4-byte char array, but also the additional mxArray header (which for each element of a cell array, specifies its datatype and additional information). Also, the sparse array indexing will do a binary search, which the graph object's findnode is not (currently) doing on its node names.
You're right about expandTable being the main overhead - if there are no node and edges properties (that is, if you store node names and edge weights separately during the loop), this overhead should decrease drastically.
The table object is not used to represent the structure of the graph object internally, we are using it only for the node and edge properties. There, it has the advantage of allowing the storage of properties of arbitrary datatypes in a simple manner. For cases where the graphs are modified many times, there is unfortunately a large overhead associated with the nodes and edges tables.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Graph and Network Algorithms 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!