Preserving node names in a digraph

Question

Michael 2018-2-28

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/385512-preserving-node-names-in-a-digraph

评论： Christine Tobler 2018-3-5

I am constructing a large digraph with between 10k-100k nodes, in which I want to add, delete, and merge nodes. The nodes represent objects with other externally-stored data, which are indexed numerically, so the nodeIDs must be preserved to reference properly to the related data.

Is there a way of preserving node ids in a graph, other than giving the nodes string names?

In the following code

from_node=[1 1 2 3 4 4 5 6 7 3];
to_node=  [3 2 5 7 6 5 7 7 2 4];
weights=rand(size(from_node));
g=digraph(to_node, from_node, weights);
h=rmnode(g,2);

when you remove node 2, it will reorder the nodes and call some other node 2 unless you specify node names, which must be strings, as such:

from_node=[1 1 2 3 4 4 5 6 7 3];
to_node=  [3 2 5 7 6 5 7 7 2 4];
weights=rand(size(from_node));
names = cellstr(string(1:7));
g=digraph(to_node, from_node, weights,names);
h=rmnode(g,findnode(g,num2str(2)));

This is fine for small graphs, but for very large graphs that must be modified, this is extremely memory-inefficient, since you are forced to store a giant table of strings, which is redundant to your node id names.

Moreover, in this case you will need to do a findnode search each time that involves converting the number to a string, which could also be costly if done many many times.

Therefore, I am wondering if there is a more efficient way of preserving node ids upon insertion/deletion than using the names?

Thanks!!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Walter Roberson 2018-3-1

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/385512-preserving-node-names-in-a-digraph#answer_307786

Convert the node numbers to base 2^16. char() the result. Use those as the strings. For node names that are no larger than 100k then this takes two characters (4 bytes) each (plus any overhead from cell arrays.)

2 个评论
显示无隐藏无

Michael 2018-3-1

How would you suggest doing this efficiently? dec2base only goes up to base 36 and I don't want to impose a strong computational load on this if I have to convert thousands of indices at once.

Thanks!

Walter Roberson 2018-3-1

在 MATLAB Online 中打开

Labels = char(reshape(typecast(uint32(Indices),'uint16').',2,[]).');

请先登录，再进行评论。

Answer 2

Christine Tobler 2018-3-1

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/385512-preserving-node-names-in-a-digraph#answer_307936

编辑：Christine Tobler 2018-3-1

在 MATLAB Online 中打开

Unfortunately, there is no direct way of doing this. The graph and digraph classes are designed to be fast when working on an existing graph, but this came at the cost of being relatively slow when adding and removing nodes one at a time.

To avoid having to convert the numbers to strings, you could construct and maintain two vectors which convert from the external indices to graph indices. For example like this:

maxExtInd = 1e6;
s = [1234 6543 765];
t = [6543 765 1234];
% graph2ext(indexIntoGraph) returns externalIndex
graph2ext = unique([s(:); t(:)]);
% ext2graph(externalIndex) returns indexIntoGraph 
%   (or zero if externalIndex is not in the graph)
ext2graph = sparse(maxExtInd, 1);
ext2graph(graph2ext) = 1:numel(graph2ext);
% Construct the graph:
g = graph(full(ext2graph(s)), full(ext2graph(t)));
graph2ext(g.Edges.EndNodes)
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]
% Add a node:
newNode = 456;
assert(ext2graph(newNode) == 0); % Check the node ID is not already in the graph
g = addnode(g, 1);
graph2ext(end+1) = newNode;
ext2graph(newNode) = numnodes(g);
figure;
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]
% Remove a node:
nodeToRemove = 1234;
graphNodeToRemove = ext2graph(nodeToRemove);
g = rmnode(g, graphNodeToRemove);
graph2ext(graphNodeToRemove) = [];
ext2graph(nodeToRemove) = 0;
ext2graph(ext2graph > graphNodeToRemove) = ext2graph(ext2graph > graphNodeToRemove) - 1;
figure;
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]

2 个评论
显示无隐藏无

Michael 2018-3-2

在 MATLAB Online 中打开

Thank you! I think this is a very nice solution. I'm wondering your thoughts about the trade-off between speed and memory in this particular situation.

In this case, we have to maintain the graph plus a 1xnum_nodes 8-byte double and 1xnum_nodes sparse filled with 8-byte doubles versus a 1xnum_nodes table column of 4-byte 1x2 char arrays. For 1000 entries, factoring in the overhead of the table object, I think it's a ~6.5 x memory savings to keep the table char array. However, we don't have to mess with conversions.

Do you think the search over the sparse/double array will be faster than doing the find_node of the proper node name?

I was able to improve on the speed of previous suggestion for conversion, assuming fewer than 2^32 entries using

char([floor(num./65536) rem(num,65536)])

to convert and

sum(double(cell2mat(nodenames)).*[65536 1],2)

to reverse, but there is certainly overhead using the findnode() functions within the digraph object and conversion to cell arrays of chars needed to use the digraph object.

You mentioned that it is optimized to be fast for operations but slow for manipulation and I see this to be true. When testing out my code, the biggest overhead is in adding an edge which calls expandTable(), which is very costly.

What is it about the table object that makes it optimal to design the graph object using it rather than just defining the nodes as a sparse and the edges as either a binary sparse or double sparse in the case of a weighted digraph? I'm very interested in what data structures are best for what jobs.

Thanks so much!

Christine Tobler 2018-3-5

Hi Michael,

With the table char array, you should factor in not only the cost for each 4-byte char array, but also the additional mxArray header (which for each element of a cell array, specifies its datatype and additional information). Also, the sparse array indexing will do a binary search, which the graph object's findnode is not (currently) doing on its node names.

You're right about expandTable being the main overhead - if there are no node and edges properties (that is, if you store node names and edge weights separately during the loop), this overhead should decrease drastically.

The table object is not used to represent the structure of the graph object internally, we are using it only for the node and edge properties. There, it has the advantage of allowing the storage of properties of arbitrary datatypes in a simple manner. For cases where the graphs are modified many times, there is unfortunately a large overhead associated with the nodes and edges tables.

请先登录，再进行评论。

Preserving node names in a digraph

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（1 个）

2 个评论
显示无隐藏无

另请参阅

类别

标签

产品

Community Treasure Hunt

Preserving node names in a digraph

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（1 个）

2 个评论 显示 无隐藏 无

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无

2 个评论
显示无隐藏无