NetworkX is mostly for graph analysis, PyGraphviz mostly for drawing, and they're designed to work together. However, there's at least one respect in which NetworkX's graph drawing (via MatPlotLib) is superior to PyGraphviz's graph drawing (via Graphviz), namely that NetworkX has a spring layout algorithm (accessible via the spring_layout
function) specifically for directed graphs while PyGraphviz has several spring layout algorithms (accessible via the neato
program, and others) that lay out directed graphs as if they were undirected graphs. The only Graphviz / PyGraphviz layout program that really handles direction in a graph is dot
, but dot
creates hierarchical layouts, not force-directed layouts.
Here is an example that shows the difference between NetworkX and PyGraphviz for spring layouts of directed graphs:
import networkx as nx
import pygraphviz as pgv
import matplotlib.pyplot as ppt
edgelist = [(1,2),(1,9),(3,2),(3,9),(4,5),(4,6),(4,9),(5,9),(7,8),(7,9)]
nxd = nx.DiGraph()
nxu = nx.Graph()
gvd = pgv.AGraph(directed=True)
gvu = pgv.AGraph()
nxd.add_edges_from(edgelist)
nxu.add_edges_from(edgelist)
gvd.add_edges_from(edgelist)
gvu.add_edges_from(edgelist)
pos1 = nx.spring_layout(nxd)
nx.draw_networkx(nxd,pos1)
ppt.savefig('1_networkx_directed.png')
ppt.clf()
pos2 = nx.spring_layout(nxu)
nx.draw_networkx(nxu,pos2)
ppt.savefig('2_networkx_undirected.png')
ppt.clf()
gvd.layout(prog='neato')
gvd.draw('3_pygraphviz_directed.png')
gvu.layout(prog='neato')
gvu.draw('4_pygraphviz_undirected.png')
1_networkx_directed.png:(http://farm9.staticflickr.com/8516/8521343506_0c5d62e013.jpg)
2_networkx_undirected.png:(http://farm9.staticflickr.com/8246/8521343490_06ba1ec8e7.jpg)
3_pygraphviz_directed.png:(http://farm9.staticflickr.com/8365/8520231171_ef7784d983.jpg)
4_pygraphviz_undirected.png:(http://farm9.staticflickr.com/8093/8520231231_80c7eab443.jpg)
The third and fourth figures drawn are basically identical but for the arrowheads (the whole figure has been rotated, but apart from that, there's no difference). However, the first and second figures are differently laid out - and not just because NetworkX's layout algorithm introduces an element of randomness.
Repeatedly running the code above shows that this is not a chance occurrence. NetworkX's spring_layout
function was apparently written on the assumption that if there is an arc from one node to another, the second node should be closer to the centre of the graph than the first (i.e. that if the graph described in edgelist
is directed, node 2 should be closer to node 9 than nodes 1 and 3 are, node 6 should be closer to node 9 than node 4 is, and node 8 should be closer to node 9 than node 7 is; this doesn't always work perfectly as we see from nodes 4 and 5 in the first figure above, but that's a small issue compared to getting both 2 and 9 near the centre and the 'error' from my point of view is very slight). In other words, NetworkX's spring_layout
is both hierarchical and force-directed.
That is a nice feature, because it makes core/periphery structures more obvious in directed graphs (where, depending on the assumptions you're working with, nodes without incoming arcs can be considered to be part of the periphery even if they have large numbers of outgoing arcs). @skyebend has explained below why most layout algorithms treat directed graphs as if they were undirected, but the graphs above show (a) that NetworkX treats them differently, and (b) that it does so in a principled way that is helpful for analysis.
Can this be replicated using PyGraphviz / Graphviz?
Unfortunately the documentation and the commented source code for NetworkX's spring_layout
(actually fruchterman_reingold_layout
) function provide no clue as to why NetworkX produces the result that it does.
This is the result of using PyGraphviz to draw the network using the NetworkX spring_layout
function (see my own answer to this question below).
5_pygraphviz_plus_networkx.png:
(http://farm9.staticflickr.com/8378/8520231183_e7dfe21ab4.jpg)
Okay, I think I figured it out so I'm going to answer my own question. I don't think it can be done in PyGraphviz per se. However, one can instruct PyGraphviz to take the node positions from NetworkX but peg them (using !
) so that the neato
program is prevented from actually doing anything except rubber-stamping the node positions calculated by spring_layout
. Add the following lines of code to the above:
for k,v in pos1.iteritems():
gvd.get_node(k).attr['pos']='{},{}!'.format(v[0]*10,v[1]*10)
gvd.layout(prog='neato')
gvd.draw('5_pygraphviz_plus_networkx.png')
The result is not perfect -- I had to multiply the co-ordinates by 10 in order to stop the nodes from being drawn on top of each other, which is (obviously) a kludge -- but it's an improvement, i.e. the nodes with 0 indegree are on the outside (benefit of laying out with NetworkX) and there are proper arrowheads that don't get swallowed up by the nodes themselves (benefit of drawing with PyGraphviz).
I am aware that this isn't strictly what I asked for, though (i.e. a solution using PyGraphviz / Graphviz itself).
If somebody can provide a better solution I'll be happy!
EDIT: Nobody's provided a better solution to the problem as articulated above, so I'm going to accept my own answer to signal that it actually works. However, I'm also voting up skyebend's answer because - although it doesn't solve the problem - it's a very useful contribution to understanding the underlying issues.