How to count all reachable nodes in a directed graph?

user8142520 picture user8142520 · Jan 22, 2018 · Viewed 8.3k times · Source

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:

directed graph

the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27

the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22

.....

My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.

Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.

Answer

amit picture amit · Jan 22, 2018

This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.

Formally,

G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }

Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.

EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.

Now, a DFS can be used from each node to find the sum of values:

DFS(v):
  if v.visited:
    return 0
  if v is leaf:
    return v.value
  v.visited = true
  return sum([DFS(u) for u in v.children])
  • This is O(V^2 + VE) worst vase, but since the graph has less nodes, V and E are now significantly lower.
  • Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.

A DP solution for this problem (DAG) can be:

D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }

This can be calculated in linear time (after topological sort of the DAG).

Pseudo code:

  1. Find SCCs
  2. Build G'
  3. Topological sort G'
  4. Find D[i] for each node in G'
  5. apply value for all node u_i in U_i, for each U_i.

Total time is O(|V|+|E|).