I am writing code in Java that uses an unordered, rooted tree where each node may have any number of child nodes. Given a tree T and a subtree S, I want to be able to find all the subtrees in T that match S (that is all the subtrees in T that are isomorphic to S).
A subtree of T is isomorphic to S, if the nodes of S can be mapped to nodes of T in such a way that the edges of S map to edges in T.
A previous question has been asked on how to find if a tree contains another subtree however I want to be able to find ALL subtrees in T that match S. In addition I want to be able to map from each node in each match in T to the corresponding node in S.
That is, when a match is found, it should be returned not simply as a pointer to the node in T where a tree is rooted that matches S, but the match should be returned as something like a list of pairs of pointers to nodes [(T1,S1),(T2,S2),...(Tn,Sn)] such that T1 is a pointer to a node in T that maps to node S1 in the subtree and so on.
Alternatively simply a list of pairs of values could be returned as each node in tree T and subtree S has a unique integer identifier associated with it.
For example:
Given tree T as follows:
a
/ \
b c
/ \
d e
and subtree S as:
x
/ \
y z
the following list of matches should be returned:
[(a,x),(b,y),(c,z)] [(b,x),(d,y),(e,z)]
A unique match is determined by the set of nodes in T, not the mapping between the nodes in T and S.
So the following match:
[(a,x),(b,z),(c,y)]
is considered to be duplicate of
[(a,x),(b,y),(c,z)]
because they have the same set of nodes from T (a,b,c) so only one of the matches should be returned.
As another example, given tree T:
a
/|\
b c d
and subtree S:
x
/ \
y z
the following list of matches should be returned:
[(a,x),(b,y),(c,z)] [(a,x),(b,y),(d,z)] [(a,x),(c,y),(d,z)]
Can anyone give any example code of how to do this?
Edit (in relation to Chris Kannon's comment):
I'm thinking you want someone to code the answer for you? How far have you gotten? What code have you written? – Chris Kannon 1 hour ago
I have the following code which when run, builds up a list (matchesList) of pointers to nodes in the tree where subtrees are rooted that match the given subtree. However there may be multiple subtrees rooted at the same node and currently each node will only be added at most once to matchesList regardless of how many matches are rooted there.
In addition, I cannot work out how to build up the mapping described above between nodes in the subtree and nodes in the match found in the original tree.
package Example;
import java.util.LinkedList;
import java.util.Vector;
public class PartialTreeMatch {
public static void main(String[] args) {
NodeX testTree = createTestTree();
NodeX searchTree = createSearchTree();
System.out.println(testTree);
System.out.println(searchTree);
partialMatch(testTree, searchTree);
}
static LinkedList<NodeX> matchesList = new LinkedList<NodeX>();
private static boolean partialMatch(NodeX tree, NodeX searchTree) {
findSubTreeInTree(tree, searchTree);
System.out.println(matchesList.size());
for (NodeX n : matchesList) {
if (n != null) {
System.out.println("Found: " + n);
}
}
return false;
}
private static NodeX findSubTreeInTree(NodeX tree, NodeX node) {
if (tree.value == node.value) {
if (matchChildren(tree, node)) {
matchesList.add(tree);
}
}
NodeX result = null;
for (NodeX child : tree.children) {
result = findSubTreeInTree(child, node);
if (result != null) {
if (matchChildren(tree, result)) {
matchesList.add(result);
}
}
}
return result;
}
private static boolean matchChildren(NodeX tree, NodeX searchTree) {
if (tree.value != searchTree.value) {
return false;
}
if (tree.children.size() < searchTree.children.size()) {
return false;
}
boolean result = true;
int treeChildrenIndex = 0;
for (int searchChildrenIndex = 0; searchChildrenIndex < searchTree.children
.size(); searchChildrenIndex++) {
// Skip non-matching children in the tree.
while (treeChildrenIndex < tree.children.size()
&& !(result = matchChildren(tree.children
.get(treeChildrenIndex), searchTree.children
.get(searchChildrenIndex)))) {
treeChildrenIndex++;
}
if (!result) {
return result;
}
}
return result;
}
private static NodeX createTestTree() {
NodeX subTree2 = new NodeX('A');
subTree2.children.add(new NodeX('A'));
subTree2.children.add(new NodeX('A'));
NodeX subTree = new NodeX('A');
subTree.children.add(new NodeX('A'));
subTree.children.add(new NodeX('A'));
subTree.children.add(subTree2);
return subTree;
}
private static NodeX createSearchTree() {
NodeX root = new NodeX('A');
root.children.add(new NodeX('A'));
root.children.add(new NodeX('A'));
return root;
}
}
class NodeX {
char value;
Vector<NodeX> children;
public NodeX(char val) {
value = val;
children = new Vector<NodeX>();
}
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append('(');
sb.append(value);
for (NodeX child : children) {
sb.append(' ');
sb.append(child.toString());
}
sb.append(')');
return sb.toString();
}
}
The code above tries to find all the subgraphs in:
A
/|\
A A A
/ \
A A
which match:
A
/ \
A A
The code successfully detects that there is a match rooted an the top node in first tree and the 3rd child of the first tree. However, there are actually 3 matches rooted at the top node, not just one. In addition, the code does not build up a mapping between nodes in the tree and nodes in the subtree and I cannot work out how to do this.
Can anyone offer any advice on how to do this?
I think your recursive method needs to return a list of partial matches, instead of just a boolean. That would go a long way to solving both your problems (the need to return the list of matches, as well as finding multiple matches).
Java-like pseudocode for the recursive function might look something like this:
findMatches(treeNode, searchNode) {
if searchNode has no children {
// search successful
pairs = [] // empty list
return [pairs] // list of lists
}
else {
matches = [] // empty list
searchChild = first child node of searchNode
searchNode2 = searchNode with searchChild removed
// NOTE: searchNode2 is created by doing a shallow copy of just the node
// (not it's children) and then removing searchChild from the child list.
for each treeChild in treeNode.children {
if treeChild.value == searchChild.value {
treeNode2 = treeNode with treeChild removed // also a shallow copy
childMatches = findMatches(searchChild, treeChild)
nodeMatches = findMatches(treeNode2, searchNode2)
// cross-product
for each nodeMatchPairs in nodeMatches {
for each childMatchPairs in childMatches {
fullMatchPairs = [(searchChild, treeChild)]
+ childMatchPairs + nodeMatchPairs // concatenate lists
add fullMatchPairs to matches
}
}
}
}
return matches
}
}
Notice that this function does not test treeNode.value == searchNode.value, or add this to the list. The caller needs to do that. This function needs to be run at every node of the tree.
As currently designed, it probably uses too much memory, but that could be optimized.