Bottom-up Subtree Isomorphism for Unordered Labeled Trees

A subtree P of a given tree T is a bottom-up subtree of T if, for each internal vertex u of P, all the children of u in T are also vertices of P. Our problem is deciding if a pattern tree P of m vertices is a bottom-up subtree of a text tree T of n vertices, m <= n, and, in the affirmative case, finding all the occurrences of P in T. If the vertices are labeled, the labels in the corresponding positions in P and T must match. We consider unordered rooted trees with labeled vertices, which are reconducted to simpler ordered trees through an ordering algorithm. Processing time depends on the label format. If the labels are single characters of a constant or of an n-integer alphabet \Sigma (respectively, |\Sigma| = constant, or \Sigma is composed of integers in the range [1 \div n]), the problem is solved in O(m +\log n) time and \Theta(m) additional space, after a preprocessing of T is done in \Theta(n) time and \Theta(n) additional space. The number of occurrences of P in T does not appear in the search time because all such occurrences can be directly reconstructed from a constant output information. For more complex labelsthe running times slightly increase. In particular if the labels are sequences of characters (e.g. as in XML files) the running time becomes a function of the total length of all the labels in T and P. Although directed to the "simple problem" of exact subtree matching, our work is the first one to solve it in linear time for unordered trees. Moreover our approach is the one of dictionary search. Regarding $T$ as a static text on which several queries are made, $P$ as the contents of one such a query, and assuming $m=o(n)$, the response time for each $P$ is sublinear in the size of the overall structure.