OHKURA Nobuhito, HIRATA Kouichi, KUBOYAMA Tetsuji, HARAO Masateru
IEICE technical report. Theoretical foundations of Computing, 105(273) 25-29, Sep 8, 2005
In this paper, we investigate the q-gram distance for ordered unlabeled trees (trees, for short). First, we formulate a q-gram as simply a tree with q nodes isomorphic to a line graph, and the q-gram distance between two trees as similar as one between two strings. Then, by using the depth sequence based on postorder, we design the algorithm EnumGram to enumerate all q-grams in a tree T with n nodes which runs in O(n^2) time and in O(q) space. Furthermore, we improve it to the algorithm LinearEnumGram which runs in O(qn) time and in O(qd) space, where d is the depth of T. Hence, we can evaluate the q-gram distance D_q(T_1, T_2) between T_1 and T_2 in O(qmax{n_1, n_2}) time and in O(qmax{d_1, d_2}) space, where n_i and d_i are the number of nodes in T_i and the depth of T_i, respectively.