Curriculum Vitaes

Tetsuji Kuboyama

  (久保山 哲二)

Profile Information

Affiliation
Professor, Computer Centre / Archival Science, Graduate School of Humanities, Gakushuin University
Tokyo Denki University
Degree
Ph.D.(University of Tokyo)

Researcher number
80302660
ORCID ID
 https://orcid.org/0000-0003-1590-0231
J-GLOBAL ID
200901047478411760
researchmap Member ID
5000102916

External link

Education

 1

Papers

 127

Misc.

 131
  • KUBOYAMA Tetsuji, SHIN Kilho
    IPSJ SIG Notes, 2005(91) 9-16, Sep 16, 2005  
    With the rapid growth of semistructured data such as XML and HTML documents on the Internet, efficient methods for comparing, matching and integrating semistructured data are required. Although there have been diversity of these methods recently, no comprehensive framework has been available. We formulated a new framework of approximate tree matching in prior work. In this framework, the semantics of approximate tree matching is defined as the conditions of node-to-node correspondences between two trees. In this paper, we present two efficient algorithms for identifying a class of approximate tree matching from the node-to-node correspondences. These algorithms provide methods for merging two trees based on the node-to-node correspondences in the class identifying process if the two trees can be merged into one trees consistently.
  • OHKURA Nobuhito, HIRATA Kouichi, KUBOYAMA Tetsuji, HARAO Masateru
    IEICE technical report. Theoretical foundations of Computing, 105(273) 25-29, Sep 8, 2005  
    In this paper, we investigate the q-gram distance for ordered unlabeled trees (trees, for short). First, we formulate a q-gram as simply a tree with q nodes isomorphic to a line graph, and the q-gram distance between two trees as similar as one between two strings. Then, by using the depth sequence based on postorder, we design the algorithm EnumGram to enumerate all q-grams in a tree T with n nodes which runs in O(n^2) time and in O(q) space. Furthermore, we improve it to the algorithm LinearEnumGram which runs in O(qn) time and in O(qd) space, where d is the depth of T. Hence, we can evaluate the q-gram distance D_q(T_1, T_2) between T_1 and T_2 in O(qmax{n_1, n_2}) time and in O(qmax{d_1, d_2}) space, where n_i and d_i are the number of nodes in T_i and the depth of T_i, respectively.
  • FUKUSHIMA Hitomi, HAYASHI Hiroshi, HARA Kenzo, KUBOYAMA Tetsuji, SUZUKI Tsuneo, HIRABARU Kiyomitsu, SEZAKI Kaoru, SHIBASAKI Ryosuke
    Monthly journal of the Institute of Industrial Science, University of Tokyo, 57(5) 79-84, Sep, 2005  
  • KUBOYAMA Tetsuji, SHIN Kilho, YASUDA Hiroshi
    IPSJ SIG Notes, 2005(42) 47-54, May 19, 2005  
    With the rapid growth of semistructured data such as XML and HTML documents on the Internet, we need efficient methods for comparing, matching and integrating semistructured data. Although there have been diversity of these methods recently, no comprehensive framework has been available. In this paper, we formulate and provide a new framework for merging semistructured data. In this framework, we firstly find a set of node-to-node correspondences between two trees by approximate matching based on tree edit distance. Then, we merge two trees by overlaying these corresponding nodes, and locating the other nodes so that each ancestor relation in two trees is preserved. This mathematical framework gives a unifying view of matching and merging semistructured data, and is beneficial to efficient implementations.
  • KUBOYAMA Tetsuji, SHIN Kilho, YASUDA Hiroshi
    IPSJ SIG Notes, 2005(42) 47-54, May 19, 2005  
    With the rapid growth of semistructured data such as XML and HTML documents on the Internet, we need efficient methods for comparing, matching and integrating semistructured data. Although there have been diversity of these methods recently, no comprehensive framework has been available. In this paper, we formulate and provide a new framework for merging semistructured data. In this framework, we firstly find a set of node-to-node correspondences between two trees by approximate matching based on tree edit distance. Then, we merge two trees by overlaying these corresponding nodes, and locating the other nodes so that each ancestor relation in two trees is preserved. This mathematical framework gives a unifying view of matching and merging semistructured data, and is beneficial to efficient implementations.
  • 久保山 哲二, 申 吉浩, 安田 浩
    情報処理学会研究報告情報学基礎(FI), 2005(42) 47-54, May 19, 2005  
    インターネット上の XML や HTML 文書等の半構造データの増大にともない、膨大な半構造データを効率よく比較・照合するための手法や、複数の半構造データを統合するための手法が求められている。これまでに半構造データのための様々な照合・結合手法が提案されているものの、一般的なフレームワークが存在しないため同様の手法が独立に繰り返し提案されることも少なくない。本稿では、2 つの半構造データを結合するための一般的かつ理論的なフレームワークを提供する。本手法では、まず、木の編集距離の概念を用いて2つの木の近似照合を行い、2つの木の間で類似しているノードの対応をとる。次に、対応のとれたノード同士を重ね合わせ、その他の全てのノードが、もとの木における階層構造を保つように新しい木を生成する。このような数理的なフレームワークによって、様々な照合・統合手法に統一的な観点を提供し、効率的な実装のための基礎を与える。With the rapid growth of semistructured data such as XML and HTML documents on the Internet, we need efficient methods for comparing, matching and integrating semistructured data. Although there have been diversity of these methods recently, no comprehensive framework has been available. In this paper, we formulate and provide a new framework for merging semistructured data. In this framework, we firstly find a set of node-to-node correspondences between two trees by approximate matching based on tree edit distance. Then, we merge two trees by overlaying these corresponding nodes, and locating the other nodes so that each ancestor relation in two trees is preserved. This mathematical framework gives a unifying view of matching and merging semistructured data, and is beneficial to efficient implementations.
  • 久保山 哲二, 申 吉浩, 安田 浩
    情報処理学会研究報告データベースシステム(DBS), 2005(42) 47-54, May 19, 2005  
    インターネット上の XML や HTML 文書等の半構造データの増大にともない、膨大な半構造データを効率よく比較・照合するための手法や、複数の半構造データを統合するための手法が求められている。これまでに半構造データのための様々な照合・結合手法が提案されているものの、一般的なフレームワークが存在しないため同様の手法が独立に繰り返し提案されることも少なくない。本稿では、2 つの半構造データを結合するための一般的かつ理論的なフレームワークを提供する。本手法では、まず、木の編集距離の概念を用いて2つの木の近似照合を行い、2つの木の間で類似しているノードの対応をとる。次に、対応のとれたノード同士を重ね合わせ、その他の全てのノードが、もとの木における階層構造を保つように新しい木を生成する。このような数理的なフレームワークによって、様々な照合・統合手法に統一的な観点を提供し、効率的な実装のための基礎を与える。With the rapid growth of semistructured data such as XML and HTML documents on the Internet, we need efficient methods for comparing, matching and integrating semistructured data. Although there have been diversity of these methods recently, no comprehensive framework has been available. In this paper, we formulate and provide a new framework for merging semistructured data. In this framework, we firstly find a set of node-to-node correspondences between two trees by approximate matching based on tree edit distance. Then, we merge two trees by overlaying these corresponding nodes, and locating the other nodes so that each ancestor relation in two trees is preserved. This mathematical framework gives a unifying view of matching and merging semistructured data, and is beneficial to efficient implementations.
  • Kuboyama Tetsuji, Shin Kilho, Miyahara Tetsuhiro
    RIMS Kokyuroku, 1426 20-25, Apr, 2005  
  • 久保山 哲二, 申吉浩, 宮原 哲浩
    情報処理学会研究報告数理モデル化と問題解決(MPS), 2005(20) 21-24, Mar 10, 2005  Lead author
    木の近似照合は広い適用領域をもち、半構造化文書やRNA2次構造の類似性判定、XMLのスキーマ発見・統合、プログラムの差分検出をはじめとする様々な分野で、独立に多様なアルゴリズムが提案されている。木の近似照合アルゴリズムの多くは、編集距離による操作的な記述により特徴づけられてきたが、独立して提案されてきたこれらの様々なアルゴリズムの関連性については、ほとんど研究されていない。本論文では、編集距離に基づく既存の様々な木の近似照合を統一的に記述するための数学的モデルを提案する。文字列においては、アラインメントと編集距離が、その計算において等価であるが、これを木に拡張した場合、両者が等価ではなくなることが知られている。 本提案モデルを用いて、木のアラインメントと等価な編集距離のクラスを同定する。すなわち、従来、別々のアルゴリズムであると考えられていた木のアラインメントと less-constrainded 編集距離が等価であることを示す。The notion of tree edit distance provides a unifying framework for measuring distance and finding approximate common patterns between two trees. In prior work, the edit distance measures have been not well-formalized. So the essentially equivalent distance measures have been independently proposed as the measures different from each other. In this paper, we present a theoretical framework for tree edit distance. By using our framework, we establish the relationship between alignment of trees, and a tree edit distance measure called less-constrained edit distance.
  • KUBOYAMA Tetsuji, SHIN Kilho, MIYAHARA Tetsuhiro
    情報処理学会研究報告. MPS, 数理モデル化と問題解決研究報告, 53 21-24, Mar 9, 2005  
  • KUBOYAMA Tetsuji, SHIN Kilho, MIYAHARA Tetsuhiro
    IPSJ SIG Notes, 2005(20) 21-24, Mar 9, 2005  
    The notion of tree edit distance provides a unifying framework for measuring distance and finding approximate common patterns between two trees. In prior work, the edit distance measures have been not well-formalized. So the essentially equivalent distance measures have been independently proposed as the measures different from each other. In this paper, we present a theoretical framework for tree edit distance. By using our framework, we establish the relationship between alignment of trees, and a tree edit distance measure called less-constrained edit distance.
  • KUBOYAMA Tetsuji, SHIN Kilho, MIYAHARA Tetsuhiro
    Technical report of IEICE. PRMU, 104(669) 97-102, Feb 17, 2005  
    A shock tree is a very effective characterization for representing 2D shapes. Torsello and Hancock have proposed an unsupervised learning method of tree structures for shock tree matching. The core procedure of this method is based on merging two trees using a tree edit distance measure. This method, however, does not guarantee that the structure obtained by the merging is also a tree. Then the method adopts an ad-hoc way, which discards the result unless it is a tree, and tries with the other pair of trees. This paper shows the critical condition to obtain a tree structure by merging two trees. The condition is obtained by showing that alignment of trees is identical to a variant of edit distance, called less-constrained edit distance. In addition, we study a condition for yielding a unique supertree of two trees.
  • KUBOYAMA Tetsuji, SHIN Kilho, MIYAHARA Tetsuhiro
    IEICE technical report. Natural language understanding and models of communication, 104(667) 97-102, Feb 17, 2005  
    A shock tree is a very effective characterization for representing 2D shapes. Torsello and Hancock have proposed an unsupervised learning method of tree structures for shock tree matching. The core procedure of this method is based on merging two trees using a tree edit distance measure. This method, however, does not guarantee that the structure obtained by the merging is also a tree. Then the method adopts an ad-hoc way, which discards the result unless it is a tree, and tries with the other pair of trees. This paper shows the critical condition to obtain a tree structure by merging two trees. The condition is obtained by showing that alignment of trees is identical to a variant of edit distance, called less-constrained edit distance. In addition, we study a condition for yielding a unique supertree of two trees.
  • Ohkura Nobuhito, Hirata Kouichi, Kuboyama Tetsuji, Harao Masateru
    Record of JCEEE in Kyushu, 2005 470-470, 2005  
  • 久保山 哲二, 申 吉浩, 宮原 哲浩
    人工知能学会全国大会論文集, 5 181-181, 2005  
    複数の木に共通する木構造パタンを発見するための新しい手法について提案する。本手法では、木の編集距離に基づき、木の近似照合を行い、複数の木のノード間の対応を求めることにより、パタン発見を行う。
  • 久保山 哲二, 申 吉浩, 宮原 哲浩
    人工知能学会全国大会論文集, 19 1-4, 2005  
  • Kuboyama Tetsuji, Shin Kilho, Miyahara Tetsuhiro
    人工知能基本問題研究会, 57 41-46, Nov 4, 2004  
  • KUBOYAMA Tetsuji, MIYAHARA Tetsuhiro
    IEICE technical report. Dependable computing, 104(347) 19-24, Oct 12, 2004  
    Recent research efforts on extracting information from Web pages have mainly focused on semi-automatic and automatic approaches to generating Web wrappers. This paper aim at establishing a structure-based approach to finding a common structured pattern from semistructured data such as HTML documents and XML documents through approximate tree matching by a tree edit distance measure for generating Web wrappers. The common structured pattern is generated by finding a similarity among parsed trees of Web pages, and merging these trees by alignment of trees. Each node of the pattern tree is weighted according to its frequency of occurrence in the tree. We present a method for generating Web wrappers from manually edited Web pages including a number of grammatical mistakes in HTML, redundant or missing fragments.
  • KUBOYAMA Tetsuji, MIYAHARA Tetsuhiro
    IEICE technical report. Data engineering, 104(345) 19-24, Oct 12, 2004  
    Recent research efforts on extracting information from Web pages have mainly focused on semi-automatic and automatic approaches to generating Web wrappers. This paper aim at establishing a structure-based approach to finding a common structured pattern from semistructured data such as HTML documents and XML documents through approximate tree matching by a tree edit distance measure for generating Web wrappers. The common structured pattern is generated by finding a similarity among parsed trees of Web pages, and merging these trees by alignment of trees. Each node of the pattern tree is weighted according to its frequency of occurrence in the tree. We present a method for generating Web wrappers from manually edited Web pages including a number of grammatical mistakes in HTML, redundant or missing fragments.
  • KUBOYAMA Tetsuji, YASUHARA Akira, MIYAHRA Tetsuhiro
    IEICE technical report. Theoretical foundations of Computing, 104(317) 25-32, Sep 10, 2004  
    A number of edit-based distance measures have been proposed to find a structural similarity between trees. Alignment of trees is one of these measures, which gives a common supertree of two trees under minor containment. It is well-known that finding an alignment is equivalent to computing an edit distance for sequences. This equivalence, however, does not hold for trees. In this paper, we establish the relationship between tree edit distance and alignment of trees by showing that the mapping condition for alignment of trees is identical to the condition for a variant of edit distance, called less-constrained edit distance. Moreover, we show a mapping condition for yielding a unique supertree of two trees.
  • 久保山 哲二, 安原 晃, 宮原 哲浩
    人工知能基本問題研究会, 56 67-72, Jul 25, 2004  
  • 久保山 哲二, 宮原 哲浩
    人工知能学会全国大会論文集, 4 259-259, 2004  
    木構造を埋め込める変数を含む順序木で記述した木構造パタンと半構造データの編集距離を計算することにより変数部分の情報抽出を行う手法について提案する。
  • 久保山 哲二, 宮原 哲浩
    人工知能学会全国大会論文集, 18 1-4, 2004  
  • Miyahara Tetsuhiro, Shoudai Takayoshi, Uchida Tomoyuki, Kuboyama Tetsuji, Takahashi Kenichi, Ueda Hiroaki
    IEICE technical report. Artificial intelligence and knowledge-based processing, 99(447) 59-64, Nov 19, 1999  
    We present a method for discovering knowledge from graph structured data, especially tree structured data. A term tree is a data structure which represents a pattern consisting of variables and tree-like structures. A term tree is more powerful than the standard representation of a first order term. In order to realize an efficient knowledge discovery system from tree structured data, we have given a polynomial time algorithm for deciding whether a term tree and a tree are matched or not. We have implemented the matching algorithm. Experimental results show that the matching algorithm is efficient and useful in knowledge discovery from tree structured data Also we discuss an application of the algorithm to a knowledge discovery system.
  • MIYAHARA Tetsuhiro, SHOUDAI Takayoshi, UCHIDA Tomoyuki, KUBOYAMA Tetsuji, TAKAHASHI Kenichi, UEDA Hiroaki
    Proceedings of the Annual Conference of JSAI, 13 389-392, Jun 15, 1999  
  • MATSUMOTO Takashi, KOSHIMURA Miyuki, KUBOYAMA Tetsuji
    Research reports on information science and electrical engineering of Kyushu University, 2(1) 75-80, Mar, 1997  
  • KUBOYAMA Tetsuji, MATSUMOTO Takashi, HASEGAWA Ryuzo
    Research reports on information science and electrical engineering of Kyushu University, 2(1) 81-86, Mar, 1997  
  • 久保山 哲二
    人工知能学会誌 = Journal of Japanese Society for Artificial Intelligence, 9(3) 464-464, May 1, 1994  

Teaching Experience

 15

Research Projects

 33