研究者業績

久保山 哲二

クボヤマ テツジ  (Tetsuji Kuboyama)

基本情報

所属
学習院大学 計算機センター / 人文科学研究科アーカイブズ学専攻 教授
東京電機大学 総合研究所・知能創発研究所 客員教授
学位
博士(工学)(東京大学)

研究者番号
80302660
ORCID ID
 https://orcid.org/0000-0003-1590-0231
J-GLOBAL ID
200901047478411760
researchmap会員ID
5000102916

外部リンク

学歴

 1

論文

 126
  • Kilho Shin, Tetsuji Kuboyama
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 25(5) 1040-1054 2010年9月  査読有り
    Haussler's convolution kernel provides an effective framework for engineering positive semidefinite kernels, and has a wide range of applications. On the other hand, the mapping kernel that we introduce in this paper is its natural generalization, and will enlarge the range of application significantly. Our main theorem with respect to positive semidefiniteness of the mapping kernel (1) implies Haussler's theorem as a corollary, (2) exhibits an easy-to-check necessary and sufficient condition for mapping kernels to be positive semidefinite, and (3) formalizes the mapping kernel so that significant flexibility is provided in engineering new kernels. As an evidence of the effectiveness of our results, we present a framework to engineer tree kernels. The tree is a data structure widely used in many applications, and tree kernels provide an effective method to analyze tree-type data. Thus, not only is the frame work important as an example but also as a practical research tool. The description of the framework accompanies a survey of the tree kernels in the literature, where we see that 18 out of the 19 surveyed tree kernels of different types are instances of the mapping kernel, and examples of novel interesting tree kernels.
  • Taku Aratsu, Kouichi Hirata, Tetsuji Kuboyama
    FUNDAMENTA INFORMATICAE 101(3) 157-171 2010年  査読有り
    This article proposes an approximation of the tree edit distance through the string edit distance for binary tree codes, instead of for Euler strings introduced by Akutsu (2006). Here, a binary tree code is a string obtained by traversing a binary tree representation with two kinds of dummy nodes of a tree in preorder. Then, we show that sigma/2 <= tau <= (h + 1)sigma + h, where tau is the tree edit distance between trees, and sigma is the string edit distance between their binary tree codes and h is the minimum height of the trees.
  • Kengo Yoshida, Tetsuhiro Miyahara, Tetsuji Kuboyama
    2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 5 749-753 2010年  査読有り
    We propose a new genetic programming (GP) approach to extracting multiple tree structured patterns from tree structured data using soft clustering. We use a set of multiple tree structured patterns, called tag tree patterns, as a combined pattern. A structured variable in a tag tree pattern can be substituted by an arbitrary tree. A set of multiple tag tree patterns matches a tree, if at least one of the set of patterns matches the tree. Using soft clustering is appropriate because one tree structured data is allowed to match multiple tag tree patterns. By soft clustering of positive data and by running GP subprocesses on each cluster with negative data, we make a combined pattern which consists of best individuals in GP subprocesses. Experiments on some glycan data show that our method has a support of about 0.8, while the previous method for evolving single patterns has a support of about 0.5.
  • Kilho Shin, Tetsuji Kuboyama
    THEORETICAL COMPUTER SCIENCE 410(19) 1847-1862 2009年4月  査読有り
    Polynomials have proven to be useful tools to tailor generic kernels to specific applications. Nevertheless, we had only restricted knowledge for selecting fertile polynomials which consistently produce positive semidefinite kernels. For example, the well-known polynomial kernel can only take advantage of a very narrow range of polynomials, that is, the univariate polynomials with positive coefficients. This restriction not only hinders intensive exploitation of the flexibility of the kernel method, but also causes misuse of indefinite kernels. Our main theorem significantly relaxes the restriction by asserting that a polynomial consistently produces positive semidefinite kernels, if it has a positive semidefinite coefficient matrix. This sufficient condition is quite natural, and hence, it can be a good characterization of the fertile polynomials. In fact, we prove that the converse of the assertion of the theorem also holds true in the case of degree 1. We also prove the effectiveness of our main theorem by showing three corollaries relating to certain applications known in the literature: the first and second corollaries, respectively, give generalizations of the polynomial kernel and the principal-angle (determinant) kernel. The third corollary shows extended and corrected sufficient conditions for the codon-improved kernel and the weighted-degree kernel with shifts to be positive semidefinite. (C) 2009 Elsevier B.V. All rights reserved.
  • Toshihiko Horiike, Youhei Takahashi, Tetsuji Kuboyama, Hiroshi Sakamoto
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, PROCEEDINGS 5712 472-+ 2009年  査読有り
    In this paper we propose an algorithm, which is an improvement, of identification of wet) communities by [1], to extract research communities from bibliography data. Web graph is huge graph structure consisting nodes and edges, which represent web pages and hyperlinks. An web community is considered to be a set of web pages holding a common topic, in other words, it is a dense subgraph of web graph. Such subgraphs obtained by the max-flow algorithm [1] are called max-flow communities. We then improve this algorithm by introducing the strategy for selection of community nodes. The effectiveness of our improvement is shown by experiments on finding research communities from CiteSeer bibliography data.
  • Taku Aratsu, Kouichi Hirata, Tetsuji Kuboyama
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5433 99-110 2009年  査読有り
    In this paper, we introduce a sibling distance ds for rooted labeled trees as an L 1-distance between their sibling histograms, which consist of the frequencies of every pair of the label of a node and the sequence of labels of its children. Then, we show that δ s gives a constant factor lower bound on the tree edit distance δ such that δ s(T 1, T 2) ≤ 4δ(T1, T2). Next, we design the algorithm to compute the sibling histogram in O(n) time for ordered trees and in O(gn) time for unordered trees, where n and g are the number of nodes and the degree of a tree. Finally, we give experimental results by applying the sibling distance to glycan data. © Springer-Verlag Berlin Heidelberg 2009.
  • Taku Aratsu, Kouichi Hirata, Tetsuji Kuboyama
    Proc. 6th Workshop on Learning with Logics and Logics for Learning 11-18 2009年  
  • Taku Aratsu, Kouichi Hirata, Tetsuji Kuboyama
    Proc. 35th International Conference on Current Trends in Theory and Practice of Computer Science,Lecture Notes in Computer Science 5404 93-104 2009年  
  • Kazuya Sata, Kouichi Hirata, Kimihito Ito, Tetsuji Kuboyama
    Knowledge-Based and Intelligent Information and Engineering Systems, 13th International Conference, KES 2009, Santiago, Chile, September 28-30, 2009, Proceedings, Part II 5712 490-497 2009年  査読有り
  • Masatoshi Nagamine, Tetsuhiro Miyahara, Tetsuji Kuboyama, Hiroaki Ueda, Kenichi Takahashi
    AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS 5360 500-+ 2008年  査読有り
    We propose a new genetic programming approach to extraction of multiple tree structured patterns from tree-structured data using clustering. As a combined pattern we use a set of tree structured patterns, called tag tree patterns. A structured variable in a tag tree pattern can be substituted by an arbitrary tree. A set of tag tree patterns matches a tree, if at least one of the set of patterns matches the tree. By clustering positive data and running GP subprocesses on each cluster with negative data, we make a combined pattern which consists of best individuals in GP subprocesses. The experiments on some glycan data show that our proposed method has a higher support of about 0.8 while the previous method for evolving single patterns has a lower support of about 0.5.
  • Kilho Shin, Tetsuji Kuboyama
    AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS 5360 236-246 2008年  査読有り
    The MAST (maximum agreement subtrees) problem has been extensively studied, and the size of the maximum agreement subtrees between two trees represents their similarity. This similarity measure, however, only takes advantage of a very small portion of the agreement subtrees, that is, the maximum agreement subtrees, and agreement subtrees of smaller size are neglected at all. On the other hand, it is reasonable to consider that the distributions of the sizes of the agreement subtrees may carry useful information with respect to similarity. Based on the notion of the size-of-index-structure-distribution kernel introduced by Shin and Kuboyama, the present paper introduces positive semidefinite tree-kernels, which evaluate distributional features of the sizes of agreement subtrees, and shows efficient dynamic programming algorithms to calculate the kernels. In fact, the algorithms are of O(vertical bar x vertical bar . vertical bar y vertical bar)-time for labeled and ordered trees x and y. In addition, the algorithms are designed so that the agreement subtrees have roots and leaves with labels from predetermined sub-domains of an alphabet. This design will be very useful for important applications such as the XML documents.
  • Taku Aratsu, Kouichi Hirata, Tetsuji Kuboyama
    Proc. ALSIP'08, Working Notes of PAKDD Workshops 5433 101-112 2008年  
  • Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5012 LNAI 184-+ 2008年  査読有り
  • Kilho Shin, Tetsuji Kuboyama
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS 4754 313-+ 2007年  査読有り
    Although polynomials have proven to be useful tools to tailor generic kernels to context of specific applications, little was known about generic rules for tuning parameters (i. e. coefficients) to engineer new positive semidefinite kernels. This not only may hinder intensive exploitation of the flexibility of the kernel method, but also may cause misuse of indefinite kernels. Our main theorem presents a sufficient condition on polynomials such that applying the polynomials to known positive semidefinite kernels results in positive semidefinite kernels. The condition is very simple and therefore has a wide range of applications. In addition, in the case of degree 1, it is a necessary condition as well. We also prove the effectiveness of our theorem by showing three corollaries to it: the first one is a generalization of the polynomial kernels, while the second one presents a way to extend the principal-angle kernels, the trace kernels, and the determinant kernels. The third corollary shows corrected sufficient conditions for the codon-improved kernels and the weighted-degree kernels with shifts to be positive semidefinite.
  • Masatoshi Nagamine, Tetsuhiro Miyahara, Tetsuji Kuboyama, Hiroaki Ueda, Kenichi Takahashi
    AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS 4830 150-+ 2007年  査読有り
    We propose a genetic programming approach to extraction of glycan motifs by using tag tree patterns, which are tree structured patterns with structured variables. A structured variable in a tag tree pattern can be substituted by an arbitrary tree. Our experiments show that we have obtained tag tree patterns as the best individuals including similar substructures of glycan motifs obtained by the previous works.
  • 人工知能学会論文誌 = Transactions of the Japanese Society for Artificial Intelligence : AI 22(2) 140-147 2007年  査読有り
    Learning from tree-structured data has received increasing interest with the rapid growth of tree-encodable data in the World Wide Web, in biology, and in other areas. Our kernel function measures the similarity between two trees by counting the number of shared sub-patterns called tree <I>q</I>-grams, and runs, in effect, in linear time with respect to the number of tree nodes. We apply our kernel function with a support vector machine (SVM) to classify biological data, the glycans of several blood components. The experimental results show that our kernel function performs as well as one exclusively tailored to glycan properties.
  • Nobuhito Ohkura, Kouichi Hirata, Tetsuji Kuboyama, Masateru Harao, Shin-Ichi Nakano
    Proc. 4th Workshop on Learning with Logics and Logics for Learning (LLLL2006) 69-77 2006年  
  • Tetsuji Kuboyama, Kouichi Hirata, Nobuhito Ohkura, Masateru Harao
    Proc. 4th Workshop on Learning with Logics and Logics for Learning (LLLL2006) 77-83 2006年  
  • Kuboyama Tetsuji, Shin Kilho, Miyahara Tetsuhiro
    情報処理学会論文誌. 数理モデル化と応用 46(17) 31-45 2005年12月15日  
    The notion of the tree edit distance provides a unifying framework for measuring distance and finding approximate common patterns between two trees. A diversity of tree edit distance measures have been proposed to deal with tree related problems, such as minor containment, maximum common subtree isomorphism, maximum common embedded subtree, and alignment of trees. These classes of problems are characterized by the conditions of the tree mappings, which specify how to associate the nodes in one tree with the nodes in the other. In this paper, we study the declarative semantics of edit distance measures based on the tree mapping. In prior work, the edit distance measures have been not well-formalized. So the relationship among various algorithms based on the tree edit distance has hardly been studied. Our framework enables us to study the relationship. By using our framework, we reveal the declarative semantics of the alignment of trees, which has remained unknown in prior work.
  • Tetsuji Kuboyama, Tetsuhiro Miyahara, Sachio Hirokawa, Eisuke Itou
    WMSCI 2005 - The 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Proceedings 1 42-47 2005年12月1日  査読有り
    Information extraction from semistructured data such as HTML documents gains importance with the unflagging growth of Web data storage. This paper proposes a structure-based method for extracting Web contents and their metadata from a set of HTML documents generated from a common template, as shown in syllabus and staff data in universities. These HTML documents include a number of grammatical mistakes in HTML, redundant or missing fragments introduced by manual editing. This method first finds a canonical HTML document compliant with the common template. Next, the correspondences of the data between the canonical document and the other documents are identified by an approximate matching algorithm, and aligned according to the correspondences of the data. Experiments have been conducted to extract attribute names for metadata construction and, to align data records from syllabus Web pages in universities.
  • Kuboyama Tetsuji, Shin Kilho, Miyahara Tetsuhiro
    IPSJ Digital Courier 1 654-668 2005年  
    The notion of the tree edit distance provides a unifying framework for measuring distance and finding approximate common patterns between two trees. A diversity of tree edit distance measures have been proposed to deal with tree related problems, such as minor containment, maximum common subtree isomorphism, maximum common embedded subtree, and alignment of trees. These classes of problems are characterized by the conditions of the tree mappings, which specify how to associate the nodes in one tree with the nodes in the other. In this paper, we study the declarative semantics of edit distance measures based on the tree mapping. In prior work, the edit distance measures have been not well-formalized. So the relationship among various algorithms based on the tree edit distance has hardly been studied. Our framework enables us to study the relationship. By using our framework, we reveal the declarative semantics of the alignment of trees, which has remained unknown in prior work.
  • Tetsuji Kuboyama, Kilho Shin, Tetsuhiro Miyahara, Hiroshi Yasuda
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3701 323-337 2005年  査読有り
    The problem of comparing two tree structures emerges across a wide range of applications in computational biology, pattern recognition, and many others. A number of tree edit methods have been proposed to find a structural similarity between trees. The alignment of trees is one of these methods, introduced as a natural extension of the alignment of strings, which gives a common supertree pattern of two trees, whereas tree edit gives a common subtree pattern. It is well known that alignment and edit are two equivalent notions for strings from the computational point of view. This equivalence, however, does not hold for trees. The lack of a theoretical formulation of these notions has lead to confusion. In this paper, we give a theoretical analysis of alignment and edit methods, and show an important relationship, which is the equivalence between the the alignment of trees and a variant of tree edit, called less-constrained edit. © Springer-Verlag Berlin Heidelberg 2005.
  • Nobuhito Ohkura, Kouichi Hirata, Tetsuji Kuboyama, Masateru Harao
    Proc. 8th International Conference on Discovery Science (DS2005), Lecture Notes in Artificial Intelligence 3735 189-202 2005年  
  • T Miyahara, T Uchida, T Shoudai, T Kuboyama, K Takahashi, H Ueda
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E84D(1) 48-56 2001年1月  査読有り
    We present a new method for discovering knowledge from structured data which are represented Ly graphs in the framework of Inductive Logic Programming. A graph, or network, is widely used for representing relations between various data and expressing a small and easily understandable hypothesis. The analyzing system directly manipulating graphs is useful for knowledge discovery. Our method uses Formal Graph System (FGS) as a knowledge representation language for graph structured data. FGS is a kind of logic programming system which directly deals with graphs just like first order terms. And our method employs a refutably inductive inference algorithm as a learning algorithm. A refutably inductive inference algorithm is a special type of inductive inference algorithm with refutability of hypothesis spaces. and is suitable for knowledge discovery. We give a sufficiently large hypothesis space, the set of weakly reducing FGS programs. And we show that this hypothesis space is refutably inferable from complete data. We have designed and implemented a prototype of a knowledge discovery system KD-FGS, which is based on our method and acquires knowledge directly from graph structured data. Finally we discuss the applicability of our method for graph structured data with experimental results on some graph theoretical notions.
  • T. Miyahara, T. Uchida, T. Kuboyama, T. Yamamoto, K. Takahashi, H. Ueda
    Proc. Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining (PAKDD-99) 1574 438-442 1999年4月26日  査読有り
  • T Miyahara, T Shoudai, T Uchida, T Kuboyama, K Takahashi, H Ueda
    INDUCTIVE LOGIC PROGRAMMING 1634 222-233 1999年  査読有り
    We present a method for discovering new knowledge from structural data which are represented by graphs in the framework of inductive logic programming. A graph, or network, is widely used for representing relations between various data and expressing a small and easily understandable hypothesis. Formal Graph System (FGS) is a kind of logic programming system which directly deals with graphs just like first order terms. By employing refutably inductive inference algorithms and graph algorithmic techniques, we are developing a knowledge discovery system KD-FGS, which acquires knowledge directly from graph data by using FGS as a knowledge representation language. In this paper we develop a logical foundation of our knowledge discovery system. A term tree is a pattern which consists of variables and tree-like structures. We give a polynomial-time algorithm for finding a unifier of a term tree and a tree in order to make consistency checks efficiently. Moreover we give experimental results on some graph theoretical notions with the system. The experiments show that the system is useful for finding new knowledge.

MISC

 131

教育業績(担当経験のある科目)

 15

共同研究・競争的資金等の研究課題

 33