研究者業績

久保山 哲二

クボヤマ テツジ  (Tetsuji Kuboyama)

基本情報

所属
学習院大学 計算機センター / 人文科学研究科アーカイブズ学専攻 教授
東京電機大学 総合研究所・知能創発研究所 客員教授
学位
博士(工学)(東京大学)

研究者番号
80302660
ORCID ID
 https://orcid.org/0000-0003-1590-0231
J-GLOBAL ID
200901047478411760
researchmap会員ID
5000102916

外部リンク

学歴

 1

論文

 122
  • Yoshinobu Kawahara, Tetsuji Kuboyama, Hiroshi Sakamoto
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2014 9067 293-295 2015年  査読有り
  • Yohei Nasu, Naoki Kishikawa, Kei Tashima, Shin Kodama, Yasunobu Imamura, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    ICPRAM 2015 - Proceedings of the International Conference on Pattern Recognition Applications and Methods, Volume 1, Lisbon, Portugal, 10-12 January, 2015. 354-359 2015年  査読有り
  • Issei Hamada, Takaharu Shimada, Daiki Nakata, Kouichi Hirata, Tetsuji Kuboyama
    ICPRAM 2015 - Proceedings of the International Conference on Pattern Recognition Applications and Methods, Volume 1, Lisbon, Portugal, 10-12 January, 2015. 342-347 2015年  査読有り
  • Takako Hashimoto, Dave Shepard, Tetsuji Kuboyama, Kilho Shin
    2015 IEEE International Conference on Data Mining Workshop (ICDMW) 7-12 2015年  査読有り
    Social media offers a wealth of insight into how significant events-such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing-affect individuals. The scale of available data, however, can be intimidating: during the Great East Japan Earthquake, over 8 million tweets were sent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent Semantic Analysis, Latent Dirichlet Allocation, or graph community detection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin with word count vectors of authors and words for each time slot (in our case, every hour), we extract discriminative words from each slot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance between each time slot and the initial point remains high while an event is happening, yet declines sharply when the event ends, offering an accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrate our method's effectiveness, we extract events from a dataset of over two hundred million tweets sent in the 21 days following the Great East Japan Earthquake. With CWC, we can identify events from this dataset with great speed and accuracy.
  • Tomoya Yamazaki, Akihiro Yamamoto, Tetsuji Kuboyama
    DISCOVERY SCIENCE, DS 2015 9356 316-323 2015年  査読有り
    We propose novel principal component analysis (PCA) for rooted labeled trees to discover dominant substructures from a collection of trees. The principal components of trees are defined in analogy to the ordinal principal component analysis on numerical vectors. Our methods substantially extend earlier work, in which the input data are restricted to binary trees or rooted unlabeled trees with unique vertex indexing, and the principal components are also restricted to the form of paths. In contrast, our extension allows the input data to accept general rooted labeled trees, and the principal components to have more expressive forms of subtrees instead of paths. For this extension, we can employ the technique of flexible tree matching; various mappings used in tree edit distance algorithms. We design an efficient algorithm using top-down mappings based on our framework, and show the applicability of our algorithm by applying it to extract dominant patterns from a set of glycan structures.
  • Kilho Shin, Tetsuji Kuboyama, Takako Hashimoto, Dave Shepard
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA 61-67 2015年  査読有り
    Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain phenomena, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, advanced feature selection algorithms that can avoid redundant selection of features and can detect interacting features require heavy computation in general and hence are seldom used for big data analysis. To eliminate this limitation, we tried to improve the run-time performance of two of the most advanced feature selection algorithms known in the literature. We have developed two accurate and extremely fast algorithms, namely Super CWC and Super LCC. In experiments with multiple real datasets which are actually studied in big data research, we have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, for two datasets, one with 15,568 instances and 15,741 features and another with 200,569 instances and 99,672 features, Super-CWC performed feature selection in 1.4 seconds and in 405 seconds, respectively. This is a remarkable improvement, because it is estimated that the original algorithms would need several hours to a few ten days to perform feature selection on the same datasets.
  • Takako Hashimoto, Tetsuji Kuboyama, Basabi Chakraborty
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1145-1150 2015年  査読有り
    Social media offers a wealth of insight into how significant topics-such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing-affect individuals. The scale of available data, however, can be intimidating: during the Great East Japan Earthquake, over 8 million tweets were sent each day from Japan alone. Conventional word vector-based social media analysis method using Latent Semantic Analysis, Latent Dirichlet Allocation, or graph community detection often cannot scale to such a large volume of data due to their space and time complexity. To overcome the scalability problem, in this paper, high performance Singular Vector Decomposition (SVD) library redsvd has been used to identify topics over time from the huge data set of over two hundred million tweets sent in the 21 days following the Great East Japan Earthquake. While we begin with word count vectors of authors and words for each time slot (in our case, every hour), authors' clusters from each slot are extracted by SVD and k-means. And then, the original fast feature selection algorithm named CWC has been used to extract discriminative words from each cluster. As a result, authors' clusters recognized as topics as well as issues of conventional social media analysis method for big data can be visualized overcoming the scalability problem.
  • Yuto Ouchiyama, Tetsuhiro Miyahara, Yusuke Suzuki, Tomoyuki Uchida, Tetsuji Kuboyama, Fumiya Tokuhara
    2015 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (IWCIA) PROCEEDINGS 95-101 2015年  査読有り
    Machine learning and data mining from graph structured data have been studied intensively. Many chemical compounds can be expressed by outerplanar graphs. We use block preserving outerplanar graph patterns having structured variables for expressing structural features of outerplanar graphs. We propose a learning method for acquiring characteristic block preserving outerplanar graph patterns from positive and negative outerplanar graph data, by using Genetic Programming and tree representation of block preserving outerplanar graph patterns. We report experimental results on applying our method to synthetic outerplanar graph data.
  • Ryohei Saito, Tetsuji Kuboyama, Hiroshi Yasuda
    International Journal of Computational Science and Engineering 11(3) 249-258 2015年  査読有り
    This paper proposes a novel framework for modelling user behaviour from low-level computer usage logs aiming to find working patterns and behaviours of employees at work. The logs we analyse are recorded in individual computers for employees in a company, and include active window transitions on display. Our framework consists of three levels of abstraction: 1) modelling user behaviour patterns by hidden Markov models 2) clustering user behaviour models by kernel principal component analysis with a graph kernel 3) extracting common patterns from clusters. The experimental results show that our method reveals implicit user behaviour at a high level of abstraction, and allows us to understand individual user behaviour among groups, and over time.
  • Haruno Kataoka, Yohei Ogawa, Isao Echizen, Tetsuji Kuboyama, Hiroshi Yoshiura
    2014 NINTH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY (ARES) 461-467 2015年  査読有り
    Personal data to be used for sophisticated data-centric services is typically anonymized to ensure the privacy of the underlying individuals. However, the degree of protection against de-anonymization is uncertain because deanonymization has not been explicitly modelled for scientific analysis and because the information that attackers might use is not well defined given that they can use a wide variety of external information sources such as publically accessible information on the web. We have developed a system that deanonymizes anonymous posts on social networks with a considerably high precision rate. We analysed this system and its behaviour and, on the basis of our findings, we clarified a de-anonymization model and used it to clarify the effects of external information on anonymity. We also identified the limitation of anonymization under conditions of external information being available and clarified the role of transparency can play in controlling the use of personal data.
  • Shohei Nakai, Tetsuhiro Miyahara, Yusuke Suzuki, Tetsuji Kuboyama, Tomoyuki Uchida
    2014 IEEE 7th International Workshop on Computational Intelligence and Applications, IWCIA 2014 - Proceedings 113-118 2014年12月16日  査読有り
    Knowledge acquisition from tree structured data is an important task in machine learning and data mining. We propose a learning method for acquiring characteristic sets of tree patterns with VLDC's from positive and negative tree structured data by using Genetic Programming and tree edit distance. We report experimental results on applying our method to glycan data.
  • Yoshiyuki Yamamoto, Kouichi Hirata, Tetsuji Kuboyama
    INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE 25(3) 307-329 2014年4月  査読有り
    In this paper, we investigate the problem of computing structural sensitive variations of an unordered tree edit (listance. First, we focus on the variations tractable by the algorithms including the submodule of a network algorithm, either the minimum cost maximum flow algorithm or the maximum weighted bipartite matching algorithm. Then, we show that both network algorithms are replaceable, and hence the time complexity of computing these variations can be reduced to (nrnd) time, where it is the number of nodes in a tree, m is the number of nodes in another tree and d is the in degree of given two trees. Next, we show that the problem of computing the bottom-up distance is MAX SNP-hard. Note that the well linear tune algorithm for the bottom -up distance designed by Valiente (2001) computes just a bottom-up iode (insertion-deletion) distance allowing no substitutions.
  • Takako Hashimoto, Basabi Chakraborty, Tetsuji Kuboyama, Yukari Shirota
    Frontiers in Artificial Intelligence and Applications 260 200-212 2014年  査読有り
    This paper proposes a time series topic extraction method to investigate the transitions of people's needs after the East Japan Great Earthquake using latent semantic analysis. Our target data is a blog about afflicted people's needs provided by a non-profit organization in Tohoku, Japan. The method crawls blog messages, extracts terms, and forms document-term matrix over time. Then, the method adopts the latent semantic analysis and extract hidden topics (people's needs) over time. In our previous work, we already proposed the graph-based topic extraction method using the modularity measure. Our previous method could visualize topic structure transition, but could not extract clear topics. In this paper, to show the effectiveness of our proposed method, we provide the experimental results, and compare them with our previous method's results. © 2014 The authors and IOS Press.
  • Kilho Shin, Tetsuji Kuboyama
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE (JSAI-ISAI 2013) 8417 337-351 2014年  査読有り
    Tree kernels are an effective method to capture the structural information of tree data of various applications and many algorithms have been proposed. Nevertheless, we do not have sufficient knowledge about how to select good kernels. To answer this question, we focus on 32 tree kernel algorithms defined within a certain framework to engineer positive definite kernels, and investigate them under two different parameter settings. The result is amazing. Three of the 64 tree kernels outperform the others, and their superiority proves statistically significant through t-tests. These kernels include the benchmark tree kernels proposed in the literature, while many of them are introduced and tested for the first time in this paper.
  • Issei Hamada, Takaharu Shimada, Daiki Nakata, Kouichi Hirata, Tetsuji Kuboyama
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE (JSAI-ISAI 2013) 8417 321-336 2014年  査読有り
    In this paper, we introduce an agreement subtree mapping kernel counting all of the agreement subtree mappings and design the algorithm to compute it for phylogenetic trees, which are unordered leaflabeled full binary trees, in quadratic time. Then, by applying the agreement subtree mapping kernel to trimmed phylogenetic trees obtained from all the positions in nucleotide sequences for A (H1N1) influenza viruses, we classify pandemic viruses from non-pandemic viruses and viruses in one region from viruses in the other regions. On the other hand, for leaf-labeled trees, we show that the problem of counting all of the agreement subtree mappings is #P-complete.
  • Kouichi Hirata, Tetsuji Kuboyama, Takuya Yoshino
    New Frontiers in Artificial Intelligence - JSAI-isAI 2014 Workshops, LENLS, JURISIN, and GABA, Kanagawa, Japan, October 27-28, 2014, Revised Selected Papers 317-330 2014年  査読有り
  • Takako Hashimoto, Basabi Chakraborty, Supavadee Aramvith, Tetsuji Kuboyama, Yukari Shirota
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1-6 2014年  査読有り
    After the East Japan Great Earthquake happened on Mar. 11, 2011, many affected people who lost houses, jobs and families fell into difficulties. Governmental agencies and NPOs supported them by offering relief supplies, foods, evacuation centers and temporary houses. When various supports were offered to affected people, if Governmental agencies and NPOs could detect their needs appropriately, it was effective for supporting them. This paper proposes the method to extract affected people's needs from Social Media after the Earthquake and analyze their needs changes over time. We target the blog that expressed thoughts, requirements and complaints of affected people, and adopt the Latent Dirichlet. Allocation (LDA) that is one of popular techniques for topic extraction. We then compare the analysis result with affected people's actual situation and real events and evaluate the effectiveness of our method. In addition, we evaluate the effectiveness as the method that can help decision making for providing appropriate supports to affected people.
  • Masaya Nakahara, Shirou Maruyama, Tetsuji Kuboyama, Hiroshi Sakamoto
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E96D(3) 457-464 2013年3月  査読有り
    A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar deriving the string deterministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a grammar as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor based on edit-sensitive parsing (ESP) is especially suitable for detecting maximal common substrings as well as long frequent substrings. Based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove several lower bounds to guarantee the length of extracted patterns. We also examine the performance of our algorithm by experiments in biological sequences and other compressible real world texts. Compared to other practical algorithms, our algorithm is faster and more scalable with large and repetitive strings.
  • Takako Hashimoto, Tetsuji Kuboyama, Yukari Shirota
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIV 251 110-+ 2013年  査読有り
    Once a disaster occurs, people discuss various topics in social media such as electronic bulletin boards, SNSs and video services, and their decision-making tends to be affected by discussions in social media. Under the circumstance, a mechanism to detect topics in social media has become important. This paper targets the East Japan Great Earthquake, and proposes a time series topic detection method based on modularity measure which shows the quality of a division of a network into modules or communities. Our proposed method clarifies emerging topics from social media messages by computing the modularity and analyzing them over time, and visualizes topic structures. An experimental result by actual social media data about the East Japan Great Earthquake is also shown.
  • Shoichi Higuchi, Tetsuji Kuboyama, Takako Hashimoto, Kouichi Hirata
    2013 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2013 187-192 2013年  査読有り
    In this paper, we design a new method to explore the social context as a community mapping from a buzz marketing site. In this method, after extracting significant topical terms from messages in buzz marketing sites, first we construct a snapshot co-occurrence network at each time stamp. Next, we organize topic hierarchical structures from each co-occurrence network by using the modularity. Then, we explore a community mapping as an LCA-preserving mapping between topic hierarchical structures and a topic mapping as a correspondence in a community mapping. Hence, we can extract a topic transition as topic mappings for the same topic. Finally, we give experimental results related to the East Japan Great Earthquake in the buzz marketing site. © 2013 IEEE.
  • Hiroshi Sakamoto, Tetsuji Kuboyama
    Smart Innovation, Systems and Technologies 24 115-145 2013年  査読有り
    We explain recent studies on pattern extraction from large-scale graphs. Patterns are represented explicitly and implicitly. Explicit patterns are concrete subgraphs defined in graph theory, e.g., clique and tree. For an explicit model of patterns, we introduce notable fast algorithms for finding all frequent patterns. We also confirm that these problems are closely related to traditional problems in data mining. On the other hand, implicit patterns are defined by statistical factors, e.g., modularity, betweenness, and flow determining optimal hidden subgraphs. For both models, we give an introductory survey focusing on notable pattern extraction algorithms.
  • Takako Hashimoto, Tetsuji Kuboyama, Basabi Chakraborty
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON) 2013年  査読有り
    This paper proposes a time series topic detection method to investigate changes in afflicted people's needs after the East Japan Great Earthquake using latent semantic analysis and singular vectors' pattern similarities. Our target data is a blog about afflicted people's needs provided by a non-profit organization in Tohoku, Japan. The method crawls blog messages, extracts terms, and forms document-term matrix over time. Then, it adopts the latent semantic analysis to extract people's needs as hidden topics from each snapshot matrix. We form time series hidden topic-term matrix as 3rd order tensor, so that changes in topics (people's needs) are detected by investigating time-series similarities between hidden topics. In this paper, to show the effectiveness of our proposed method, we also provide the experimental results.
  • Shohei Nakai, Tetsuhiro Miyahara, Tetsuji Kuboyama, Tomoyuki Uchida, Yusuke Suzuki
    2013 SECOND IIAI INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2013) 147-151 2013年  査読有り
    Knowledge discovery from structured data is an important task in machine learning and data mining. We propose a learning method for acquiring characteristic tree patterns with VLDC's from positive and negative tree structured data by using Genetic Programming and tree edit distance. We report experimental results on applying our method to glycan data.
  • Takaharu Shimada, Issei Hamada, Kouichi Hirata, Tetsuji Kuboyama, Kouki Yonezawa, Kimihito Ito
    2013 SECOND IIAI INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2013) 129-134 2013年  査読有り
    A trim distance is a measure for comparing positions in nucleotide sequences based on phylogenetic trees. In this paper, first we formulate another trim distance based on the MAST distance, in contrast to the previous trim distance based on the LCA-preserving distance. Next, we apply a group average method in agglomerative hierarchical clustering to the positions in nucleotide sequences by the trim distances to nucleotide sequences of influenza A (HINI) viruses at 2008 as non-pandemic viruses and at 2009 as pandemic viruses, with introducing the following two methods of clustering. The clustering of separated positions is one method of clustering whose positions in nucleotide sequences at 2008 are separated from those at 2009. The clustering of mixed positions is another method of clustering whose positions in nucleotide sequences at 2008 are mixed with those at 2009.
  • Li, Y., Kuboyama, T., Sakamoto, H.
    Frontiers in Artificial Intelligence and Applications 255 287-295 2013年  査読有り
  • Yoshiyuki Yamamoto, Kouichi Hirata, Tetsuji Kuboyama
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7258 211-223 2012年  査読有り
    The problem of computing the standard edit distance between unordered trees is known to be intractable. To circumvent this hardness result, several tractable variations have been proposed. The algorithms of these variations include the submodule of a network algorithm, either the minimum cost maximum flow algorithm or the maximum weighted bipartite matching algorithm. In this paper, we point out that these network algorithms are replaceable, and give the experimental results of computing these variations with both network algorithms. © 2012 Springer-Verlag.
  • Yukari Shirota, Takako Hashimoto, Tetsuji Kuboyama
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII 237 271-286 2012年  査読有り
    This paper presents a concept model for solving bond mathematics problems; it is the manner in which our knowledge concerning bond mathematics is organized into a cognitive architecture. We build a concept model for bond mathematics to achieve good organization and to integrate the knowledge for bond mathematics. The ultimate goal is to enable many students to understand the solution process without difficulty. Our concept model comprises entity-relationship diagrams, and using our concept models, students can integrate financial theories and mathematical formulas. This paper illustrates concept models for bond mathematics by showing concrete examples of word mathematics problems. It also describes our principles in developing the concept model and the descriptive power of our model.
  • Takako Hashimoto, Tetsuji Kuboyama, Yukari Shirota
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII 237 259-270 2012年  査読有り
    This paper proposes a method to discover consumer behavior from buzz marketing sites. For example, in 2009, the super-flu virus spawned significant effects on various product marketing domains around the globe. Using text mining technology, we found a relationship between the flu pandemic and the reluctance of consumers to buy digital single-lens reflex camera. We could easily expect more air purifiers to be sold due to flu pandemic. However, the reluctance to buy digital single-lens reflex cameras because of the flu is not something we would have expected. This paper applies text mining techniques to analyze expected and unexpected consumer behavior caused by a current topic like the flu. The unforeseen relationship between a current topic and products is modeled and visualized using a directed graph that shows implicit knowledge. Consumer behavior is further analyzed based on the time series variation of directed graph structures.
  • Yushi Nakamura, Toshihiko Horiike, Tetsuji Kuboyama, Hiroshi Sakamoto
    International Journal of Knowledge-Based and Intelligent Engineering Systems 16(1) 25-34 2012年  査読有り
    We develop a research community extraction algorithm from large bibliographic data, which was preliminarily reported in Horiike et al. [10] and Nakamura et al. [18]. A research community in bibliographic data is considered to be a set of the linked texts holding a common topic, in other words, it is a dense subgraph embedded in the directed graph. Our method is based on the maximum flow algorithm for finding web communities by Flake et al. [5]. We propose improvements of the algorithm to select community nodes and initial seeds taking account of the restriction that any directed graph is acyclic. We examine the improved algorithm for the list of keywords frequently appearing in the bibliographic data. In addition we propose a simple method to extract characteristic keywords for deciding initial seed nodes. This method is also evaluated by experiments. © 2012 -IOS Press and the authors. All rights reserved.
  • Kilho Shin, Tetsuji Kuboyama
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS 1690-1695 2012年  査読有り
    Positive definiteness and computational efficiency are two important factors in designing kernels. In this paper, we focus on kernels for trees, and show an important and useful technique to design positive definite tree kernels that can be computed efficiently based on the dynamic programming methodology. We call the technique dynamic labeling. In combination with the partitionable string kernel technique, this technique yields a useful framework to design tree kernels that has a much wider range of application compared with the framework known in the literature.
  • Tetsuhiro Miyahara, Tetsuji Kuboyama
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS 1684-1689 2012年  査読有り
    We apply a genetic programming approach to extraction of glycan motifs by using tag tree patterns and various fitness functions. Tag tree patterns obtained from some glycan data show characteristic tree structures. We consider the effects of using various fitness functions on obtained glycan motifs.
  • Daisuke Kimura, Tetsuji Kuboyama, Tetsuo Shibuya, Hisashi Kashima
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I 6634 62-74 2011年  査読有り
    Kernel method is one of the promising approaches to learning with tree-structured data, and various efficient tree kernels have been proposed to capture informative structures in trees. In this paper, we propose a new tree kernel function based on "subpath sets" to capture vertical structures in rooted unordered trees, since such tree-structures are often used to code hierarchical information in data. We also propose a simple and efficient algorithm for computing the kernel by extending the multikey quicksort algorithm used for sorting strings. The time complexity of the algorithm is O((|T-1| + |T-2|) log(|T-1| + |T-2|)) time on average, and the space complexity is O(|T-1| + | T-2|), where |T-1| and |T-2| are the numbers of nodes in two trees T-1 and T-2. We apply the proposed kernel to two supervised classification tasks, XML classification in web mining and glycan classification in bioinformatics. The experimental results show that the predictive performance of the proposed kernel is competitive with that of the existing efficient tree kernel for unordered trees proposed by Vishwanathan et al. [1], and is also empirically faster than the existing kernel.
  • Tetsuji Kuboyama, Takako Hashimoto, Yukari Shirota
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II 6882 73-83 2011年  査読有り
    This paper proposes a text mining method for detecting drastic changes of consumer behavior over time from buzz marketing sites, and applies it to finding the effects of the flu pandemic on consumer behavior in various marketing domains. It is expected that more air purifiers are sold due to the pandemic, and it is, actually, observed. By using our method, we reveal an unexpected relationship between the flu pandemic and the reluctance of consumers to buy digital single-lens reflex camera. Our method models and visualizes the relationship between a current topic and products using a graph representation of knowledge generated from the text documents in a buzz marketing site. The change of consumer behavior is detected by quantifying the difference of the graph structures over time.
  • Tomotaka Okuno, Masatsugu Ichino, Tetsuji Kuboyama, Hiroshi Yoshiura
    Proceedings - 7th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2011 53-56 2011年  査読有り
    Various types of personal information are accessible through the Web, and they can be accessed in number of ways. Linking of the personal information obtained with other personal information can lead to a serious violation of privacy. To address this problem, we developed a method that matches the information individuals place in a profile to the information they post on social media sites. It uses an algorithm we developed for quantifying the correlation between a word and a document even if the word itself does not appear in document. © 2011 IEEE.
  • Kilho Shin, Marco Cuturi, Tetsuji Kuboyama
    Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011 961-968 2011年  査読有り
  • Ryohei Saito, Tetsuji Kuboyama, Yuta Yamakawa, Hiroshi Yasuda
    DATABASES IN NETWORKED INFORMATION SYSTEMS 7108 162-+ 2011年  査読有り
    This paper proposes a novel method for analyzing PC usage logs aiming to find working patterns and behaviors of employees at work. The logs we analyze are recorded at individual PCs for employees in a company, and include active window transitions. Our method consists of two levels of abstraction: (1) task summarization by HMM: (2) user behavior comparison by kernel principle component analysis based on a graph kernel. The experimental results show that our method reveals implicit user behavior at a high level of abstraction, and allows us to understand individual user behavior among groups, and over time.
  • Takako Hashimoto, Tetsuji Kuboyama, Yukari Shirota
    DATABASES IN NETWORKED INFORMATION SYSTEMS 7108 147-+ 2011年  査読有り
    This paper proposes a method to detect unexpected correlation from between a current topic and products word of mouth in buzz marketing sites, which will be part of a new approach to marketing analysis. For example, in 2009, the super-flu virus spawned significant effects on various product marketing domains around the globe. In buzz marketing sites, there had been a lot of word of mouth about the "flu." We could easily expect an "air purifier" to be correlated to the "flu" and air purifiers' shipments had grown according to the epidemic of flu. On the other hand, the relatedness between the "flu" and a "camera" could not be easily expected. However, in Japan, consumers' unforeseen behavior like the reluctance to buy digital cameras because of cancellations of a trip, a PE festival or other events caused by the epidemic of flu had appeared, and a strong correlation between the "flu" and "camera" had been found. Detecting these unforeseen consumers' behavior is significant for today's marketing analysis. In order to detect such unexpected relations, this paper applies the dynamic time warping techniques. Our proposed method computes time series correlations between a current topic and unspecified products from word of mouth of buzz marketing sites, and finds product candidates which have unexpected correlation with a current topic. To evaluate the effectiveness of the method, the experimental results for the current topic ("flu") and products ("air purifier", "camera", "car", etc.) are shown as well. By detecting unexpected relatedness from buzz marketing sites, unforeseen consumer behaviors can be further analyzed.
  • Masaya Nakahara, Shirou Maruyama, Tetsuji Kuboyama, Hiroshi Sakamoto
    DISCOVERY SCIENCE 6926 236-+ 2011年  査読有り
    A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar (CFG) deriving the string deterministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a CFG as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor by Sakamoto et al. (2009) is especially suitable for detecting maximal common substrings as well as long frequent substrings. This is made possible thanks to the characteristics of edit-sensitive parsing (ESP) by Cormode and Muthukrishnan (2007), which was introduced to approximate a variant of edit distance. Based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove a lower bound for the length of extracted frequent patterns. We also examine the performance of our algorithm by experiments in DNA sequences and other compressible real world texts. Compared to the practical algorithm developed by Uno (2008), our algorithm is faster with large and repetitive strings.
  • Kouichi Hirata, Yoshiyuki Yamamoto, Tetsuji Kuboyama
    COMBINATORIAL PATTERN MATCHING, 22ND ANNUAL SYMPOSIUM, CPM 2011 6661 402-415 2011年  査読有り
    Zhang and Jiang (1994) have shown that the problem of finding an edit distance between unordered trees is MAX SNP-hard. In this paper, we show that this problem is MAX SNP-hard, even if (1) the height of trees is 2, (2) the degree of trees is 2, (3) the height of trees is 3 under a unit cost, and (4) the degree of trees is 2 under a unit cost.
  • Takako Hashimoto, Tetsuji Kuboyama, Yukari Shirota
    2011 IEEE REGION 10 CONFERENCE TENCON 2011 133-137 2011年  査読有り
    When a disaster occurred, people try to acquire information through social media such as electronic bulletin boards, SNSs, and video services. Their decision-making tend to be affected by social media information. Sometimes, social media information is useful, so people can get valuable knowledge. On the other hand, social media information is occasionally unreliable. Harmful rumors spread and cause people to panic. In today's information oriented society, a mechanism to detect rumor information in social media has become very important. This paper proposes a framework to detect the rumor information in social media. Our proposed framework clarifies topics in social media, visualizes topic structures in time series variation. Then it extracts rumor candidates and seeks related information from other media such as TV program, newspapers and so on in order to confirm the reliability of rumor candidates. By our framework, potential rumors will be shown.
  • 木村 大翼, 久保山 哲二, 渋谷 哲朗, 鹿島 久嗣
    人工知能学会論文誌 26(3) 473-482 2011年  
    Kernel method is one of the promising approaches to learning with tree-structured data, and various efficient tree kernels have been proposed to capture informative structures in trees. In this paper, we propose a new tree kernel function based on ``subpath sets'' to capture vertical structures in tree-structured data, since tree-structures are often used to code hierarchical information in data. We also propose a simple and efficient algorithm for computing the kernel by extending the Multikey quicksort algorithm used for sorting strings. The time complexity of the algorithm is O((|T_1|+|T...
  • Tetsuji Kuboyama, Kouichi Hirata
    Proc. 7th Workshop on Learning with Logics and Logics for Learning 34-41 2011年  
  • Yoshiyuki Yamamoto, Kouichi Hirata, Tetsuji Kuboyama
    Proc. 7th Workshop on Learning with Logics and Logics for Learning 26-33 2011年  
  • Kazuya Sata, Kouichi Hirata, Kimihito Ito, Tetsuji Kuboyama
    Annual International Conference on BioInformatics and Computational Biology (BICB 2011) 2011年  
  • Tetsuji Kuboyama, Kimihito Ito, Kouichi Hirata, Hiroshi Sakamoto
    Annual International Conference on BioInformatics and Computational Biology (BICB 2011) 2011年  
  • Kilho Shin, Tetsuji Kuboyama
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 25(5) 1040-1054 2010年9月  査読有り
    Haussler's convolution kernel provides an effective framework for engineering positive semidefinite kernels, and has a wide range of applications. On the other hand, the mapping kernel that we introduce in this paper is its natural generalization, and will enlarge the range of application significantly. Our main theorem with respect to positive semidefiniteness of the mapping kernel (1) implies Haussler's theorem as a corollary, (2) exhibits an easy-to-check necessary and sufficient condition for mapping kernels to be positive semidefinite, and (3) formalizes the mapping kernel so that significant flexibility is provided in engineering new kernels. As an evidence of the effectiveness of our results, we present a framework to engineer tree kernels. The tree is a data structure widely used in many applications, and tree kernels provide an effective method to analyze tree-type data. Thus, not only is the frame work important as an example but also as a practical research tool. The description of the framework accompanies a survey of the tree kernels in the literature, where we see that 18 out of the 19 surveyed tree kernels of different types are instances of the mapping kernel, and examples of novel interesting tree kernels.
  • Taku Aratsu, Kouichi Hirata, Tetsuji Kuboyama
    FUNDAMENTA INFORMATICAE 101(3) 157-171 2010年  査読有り
    This article proposes an approximation of the tree edit distance through the string edit distance for binary tree codes, instead of for Euler strings introduced by Akutsu (2006). Here, a binary tree code is a string obtained by traversing a binary tree representation with two kinds of dummy nodes of a tree in preorder. Then, we show that sigma/2 <= tau <= (h + 1)sigma + h, where tau is the tree edit distance between trees, and sigma is the string edit distance between their binary tree codes and h is the minimum height of the trees.
  • Kengo Yoshida, Tetsuhiro Miyahara, Tetsuji Kuboyama
    2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 5 749-753 2010年  査読有り
    We propose a new genetic programming (GP) approach to extracting multiple tree structured patterns from tree structured data using soft clustering. We use a set of multiple tree structured patterns, called tag tree patterns, as a combined pattern. A structured variable in a tag tree pattern can be substituted by an arbitrary tree. A set of multiple tag tree patterns matches a tree, if at least one of the set of patterns matches the tree. Using soft clustering is appropriate because one tree structured data is allowed to match multiple tag tree patterns. By soft clustering of positive data and by running GP subprocesses on each cluster with negative data, we make a combined pattern which consists of best individuals in GP subprocesses. Experiments on some glycan data show that our method has a support of about 0.8, while the previous method for evolving single patterns has a support of about 0.5.
  • Kilho Shin, Tetsuji Kuboyama
    THEORETICAL COMPUTER SCIENCE 410(19) 1847-1862 2009年4月  査読有り
    Polynomials have proven to be useful tools to tailor generic kernels to specific applications. Nevertheless, we had only restricted knowledge for selecting fertile polynomials which consistently produce positive semidefinite kernels. For example, the well-known polynomial kernel can only take advantage of a very narrow range of polynomials, that is, the univariate polynomials with positive coefficients. This restriction not only hinders intensive exploitation of the flexibility of the kernel method, but also causes misuse of indefinite kernels. Our main theorem significantly relaxes the restriction by asserting that a polynomial consistently produces positive semidefinite kernels, if it has a positive semidefinite coefficient matrix. This sufficient condition is quite natural, and hence, it can be a good characterization of the fertile polynomials. In fact, we prove that the converse of the assertion of the theorem also holds true in the case of degree 1. We also prove the effectiveness of our main theorem by showing three corollaries relating to certain applications known in the literature: the first and second corollaries, respectively, give generalizations of the polynomial kernel and the principal-angle (determinant) kernel. The third corollary shows extended and corrected sufficient conditions for the codon-improved kernel and the weighted-degree kernel with shifts to be positive semidefinite. (C) 2009 Elsevier B.V. All rights reserved.

MISC

 131

教育業績(担当経験のある科目)

 15

共同研究・競争的資金等の研究課題

 33