Curriculum Vitaes

Tetsuji Kuboyama

  (久保山 哲二)

Profile Information

Affiliation
Professor, Computer Centre / Archival Science, Graduate School of Humanities, Gakushuin University
Tokyo Denki University
Degree
Ph.D.(University of Tokyo)

Researcher number
80302660
ORCID ID
 https://orcid.org/0000-0003-1590-0231
J-GLOBAL ID
200901047478411760
researchmap Member ID
5000102916

External link

Education

 1

Papers

 122
  • Naoya Higuchi, Yasunobu Imamura, Vladimir Mic, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    ICPRAM, 499-510, 2024  
  • 草場彰, 寒川義裕, 久保山哲二, 新田州吾, 白石賢二, 押山淳
    日本結晶成長学会誌, 50(1) 50-1-05, Apr 28, 2023  Peer-reviewed
  • TOKUNAGA Hiroko, KUBOYAMA Tetsuji, KIMURA Atsushi, MUKAWA Naoki
    J106-A(3) 104-113, Mar 1, 2023  Peer-reviewed
    Based on the behavior analysis of the co-eating of older adults, this study clarifies the effects of psychological health on meal time conversation. We conducted a dietary experiment in which the same menu was provided in individual and platter formats to four groups consisting of six older adults. From the recorded video, participant's speech behaviors were annotated from the start of eating to 20 minutes. The subjects of analysis were the amount of speech and topics of the participants. The results revealed that during the meal, the speaker could easily secure a situation in which they continued to speak at a certain time, and the participants activated the interaction on the topic of cooking. This study found that co-eating conversation functions as a place to deepen mutual understanding among participants and to give the first meeting an opportunity to talk with each other.
  • 2022 207-212, Dec 2, 2022  Peer-reviewed
  • 2022 289-294, Dec 2, 2022  Peer-reviewed
  • A. Kusaba, S. Nitta, K. Shiraishi, T. Kuboyama, Y. Kangawa
    Applied Physics Letters, 121(16), Oct 17, 2022  Peer-reviewed
    To develop a quantitative reaction simulator, data assimilation was performed using high-resolution time-of-flight mass spectrometry (TOF-MS) data applied to a GaN metalorganic vapor phase epitaxy system. Incorporating ab initio knowledge into the optimization enables it to reproduce not only the concentration of CH4 (an impurity precursor) as an objective variable but also known reaction pathways. The simulation results show significant production of GaH3, a precursor of GaN, which has been difficult to detect in TOF-MS experiments. Our proposed approach is expected to be applicable to other applied physics fields that require quantitative prediction that goes beyond ab initio reaction rates.
  • A. Kusaba, Y. Kangawa, T. Kuboyama, A. Oshiyama
    Applied Physics Letters, 120(2), Jan 10, 2022  Peer-reviewed
    GaN(0001) surfaces with Ga- and H-adsorbates are fundamental stages for epitaxial growth of semiconductor thin films. We explore stable surface structures with a nanometer scale by the density-functional calculations combined with Bayesian optimization and reach a single structure with satisfactorily low mixing enthalpy among hundreds of thousand possible candidate structures. We find that the obtained structure is free from any postulated high symmetry previously introduced by human intuition, satisfies an electron counting rule locally, and shows a complex adsorbate arrangement, reflecting characteristics of nitride semiconductors. The proposed scheme toward a high-resolution surface phase diagram contributes to a more precise design of GaN epitaxial growth conditions, especially the ratio of Ga and H partial pressures.
  • Akira Kusaba, Tetsuji Kuboyama, Kilho Shin, Makoto Sasaki, Shigeru Inagaki
    Japanese Journal of Applied Physics, 61(SA), Jan, 2022  Peer-reviewed
    A new combined use of dynamic mode decomposition algorithms is proposed, which is suitable for the analysis of spatiotemporal data from experiments with few observation points, unlike computational fluid dynamics with many observation points. The method was applied to our data from a plasma turbulence experiment. As a result, we succeeded in constructing a quite accurate model for our training data and it made progress in predictive performance as well. In addition, modal patterns from the longer-term analysis help to understand the underlying mechanism more clearly, which is demonstrated in the case of plasma streamer structure. This method is expected to be a powerful tool for the data-driven construction of a reduced-order model and a predictor in plasma turbulence research and also any nonlinear dynamics researches of other applied physics fields.
  • Naoya Higuchi, Yasunobu Imamura, Vladimir Mic, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    ICPRAM, 401-410, 2022  Peer-reviewed
  • Yusuke Kyokawa, Tetsuji Kuboyama, Mizuho Kamo, Eisaku Maeda
    じんもんこん2021論文集, 2021(2021) 260-267, Dec 4, 2021  Peer-reviewed
  • Akira Kusaga, Tetsuji Kuboyama, Yoshihiro Kangawa
    結晶成長国内会議予稿集(CD-ROM), 50th, 2021  Invited
  • Maciej Huk, Kilho Shin, Tetsuji Kuboyama, Takako Hashimoto
    Intelligent Information and Database Systems - 13th Asian Conference(ACIIDS), 12672 LNAI 717-730, 2021  Peer-reviewed
    Much care should be given to the cases when there is a need to compare results of machine learning (ML) experiments performed with the usage of different Pseudo Random Number Generators (PRNGs). This is because the selection of PRNG can be regarded as a source of measurement error, e.g. in repeated N-fold Cross Validation (CV). It can be also important to verify if the observed properties of a model or algorithm are not due to the effects of the use of a particular PRNG. In this paper we conduct experiments so that we can observe the possible level of differences in obtained values of various measures of classification quality of simple Contextual Neural Networks and Multilayer Perceptron (MLP) models for various PRNGs. It is presented that the results for some pairs of PRNGs can be significantly different even for large number of repeats of 5-fold CV. Observations suggest that when different ML models and algorithms are compared with the usage of 5-fold CV when different PRNGs were used, the confidence interval should be doubled or confidence level higher than 95% should be used. Additionally, it is shown that even under such conditions classification properties of Contextual Neural Networks are found statistically better than of not-contextual MLP models.
  • Takako Hashimoto, David Lawrence Shepard, Tetsuji Kuboyama, Kilho Shin, Ryota Kobayashi, Takeaki Uno
    The Journal of Supercomputing, 77(5) 4375-4388, Oct 1, 2020  Peer-reviewed
    <title>Abstract</title> During a disaster, social media can be both a source of help and of danger: Social media has a potential to diffuse rumors, and officials involved in disaster mitigation must react quickly to the spread of rumor on social media. In this paper, we investigate how topic diversity (i.e., homogeneity of opinions in a topic) depends on the truthfulness of a topic (whether it is a rumor or a non-rumor) and how the topic diversity changes in time after a disaster. To do so, we develop a method for quantifying the topic diversity of the tweet data based on text content. The proposed method is based on clustering a tweet graph using Data polishing that automatically determines the number of subtopics. We perform a case study of tweets posted after the East Japan Great Earthquake on March 11, 2011. We find that rumor topics exhibit more homogeneity of opinions in a topic during diffusion than non-rumor topics. Furthermore, we evaluate the performance of our method and demonstrate its improvement on the runtime for data processing over existing methods.
  • 13(1) 13-22, Mar 25, 2020  Peer-reviewed
    局所性鋭敏ハッシュ(LSH)の一種であるスケッチを用いたk近傍検索について議論する.スケッチを用いるk近傍検索は2段階で行う.第1段階では,質問とのスケッチ間の距離が近いK個の解候補を選択する.ただし,K ≥ kである.第2段階では,K個の解候補に対して実距離計算を行うことでk近傍解を選択する.従来,高い検索精度を保証するためには32ビット以上のスケッチを用いていた.本研究では,スケッチのビット数を16に減らすことにより,バケット法を用いたデータ管理による高速化を可能とし,第1段階の検索コストをほとんど無視できる手法を提案する.16ビットスケッチを用いる検索は,精度を維持するためには,32ビットスケッチを用いる場合より大きな候補数Kを必要とする.データオブジェクトをスケッチの値によってソートしておくことで,第2段階の検索におけるメモリ局所性を向上することで候補数Kの増加による速度低下を低減できる.提案手法を用いると,検索精度を維持しつつ10倍程度の高速化が実現できる. We discuss k nearest neighbor search using sketches, which is a kind of locality sensitive hash (LSH). Search using sketches is processed in two stages. The first stage is to select K solution candidates close in distance between the question and the sketch, where K ≥ k. In the second stage, k nearest neighbor solutions are selected by performing real distance calculation on K candidates. Conventionally, to ensure high search accuracy, sketches of 32-bit or more have been used. In this paper, we reduce the width of sketches to 16-bit for which efficient data management by bucket is applicable. We propose a search method that enables high speed with the first stage of negligible cost. Searches using 16-bit sketches require a larger number of candidates K to maintain accuracy than using 32-bit sketches. However, by sorting the data objects by their sketch values, memory locality in the second stage search is improved and influence by increasing K is canceled. By using the proposed method, about 10 times speedup can be realized while maintaining search accuracy.
  • Takako Hashimoto, Akira Kusaba, Dave Shepard, Tetsuji Kuboyama, Kilho Shin, Takeaki Uno
    Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2020), 585-592, Feb 22, 2020  Peer-reviewed
  • Kilho Shin, Kenta Okumoto, David Shepard, Tetsuji Kuboyama, Takako Hashimoto, Hiroaki Ohshima
    Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 203-213, Feb, 2020  Peer-reviewed
  • Akira Kusaba, Tetsuji Kuboyama, Shigeru Inagaki
    Plasma and Fusion Research, 15 1301001:1-1301001:4, Jan 6, 2020  Peer-reviewed
  • Naoya Higuchi, Yasunobu Imamura, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11996 LNCS 71-92, 2020  Peer-reviewed
    Annealing by Increasing Resampling (AIR, for short) is a stochastic hill-climbing optimization algorithm that evaluates the objective function for resamplings with increasing size. At the beginning stages, AIR makes state transitions like a random walk, because it uses small resamplings for which evaluation has large error at high probability. At the ending stages, AIR behaves like a local search because it uses large resamplings very close to the entire sample. Thus AIR works similarly as the conventional Simulated Annealing (SA, for short). As a rationale for AIR approximating SA, we show that both AIR and SA can be regarded as a hill-climbing algorithm according to objective function evaluation with stochastic fluctuations. The fluctuation in AIR is explained by the probit, while in SA by the logit. We show experimentally that the logit can be replaced with the probit in MCMC, which is a basis of SA. We also show experimental comparison of SA and AIR for two optimization problems, sparse pivot selection for dimension reduction, and annealing-based clustering. Strictly speaking, AIR must use resampling independently performed at each transition trial. However, it has been demonstrated by experiments that reuse of resampling within a certain number of times can speed up optimization without losing the quality of optimization. In particular, the larger the samples used for evaluation, the more remarkable the superiority of AIR is in terms of speed with respect to SA.
  • Akira Kusaba, Takako Hashimoto, Kilho Shin, David Lawrence Shepard, Tetsuji Kuboyama
    2020 IEEE Region 10 Conference(TENCON), 2020-November 1192-1197, 2020  Peer-reviewed
    This paper presents FITS, or Feature-value / Instance Transposition Selection, a method for unsupervised clustering. FITS is a tractable, explicable clustering method, which leverages the unsupervised feature value selection algorithm known as UFVS in the literature. FITS combines repeated rounds of UFVS with alternating steps of matrix transposition to produce a set of homogenous clusters that describe data well. By repeatedly swapping the role of feature and instance and applying the same selection process to them, FITS leverages UFVS's speed and can perform clustering in our experiments in tens milliseconds for datasets of thousands of features and thousands of instances.We performed feature selection-based clustering on two real-world data sets. One is aimed at topic extraction from Twitter data, and the other is aimed at gaining awareness of energy conservation from time-series power consumption data. This study also proposes a novel method based on iterative feature extraction and transposition. The effectiveness of this method is shown in an application of Twitter data analysis. On the other hand, a more straightforward use of feature selection is adopted in the application of time series power consumption data analysis.
  • Naoya Higuchi, Yasunobu Imamura, Vladimir Mic, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    Similarity Search and Applications - 13th International Conference(SISAP), 33-46, 2020  Peer-reviewed
  • Akira Kusaba, Kilho Shin, Dave Shepard 0001, Tetsuji Kuboyama
    20th International Conference on Data Mining Workshops, 811-819, 2020  Peer-reviewed
  • Takako Hashimoto, Kilho Shin, David Lawrence Shepard, Tetsuji Kuboyama
    11th International Conference on Awareness Science and Technology(iCAST), 1-6, 2020  
    This paper presents an analysis of an Indonesian gender equality survey: in 2019, we conducted a survey of attitudes about gender roles in Indonesia and obtained data from 122 individuals. The obtained data were analyzed using our original clustering method (UFVS, Unsupervised Feature Value Selection) to form clusters. The extracted features characterized the clusters and helped to analyze the attitudes of Indonesians towards gender equality. This method allowed the respondents to be grouped by features and each group characteristics could be easily identified. It facilitated the understanding of the survey data.
  • Kilho Shin, Kenta Okumoto, David Lawrence Shepard, Akira Kusaba, Takako Hashimoto, Jorge Amari, Keisuke Murota, Junnosuke Takai, Tetsuji Kuboyama, Hiroaki Ohshima
    Agents and Artificial Intelligence, 12613 LNAI 421-444, 2020  Peer-reviewed
    The problem of feature selection has been an area of considerable research in machine learning. Feature selection is known to be particularly difficult in unsupervised learning because different subgroups of features can yield useful insights into the same dataset. In other words, many theoretically-right answers may exist for the same problem. Furthermore, designing algorithms for unsupervised feature selection is technically harder than designing algorithms for supervised feature selection because unsupervised feature selection algorithms cannot be guided by class labels. As a result, previous work attempts to discover intrinsic structures of data with heavy computation such as matrix decomposition, and require significant time to find even a single solution. This paper proposes a novel algorithm, named Explainability-based Unsupervised Feature Value Selection (EUFVS), which enables a paradigm shift in feature selection, and solves all of these problems. EUFVS requires only a few tens of milliseconds for datasets with thousands of features and instances, allowing the generation of a large number of possible solutions and select the solution with the best fit. Another important advantage of EUFVS is that it selects feature values instead of features, which can better explain phenomena in data than features. EUFVS enables a paradigm shift in feature selection. This paper explains its theoretical advantage, and also shows its applications in real experiments. In our experiments with labeled datasets, EUFVS found feature value sets that explain labels, and also detected useful relationships between feature value sets not detectable from given class labels.
  • David Lawrence Shepard, Takako Hashimoto, Kilho Shin, Takeaki Uno, Tetsuji Kuboyama
    Digital Humanities 2020, 2020  Peer-reviewed
  • Takuya Kida, Tetsuji Kuboyama, Takeaki Uno, Akihiro Yamamoto
    Mach. Learn., 109(6) 1145-1146, 2020  
  • Akira Kusaba, Tetsuji Kuboyama, Takako Hashimoto
    Proceedings of the 10th International Conference on Awareness Science and Technology (iCAST 2019), 1-6, Oct 23, 2019  Peer-reviewed
  • Yasunobu Imamura, Naoya Higuchi, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    ICPRAM 2019 - Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, 173-180, 2019  Peer-reviewed
    Annealing by Increasing Resampling (AIR) is a stochastic hill-climbing optimization by resampling with increasing size for evaluating an objective function. In this paper, we introduce a unified view of the conventional Simulated Annealing (SA) and AIR. In this view, we generalize both SA and AIR to a stochastic hill-climbing for objective functions with stochastic fluctuations, i.e., logit and probit, respectively. Since the logit function is approximated by the probit function, we show that AIR is regarded as an approximation of SA. The experimental results on sparse pivot selection and annealing-based clustering also support that AIR is an approximation of SA. Moreover, when an objective function requires a large number of samples, AIR is much faster than SA without sacrificing the quality of the results.
  • Naoya Higuchi, Yasunobu Imamura, Tetsuji Kuboyama, Kouichi Hirata, Takeshi Shinohara
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11919 LNAI 240-252, 2019  Peer-reviewed
    A sketch is a lossy compression of high-dimensional data into compact bit strings such as locality sensitive hash. In general, k nearest neighbor search using sketch consists of the following two stages. The first stage narrows down the top K candidates, for some K ≥ k,, using a priority measure of sketch as a filter. The second stage selects the k nearest objects from K candidates. In this paper, we discuss the search algorithms using fast filtering by sketch enumeration without using matching. Surprisingly, the search performance is rather improved by the proposed method when narrow sketches with smaller number of bits such as 16-bits than the conventional ones are used. Furthermore, we compare the search efficiency by sketches of various widths for several databases, which have different numbers of objects and dimensionalities. Then, we can observe that wider sketches are appropriate for larger databases, while narrower sketches are appropriate for higher dimension.
  • Mikio Mizukami, Kouichi Hirata, Tetsuji Kuboyama
    Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods(ICPRAM), 699-706, 2019  
  • Takako Hashimoto, Takeaki Uno, Tetsuji Kuboyama, Kilho Shin, Dave Shepard
    IEEE International Conference on Big Data and Smart Computing, BigComp 2019, Kyoto, Japan, February 27 - March 2, 2019, 1-8, 2019  Peer-reviewed
  • Takako Hashimoto, Hiroshi Okamoto, Tetsuji Kuboyama, Kilho Shin
    Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, 2018- 2740-2745, Jan 12, 2018  Peer-reviewed
    This paper is showing a time series topic life cycle extraction from millions of Tweets using our original community detection technique in bipartite networks. We suppose that the authors role that means who belong to what topics is important to extract quality topics from social media data. We already proposed the topic extraction method that considers the relationship between the authors and the words as bipartite networks and explores the authors role by forming clusters as topics. As the next step, this paper applies our method to the time series topic life cycle detection. We extract topics in different time slots and analyze the time series of topic transition using the coherence measure that expresses the semantic accuracy of topics. The paper demonstrates that our method can detect the topic life cycle such as the growth, the conflicts and so on over time from millions of Tweets.
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida
    IJCIStudies, 7(3/4) 270-288, 2018  Peer-reviewed
  • Naoya Higuchi, Yasunobu Imamura, Tetsuji Kuboyama, Kouichi Hirata, Takeshi Shinohara
    Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2018, Funchal, Madeira - Portugal, January 16-18, 2018., 356-363, 2018  Peer-reviewed
  • Fumiya Tokuhara, Array, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida
    Vietnam J. Computer Science, 5(3-4) 229-239, 2018  Peer-reviewed
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida
    2017 IEEE 10th International Workshop on Computational Intelligence and Applications, IWCIA 2017 - Proceedings, 2017- 191-197, Dec 13, 2017  Peer-reviewed
    Knowledge acquisition from graph structured data is an important task in machine learning and data mining. Block preserving outerplanar graph patterns are graph structured patterns having structured variables and are suited to represent characteristic graph structures of graph data modeled as outerplanar graphs. We propose a learning method for acquiring characteristic multiple block preserving outerplanar graph patterns by evolutionary computation using graph pattern sets as individuals, from positive and negative outerplanar graph data, in order to represent characteristic graph structures more precisely.
  • Kilho Shin, Tetsuji Kuboyama, Takako Hashimoto, Dave Shepard
    Information (Switzerland), 8(4) 159, Dec 6, 2017  Peer-reviewed
    Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain the phenomena that the dataset describes, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, while feature selection algorithms that exhibit excellent accuracy have been developed, they are seldom used for analysis of high-dimensional data because high-dimensional data usually include too many instances and features, which make traditional feature selection algorithms inefficient. To eliminate this limitation, we tried to improve the run-time performance of two of the most accurate feature selection algorithms known in the literature. The result is two accurate and fast algorithms, namely SCWC and SLCC. Multiple experiments with real social media datasets have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, we have two datasets, one with 15,568 instances and 15,741 features, and another with 200,569 instances and 99,672 features. SCWC performed feature selection on these datasets in 1.4 seconds and in 405 seconds, respectively. In addition, SLCC has turned out to be as fast as SCWC on average. This is a remarkable improvement because it is estimated that the original algorithms would need several hours to dozens of days to process the same datasets. In addition, we introduce a fast implementation of our algorithms: SCWC does not require any adjusting parameter, while SLCC requires a threshold parameter, which we can use to control the number of features that the algorithm selects.
  • Yuuki Yamagata, Tetsuhiro Miyahara, Yusuke Suzuki, Tomoyuki Uchida, Fumiya Tokuhara, Tetsuji Kuboyama
    Proceedings - 2017 6th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2017, 459-464, Nov 15, 2017  Peer-reviewed
    Knowledge acquisition from graph structured data is an important task in machine learning and data mining. TTSP (Two-Terminal Series Parallel) graphs are used as data models for electric networks and scheduling. We propose a learning method for acquiring characteristic multiple graph structured patterns by evolutionary computation using sets of TTSP graph patterns as individuals, from positive and negative TTSP graph data, in order to represent sets of TTSP graphs more precisely.
  • Yasunobu Imamura, Naoya Higuchi, Tetsuji Kuboyama, Kouichi Hirata, Takeshi Shinohara
    Lernen, Wissen, Daten, Analysen (LWDA) Conference Proceedings, Rostock, Germany, September 11-13, 2017., 15, 2017  Peer-reviewed
  • Kilho Shin, Tetsuji Kuboyama, Tetsuhiro Miyahara, Kenji Tanaka
    Frontiers in Artificial Intelligence and Applications, 299 35-45, 2017  Peer-reviewed
    Multiple alignments of strings have been extensively studied as an effective tool to study string-type data such as DNA. In this paper, we generalize the notion of multiple alignments of strings and introduce M -alignments. M -alignments can be defined for arbitrary data objects that consist of a finite number of components. Such objects can be strings, ordered and unordered trees, rooted and unrooted trees, directed and undirected graphs, partially ordered sets and so on. On the other hand, when we introduce costs of M -alignments, the problem to find optimal M -alignments that minimize their costs proves to be NP-hard. To solve this computational problem, we show that the center star algorithm, which is well known approximation algorithm for optimal multiple alignments of strings, can be generalized to M -alignments. When we applied the generalized center star algorithm to a real dataset of glycans, we were successful in identifying effective structural patterns of glycans that characterize the disease of leukemia.
  • Takako Hashimoto, Tetsuji Kuboyama, Hiroshi Okamoto, Kilho Shin
    Information Modelling and Knowledge Bases XXIX, 27th International Conference on Information Modelling and Knowledge Bases (EJC 2017), Krabi, Thailand, June 5-9, 2017., 395-408, 2017  Peer-reviewed
  • Takako Hashimoto, Tetsuji Kuboyama, Hiroshi Okamoto, Kilho Shin
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10558 239-247, 2017  Peer-reviewed
    This paper proposes a quality topic extraction on Twitter based on author’s role on bipartite networks. We suppose that author’s role which means who were in what group, affects the quality of extracted topics. Our proposed method expresses relations between authors and words as bipartite networks, explores author’s role by forming clusters using our original community detection technique, and finds quality topics considering the semantic accuracy of words and author’s role.
  • David Lawrence Shepard, Takako Hashimoto, Hiroshi Okamoto, Tetsuji Kuboyama, Kilho Shin
    Digital Humanities 2017, DH 2017, Conference Abstracts, McGill University & Université de Montréal, Montréal, Canada, August 8-11, 2017, 2017  Peer-reviewed
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I, 10191 748-757, 2017  Peer-reviewed
    We propose a context-aware fitness function based on feature selection for evolutionary learning of characteristic graph patterns. The proposed fitness function estimates the fitness of a set of correlated individuals rather than the sum of fitness of the individuals, and specifies the fitness of an individual as its contribution degree in the context of the set. We apply the proposed fitness function to our evolutionary learning, based on Genetic Programming, for obtaining characteristic graph patterns from positive and negative graph data. We report some experimental results on our evolutionary learning of characteristic graph patterns, using the context-aware fitness function and a previous fitness function ignoring context.
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Yusuke Suzuki, Tomoyuki Uchida, Tetsuji Kuboyama
    PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 203-210, 2016  Peer-reviewed
    Many chemical compounds can be expressed by a class of graphs called outerplanar graphs. By taking advantage of this tractable class of graphs, we use block preserving outerplanar graph patterns having structured variables for expressing structural features of outerplanar graphs. We propose a method for acquiring characteristic block preserving outerplanar graph patterns from positive and negative outerplanar graph data by Genetic Programming using vertex and edge label information of positive examples. We report experimental results on real chemical compound data.
  • Yasunobu Imamura, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9939 259-267, 2016  Peer-reviewed
    Hilbert sort arranges given points of a high-dimensional space with integer coordinates along a Hilbert curve. A naïve method first draws a Hilbert curve of a sufficient resolution to separate all the points, associates integers called Hilbert indices representing the orders along the Hilbert curve to points, and then, sorts the pairs of points and indices. Such a method requires an exponentially large cost with respect to both the dimensionality n of the space and the order m of the Hilbert curve even if obtaining Hilbert indices. A known improved method computes the Hilbert index for each point in O (mn) time. In this paper, we propose an algorithm which directly sorts N points along a Hilbert curve in O (mnN) time without using Hilbert indices. This algorithm has the following three advantages (1) it requires no extra space for Hilbert indices, (2) it handles simultaneously multiple points, and (3) it simulates the Hilbert curve in heterogeneous resolution, that is, in lower order for sparse space and higher order for dense space. It, therefore, runs much faster on random data in O (NlogN) time. Furthermore, it can be expected to run very fast on practical data, such as high-dimensional features of multimedia data.
  • Takako Hashimoto, Dave Shepard, Tetsuji Kuboyama, Kilho Shin
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 724-731, 2016  Peer-reviewed
    Social media offers a wealth of insight into how significant topics such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing affect individuals. The scale of available data, however, can be intimidating: during the Great East Japan Earthquake, over 8 million tweets were sent each day from Japan alone. Conventional word vector-based topic-detection techniques for social media that use Latent Semantic Analysis, Latent Dirichlet Allocation, or graph community detection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we have already proposed an efficient method for topic extraction by leveraging our original fast feature selection algorithm, CWC, which vastly reduces the number of features to track. While we begin with word count vectors of authors and words for each time slot (in our case, every 30 minutes), we make clusters from each time slot by a matrix decomposition technique to identify clusters and adapt CWC to extract discriminative words from each cluster. This method makes it possible to detect topics from high dimensional datasets. In this paper, to demonstrate our method's effectiveness, we extract topics from a dataset of over two hundred million tweets sent following the Great East Japan Earthquake and compare them with the result extracted by LDA, the current most popular topic extraction method. With CWC, we can identify topics from this dataset with great speed and accuracy.
  • Eina Hashimoto, Masatsugu Ichino, Tetsuji Kuboyama, Isao Echizen, Hiroshi Yoshiura
    SOCIAL MEDIA: THE GOOD, THE BAD, AND THE UGLY, 9844 455-470, 2016  Peer-reviewed
    A method for de-anonymizing social network accounts is presented to clarify the privacy risks of such accounts as well as to deter their misuse such as by posting copyrighted, offensive, or bullying contents. In contrast to previous de-anonymization methods, which link accounts to other accounts, the presented method links accounts to resumes, which directly represent identities. The difficulty in using machine learning for de-anonymization, i.e. preparing positive examples of training data, is overcome by decomposing the learning problem into subproblems for which training data can be harvested from the Internet. Evaluation using 3 learning algorithms, 2 kinds of sentence features, 238 learned classifiers, 2 methods for fusing scores from the classifiers, and 30 volunteers' accounts and resumes demonstrated that the proposed method is effective. Because the training data are harvested from the Internet, the more information that is available on the Internet, the greater the effectiveness of the presented method.
  • David Lawrence Shepard, Takako Hashimoto, Tetsuji Kuboyama, Kilho Shin
    Digital Humanities 2016, DH 2016, Conference Abstracts, Jagiellonian University & Pedagogical University, Krakow, Poland, July 11-16, 2016, 361-364, 2016  Peer-reviewed
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Yusuke Suzuki, Tomoyuki Uchida, Tetsuji Kuboyama
    2016 IEEE 9TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (IWCIA), 93-99, 2016  Peer-reviewed
    We consider evolutionary learning, based on Genetic Programming, for acquiring characteristic graph structures from positive and negative outerplanar graph data. We use block preserving outerplanar graph patterns as representations of graph structures. Block tree patterns are tree representations of block preserving outerplanar patterns, and have the structure of unrooted trees some of whose vertices have ordered adjacent vertices. In this paper we propose canonical representations, which are representations having the structure of rooted and ordered trees, of block tree patterns in acquiring characteristic block preserving outerplanar graph patterns. Then we give an algorithm for calculating canonical representations of block tree patterns. Preliminary experimental results show the algorithm is effective in reducing the run time of our evolutionary learning method.
  • Quming Jin, Masaya Nakashima, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2014, 9067 310-316, 2015  Peer-reviewed
    A Simple-Map (S-Map, for short), which is one of dimension reduction techniques applicable to any metric space, uses the distances between central points and objects as the coordinate values. S-Map with multiple central points is a projection to multidimensional L-infinity space. In the previous researches for S-Map, the candidates for central points are randomly selected from data objects in database, and the summation of projective distances between sampled pairs of points is used as the scoring function to be maximized. We can improve the above method to select central points by using local search. The coordinate values of central points obtained after local search tend to be the maximum or the minimum ends of the space. By focusing on this tendency, in this paper, we propose a binary quantization to select central points divided into the maximum values and the minimum values based on whether the coordinate value of an object in database is greater than the threshold or not.

Misc.

 131
  • 樋口 直哉, 今村 安伸, 篠原 武, 平田 耕一, 久保山 哲二
    人工知能学会研究会資料 人工知能基本問題研究会, 123 24-29, Jan 5, 2023  
  • 徳永弘子, 久保山哲二, 木村敦, 武川直樹
    電子情報通信学会技術研究報告(Web), 121(363(HCS2021 43-60)) 43-48, 2022  
  • KAWASAKI Yuma, MIYAHARA Tetsuihiro, KUBOYAMA Tetsuji, SUZUKI Yusuke, UCHIDA Tomoyuki
    Proceedings of the Annual Conference of JSAI, JSAI2021 4G3GS2l01-4G3GS2l01, 2021  
    Knowledge acquisition from graph structured data is an important task in machine learning and data mining. TTSP (Two-Terminal Series Parallel) graphs are used as data models for electric networks and scheduling. We propose an evolutionary learning method for obtaining characteristic multiple TTSP graph patterns with wildcards, from positive and negative TTSP graph data by clustering TTSP graphs.
  • YOKOYAMA Shunsuke, MIYAHARA Tetsuhiro, SUZUKI Yusuke, UCHIDA Tomoyuki, KUBOYAMA Tetsuji
    Proceedings of the Annual Conference of JSAI, JSAI2021 4G3GS2l02-4G3GS2l02, 2021  
    Machine learning and data mining from tree structured data are studied intensively. We propose an evolutionary learning method for acquiring characteristic tag tree patterns with vertex labels and wildcards from positive and negative tree data, by using label information of positive examples. We report preliminary experimental results on our evolutionary learning method.
  • TOKUHARA Fumiya, OKINAGA Shiho, MIYAHARA Tetsuhiro, SUZUKI Yusuke, KUBOYAMA Tetsuji, UCHIDA Tomoyuki
    Proceedings of the Annual Conference of JSAI, JSAI2020 1O3GS802-1O3GS802, 2020  
    Machine learning from graph structured data are studied intensively. Many chemical compounds can be expressed by outerplanar graphs. The purpose of this paper is to propose a learning method for obtaining characteristic graph patterns from positive and negative outerplanar graph data. We propose a two-stage evolutionary learning method for acquiring characteristic multiple block preserving outerplanar graph patterns with wildcards from positive and negative outerplanar graph data, by using label information of positive examples. We report preliminary experimental results on our evolutionary learning method.

Teaching Experience

 15

Research Projects

 33