研究者業績

久保山 哲二

クボヤマ テツジ  (Tetsuji Kuboyama)

基本情報

所属
学習院大学 計算機センター / 人文科学研究科アーカイブズ学専攻 教授
東京電機大学 総合研究所・知能創発研究所 客員教授
学位
博士(工学)(東京大学)

研究者番号
80302660
ORCID ID
 https://orcid.org/0000-0003-1590-0231
J-GLOBAL ID
200901047478411760
researchmap会員ID
5000102916

外部リンク

学歴

 1

論文

 126
  • Tetsuji Kuboyama, Akira Kusaba
    Journal of Crystal Growth 2025年1月  
  • Naoya Higuchi, Yasunobu Imamura, Vladimir Mic, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    ICPRAM 499-510 2024年  
  • 草場彰, 寒川義裕, 久保山哲二, 新田州吾, 白石賢二, 押山淳
    日本結晶成長学会誌 50(1) 50-1-05 2023年4月28日  査読有り
  • 徳永 弘子, 久保山 哲二, 木村 敦, 武川 直樹
    電子電子情報通信学会論文誌A 基礎・境界 J106-A(3) 104-113 2023年3月1日  査読有り
    共食はコミュニケーションの場であり,人の心理的健康に良い効果をもたらすことが明らかになっている.本研究は,2種類の形式で食事を提供し,共食中の人の発話行動の特徴から心理的効果の根拠を示す.具体的には,6名からなる4グループに対し,同じメニューを銘々膳形式,共同膳形式で提供した.参与者らによる共食シーンを映像に記録し,食事開始後の各20分の発話を書き起こした.分析の結果,食事中の話し手は,一定の時間において発話を継続する状況が確保しやすいこと,参与者らは料理のトピックにより会話を活性化させていることが確認された.これにより共食会話には他者との相互理解を深めたり,初対面同志の会話機会を得たりする場として機能しており,こうした人と人のつながりが心理的健康に貢献していることが示唆された.
  • 宇野, 毅明, 武富, 有香, 小林, 亮太, 橋本, 隆子, 久保山, 哲二, 申, 吉浩
    じんもんこん2022論文集 2022 207-212 2022年12月2日  査読有り
    本稿では,ニュース記事に対するコメント,および,ニュース記事のあるカテゴリのニュース記事に対するコメントの多様性を測り,その結果を紹介する.記事やカテゴリに対して人々がどのような反応をしているかを読み解くための材料として,新たな方向性を提案する.異なる記事に対するコメントは,単一の記事に対するコメントよりも多様性が増すであろう,という仮説から,あるカテゴリの複数の記事に対するコメントの多様性を比較することで,カテゴリの多様性を評価する. We analyze the diversity of the comments posted to news articles, and news articles in some categories. We think this helps interpretation of the behavior of the persons in a society from a new view point. From a hypothesis that the diversity of comments to two different news articles is larger than that to one news article, we compare the diversity of several number of articles in the same category, to observe the diversity of the comments to the articles in the category.
  • 杉山, 佳奈美, 久保山, 哲二, 三輪, 洋文, 宇野, 毅明
    じんもんこん2022論文集 2022 289-294 2022年12月2日  査読有り
    選挙公報のテキストデータに対して文書クラスタリングを適用した. クラスタリング手法には、 文書間類似度により形成されるネットワーク構造から密な部分構造を抽出するマイクロクラスタリン グと、代表的なトピックモデルである LDAの2種類を利用した. クラスタリング結果を比較したとこ ろ, マイクロクラスタリングではトピックの解釈が容易な解像度が高いクラスタ, 特に政党に関して より類似度が高いクラスタが多数得られることが示された. さらにマイクロクラスタリングで抽出さ れた文書クラスタを元に回帰分析を行い, 個人票志向の候補者の傾向を解析した. その結果, LDA を 用いた先行研究にあった人手によるトピック解釈の過程を経ることなく, 選挙制度改革前後の変化や 政党ごとの特色について先行研究の主張を支持する結果が得られた. A text analysis was applied to the election manifestos. Two kinds of clustering methods were utilized and compared: microclustering that extracted dense substructures from the network constructed based on the similarity of the documents, and the LDA model, which is a kind of topic model. The microclustering gave many clusters that were easier to understand their topics, especially for the party of candidates. Furthermore, regression analysis was applied to the obtained clusters, and the tendency of candidates with personal-oriented was elucidated. The results correspond to previous studies, such as the change of the electoral rule and the characteristics of each political party, without any manual topic interpretation process as in the previous studies.
  • A. Kusaba, S. Nitta, K. Shiraishi, T. Kuboyama, Y. Kangawa
    Applied Physics Letters 121(16) 2022年10月17日  査読有り
    To develop a quantitative reaction simulator, data assimilation was performed using high-resolution time-of-flight mass spectrometry (TOF-MS) data applied to a GaN metalorganic vapor phase epitaxy system. Incorporating ab initio knowledge into the optimization enables it to reproduce not only the concentration of CH4 (an impurity precursor) as an objective variable but also known reaction pathways. The simulation results show significant production of GaH3, a precursor of GaN, which has been difficult to detect in TOF-MS experiments. Our proposed approach is expected to be applicable to other applied physics fields that require quantitative prediction that goes beyond ab initio reaction rates.
  • A. Kusaba, Y. Kangawa, T. Kuboyama, A. Oshiyama
    Applied Physics Letters 120(2) 2022年1月10日  査読有り
    GaN(0001) surfaces with Ga- and H-adsorbates are fundamental stages for epitaxial growth of semiconductor thin films. We explore stable surface structures with a nanometer scale by the density-functional calculations combined with Bayesian optimization and reach a single structure with satisfactorily low mixing enthalpy among hundreds of thousand possible candidate structures. We find that the obtained structure is free from any postulated high symmetry previously introduced by human intuition, satisfies an electron counting rule locally, and shows a complex adsorbate arrangement, reflecting characteristics of nitride semiconductors. The proposed scheme toward a high-resolution surface phase diagram contributes to a more precise design of GaN epitaxial growth conditions, especially the ratio of Ga and H partial pressures.
  • Akira Kusaba, Tetsuji Kuboyama, Kilho Shin, Makoto Sasaki, Shigeru Inagaki
    Japanese Journal of Applied Physics 61(SA) 2022年1月  査読有り
    A new combined use of dynamic mode decomposition algorithms is proposed, which is suitable for the analysis of spatiotemporal data from experiments with few observation points, unlike computational fluid dynamics with many observation points. The method was applied to our data from a plasma turbulence experiment. As a result, we succeeded in constructing a quite accurate model for our training data and it made progress in predictive performance as well. In addition, modal patterns from the longer-term analysis help to understand the underlying mechanism more clearly, which is demonstrated in the case of plasma streamer structure. This method is expected to be a powerful tool for the data-driven construction of a reduced-order model and a predictor in plasma turbulence research and also any nonlinear dynamics researches of other applied physics fields.
  • Naoya Higuchi, Yasunobu Imamura, Vladimir Mic, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    ICPRAM 401-410 2022年  査読有り
  • 鏡川 悠介, 久保山 哲二, 加茂 瑞穂, 前田 英作
    じんもんこん2021論文集 2021(2021) 260-267 2021年12月4日  査読有り
    本研究では,伝統文様のデジタルアーカイブ化に資するアノテーション自動化手法について検討した.デジタル化された伊勢型紙約18,000 枚に描かれた文様(梅,桜,菱など)を手がかりに,型紙の文様(約300 種)の自動識別を試みた.伊勢型紙は,文様の抽象度が高いだけでなく,同種の文様でも型紙ごとにデザインが大きく異なる.さらに,伊勢型紙はほぼ白黒2 値で表現されているとともに,自然画像のような仔細なテクスチャを持たない.そのため,自然画像による事前学習モデルを用いたニューラルネットワーク(CNN) による従来手法では十分な性能を得ることが難しい.本研究では,自動生成したフラクタル画像による事前学習モデルを用いたアンサンブル学習によるアノテーションの自動化を試みた. In this paper, we propose an automated annotation method for the digital archiving of Ise- Katagami, Japanese traditional stencils. We tried to automatically classify the object types depicted in the stencils (about 300 types) based on the patterns (plum blossoms, cherry blossoms, water chestnuts, etc.) for about 18,000 digitized stencil images. The designs in the stencils are not only highly abstract but also highly diverse even within the same class of objects. Moreover, the stencils are almost monochrome, binary black-and-white images, and do not have rich texture information such as in natural images. Therefore, it is difficult to achieve sufficient performance with conventional methods using pre-trained neural network models on natural images. Thus, we propose an improved ensemble method using both fractal images and natural images for pre-trained models.
  • 草場彰, 久保山哲二, 寒川義裕
    結晶成長国内会議予稿集(CD-ROM) 50th 2021年  招待有り
  • Maciej Huk, Kilho Shin, Tetsuji Kuboyama, Takako Hashimoto
    Intelligent Information and Database Systems - 13th Asian Conference(ACIIDS) 12672 LNAI 717-730 2021年  査読有り
    Much care should be given to the cases when there is a need to compare results of machine learning (ML) experiments performed with the usage of different Pseudo Random Number Generators (PRNGs). This is because the selection of PRNG can be regarded as a source of measurement error, e.g. in repeated N-fold Cross Validation (CV). It can be also important to verify if the observed properties of a model or algorithm are not due to the effects of the use of a particular PRNG. In this paper we conduct experiments so that we can observe the possible level of differences in obtained values of various measures of classification quality of simple Contextual Neural Networks and Multilayer Perceptron (MLP) models for various PRNGs. It is presented that the results for some pairs of PRNGs can be significantly different even for large number of repeats of 5-fold CV. Observations suggest that when different ML models and algorithms are compared with the usage of 5-fold CV when different PRNGs were used, the confidence interval should be doubled or confidence level higher than 95% should be used. Additionally, it is shown that even under such conditions classification properties of Contextual Neural Networks are found statistically better than of not-contextual MLP models.
  • Takako Hashimoto, David Lawrence Shepard, Tetsuji Kuboyama, Kilho Shin, Ryota Kobayashi, Takeaki Uno
    The Journal of Supercomputing 77(5) 4375-4388 2020年10月1日  査読有り
    <title>Abstract</title> During a disaster, social media can be both a source of help and of danger: Social media has a potential to diffuse rumors, and officials involved in disaster mitigation must react quickly to the spread of rumor on social media. In this paper, we investigate how topic diversity (i.e., homogeneity of opinions in a topic) depends on the truthfulness of a topic (whether it is a rumor or a non-rumor) and how the topic diversity changes in time after a disaster. To do so, we develop a method for quantifying the topic diversity of the tweet data based on text content. The proposed method is based on clustering a tweet graph using Data polishing that automatically determines the number of subtopics. We perform a case study of tweets posted after the East Japan Great Earthquake on March 11, 2011. We find that rumor topics exhibit more homogeneity of opinions in a topic during diffusion than non-rumor topics. Furthermore, we evaluate the performance of our method and demonstrate its improvement on the runtime for data processing over existing methods.
  • 樋口, 直哉, 今村, 安伸, 久保山, 哲二, 平田, 耕一, 篠原, 武
    情報処理学会論文誌数理モデル化と応用(TOM)6 13(1) 13-22 2020年3月25日  査読有り
  • Takako Hashimoto, Akira Kusaba, Dave Shepard, Tetsuji Kuboyama, Kilho Shin, Takeaki Uno
    Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2020) 585-592 2020年2月22日  査読有り
  • Kilho Shin, Kenta Okumoto, David Shepard, Tetsuji Kuboyama, Takako Hashimoto, Hiroaki Ohshima
    203-213 2020年2月  査読有り
  • Akira Kusaba, Tetsuji Kuboyama, Shigeru Inagaki
    Plasma and Fusion Research 15 1301001:1-1301001:4 2020年1月6日  査読有り
  • Naoya Higuchi, Yasunobu Imamura, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11996 LNCS 71-92 2020年  査読有り
    Annealing by Increasing Resampling (AIR, for short) is a stochastic hill-climbing optimization algorithm that evaluates the objective function for resamplings with increasing size. At the beginning stages, AIR makes state transitions like a random walk, because it uses small resamplings for which evaluation has large error at high probability. At the ending stages, AIR behaves like a local search because it uses large resamplings very close to the entire sample. Thus AIR works similarly as the conventional Simulated Annealing (SA, for short). As a rationale for AIR approximating SA, we show that both AIR and SA can be regarded as a hill-climbing algorithm according to objective function evaluation with stochastic fluctuations. The fluctuation in AIR is explained by the probit, while in SA by the logit. We show experimentally that the logit can be replaced with the probit in MCMC, which is a basis of SA. We also show experimental comparison of SA and AIR for two optimization problems, sparse pivot selection for dimension reduction, and annealing-based clustering. Strictly speaking, AIR must use resampling independently performed at each transition trial. However, it has been demonstrated by experiments that reuse of resampling within a certain number of times can speed up optimization without losing the quality of optimization. In particular, the larger the samples used for evaluation, the more remarkable the superiority of AIR is in terms of speed with respect to SA.
  • Akira Kusaba, Takako Hashimoto, Kilho Shin, David Lawrence Shepard, Tetsuji Kuboyama
    2020 IEEE Region 10 Conference(TENCON) 2020-November 1192-1197 2020年  査読有り
    This paper presents FITS, or Feature-value / Instance Transposition Selection, a method for unsupervised clustering. FITS is a tractable, explicable clustering method, which leverages the unsupervised feature value selection algorithm known as UFVS in the literature. FITS combines repeated rounds of UFVS with alternating steps of matrix transposition to produce a set of homogenous clusters that describe data well. By repeatedly swapping the role of feature and instance and applying the same selection process to them, FITS leverages UFVS's speed and can perform clustering in our experiments in tens milliseconds for datasets of thousands of features and thousands of instances.We performed feature selection-based clustering on two real-world data sets. One is aimed at topic extraction from Twitter data, and the other is aimed at gaining awareness of energy conservation from time-series power consumption data. This study also proposes a novel method based on iterative feature extraction and transposition. The effectiveness of this method is shown in an application of Twitter data analysis. On the other hand, a more straightforward use of feature selection is adopted in the application of time series power consumption data analysis.
  • Naoya Higuchi, Yasunobu Imamura, Vladimir Mic, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    Similarity Search and Applications - 13th International Conference(SISAP) 33-46 2020年  査読有り
  • Akira Kusaba, Kilho Shin, Dave Shepard 0001, Tetsuji Kuboyama
    20th International Conference on Data Mining Workshops 811-819 2020年  査読有り
  • Takako Hashimoto, Kilho Shin, David Lawrence Shepard, Tetsuji Kuboyama
    11th International Conference on Awareness Science and Technology(iCAST) 1-6 2020年  
    This paper presents an analysis of an Indonesian gender equality survey: in 2019, we conducted a survey of attitudes about gender roles in Indonesia and obtained data from 122 individuals. The obtained data were analyzed using our original clustering method (UFVS, Unsupervised Feature Value Selection) to form clusters. The extracted features characterized the clusters and helped to analyze the attitudes of Indonesians towards gender equality. This method allowed the respondents to be grouped by features and each group characteristics could be easily identified. It facilitated the understanding of the survey data.
  • Kilho Shin, Kenta Okumoto, David Lawrence Shepard, Akira Kusaba, Takako Hashimoto, Jorge Amari, Keisuke Murota, Junnosuke Takai, Tetsuji Kuboyama, Hiroaki Ohshima
    Agents and Artificial Intelligence 12613 LNAI 421-444 2020年  査読有り
    The problem of feature selection has been an area of considerable research in machine learning. Feature selection is known to be particularly difficult in unsupervised learning because different subgroups of features can yield useful insights into the same dataset. In other words, many theoretically-right answers may exist for the same problem. Furthermore, designing algorithms for unsupervised feature selection is technically harder than designing algorithms for supervised feature selection because unsupervised feature selection algorithms cannot be guided by class labels. As a result, previous work attempts to discover intrinsic structures of data with heavy computation such as matrix decomposition, and require significant time to find even a single solution. This paper proposes a novel algorithm, named Explainability-based Unsupervised Feature Value Selection (EUFVS), which enables a paradigm shift in feature selection, and solves all of these problems. EUFVS requires only a few tens of milliseconds for datasets with thousands of features and instances, allowing the generation of a large number of possible solutions and select the solution with the best fit. Another important advantage of EUFVS is that it selects feature values instead of features, which can better explain phenomena in data than features. EUFVS enables a paradigm shift in feature selection. This paper explains its theoretical advantage, and also shows its applications in real experiments. In our experiments with labeled datasets, EUFVS found feature value sets that explain labels, and also detected useful relationships between feature value sets not detectable from given class labels.
  • David Lawrence Shepard, Takako Hashimoto, Kilho Shin, Takeaki Uno, Tetsuji Kuboyama
    Digital Humanities 2020 2020年  査読有り
  • Takuya Kida, Tetsuji Kuboyama, Takeaki Uno, Akihiro Yamamoto
    Mach. Learn. 109(6) 1145-1146 2020年  
  • Fumiya Tokuhara, Shiho Okinaga, Tetsuhiro Miyahara, Yusuke Suzuki, Tetsuji Kuboyama, Tomoyuki Uchida
    2019 IEEE 11th International Workshop on Computational Intelligence and Applications, IWCIA 2019 - Proceedings 95-100 2019年11月1日  
    Machine learning and data mining from graph structured data have gained much attention. Many chemical compounds can be expressed by outerplanar graphs. We propose a method for acquiring characteristic block preserving outerplanar graph patterns with wildcards for vertex and edge labels, from positive and negative outerplanar graph data, by Genetic Programming using label connecting information of positive examples. We report experimental results on real chemical compound data and synthetic data.
  • Akira Kusaba, Tetsuji Kuboyama, Takako Hashimoto
    Proceedings of the 10th International Conference on Awareness Science and Technology (iCAST 2019) 1-6 2019年10月23日  査読有り
  • Yasunobu Imamura, Naoya Higuchi, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    ICPRAM 2019 - Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods 173-180 2019年  査読有り
    Annealing by Increasing Resampling (AIR) is a stochastic hill-climbing optimization by resampling with increasing size for evaluating an objective function. In this paper, we introduce a unified view of the conventional Simulated Annealing (SA) and AIR. In this view, we generalize both SA and AIR to a stochastic hill-climbing for objective functions with stochastic fluctuations, i.e., logit and probit, respectively. Since the logit function is approximated by the probit function, we show that AIR is regarded as an approximation of SA. The experimental results on sparse pivot selection and annealing-based clustering also support that AIR is an approximation of SA. Moreover, when an objective function requires a large number of samples, AIR is much faster than SA without sacrificing the quality of the results.
  • Naoya Higuchi, Yasunobu Imamura, Tetsuji Kuboyama, Kouichi Hirata, Takeshi Shinohara
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11919 LNAI 240-252 2019年  査読有り
    A sketch is a lossy compression of high-dimensional data into compact bit strings such as locality sensitive hash. In general, k nearest neighbor search using sketch consists of the following two stages. The first stage narrows down the top K candidates, for some K ≥ k,, using a priority measure of sketch as a filter. The second stage selects the k nearest objects from K candidates. In this paper, we discuss the search algorithms using fast filtering by sketch enumeration without using matching. Surprisingly, the search performance is rather improved by the proposed method when narrow sketches with smaller number of bits such as 16-bits than the conventional ones are used. Furthermore, we compare the search efficiency by sketches of various widths for several databases, which have different numbers of objects and dimensionalities. Then, we can observe that wider sketches are appropriate for larger databases, while narrower sketches are appropriate for higher dimension.
  • Mikio Mizukami, Kouichi Hirata, Tetsuji Kuboyama
    Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods(ICPRAM) 699-706 2019年  
  • Takako Hashimoto, Takeaki Uno, Tetsuji Kuboyama, Kilho Shin, Dave Shepard
    IEEE International Conference on Big Data and Smart Computing, BigComp 2019, Kyoto, Japan, February 27 - March 2, 2019 1-8 2019年  査読有り
  • Takako Hashimoto, Hiroshi Okamoto, Tetsuji Kuboyama, Kilho Shin
    Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 2018- 2740-2745 2018年1月12日  査読有り
    This paper is showing a time series topic life cycle extraction from millions of Tweets using our original community detection technique in bipartite networks. We suppose that the authors role that means who belong to what topics is important to extract quality topics from social media data. We already proposed the topic extraction method that considers the relationship between the authors and the words as bipartite networks and explores the authors role by forming clusters as topics. As the next step, this paper applies our method to the time series topic life cycle detection. We extract topics in different time slots and analyze the time series of topic transition using the coherence measure that expresses the semantic accuracy of topics. The paper demonstrates that our method can detect the topic life cycle such as the growth, the conflicts and so on over time from millions of Tweets.
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida
    IJCIStudies 7(3/4) 270-288 2018年  査読有り
  • Naoya Higuchi, Yasunobu Imamura, Tetsuji Kuboyama, Kouichi Hirata, Takeshi Shinohara
    Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2018, Funchal, Madeira - Portugal, January 16-18, 2018. 356-363 2018年  査読有り
  • Fumiya Tokuhara, Array, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida
    Vietnam J. Computer Science 5(3-4) 229-239 2018年  査読有り
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida
    2017 IEEE 10th International Workshop on Computational Intelligence and Applications, IWCIA 2017 - Proceedings 2017- 191-197 2017年12月13日  査読有り
    Knowledge acquisition from graph structured data is an important task in machine learning and data mining. Block preserving outerplanar graph patterns are graph structured patterns having structured variables and are suited to represent characteristic graph structures of graph data modeled as outerplanar graphs. We propose a learning method for acquiring characteristic multiple block preserving outerplanar graph patterns by evolutionary computation using graph pattern sets as individuals, from positive and negative outerplanar graph data, in order to represent characteristic graph structures more precisely.
  • Kilho Shin, Tetsuji Kuboyama, Takako Hashimoto, Dave Shepard
    Information (Switzerland) 8(4) 159 2017年12月6日  査読有り
    Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain the phenomena that the dataset describes, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, while feature selection algorithms that exhibit excellent accuracy have been developed, they are seldom used for analysis of high-dimensional data because high-dimensional data usually include too many instances and features, which make traditional feature selection algorithms inefficient. To eliminate this limitation, we tried to improve the run-time performance of two of the most accurate feature selection algorithms known in the literature. The result is two accurate and fast algorithms, namely SCWC and SLCC. Multiple experiments with real social media datasets have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, we have two datasets, one with 15,568 instances and 15,741 features, and another with 200,569 instances and 99,672 features. SCWC performed feature selection on these datasets in 1.4 seconds and in 405 seconds, respectively. In addition, SLCC has turned out to be as fast as SCWC on average. This is a remarkable improvement because it is estimated that the original algorithms would need several hours to dozens of days to process the same datasets. In addition, we introduce a fast implementation of our algorithms: SCWC does not require any adjusting parameter, while SLCC requires a threshold parameter, which we can use to control the number of features that the algorithm selects.
  • Yuuki Yamagata, Tetsuhiro Miyahara, Yusuke Suzuki, Tomoyuki Uchida, Fumiya Tokuhara, Tetsuji Kuboyama
    Proceedings - 2017 6th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2017 459-464 2017年11月15日  査読有り
    Knowledge acquisition from graph structured data is an important task in machine learning and data mining. TTSP (Two-Terminal Series Parallel) graphs are used as data models for electric networks and scheduling. We propose a learning method for acquiring characteristic multiple graph structured patterns by evolutionary computation using sets of TTSP graph patterns as individuals, from positive and negative TTSP graph data, in order to represent sets of TTSP graphs more precisely.
  • 福元, 健太郎, 久保山, 哲二
    学習院大学計算機センター年報 38 77-78 2017年  
    application/pdf 研究報告
  • Yasunobu Imamura, Naoya Higuchi, Tetsuji Kuboyama, Kouichi Hirata, Takeshi Shinohara
    Lernen, Wissen, Daten, Analysen (LWDA) Conference Proceedings, Rostock, Germany, September 11-13, 2017. 15 2017年  査読有り
  • Kilho Shin, Tetsuji Kuboyama, Tetsuhiro Miyahara, Kenji Tanaka
    Frontiers in Artificial Intelligence and Applications 299 35-45 2017年  査読有り
    Multiple alignments of strings have been extensively studied as an effective tool to study string-type data such as DNA. In this paper, we generalize the notion of multiple alignments of strings and introduce M -alignments. M -alignments can be defined for arbitrary data objects that consist of a finite number of components. Such objects can be strings, ordered and unordered trees, rooted and unrooted trees, directed and undirected graphs, partially ordered sets and so on. On the other hand, when we introduce costs of M -alignments, the problem to find optimal M -alignments that minimize their costs proves to be NP-hard. To solve this computational problem, we show that the center star algorithm, which is well known approximation algorithm for optimal multiple alignments of strings, can be generalized to M -alignments. When we applied the generalized center star algorithm to a real dataset of glycans, we were successful in identifying effective structural patterns of glycans that characterize the disease of leukemia.
  • Takako Hashimoto, Tetsuji Kuboyama, Hiroshi Okamoto, Kilho Shin
    Information Modelling and Knowledge Bases XXIX, 27th International Conference on Information Modelling and Knowledge Bases (EJC 2017), Krabi, Thailand, June 5-9, 2017. 395-408 2017年  査読有り
  • Takako Hashimoto, Tetsuji Kuboyama, Hiroshi Okamoto, Kilho Shin
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10558 239-247 2017年  査読有り
    This paper proposes a quality topic extraction on Twitter based on author’s role on bipartite networks. We suppose that author’s role which means who were in what group, affects the quality of extracted topics. Our proposed method expresses relations between authors and words as bipartite networks, explores author’s role by forming clusters using our original community detection technique, and finds quality topics considering the semantic accuracy of words and author’s role.
  • David Lawrence Shepard, Takako Hashimoto, Hiroshi Okamoto, Tetsuji Kuboyama, Kilho Shin
    Digital Humanities 2017, DH 2017, Conference Abstracts, McGill University & Université de Montréal, Montréal, Canada, August 8-11, 2017 2017年  査読有り
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I 10191 748-757 2017年  査読有り
    We propose a context-aware fitness function based on feature selection for evolutionary learning of characteristic graph patterns. The proposed fitness function estimates the fitness of a set of correlated individuals rather than the sum of fitness of the individuals, and specifies the fitness of an individual as its contribution degree in the context of the set. We apply the proposed fitness function to our evolutionary learning, based on Genetic Programming, for obtaining characteristic graph patterns from positive and negative graph data. We report some experimental results on our evolutionary learning of characteristic graph patterns, using the context-aware fitness function and a previous fitness function ignoring context.
  • Fumiya Tokuhara, Tetsuhiro Miyahara, Yusuke Suzuki, Tomoyuki Uchida, Tetsuji Kuboyama
    PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016 203-210 2016年  査読有り
    Many chemical compounds can be expressed by a class of graphs called outerplanar graphs. By taking advantage of this tractable class of graphs, we use block preserving outerplanar graph patterns having structured variables for expressing structural features of outerplanar graphs. We propose a method for acquiring characteristic block preserving outerplanar graph patterns from positive and negative outerplanar graph data by Genetic Programming using vertex and edge label information of positive examples. We report experimental results on real chemical compound data.
  • Yasunobu Imamura, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9939 259-267 2016年  査読有り
    Hilbert sort arranges given points of a high-dimensional space with integer coordinates along a Hilbert curve. A naïve method first draws a Hilbert curve of a sufficient resolution to separate all the points, associates integers called Hilbert indices representing the orders along the Hilbert curve to points, and then, sorts the pairs of points and indices. Such a method requires an exponentially large cost with respect to both the dimensionality n of the space and the order m of the Hilbert curve even if obtaining Hilbert indices. A known improved method computes the Hilbert index for each point in O (mn) time. In this paper, we propose an algorithm which directly sorts N points along a Hilbert curve in O (mnN) time without using Hilbert indices. This algorithm has the following three advantages (1) it requires no extra space for Hilbert indices, (2) it handles simultaneously multiple points, and (3) it simulates the Hilbert curve in heterogeneous resolution, that is, in lower order for sparse space and higher order for dense space. It, therefore, runs much faster on random data in O (NlogN) time. Furthermore, it can be expected to run very fast on practical data, such as high-dimensional features of multimedia data.
  • Takako Hashimoto, Dave Shepard, Tetsuji Kuboyama, Kilho Shin
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW) 724-731 2016年  査読有り
    Social media offers a wealth of insight into how significant topics such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing affect individuals. The scale of available data, however, can be intimidating: during the Great East Japan Earthquake, over 8 million tweets were sent each day from Japan alone. Conventional word vector-based topic-detection techniques for social media that use Latent Semantic Analysis, Latent Dirichlet Allocation, or graph community detection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we have already proposed an efficient method for topic extraction by leveraging our original fast feature selection algorithm, CWC, which vastly reduces the number of features to track. While we begin with word count vectors of authors and words for each time slot (in our case, every 30 minutes), we make clusters from each time slot by a matrix decomposition technique to identify clusters and adapt CWC to extract discriminative words from each cluster. This method makes it possible to detect topics from high dimensional datasets. In this paper, to demonstrate our method's effectiveness, we extract topics from a dataset of over two hundred million tweets sent following the Great East Japan Earthquake and compare them with the result extracted by LDA, the current most popular topic extraction method. With CWC, we can identify topics from this dataset with great speed and accuracy.
  • Eina Hashimoto, Masatsugu Ichino, Tetsuji Kuboyama, Isao Echizen, Hiroshi Yoshiura
    SOCIAL MEDIA: THE GOOD, THE BAD, AND THE UGLY 9844 455-470 2016年  査読有り
    A method for de-anonymizing social network accounts is presented to clarify the privacy risks of such accounts as well as to deter their misuse such as by posting copyrighted, offensive, or bullying contents. In contrast to previous de-anonymization methods, which link accounts to other accounts, the presented method links accounts to resumes, which directly represent identities. The difficulty in using machine learning for de-anonymization, i.e. preparing positive examples of training data, is overcome by decomposing the learning problem into subproblems for which training data can be harvested from the Internet. Evaluation using 3 learning algorithms, 2 kinds of sentence features, 238 learned classifiers, 2 methods for fusing scores from the classifiers, and 30 volunteers' accounts and resumes demonstrated that the proposed method is effective. Because the training data are harvested from the Internet, the more information that is available on the Internet, the greater the effectiveness of the presented method.

MISC

 131

教育業績(担当経験のある科目)

 15

共同研究・競争的資金等の研究課題

 33