研究者データベース

中村 篤祥(ナカムラ アツヨシ)
情報科学研究院 情報理工学部門 知識ソフトウェア科学分野
教授

基本情報

所属

  • 情報科学研究院 情報理工学部門 知識ソフトウェア科学分野

職名

  • 教授

学位

  • 博士(理学)(東京工業大学)

ホームページURL

科研費研究者番号

  • 50344487

J-Global ID

研究キーワード

  • アルゴリズム   知識発見とデータマイニング   汎化性能   木構造   オンライン学習   能動学習   文字列アルゴリズム   計算論的学習理論   機械学習   データマイニング   

研究分野

  • 情報通信 / 知能情報学
  • 情報通信 / 知能情報学
  • 情報通信 / 知能ロボティクス
  • 情報通信 / 知覚情報処理

職歴

  • 2021年04月 - 現在 北海道大学 大学院情報科学研究科 教授
  • 2002年07月 - 2021年03月 北海道大学 大学院情報科学研究院 准教授
  • 1988年04月 - 2002年06月 日本電気株式会社

学歴

  • 2000年02月 -   東京工業大学   大学院理工学研究科   博士(理学)
  • 1986年04月 - 1988年03月   東京工業大学   大学院理工学研究科   情報科学専攻
  • 1982年04月 - 1986年03月   東京工業大学   理学部   情報科学科

所属学協会

  • 人工知能学会   電子情報通信学会   情報処理学会   

研究活動情報

論文

  • Yuya Sugie, Yuki Yoshida, Normann Mertig, Takashi Takemoto, Hiroshi Teramoto, Atsuyoshi Nakamura, Ichigaku Takigawa, Shin-ichi Minato, Masanao Yamaoka, Tamiki Komatsuzaki
    Soft Computing 25 3 1731 - 1749 2021年02月 [査読有り]
  • Koji Tabata, Atsuyoshi Nakumura, Tamiki Komatsuzaki
    PAKDD 2021 Workshop on Machine Learning for MEasurement INformatics (MLMEIN 2021) 57 - 69 2021年 [査読有り]
  • Mitsuki Maekawa, Atsuyoshi Nakamura, Mineichi Kudo
    ACML 2020 241 - 256 2020年11月 [査読有り]
  • Atsuyoshi Nakamura, Ichigaku Takigawa, Hiroshi Mamitsuka
    Proceedings of the AAAI Conference on Artificial Intelligence 34 04 5240 - 5247 2020年04月03日 [査読有り][通常論文]
     
    We propose new frequent substring pattern mining which can enumerate all substrings with statistically significant frequencies of their locally optimal occurrences from a given single sequence. Our target application is genome sequences, around a half being said to be covered by interspersed and consecutive (tandem) repeats, and detecting these repeats is an important task in molecular life sciences. We evaluate the statistical significance of frequent substrings by using a string generation model with a memoryless stationary information source. We combine this idea with an existing algorithm, ESFLOO-0G.C (Nakamura et al. 2016), to enumerate all statistically significant substrings with locally optimal occurrences. We further develop a parallelized version of our algorithm. Experimental results using synthetic datasets showed the proposed algorithm achieved far higher F-measure in extracting substrings (with various lengths and frequencies) embedded in a randomly generated string with noise, than conventional algorithms. The large-scale experiment using the whole human genome sequence with 3,095,677,412 bases (letters) showed that our parallel algorithm covers 75% of the whole positions analyzed, around 4% and 24% higher than the recent report and the current cutting-edge knowledge, implying a biologically unique finding.
  • Atsuyoshi Nakamura, Kento Sakurada
    Machine Learning and Knowledge Discovery in Databases - European Conference, ECMLPKDD2019 578 - 589 2020年 [査読有り]
  • Taiga Ikeda, Kento Sakurada, Atsuyoshi Nakamura, Masato Motomura, Shinya Takamaeda-Yamazaki
    345 - 357 2020年 [査読有り][通常論文]
  • Tabata, Koji, Nakamura, Atsuyoshi, Honda, Junya, Komatsuzaki, Tamiki
    MACHINE LEARNING 109 2 2019年10月 [査読有り][通常論文]
     
    We study a bad arm existence checking problem in a stochastic K-armed bandit setting, in which a player's task is to judge whether a positive arm exists or all the arms are negative among given K arms by drawing as small number of arms as possible. Here, an arm is positive if its expected loss suffered by drawing the arm is at least a given threshold theta(U), and it is negative if that is less than another given threshold theta(L) (<= theta(U)). This problem is a formalization of diagnosis of disease or machine failure. An interesting structure of this problem is the asymmetry of positive and negative arms' roles; finding one positive arm is enough to judge positive existence while all the arms must be discriminated as negative to judge whole negativity. In the case with Delta = theta(U) - theta(L) > 0, we propose elimination algorithms with arm selection policy (policy to determine the next arm to draw) and decision condition (condition to conclude positive arm's existence or the drawn arm's negativity) utilizing this asymmetric problem structure and prove its effectiveness theoretically and empirically.
  • Atsuyoshi Nakamura, David P. Helmbold, Manfred K. Warmuth
    Inf. Comput. 269 2019年 [査読有り][通常論文]
  • Hideaki Kano, Junya Honda, Kentaro Sakamaki, Kentaro Matsuura, Atsuyoshi Nakamura, Masashi Sugiyama
    Mach. Learn. 108 5 721 - 745 2019年 [査読有り][通常論文]
  • Aurélien Pélissier, Atsuyoshi Nakamura, Koji Tabata
    Proceedings of the 2019 SIAM International Conference on Data Mining, SDM 2019, Calgary, Alberta, Canada, May 2-4, 2019. 450 - 458 SIAM 2019年 [査読有り][通常論文]
  • Ryo Watanabe, Junpei Komiyama, Atsuyoshi Nakamura, Mineichi Kudo
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E101A 3 662 - 667 2018年03月01日 [査読有り][通常論文]
     
    We propose a policy UCB-SC for budgeted multi-armed bandits. The policy is a variant of recently proposed KL-UCB-SC. Unlike KL-UCB-SC, which is computationally prohibitive, UCB-SC runs very fast while keeping KL-UCB-SC's asymptotical optimality when reward and cost distributions are Bernoulli with means around 0.5, which are verified both theoretically and empirically.
  • Takashi Takemoto, Normann Mertig, Masato Hayashi, Saki Susa-Tanaka, Hiroshi Teramoto, Atsuyoshi Nakamura, Ichigaku Takigawa, Shin-ichi Minato, Tamiki Komatsuzaki, Masanao Yamaoka
    2018 International Conference on ReConFigurable Computing and FPGAs, ReConFig 2018, Cancun, Mexico, December 3-5, 2018 1 - 8 IEEE 2018年 [査読有り][通常論文]
  • Yuya Sugie, Array,Array,Array, Hiroshi Teramoto, Array,Array, Shin-ichi Minato, Masanao Yamaoka,Array
    Theory and Practice of Natural Computing - 7th International Conference, TPNC 2018, Dublin, Ireland, December 12-14, 2018, Proceedings 111 - 123 Springer 2018年 [査読有り][通常論文]
  • Ryo Watanabe, Junpei Komiyama, Atsuyoshi Nakamura, Mineichi Kudo
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E100A 11 2470 - 2486 2017年11月01日 [査読有り][通常論文]
     
    We study the budgeted multi-armed bandit problem with stochastic action costs. In this problem, a player not only receives a reward but also pays a cost for an action of his/her choice. The goal of the player is to maximize the cumulative reward he/she receives before the total cost exceeds the budget. In the classical multi-armed bandit problem, a policy called KL-UCB is known to perform well. We propose KL-UCB-SC, an extension of this policy for the budgeted bandit problem. We prove that KL-UCB-SC is asymptotically optimal for the case of Bernoulli costs and rewards. To the best of our knowledge, this is the first result that shows asymptotic optimality in the study of the budgeted bandit problem. In fact, our regret upper bound is at least four times better than that of BTS, the best known upper bound for the budgeted bandit problem. Moreover, an empirical simulation we conducted shows that the performance of a tuned variant of KL-UCB-SC is comparable to that of state-of-the-art policies such as PD-BwK and BTS.
  • Koji Tabata, Atsuyoshi Nakamura, Mineichi Kudo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E100D 5 994 - 1002 2017年05月 [査読有り][通常論文]
     
    We propose a heuristic approximation algorithm for the 1-median problem. The 1-median problem is the problem of finding a vertex with the highest closeness centrality. Starting from a randomly selected vertex, our algorithm repeats to find a vertex with higher closeness centrality by approximately calculating closeness centrality of each vertex using simpler spanning subgraphs, which are called k-neighbor dense shortest path graphs with shortcuts. According to our experimental results using real networks with more than 10,000 vertices, our algorithm is more than 100 times faster than the exhaustive search and more than 20 times faster than the state-of-the-art approximation algorithm using annotated information to the vertices while the solutions output by our algorithm have higher approximation ratio.
  • Atsuyoshi Nakamura, Ichigaku Takigawa, Hisashi Tosaka, Mineichi Kudo, Hiroshi Mamitsuka
    DISCRETE APPLIED MATHEMATICS 200 123 - 152 2016年02月 [査読有り][通常論文]
     
    We consider a frequent approximate pattern mining problem, in which interspersed repetitive regions are extracted from a given string. That is, we enumerate substrings that frequently match substrings of a given string locally and optimally. For this problem, we propose a new algorithm, in which candidate patterns are generated without duplication using the suffix tree of a given string. We further define a k-gap-constrained setting, in which the number of gaps in the alignment between a pattern and an occurrence is limited to at most k. Under this setting, we present memory-efficient algorithms, particularly a candidate-based version, which runs fast enough even over human chromosome sequences with, more than 10 million nucleotides. We note that our problem and algorithms for strings can be directly extended to ordered labeled trees. In our experiments we used both randomly synthesized strings, in which corrupted similar substrings are embedded, and real data of human chromosome. The synthetic data experiments show that our proposed approach extracted embedded patterns correctly and time-efficiently. In real data experiments, we examined the centers of 100 clusters computed after grouping the patterns obtained by our k-gap-constrained versions (k = 0, 1 and 2) and the results revealed that the regions of their occurrences coincided with around a half of the regions automatically annotated as Alu sequences by a manually curated repeat sequence database. (C) 2015 Elsevier B.V. All rights reserved.
  • Shunsuke Suzuki, Mineichi Kudo, Atsuyoshi Nakamura
    2016 IEEE INTERNATIONAL CONFERENCE ON IDENTITY, SECURITY AND BEHAVIOR ANALYSIS (ISBA) 1 - 6 2016年 [査読有り][通常論文]
     
    It is well known that taking wrong sitting posture all day long is harmful for health. However, quantifying the degree of collapse of posture is not so easy. Typically, checking the video of sitting state or examining the spinal curves in radiograph is made so far, but it needs a clinical specialist for diagnosis. In this paper, for attaining this goal more easily, we give a device usable in daily life. We measure a time series of the pressure distributions obtained from a mat with 16x16 pressure sensors put on a chair. More concretely we analyze the sitting person's move direction and the amount, and make a daily diagnosis report aiming at helping his/her own notice. Through the experiments using two subjects who spent around three hours at the chair, a possibility of automatic and daily-basis diagnosis of sitting posture is demonstrated.
  • Atsuyoshi Nakamura, David P. Helmbold, Manfred K. Warmuth
    LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, LATA 2016 9618 412 - 423 2016年 [査読有り][通常論文]
     
    We study the loss version of adversarial multi-armed bandit problems with one lossless arm. We show an adversary's strategy that forces any player to suffer K - 1 - O(1/T) loss where K is the number of arms and T is the number of rounds.
  • Ryo Watanabe, Atsuyoshi Nakamura, Mineichi Kudo
    OPERATIONS RESEARCH LETTERS 43 6 558 - 563 2015年11月 [査読有り][通常論文]
     
    We improved an upper bound on the expected regret of a UCB-type policy LLR for a bandit problem that repeats the following rounds: a player selects a maximal matching on a complete bipartite graph K-M,K-N and receives a reward for each component edge of the selected matching. Rewards are assumed to be generated independently of its previous rewards according to an unknown fixed distribution. Our upper bound is smaller than the best known result (Chen et al., 2013) by a factor of Theta (M-2/3). (C) 2015 Elsevier B.V. All rights reserved.
  • Koji Tabata, Atsuyoshi Nakamura, Mineichi Kudo
    DISCOVERY SCIENCE, DS 2015 9356 275 - 283 2015年 [査読有り][通常論文]
     
    We developed an algorithm to exactly solve an influence maximization problem (MaxInf) for a two-terminal series parallel graph (TTSPG) in the independent cascade model. The class of TTSPGs can be considered as a class wider than that of trees, only for which an efficient exact solver of this problem has been developed so far. Our algorithm calculates candidate node sets in the divide-and-conquer manner keeping the number of them as small as possible by efficiently eliminating unnecessary ones in merge of subproblems' solutions. Furthermore, we propose a way of converting an arbitrary network to a TTSPG with edges important for propagation to apply our method to real networks. According to our empirical results, our method is significantly faster than the greedy approximation algorithm for MaxInf of a TTSPG. We also demonstrate improvement of solutions by converting to TTSPGs instead of trees using real networks made from DBLP datasets.
  • Ayako Mikami, Mineichi Kudo, Atsuyoshi Nakamura
    MULTIPLE CLASSIFIER SYSTEMS (MCS 2015) 9132 27 - 37 2015年 [査読有り][通常論文]
     
    Ensemble learning is a strong tool to strengthen weak classifiers. A large amount of diversity among those weak classifiers is a key to accelerate the effectiveness. Therefore, many diversity measures on a given training sample set have been proposed so far. However, they are almost all based on the oracle output that is one if the class predicted by the classifier is correct, zero otherwise. We point out such an oracle output scheme is not appropriate for the problems of more than two classes, and extend one of the most popular diversity measures, disagreement measure, to multi-class cases. On the other hand, the concept of margin has been recognized as an analytic tool to measure the generalization performance of a given classifier. Therefore, we analyze when some criteria for maximizing margins of an ensemble classifier over training samples are maximized under the assumption that the average accuracy of the base classifiers is constant. We also reveal the relationship between those criteria and the extended disagreement measure. As a result, it turns out that diversity is necessary not only over samples but also over predicted classes, if we want to extract the highest potential of ensemble.
  • Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi Nakamura
    THEORETICAL COMPUTER SCIENCE 530 23 - 41 2014年04月 [査読有り][通常論文]
     
    In this paper we consider the problem of similar substring searching in the q-gram distance. The q-gram distance d(q)(x, y) is a similarity measure between two strings x and y defined by the number of different q-grams between them. The distance can be used instead of the edit distance due to its lower computation cost, O(|x| + |Y|) vs. O(|x||Y|). and its good approximation for the edit distance. However, if this distance is applied to the problem of finding all similar strings, in a long text t, to a given pattern p, the total computation cost is sometimes not acceptable. Ukkonen already proposed two fast algorithms: one with an array and the other with a tree. When "similar" means k or less in dq, their time complexities are O(|t|k + |P|) and O(|t| log k + |p|). respectively. In this paper, we propose two algorithms of average-case complexity O(|t| + |p|). although their worst-case complexities are still O(|t|k + |P|) and O(|t| log k + |p|). respectively. The linearity of the average-case complexity is analyzed under the assumption of random sampling of t and the condition that q is larger than a threshold. The algorithms exploit the fact that similar substrings in t are often found at very close positions if the beginning positions of the substrings are close. In the second proposed algorithm, we adopted a doubly-linked list supported by an array and a search tree to search for a list element in O(log k) time. Experimental results support their theoretical average-case complexities. (C) 2014 Elsevier B.V. All rights reserved.
  • Koji Ouchi, Atsuyoshi Nakamura, Mineichi Kudo
    PATTERN RECOGNITION 47 3 1459 - 1468 2014年03月 [査読有り][通常論文]
     
    We develop efficient construction methods of a rectangle greedy cover (RGC), and evaluate its usefulness in applications. An RGC is a greedy cover of the set of given positive instances by exclusive axis-parallel hyperrectangles, namely, axis-parallel hyperrectangles that exclude all the given negative instances. An RGC is expected to be a compact classification rule with high readability because the number of its component rectangles is expected to be small and it can be seen as a disjunctive normal form, which is one of the most readable representations for us. We propose two approaches of RGC construction: enumeration approach and direct approach. In enumeration approach, the maximal exclusive positive subsets (MEPSs) are enumerated first and then an ordinary greedy set covering is done using the enumerated MEPSs. We make clear the relation between enumeration of the maximal frequent itemsets and enumeration of the MEPSs, and convert an efficient enumeration algorithm LCMmax [1] of maximal frequent itemsets to an enumeration algorithm LCMmax.R-naive of MEPSs. We also develop a more efficient version of LCMmax.R-naive, or LCMmax.R, by incorporating effective dynamic reordering of instances using excluded frequency and bit-parallel exclusiveness check. In direct approach, each component MEPS of an RGC is searched not from enumerated MEPSs but directly from the dataset that consists of the remaining uncovered positive instances and the whole negative instances. We developed an algorithm called MRF that efficiently finds an maximum-sized MEPS for given positive and negative instances. MRF is made from LCMmax.R by modifying it so as to find a maximum-sized MEPS only. An RGC is constructed by MRF repetition, that is, by repeatedly executing MRF using the remaining uncovered positive instances. According to our experimental evaluation using UCI-repository datasets, LCMmax.R was about 5-11 times faster than LCMmax.Rnaive, which indicates effectiveness of the introduced two improvements. MRF repetition, however, was significantly faster than LCMmax.R, and it was fast enough for practical usage. The experimental results using UCI-repository datasets also showed that accuracy of a nearest rectangle classifier using an RGC is close to that using the hyperrectangles output by the randomized subclass method (RSM) [2] though the number of component rectangles of an RGC is significantly smaller than the number of the hyperrectangles output by RSM. The performance of RGC was also shown to be comparable to that of the six popular classifiers including logistic regression and support vector machine. The disjunctive normal form representation of the classification rules obtained by RGC was demonstrated to be simpler and more readable for us than that obtained by RSM and C4.5. (C) 2013 Elsevier Ltd. All rights reserved.
  • Atsuyoshi Nakamura
    Proceedings of the Sixth Asian Conference on Machine Learning, ACML 2014, Nha Trang City, Vietnam, November 26-28, 2014. JMLR.org 2014年 [査読有り][通常論文]
  • Atsuyoshi Nakamura, Tomoya Saito, Ichigaku Takigawa, Mineichi Kudo, Hiroshi Mamitsuka
    Discrete Applied Mathematics 161 10-11 1556 - 1575 2013年07月 [査読有り][通常論文]
     
    A string with many repetitions can be represented compactly by replacing h-fold contiguous repetitions of a string r with (r)h. We present a compact representation, which we call a repetition representation (of a string) or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined by the sum of the length of component letters and the description length of the component repetitions (ṡ)h which is defined by ER (h) using a repetition weight function wR. We develop two dynamic programming-based algorithms to solve this problem: CMR, which works for any type of wR, and CMR-C, which is faster but can be applied to a constant wR only. CMR-C is an O(n2logn)-time O(nlogn)-space algorithm, which is more efficient in both time and space than CMR by a ((logn)/n)-factor, where n is the length of the given string. The problem of finding an MRRS for a string can be extended to that of finding a minimum repetition representation (of a tree) or MRRT for a given labeled ordered tree. For this problem, we present two algorithms, CMRT and CMRT-C, by using CMR and CMR-C, respectively, as a subroutine. As well as the theoretical analysis, we confirmed the efficiency of the proposed algorithms by experiments, which consist of the following three parts: First we demonstrated that CMR-C and CMRT-C are fast enough for large-scale data by using synthetic strings and trees, respectively. The size of an MRRS for a given string can be a measure of how compactly the string can be represented, meaning how well the string is structurally organized. This is also true of trees. To check such ability of MRRS-size, second we measured the size of an MRRS for chromosomes of nine different species. We found that all the chromosomes of the same species have a similar compression rate when realized by an MRRS. Run length encoding (RLE) was also shown to have species-specific compression rate, but species were separated more clearly by MRRS than by RLE. Third we examined the size of an MRRT for web pages of world-leading companies by using the tag trees, showing a consistency between the compression rate by an MRRT and visual web page structures. © © 2013 Elsevier B.V. All rights reserved.
  • Koji Tabata, Atsuyoshi Nakamura, Mineichi Kudo
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7569 169 - 183 2012年 [査読有り][通常論文]
     
    We present a fast approximation algorithm for the 1-median problem. Our algorithm can be applied to metric undirected graphs with node weight. Given a node v, our algorithm repeatedly finds a better node by making use of a shortest path tree of the previous node. We empirically show that our algorithm runs much faster and has better approximation ratio than a sophisticated existing method called DTZ. We demonstrate the effectiveness of our algorithm through experiments. © 2012 Springer-Verlag Berlin Heidelberg.
  • Koji Tabata, Atsuyoshi Nakamura, Mineichi Kudo
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7569 169 - 183 2012年 [査読有り][通常論文]
     
    We present a fast approximation algorithm for the 1-median problem. Our algorithm can be applied to metric undirected graphs with node weight. Given a node v, our algorithm repeatedly finds a better node by making use of a shortest path tree of the previous node. We empirically show that our algorithm runs much faster and has better approximation ratio than a sophisticated existing method called DTZ. We demonstrate the effectiveness of our algorithm through experiments. © 2012 Springer-Verlag Berlin Heidelberg.
  • Atsuyoshi Nakamura, Hisashi Tosaka, Mineichi Kudo
    Proceedings of the 2012 IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012 54 - 59 2012年 [査読有り][通常論文]
     
    We propose a novel frequent approximate pattern mining in which occurrences themselves are valuable regions to extract. Given a string , our mining task is to enumerate its substrings that locally optimally match many substrings of . We show an algorithm for this problem whose time and space complexities are (3) for a string with length . According to our experiments using synthetic data, our algorithm can correctly extract embedded frequent patterns as both patterns and their occurrences even if the embedded patterns are corrupted by random edit operations. © 2012 IEEE.
  • Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi Nakamura
    2012 6TH INTERNATIONAL CONFERENCE ON NEW TRENDS IN INFORMATION SCIENCE, SERVICE SCIENCE AND DATA MINING (ISSDM2012) 544 - 549 2012年 [査読有り][通常論文]
     
    The q-gram distance d(q)(x, y) between two strings x and y is a string similarity measure correlated with a famous string distance: the edit distance. In addition, it can be computed much faster, in linear (O(vertical bar x vertical bar + vertical bar y vertical bar) time, than the edit distance in quadratic (O(vertical bar x parallel to y vertical bar) time, where I. I denotes the stmg length. However, it does not mean that we can find all substrmgs of a text t similar to a pattern p in linear time. In this paper we will propose a searching algorithm achieving quasi-linear (O(vertical bar t vertical bar log vertical bar p vertical bar + vertical bar p vertical bar) time by the q-gram distance.
  • Tetsuji Takahashi, Mineichi Kudo, Atsuyoshi Nakamura
    PATTERN RECOGNITION LETTERS 32 16 2224 - 2230 2011年12月 [査読有り][通常論文]
     
    We propose an algorithm to approximate each class region by a small number of approximated convex hulls and to use these for classification. The classifier is one of non-kernel maximum margin classifiers. It keeps the maximum margin in the original feature space, unlike support vector machines with a kernel. The construction of an exact convex hull requires an exponential time in dimension, so we find an approximate convex hull (a polyhedron) instead, which is constructed in linear time in dimension. We also propose a model selection procedure to control the number of faces of convex hulls for avoiding over-fitting, in which a fast procedure is adopted to calculate an upper-bound of the leave-one-out error. In comparison with support vector machines, the proposed approach is shown to be comparable in performance but more natural in the extension to multi-class problems. (C) 2011 Elsevier B.V. All rights reserved.
  • Takashi Saso, Kojiro Kobayashi, Atsuyoshi Nakamura
    INFORMATION PROCESSING LETTERS 111 12 595 - 599 2011年06月 [査読有り][通常論文]
     
    Block sorting in the Burrows-Wheeler transformation is to sort all of the n circular shifts of a string of length n lexicographically. We introduce a notion called the width of a sequence of n strings of length n and show that the values of widths are very different between the two types of sequences of strings; (1) a sequence of n randomly generated strings of length n, and (2) the sequence of n circular shifts of a randomly generated string of length n. (C) 2011 Elsevier B.V. All rights reserved.
  • Atsuyoshi Nakamura, Mineichi Kudo
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II 6635 234 - 245 2011年 [査読有り][通常論文]
     
    We study an alignment called a packing alignment that is an alignment for sequences of various length events like musical notes. One event in a packing alignment can have a number of consecutive opposing events unless the total length of them exceeds the length of that one event. Instead of using a score function that depends on event length, which was studied by Mongeau and Sankoff [5], packing alignment deals with event lengths explicitly using a simple score function. This makes the problem clearer as an optimization problem. Packing alignment can be calculated efficiently using dynamic programming. As an application of packing alignment, we conducted experiments on frequent approximate pattern extraction from MIDI files of famous musical variations. The patterns and occurrences extracted from the variations using packing alignment have more appropriate boundaries than those using conventional string alignments from the viewpoints of the repetition structure of the variations.
  • Koji Ouchi, Atsuyoshi Nakamura, Mineichi Kudo
    Proceedings - 2011 IEEE International Conference on Granular Computing, GrC 2011 533 - 538 2011年 [査読有り][通常論文]
     
    We discuss efficient construction and usefulness of greedy covers of positive instances by axis-parallel rectangles that exclude negative instances. A rectangle greedy cover is expected to be a simple classification rule with high readability because the number of its component rectangles is expected to be small and it can be seen as a disjunctive normal form, which is one of the most readable representations. We develop efficient construction methods of rectangle greedy covers by making use of algorithms for efficient maximal frequent itemsets. We empirically demonstrate that, for high dimensional datasets, the method of finding a component rectangle one by one without enumerating candidate covering component sets is faster than the method with independent greedy covering process after the enumeration. In our classification experiments using 10 datasets in UCI repository, rectangle greedy covers (RGC) were shown to have classification performance comparable to the randomized subclass method (RSM) developed in [1], which is a conventional classification method using rectangles, though RGC used significantly smaller number of rectangles. The performance of RGC was also shown to be comparable to that of popular classifiers such as logistic regression and SVM. The DNF-representation of the classification rules obtained by RGC was demonstrated to be simpler than that obtained by RSM and C4.5 through our experiment. © 2011 IEEE.
  • Hiroyuki Hanada, Atsuyoshi Nakamura, Mineichi Kudo
    Proceedings - 2011 IEEE International Conference on Granular Computing, GrC 2011 231 - 236 2011年 [査読有り][通常論文]
     
    The edit distance is a basic string similarity measure for many applications such as string searching, text mining, signal processing, bioinformatics and so on. However, its high computational cost often prevents it from being used for a large set of strings like similar string searching. A promising solution for the problem is to approximate the edit distance with low computational cost. However, although there are many methods for approximating the edit distance, most of them are analyzed only theoretically. In fact, most of the methods can evaluate the edit distance only in terms of order notations, and do not conduct any experiment. This is a large obstacle for applying them to real applications. In this study we will first list up existing edit distance approxi-mation methods. Then we compare them by: (i) approximation performances shown by the original authors, (ii) approximation performances re-analyzed by us (concrete values instead of the order notations) and (iii) experimental performances by our implementations. © 2011 IEEE.
  • Masashi Sugiyama, Tomoyuki Higuchi, Tsuyoshi Ide, Akihiro Inokuchi, Toshihiro Kamishima, Hiroyuki Minami, Shinichi Nakajima, Atsuyoshi Nakamura, Koichi Shinoda, Koji Tsuda, Takashi Washio
    IEICE Transactions on Information and Systems E93-D 10 2671  2010年10月 [査読有り][通常論文]
  • Atsuyoshi Nakamura, Tomoya Saito, Ichigaku Takigawa, Hiroshi Mamitsuka, Mineichi Kudo
    STRING PROCESSING AND INFORMATION RETRIEVAL 6393 185 - + 2010年 [査読有り][通常論文]
     
    A string with many repetitions can be written compactly by replacing h-fold contiguous repetitions of substring r with (r)h. We refer to such a compact representation as a repetition representation string or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined to be the sum of its component letter sizes and the sizes needed to describe the repetitions (.)h which are defined as w(R)(h) using a repetition weight function w(R). We develop two dynamic programming algorithms to solve the problem. One is CMR that works for any repetition weight function, and the other is CMR-C that is faster but can be applied only when the repetition weight function is constant. CMR-C is an O(w(n + z))-time algorithm using O(n + z) space for a given string with length n, where w and z are the number of distinct primitive tandem repeats and the number of their occurrences, respectively. Since w = O(n) and z = O(n log n) in the worst case, CMR-C is an O(n(2) log n)-time O(n log n)-space algorithm, which is faster than OMR by ((log n)/n)-factor.
  • Taishi Uchiya, Atsuyoshi Nakamura, Mineichi Kudo
    ALGORITHMIC LEARNING THEORY, ALT 2010 6331 375 - 389 2010年 [査読有り][通常論文]
     
    Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k(>= 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms (Exp3, ComBand [5]) in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3, called Exp3. M, performs best with respect to all the measures: it runs in O(K (log k + 1) time and O(K) space, and suffers at most O(root kTK log (K/k)) regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3. M is an extension of that proved by Auer et al. for Exp3.
  • Ichigaku Takigawa, Mineichi Kudo, Atsuyoshi Nakamura
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 22 1 101 - 108 2009年02月 [査読有り][通常論文]
     
    We propose a general framework for nonparametric classification of multi-dimensional numerical patterns. Given training points for each class, it builds a set cover with convex sets each of which contains some training points of the class but no points of the other classes. Each convex set has thus an associated class label, and classification of a query point is made to the class of the convex set such that the projection of the query point onto its boundary is minimal. In this sense, the convex sets of a class are regarded as "prototypes" for that class. We then apply this framework to two special types of convex sets, minimum enclosing balls and convex hulls, giving algorithms for constructing a set cover with them and for computing the projection length onto their boundaries. For convex hulls, we also give a method for implicitly evaluating whether a point is contained in a convex hull, which can avoid computational difficulty for explicit construction of convex hulls in high-dimensional space. (C) 2008 Elsevier Ltd. All rights reserved.
  • Satoshi Shirai, Mineichi Kudo, Atsuyoshi Nakamura
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS 5519 22 - 31 2009年 [査読有り][通常論文]
     
    We compared boosting with bagging in different strengths of learning algorithms for improving the performance of the set of classifiers to be fused. Our experimental results showed that boosting worked much with weak algorithms and bagging, especially feature-based bagging, worked much with strong algorithms. On the basis of these observations we developed a mixed fusion method in which randomly chosen features are used with a standard boosting method. As a result, it was confirmed that the proposed fusion method worked well regardless of learning algorithms.
  • Tetsuji Takahashi, Mineichi Kudo, Atsuyoshi Nakamura
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS 5856 441 - 448 2009年 [査読有り][通常論文]
     
    We consider an algorithm to approximate each class region by a small number of convex hulls and to apply them to classification. The convex hull of a finite set of points is computationally hard to be constructed in high dimensionality. Therefore, instead of the exact convex hull, we find an approximate convex hull (a polyhedron) in a time complexity that is linear in dimension. On the other hand, the set of such convex hulls is often too much complicated for classification. Thus we control the complexity by adjusting the number of faces of convex hulls. For reducing the computational time, we use an upper bound of the leave-one-out estimated error to evaluate the classifiers.
  • Satoshi Shirai, Mineichi Kudo, Atsuyoshi Nakamura
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION 5342 801 - 810 2008年 [査読有り][通常論文]
     
    In recent years, many approaches for achieving high performance by combining some classifiers have been proposed. We exploit, many random replicates of samples in the bagging, and randomly chosen feature subsets in the random subspace method. In this paper, we introduce a method for selecting both samples and features at, the same, time and demonstrate the effectiveness of the method. This method includes a parametric bagging and a parametric random subspace method as special cases. lit some experiments, this method and the parametric random subspace method showed the best performance.
  • Yohji Shidara, Mineichi Kudo, Atsuyoshi Nakamura
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6 571 - 574 2008年 [査読有り][通常論文]
     
    Associative classifiers that utilize association rules have been widely studied. It has been shown that associative classifiers often outperform traditional classifiers. Associative classifiers usually find only rules with high support values, because reducing the minimum support to be satisfied increases computational cost. However, rules with low support but high confidence may contribute to classification. We have proposed an approach to build a classifier composed of almost all consistent (100% confident) rules. The proposed classifier was extended by introducing item reduction and bagging in order to relax the constraint of consistency, which resulted in slightly increased performance for 26 datasets from the UCI machine learning repository.
  • Mineichi Kudo, Atsuyoshi Nakamura, Ichigaku Takigawa
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6 2260 - 2263 2008年 [査読有り][通常論文]
     
    A set of convex bodies including samples of a single class only is used for classification. The convex body is defined by some facets (hyper-planes) that separate the class front the other classes. This paper describes ail algorithm to find a set of such convex bodies efficiently and examine the performance of a classifier using them. The relationship to the support vector machines is also discussed.
  • Atsuyoshi Nakamura, Mineichi Kudo
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS 482 - 491 2008年 [査読有り][通常論文]
     
    We study the problem of enumerating concepts in a Sperner family concept class using subconcept queries [7], which is a general problem including maximal frequent itemset mining as its instance. Though even the theoretically best known algorithm needs quasi-polynomial time to solve this problem in the worst case, there exist practically fast algorithms for this problem. This is because many instances of this problem in real world have low complexity in some measures. In this paper, we characterize the complexity of Sperner family concept class by the VC dimension of its intersection closure and its characteristic dimension, and analyze the worst case time complexity on the enumeration problem of its concepts in terms of the VC dimension. We also showed that the VC dimension of real data used in data mining is actually small by calculating the VC dimension of some real datasets using a new algorithm closely related to the introduced two measures, which does not only solve the problem but also let us know the VC dimension of the intersection closure of the target concept class.
  • Yohji Shidara, Mineichi Kudo, Atsuyoshi Nakamura
    Trans. MLDM 1 1 17 - 30 2008年 [査読有り][通常論文]
  • Yohji Shidara, Atsuyoshi Nakamura, Mineichi Kudo
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS 4571 490 - + 2007年 [査読有り][通常論文]
     
    We propose a novel approach which extracts consistent (100% confident) rules and builds a classifier with them. Recently, associative classifiers which utilize association rules have been widely studied. Indeed, the associative classifiers often outperform the traditional classifiers. In this case, it is important to collect high quality (association) rules. Many algorithms find only high support rules, because decreasing the minimum support to be satisfied is computationally demanding. However, it may be effective to collect low support but high confidence rules. Therefore, we propose an algorithm that produces a wide variety of 100% confident rules including low support rules. To achieve this goal, we adopt a specific-to-general rule searching strategy, in contrast to the previous many approaches. Our experimental results show that the proposed method achieves higher accuracies in several datasets taken from UCI machine learning repository.
  • Hisashi Tosaka, Atsuyoshi Nakamura, Mineichi Kudo
    DISCOVERY SCIENCE, PROCEEDINGS 4755 286 - + 2007年 [査読有り][通常論文]
     
    We study a novel problem of mining subtrees with frequent occurrence of similar subtrees, and propose an algorithm for this problem. In our problem setting, frequency of a subtree is counted not only for equivalent subtrees but also for similar subtrees. According to our experiment using tag trees of web pages, this problem can be solved fast enough for practical use. An encouraging result was obtained in a preliminary experiment for data record extraction from web pages using our mining method.
  • M Kudo, A Nakamura
    FEDERATION OVER THE WEB 3847 79 - 96 2006年 [査読有り][通常論文]
     
    We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.
  • Atsuyoshi Nakamura
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS 4264 378 - 392 2006年 [査読有り][通常論文]
     
    In this paper, we study learning-related complexity of linear ranking functions from n-dimensional Euclidean space to {1, 2,..., k}. We show that their graph dimension, a kind of measure for PAC learning complexity in the multiclass classification setting, is Theta(n + k). This graph dimension is significantly smaller than the graph dimension Omega(nk) of the class of {1, 2,..., k}-valued decision-list functions naturally defined using k - 1 linear discrimination functions. We also show a risk bound of learning linear ranking functions in the ordinal regression setting by a technique similar to that used in the proof of an upper bound of their graph dimension.
  • A Nakamura, M Schmitt, N Schmitt, HU Simon
    JOURNAL OF MACHINE LEARNING RESEARCH 6 1383 - 1403 2005年09月 [査読有り][通常論文]
     
    Bayesian networks have become one of the major models used for statistical inference. We study the question whether the decisions computed by a Bayesian network can be represented within a low-dimensional inner product space. We focus on two-label classification tasks over the Boolean domain. As main results we establish upper and lower bounds on the dimension of the inner product space for Bayesian networks with an explicitly given (full or reduced) parameter collection. In particular, these bounds are tight up to a factor of 2. For some nontrivial cases of Bayesian networks we even determine the exact values of this dimension. We further consider logistic autoregressive Bayesian networks and show that every sufficiently expressive inner product space must have dimension at least Omega(n(2)), where n is the number of network nodes. We also derive the bound 2(Omega(n)) for an artificial variant of this network, thereby demonstrating the limits of our approach and raising an interesting open question. As a major technical contribution, this work reveals combinatorial and algebraic structures within Bayesian networks such that known methods for the derivation of lower bounds on the dimension of inner product spaces can be brought into play.
  • A Nakamura
    INFORMATION AND COMPUTATION 201 2 178 - 198 2005年09月 [査読有り][通常論文]
     
    In this paper, we propose a new algorithm that exactly learns ordered binary decision diagrams (OBDDs) with a given variable ordering via equivalence and membership queries. Our algorithm uses at most it equivalence queries and at most 2n([log(2) m] + 3n) membership queries. where it is the number of nodes in the target-reduced OBDD and m is the number of variables. The upper bound on the number of membership queries is smaller by a factor of O(m) compared with that for the previous best known algorithm proposed by R. Gavalda and D. Guijarro [Learning Ordered Binary Decision Diagrams, Proceedings of the 6th International Workshop on Algorithmic Learning Theory, 1995, pp. 228 238]. (c) 2005 Elsevier Inc. All rights reserved.
  • Y Shidara, M Kudo, A Nakamura
    Foundations of Data Mining and Knowledge Discovery 6 161 - 170 2005年 [査読有り][通常論文]
     
    We propose a novel method for mining generalized rules with high support and confidence. Using our method, we can obtain generalized rules in which the abstraction of attribute values is implicitly carried out without the requirement of additional information such as information on conceptual hierarchies. Our experimental results showed that the obtained rules not only have high support and confidence but also have expressions that are conceptually meaningful.
  • Hidehiko Ino, Mineichi Kudo, Atsuyoshi Nakamura
    Proceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10-14, 2005 661 - 669 ACM 2005年 [査読有り][通常論文]
  • A Nakamura, M Kudo
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS 3518 850 - 860 2005年 [査読有り][通常論文]
     
    In this paper, we propose an efficient algorithm enumerating all frequent subtrees containing all special nodes that are guaranteed to be included in all trees belonging to a given data. Our algorithm is a modification of TreeMiner algorithm [10] so as to efficiently generate only candidate subtrees satisfying our constraints. We report mining results obtained by applying our algorithm to the problem of finding frequent structures containing the name and reputation of given restaurants in Web pages collected by a search engine.
  • Takigawa, I, M Kudo, A Nakamura
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINDS 3587 90 - 99 2005年 [査読有り][通常論文]
     
    We propose a new nonparametric classification framework for numerical patterns, which can also be exploitable for exploratory data analysis. The key idea is approximating each class region by a family of convex geometric sets which can cover samples of the target class without containing any samples of other classes. According to this framework, we consider a combinatorial classifier based on a family of spheres, each of which is the minimum covering sphere for a subset of positive samples and does not contain any negative samples. We also present a polynomial-time exact algorithm and an incremental randomized algorithm to compute it. In addition, we discuss the soft-classification version and evaluate these algorithms by some numerical experiments.
  • H Hasegawa, M Kudo, A Nakamura
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 4, PROCEEDINGS 3684 668 - 674 2005年 [査読有り][通常論文]
     
    We consider the problem of extracting texts related to a given keyword from Web pages collected by a search engine. Recently, we proposed a method using both structural and content information[1, 2]. In our previous paper, we reported good extraction performance of our method only for Ramen-shop dataset written in Japanese. In this paper, we examine it for datasets of other kind of restaurants, and also for a dataset written in English. We discuss some modification for performance improvement.
  • Hidehiko Ino, Mineichi Kudo, Atsuyoshi Nakamura
    Proceedings - International Workshop on Biomedical Data Engineering, BMDE2005 2005 1257  2005年 [査読有り][通常論文]
     
    Recently, researches on extraction of densely connected subgraphs, which are called communities, from the graph representing link structure in WWW, are very popular. However, few methods guarantee that extracted subgraphs satisfy community conditions which are strictly defined. In this paper, we consider the problem of extracting subgraphs that strictly satisfy the community conditions proposed in [3]. It is known that finding all such communities is computationally hard. As methods that possibly find many communities efficiently, we experimentally compared two methods, a method with a Gomory-Hu tree construction and a method with calculating edge-betweenness. We also proposed evaluation criterion for ranking found communities. © 2005 IEEE.
  • Atsuyoshi Nakamura, Naoki Abe
    Electronic Commerce Research 5 1 75 - 98 2005年01月 [査読有り][通常論文]
     
    We propose and evaluate a number of improvements to the linear programming formulation of web advertisement scheduling, which we have proposed elsewhere together with our colleagues [Langheinrich et al., 9]. In particular, we address a couple of important technical challenges having to do with the estimation of click-through rates and optimization of display probabilities (the exploration-exploitation trade-off and the issue of data sparseness and sealability). as well as practical aspects that are essential for successful deployment of this approach (the issues of multi-impressions and inventory management). We propose solutions to each of these issues, and assess their effectiveness by running large-scale simulation experiments. © 2005 Springer Science + Business Media. Inc.
  • Takigawa, I, M Kudo, A Nakamura, J Toyama
    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 193 - 200 2004年 [査読有り][通常論文]
     
    This paper studied the minimum l(1)-norm signal recovery in underdetermined source separation, which is a problem of separating n sources blindly from m linear mixtures for n. > m. Based on our previous result of submatrix representation and decision regions, we describe the property of the minimum l(1)-norm sequence from the viewpoint of source separation, and discuss how to construct it geometrically from the observed sequence and the mixing matrix, and the unstability for a perturbation of mixing matrix.
  • A Nakamura, M Schmitt, N Schmitt, HU Simon
    LEARNING THEORY, PROCEEDINGS 3120 518 - 533 2004年 [査読有り][通常論文]
     
    In connection with two-label classification tasks over the Boolean domain, we consider the possibility to combine the key advantages of Bayesian networks and of kernel-based learning systems. This leads us to the basic question whether the class of decision functions induced by a given Bayesian network can be represented within a low-dimensional inner product space. For Bayesian networks with an explicitly given (full or reduced) parameter collection, we show that the "natural" inner product space has the smallest possible dimension up to factor 2 (even up to an additive term 1 in many cases). For a slight modification of the so-called logistic autoregressive Bayesian network with n nodes, we show that every sufficiently expressive inner product space has dimension at least 2(n/4). The main technical contribution of our work consists in uncovering combinatorial and algebraic structures within Bayesian networks such that known techniques for proving lower bounds on the dimension of inner product spaces can be brought into play.
  • M Kudo, T Hosokawa, J Toyama, H Tenmoto, A Nakamura
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS 65 - 68 2004年 [査読有り][通常論文]
     
    In the past decades, person identification has been gathering a great deal of attention in a wide rage of applications. Among those techniques, many aim at achieving a high identification rate using several biometrics evidences such as fingerprint, iris, and finger vein. However these evidences require user's cooperation and are suitable for a "hard" identification in which any kind of impersonation is not allowed. In this study, we consider a "soft" identification, in which the system does require user-cooperation, at the expense of some degree of accuracy. On the contrary, the cost for gathering evidences becomes cheap and some weak evidences are used together The assumed applications are electronic devices on daily life. Through a soft identification, a device can be personalized to a user and the system provides some user-specific functions to the user In this study, a Bayesian network approach is investigated its potential for this goal. In addition, environmental evidence such as day and time are also incorporated to the system.
  • A Nakamura, M Kudo, A Tanaka
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS 2838 339 - 349 2003年 [査読有り][通常論文]
     
    We propose a new collaborative filtering method that uses restoration operators. The problem of restoration by operators was originally studied in the field of digital image restoration [9]. We also consider the problem of selecting items that users should be asked to rate in order to achieve a small expected squared error, and we propose a greedy method as a solution of this problem. According to our experimental results, prediction performance of restoration operators is good when the number of observed ratings is small, and our greedy method outperforms random query item selection.
  • A Nakamura, M Kudo, A Tanaka, K Tanabe
    DISCOVERY SCIENCE, PROCEEDINGS 2843 393 - 401 2003年 [査読有り][通常論文]
     
    We propose a modified version of our collaborative filtering method using restoration operators, which was proposed in [6]. Our previous method was designed so as to minimize expected squared error of predictions for user's ratings, and we experimentally showed that, for users who have evaluated only small number of items, mean squared error of our method is smaller than that of correlation-base methods. After further experiments, however, we found that, for users who have evaluated many items, the best correlation-based method has smaller mean squared error than our method. In our modified version, we incorporated an idea of projecting on a low-dimensional subspace with our method using restoration operators. We experimentally showed that our modification overcame the shortcoming stated above.
  • A Nakamura, N Abe
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES 65 2 224 - 256 2002年09月 [査読有り][通常論文]
     
    We consider the online learning problem for binary relations defined over two finite sets, each clustered into a relatively small number k, I of types (such a relation is termed a (k, l)-binary relation), extending the models of S. Goldman, R. Rivest, and R. Schapire (1993, SIAM J. Comput. 22, 1006-1034). We investigate the learning complexity of (k, l)-binary relations with respect to both the self-directed and adversary-directed learning models. We also generalize this problem to the learning problem for (k(l),...,k(d))-d-ary relations. In the self-directed model, we exhibit an efficient learning algorithm which makes at most kl + (n - k)log k + (m - l)log l mistakes, where n and in are the number of rows and columns, roughly twice the lower bound we show for this problem, '[log k] [log 1] + (1)/(2)(n - k) [log k] + (1)/(2)(m - 1) [log l]. In the adversary-directed model, we exhibit an efficient algorithm for the (2, 2)-binary relations, which makes at most n + m + 2 mistakes, only two more than the lower bound we show for this problem, n + m. As for (k(l),..., k(d))-d-ary relations, we obtain lower bounds and upper bounds on the number of mistakes in the self-directed model, teacher-directed model, and adversary-directed model. Finally we show that, although the sample consistency problem for (2, 2)-binary relations is solvable in polynomial time, the same problem for (2,2,2)-ternary relations is already NP-complete. (C) 2002 Elsevier Science (USA).
  • Atsuyoshi Nakamura
    Proceedings of the 11th International Conference on World Wide Web, WWW '02 536 - 541 2002年 [査読有り][通常論文]
     
    We addressed two issues concerning the practical aspects of optimally scheduling web advertising proposed by Langheinrich et al. [5], which scheduling maximizes the total number of click-throughs for all banner advertisements. One is the problem of multi-impressions in which two or more banner ads are impressed at the same time. The other is inventory management, which is important in order to prevent over-selling and maximize revenue. We propose efficient methods which deal with these two issues.
  • A Nakamura
    THEORETICAL COMPUTER SCIENCE 241 1-2 83 - 114 2000年06月 [査読有り][通常論文]
     
    In this paper, we study exact learnability of bounded-width ordered binary decision diagrams (OBDDs) when no ordering of the variables is given and learning is conducted by way of equivalence queries and membership queries. We present a learning algorithm for width-2 OBDDs, an algorithm which uses O(n(3)) equivalence queries alone, where n is the number of variables. We also present a learning algorithm for width-2 OBDDs that uses O(n) proper equivalence queries and O(n(2)) membership queries. Further, we show a negative result: that there are no polynomial-time algorithms capable of learning width-3 OBDDs from proper equivalence queries alone. (C) 2000 Elsevier Science B.V. All rights reserved.
  • Atsuyoshi Nakamura, Naoki Abe, Hiroshi Matoba, Katsuhiro Ochiai
    Proceedings of the 8th ACM International Conference on Multimedia 2000, Los Angeles, CA, USA, October 30 - November 3, 2000. 57 - 66 ACM 2000年 [査読有り][通常論文]
  • M Langheinrich, A Nakamura, N Abe, T Kamba, Y Koseki
    COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING 31 11-16 1259 - 1272 1999年05月 [査読有り][通常論文]
     
    Most online advertisement systems in place today use the concept of consumer targeting: each user is identified and, according to his or her system setup, browsing habits and available off-line information, categorized in order to customize the advertisements for highest user responsiveness. This constant monitoring of a user's online habits, together with the trend to centralize this data and link it with other databases, continuously nurtures fears about the growing lack of privacy in a networked society. In this paper, we propose a novel technique of adapting online advertisement to a user's short term interests in a non-intrusive way. As a proof-of-concept we implemented a dynamic advertisement selection system able to deliver customized advertisements to users of an online search service or Web directory. No user-specific data elements are collected or stored at any time. Initial experiments indicate that the system is able to improve the average click-through rate substantially compared to random selection methods. (C) 1999 Published by Elsevier Science B.V. All rights reserved.
  • N Abe, A Nakamura
    MACHINE LEARNING, PROCEEDINGS 12 - 21 1999年 [査読有り][通常論文]
     
    We have developed a method which learns to schedule internet banner advertisements so as to maximize the average click-through rate, while adhering to the requirements imposed by contracts with the advertisers such as a minimum guaranteed number of impressions. We focus on the problem of adaptively scheduling advertisement display probabilities as a function of a single attribute such as a search keyword. Our learning algorithm is based on an efficient solution of a special class of linear programming problems called the 'transportation problem,' and also embodies a number of measures to address the exploration-exploitation trade-off and an efficient attribute clustering method to help reduce the dimensionality. Our experimental results verify the advantage of our linear programming based approach, as well as the effect of various additional measures we incorporate into our method.
  • Atsuyoshi Nakamura
    Proceedings of the Twelfth Annual Conference on Computational Learning Theory, COLT 1999, Santa Cruz, CA, USA, July 7-9, 1999 215 - 225 ACM 1999年 [査読有り][通常論文]
  • Atsuyoshi Nakamura, Naoki Abe
    Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, USA, July 24-27, 1998 395 - 403 Morgan Kaufmann 1998年 [査読有り][通常論文]
  • N Abe, H Mamitsuka, A Nakamura
    DISCOVERY SCIENCE 1532 387 - 388 1998年 [査読有り][通常論文]
  • A Nakamura, J Takeuchi, N Abe
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE 23 1-2 53 - 82 1998年 [査読有り][通常論文]
     
    We consider a variant of the 'population learning model' proposed by Kearns and Seung [8], in which the learner is required to be 'distribution-free' as well as computationally efficient. A population learner receives as input hypotheses from a large population of agents and produces as output its final hypothesis. Each agent is assumed to independently obtain labeled sample for the target concept and output a hypothesis. A polynomial time population learner is said to PAC-learn a concept class, if its hypothesis is probably approximately correct whenever the population size exceeds a certain bound which is polynomial, even if the sample size for each agent is fixed at some constant. We exhibit some general population learning strategies, and some simple concept classes that can be learned by them. These strategies include the 'supremum hypothesis finder', the 'minimum superset finder' (a special case of the 'supremum hypothesis finder'), and various voting schemes. When coupled with appropriate agent algorithms, these strategies can learn a variety of simple concept classes, such as the 'high-low game', conjunctions, axis-parallel rectangles and others. We give upper bounds on the required population size for each of these cases, and show that these systems can be used to obtain a speed up from the ordinary PAC-learning model [11], with appropriate choices of sample and population sizes. With the population learner restricted to be a voting scheme, what we have is effectively a model of 'population prediction', in which the learner is to predict the value of the target concept at an arbitrarily drawn point, as a threshold function of the predictions made by its agents on the same point. We show that the population learning model is strictly more powerful than the population prediction model. Finally, we consider a variant of this model with classification noise, and exhibit a population learner for the class of conjunctions in this model.
  • A Nakamura
    ALGORITHMIC LEARNING THEORY 1316 307 - 322 1997年 [査読有り][通常論文]
     
    In this paper, we propose a new algorithm which exactly learns ordered binary decision diagrams (OBDDs) with a given variable ordering via equivalence queries and membership queries. Our algorithm uses at most n equivalence queries and at most 2n([log(2) m] + 3n) membership queries, where n is the number of nodes in the target reduced OBDD and m is the number of variables. We have reduced the number of membership queries by a factor of m compared with the best known algorithm for this problem due to Gavalda and Guijarro.
  • Atsuyoshi Nakamura
    Algorithmic Learning Theory, 7th International Workshop, ALT '96, Sydney, Australia, October 23-25, 1996, Proceedings 37 - 50 Springer 1996年 [査読有り][通常論文]
  • Naoki Abe, Hang Li, Atsuyoshi Nakamura
    CoRR abs/cmp-lg/9507010 1995年 [査読有り][通常論文]
  • Naoki Abe, Hang Li, Atsuyoshi Nakamura
    Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, July 9-12, 1995 3 - 11 Morgan Kaufmann 1995年 [査読有り][通常論文]
  • Atsuyoshi Nakamura, Naoki Abe
    Proceedings of the Eigth Annual Conference on Computational Learning Theory, COLT 1995, Santa Cruz, California, USA, July 5-8, 1995 214 - 221 ACM 1995年 [査読有り][通常論文]
  • Atsuyoshi Nakamura, Shinji Miura
    Algorithmic Learning Theory, 6th International Conference, ALT '95, Fukuoka, Japan, October 18-20, 1995, Proceedings 138 - 150 Springer 1995年 [査読有り][通常論文]
  • A NAKAMURA, N ABE
    THEORETICAL COMPUTER SCIENCE 137 1 159 - 176 1995年01月 [査読有り][通常論文]
     
    We investigate the problem of exactly identifying a real-valued function of {0, 1}(n) by a weighted sum of a number of monotone terms by querying for the values of the target function at assignments of the learner's choice. When all coefficients are nonnegative, we exhibit an efficient learning algorithm requiring at most (n-[log s]+1)s queries, where n is the number of variables and s is the number of terms in the target formula. We prove a lower bound of Omega(ns/log s) on the number of queries necessary for learning this class, so no algorithm can reduce the number of queries dramatically. The algorithm runs in time O (ns(2)) in the worst case. The same algorithm can be used to learn the 'inductive-read-k' subclass, a proper superclass of the 'read-k' subclass, with a number of queries not exceeding 1/2((n-[log k])(n-[log k]+1)+2)k, which improves upon the bound achievable by a naive learning algorithm by a factor of two. In addition, the above method can be extended to handle the nonmonotone case in some restricted sense: A similar algorithm can learn the trnate linear combinations of terms with a comparable number of queries. In the general case, namely, when the coefficients vary over the reals (or any arbitrary field), we show that the number of queries required for exact learning of the k-term subclass is upper bounded by q(n,[log k]+1) and is lower bounded by q(n,[log k]), where q(n,l)=Sigma(i=0)(l)((n)(i)). These bounds are shown by generalizing Roth and Benedik's technique for analyzing the learning problem for k-sparse multivariate polynomials over GF(2) (Roth and Benedek, 1991) to those over an arbitrary field.
  • Atsuyoshi Nakamura, Naoki Abe, Jun-ichi Takeuchi
    Algorithmic Learning Theory, 4th International Workshop on Analogical and Inductive Inference, AII '94, 5th International Workshop on Algorithmic Learning Theory, ALT '94, Reinhardsbrunn Castle, Germany, October 10-15, 1994, Proceedings 500 - 515 Springer 1994年 [査読有り][通常論文]
  • Atsuyoshi Nakamura, Naoki Abe
    Algorithmic Learning Theory, 4th International Workshop, ALT '93, Tokyo, Japan, November 8-10, 1993, Proceedings 300 - 313 Springer 1993年 [査読有り][通常論文]

書籍

その他活動・業績

  • 中村 篤祥, 櫻田 健斗 人工知能学会全国大会論文集 2020 2I1GS201 -2I1GS201 2020年 

    実数属性に対する閾値を用いた分岐条件を各ノードにもつ決定森は、同じ属性かつ類似した閾値の条件をまとめて条件共有化を行うことにより単純化できる。我々は、決定森に属する各決定木上の訓練データに対する決定パスを、できるだけ変えずに分岐条件の共有化を行うことにより、精度の劣化を抑えながら単純化を行うアルゴリズムMin_DBNを開発した(PKDD ECML2019で発表)。本稿ではこのMin_DBNを、バギング、ブースティング、ランダムサブスペース法に基づく様々なアンサンブル学習法で構築した分類・回帰を行う決定森に適用した場合の効果について、実験により有効性を検証する。実験の結果、extremely randomized trees, ランダムフォレストなど、ランダム性の高いものほど効果は大きいが、データによってはAdaBoostで作成した決定森に対しても効果があることがわかった。また、バギング系のアンサンブル学習法に対し、学習時に用いられた木毎の訓練データのみをその木に適用する改良法を提案し、有効性が実験により確認された。

  • A Fast Approximate Algorithm for k-Median Problem on a Graph
    Keisuke Todo, Atsuyoshi Nakamura and Mineichi Kudo Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG) 2019年08月 [査読有り][通常論文]
  • 櫻田 健斗, 中村 篤祥 人工知能基本問題研究会 109 62 -67 2019年03月13日 [査読無し][通常論文]
  • 中村篤祥, ペリシエ オレリアン, 田畑公次, 小松崎民樹 日本細胞生物学会大会(Web) 71st ROMBUNNO.1SEp‐07 (WEB ONLY) 2019年 [査読無し][通常論文]
  • 田畑公次, 中村篤祥, 小松崎民樹 電子情報通信学会技術研究報告 118 (284(IBISML2018 44-104)(Web)) 353‐360 (WEB ONLY) 2018年10月29日 [査読無し][通常論文]
  • 田畑 公次, 中村 篤祥, 本多 淳也, 小松崎 民樹 人工知能基本問題研究会 106 94 -99 2018年03月16日 [査読無し][通常論文]
  • Aurelien Pelissier, Atsuyoshi Nakamura, Koji Tabata CoRR abs/1811.07531 2018年 [査読無し][通常論文]
  • 安井拓未, 中村篤祥, 中村篤祥, 田中章, 田中章, 工藤峰一, 工藤峰一 情報処理学会研究報告(Web) 2017 (MUS-116) Vol.2017‐MUS‐116,No.15,1‐4 (WEB ONLY) 2017年08月17日 [査読無し][通常論文]
  • 中村 篤祥 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 117 (110) 49 -54 2017年06月23日 [査読無し][通常論文]
  • 中村 篤祥 人工知能基本問題研究会 103 45 -50 2017年03月13日 [査読無し][通常論文]
  • 渡辺 僚, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 (323) 167 -174 2015年11月26日 [査読無し][通常論文]
  • Helmbold David P, Nakamura Atsuyoshi, Warmuth Manfred K 人工知能基本問題研究会 98 16 -21 2015年08月07日 [査読無し][通常論文]
  • 中村 篤祥 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 (112) 81 -86 2015年06月23日 [査読無し][通常論文]
  • 渡辺 僚, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 96 29 -34 2015年01月13日 [査読無し][通常論文]
  • 中村 篤祥 人工知能基本問題研究会 93 45 -48 2014年03月07日 [査読無し][通常論文]
  • 渡辺 僚, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 113 (198) 9 -16 2013年09月03日 [査読無し][通常論文]
  • 花田 博幸, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 89 25 -30 2013年02月28日 [査読無し][通常論文]
  • 渡邊 僚, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 86 (0) 29 -34 2012年08月09日 [査読無し][通常論文]
  • 田畑 公次, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 86 (0) 77 -82 2012年08月09日 [査読無し][通常論文]
  • 中村 篤祥, 瀧川 一学, 戸坂 央 人工知能基本問題研究会 85 23 -28 2012年02月02日 [査読無し][通常論文]
  • 田畑 公次, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. COMP, コンピュテーション 111 (256) 7 -14 2011年10月14日 [査読無し][通常論文]
     
    Closeness Centralityはグラフにおけるノードの中心性を表す尺度の一つである.あるノードのCloseness Centralityは,そのノードから他のすべてのノードまでの距離の和の逆数として計算される.本報告では,無向グラフに対しCloseness Centralityが大きいノードを高速に発見するアルゴリズムを提案する.提案法は,与えられたグラフのノードvに対し,vを根とする最短経路木Tを構築し,TにおいてCloseness Centerを新たなvとして同じ手続きを繰り返すアルゴリズムである.頂点の次数分布がべき乗則に従う場合には,高い確率でCloseness Centralityがほぼ最大のノードを見つけることができることが実験的に示された.
  • 中村 篤祥 人工知能基本問題研究会 82 1 -6 2011年08月04日 [査読無し][通常論文]
  • 大内 康治, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. IBISML, 情報論的学習理論と機械学習 110 (265) 99 -104 2010年10月28日 [査読無し][通常論文]
     
    固定された自然数dに対し,軸に平行な有限個のd次元超矩形の和集合で表現される概念クラスが多項式時間PAC学習可能であることはBlumerらにより示されている.証明に用いられたアルゴリズムは,概念に含まれる事例(正例)全体を被覆する超矩形の集合を,貪欲手続きで求めるというものである.本稿では,Blumerらのアルゴリズムの効率的な実装について検討する.また,このアルゴリズムを実際のnクラス識別問題(n≧2)に通用し,同じく超矩形の和集合を用いる確率的サブクラス法との比較を行う.UCI機械学習レポジトリのデータセットを用いた実験では,ほぼ同等の識別性能となりながらも,貪欲な手法によって生成される超矩形の数が確率的サブクラス法のものよりも少なくなり,多くのデータで7割以上の削減がみられた,
  • 打矢 泰志, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. COMP, コンピュテーション 109 (195) 13 -20 2009年09月07日 [査読無し][通常論文]
     
    Auerらにより研究されたadversarial bandit問題は,プレーヤーが選択したアクションに対する報酬生成過程において確率的な仮定をおかないmulti-armed bandit問題である.本稿ではadversarial bandit問題を,各時刻においてk(≧1)回のアクションを選択できるように拡張し,アクションの重複選択を許す場合と許さない場合の2つの設定で分析を行う.両方の設定において,Auerらが提案したアルゴリズムExp3を一般化し,最適固定アクション集合に対する損失上界の一般化を得る.
  • 打矢 泰志, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解 108 (363) 213 -218 2008年12月11日 [査読無し][通常論文]
     
    multi-armed bandit問題とは、異なるK個のスロットマシンから1台のマシンを選択するという試行を繰り返し行う状況において、総合利得を最大化するようにマシンを選択する問題である。ほとんどの従来手法では各スロットマシンから得られる報酬は確率的に定まるという仮定のもとに分析が行われてきた。一方、Auerらは報酬に確率的な仮定をおかない一般的な場合を考え、損失の上界が理論的に保証されたアルゴリズムを示した。本報告では、この問題を一度に複数のスロットマシンを選択できるように拡張し、分散投資効果について理論的に分析する。
  • 高橋 哲自, 工藤 峰一, 中村 篤洋 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解 108 (363) 37 -41 2008年12月11日 [査読無し][通常論文]
     
    クラス領域の近似において凸包は有効であるものの,高次元で凸包を構成するのは計算量的に困難である.本稿では,次元に線形な計算量で済む近似凸包の構成法を提案する.加えて,近似凸包を含む多面体領域を用いた識別子の族から,与えられた問題に有効な識別子を選択する基準を検討する.
  • 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. COMP, コンピュテーション 108 (237) 9 -16 2008年10月03日 [査読無し][通常論文]
     
    音のような長さ付きシンボルの列に対するアライメントとしてパッキングアライメントを提案する。更に音に重なりのある一般的な楽曲のモデルとして、終了位置(時間)と長さが付いたシンボルの列を考え、パッキングアライメントをそのような列に適用できるように拡張する。パッキングアライメントを用いることにより、曲のなんとなく似ている部分を自動的に抽出することが可能であると考えられる。バッハの人気曲を使った実験によれば、明らかには似ていると言えない、なんとなく似ている部分が実際に抽出された。
  • 中村 篤祥, 工藤 峰一 数理解析研究所講究録 1599 (0) 162 -169 2008年05月 [査読無し][通常論文]
  • 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. COMP, コンピュテーション 107 (219) 35 -42 2007年09月13日 [査読無し][通常論文]
     
    複雑な領域を、複数の単純な領域の和集合で表現する方法は、わかりやすく扱いやすいため様々な分野で用いられる。工藤らは、多次元実数空間において、与えられた正例の部分集合で、それをすべて含み与えられた負例を1つも含まない超矩形が存在するような極大集合(部分クラス)をすべて求める、部分クラス問題を考えた[2],[4]。しかし、この問題を解く既存アルゴリズムの時間計算量が大き過ぎるため、ランダマイズド法によりできるだけ多くの重要な部分クラスを求める方法がとられてきた。本稿では、部分クラス族のVC次元に計算時間が依存し、VC次元が低い部分クラス族ほど高速に列挙できる、部分クラス列挙アルゴリズムを提案する。UCI Machine Learning Repositoryのデータセットを使った実験により、提案アルゴリズムが小規模な実データへ適用可能であることが検証された。
  • 戸坂 央, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解 107 (115) 7 -12 2007年06月21日 [査読無し][通常論文]
     
    ラベル付き木は計算機ネットワークやWebマイニング,バイオインフォマティクス,XML文書マイニング等様々な分野で扱われている.本稿ではこれらのデータからの基礎的なマイニング手法として,類似する部分木が頻出する部分木を発見する問題を扱う.問題の定式化を行った後に,この問題を効率良く解くアルゴリズムを提案する.実際のWebページを用いた実験により提案アルゴリズムが実用的な速度で動作することを確認した.また,Webページからのデータレコード抽出における予備実験では有望な結果が得られた.
  • 戸坂 央, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. DE, データ工学 107 (114) 7 -12 2007年06月21日 [査読無し][通常論文]
     
    ラベル付き木は計算機ネットワークやWebマイニング,バイオインフォマティクス,XML文書マイニング等様々な分野で扱われている.本稿ではこれらのデータからの基礎的なマイニング手法として,類似する部分木が頻出する部分木を発見する問題を扱う.問題の定式化を行った後に,この問題を効率良く解くアルゴリズムを提案する.実際のWebページを用いた実験により提案アルゴリズムが実用的な速度で動作することを確認した.また,Webページからのデータレコード抽出における予備実験では有望な結果が得られた.
  • 中村 篤祥 人工知能基本問題研究会 63 (0) 33 -38 2006年09月08日 [査読無し][通常論文]
  • 斉藤 智哉, 中村 篤祥, 工藤 峰一 情報科学技術レターズ 5 (0) 1 -4 2006年08月21日 [査読無し][通常論文]
  • 斉藤 智哉, 中村 篤祥, 工藤 峰一 数理解析研究所講究録 1489 (0) 216 -222 2006年05月 [査読無し][通常論文]
  • 中村 篤祥 電子情報通信学会技術研究報告. COMP, コンピュテーション 105 (273) 49 -53 2005年09月08日 [査読無し][通常論文]
     
    本論文では、n次元ユークリッド空間から{1, 2, …, k}への線形ランキング関数の学習容易性について考える。関数の学習し易さを表す指標と考えられるグラフ次元が、線形ランキング関数の場合Θ(n+k)であることを示す。示されたグラフ次元は、k-1個の線形識別関数を使って決定リストの形で自然に定義されるk値関数のグラフ次元Θ(nk)と比べて、漸近的な評価では明らかに小さいといえる。
  • 中村 篤祥 数理解析研究所講究録 1426 (0) 51 -56 2005年04月 [査読無し][通常論文]
  • 中村 篤祥, 繁住 健哉, 山本 真基 数理解析研究所講究録 1426 (0) 71 -77 2005年04月 [査読無し][通常論文]
  • 河田 岳大, 工藤 峰一, 外山 淳, 中村 篤祥 電子情報通信学会論文誌. D-II, 情報・システム, II-パターン処理 88 (3) 629 -635 2005年03月01日 [査読無し][通常論文]
     
    OCRなどを通して得られる日本語文の認識結果において, N-gram確率を利用した高速な誤認識文字検出法を提案する.日本語のように単語が分かち書きされず大規模な語彙を対象とした場合, 誤り個所の指摘に文字N-gramは有効な方法である.本論文ではまず, 通常のN-gram確率の拡張として両方向N-gram確率を提案し, その有効性を情報量の点から考察する.次に, 両方向N-gram確率と文脈確率を用いて1文字の誤字を検出する方法を提案する.シミュレーション実験では, 適合率80%において従来法よりも10%以上高い約75%の再現率を達成できた.また, 誤り範囲の指摘という点では, 適合率80%で再現率90%が達成された.
  • 中村 篤祥, 工藤 峰一 情報処理学会研究報告. AL, アルゴリズム研究会報告 2004 (101) 67 -74 2004年10月14日 [査読無し][通常論文]
     
    本論文では、与えられたラベル付き順序木の集合から、あらかじめ全ての木が含んでいることがわかっている特殊節点を全て含む部分木で、頻出するものを効率的に数えあげる方法を提案する。提案アルゴリズムはZakiのTreeMinerアルゴリズム[9]を、制約を育たすものだけ候補として効率的に生成するように改造したものである。また、同じラベルが多く存在する場合に大量のメモリを使用するという問題に対処する方法についても提案する。検索エンジンで集められたWebページに含まれる、レストランの名前と評判情報を含む頻出部分構造見つける問題に、提案アルゴリズムを適用することにより得られた結果につても報告する。
  • 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. COMP, コンピュテーション 104 (339) 7 -14 2004年10月08日 [査読無し][通常論文]
     
    本論文では、与えられたラベル付き順序本の集合から、あらかじめ全ての本が含んでいることがわかっている特殊節点を全て含む部分木で、頻出するものを効率的に数えあげる方法を提案する。提案アルゴリズムはZakiのTreeMinerアルゴリズム[9]を、制約を満たすものだけ候補として効率的に生成するように改造したものである。また、同じラベルが多く存在する場合に大量のメモリを使用するという問題に対処する方法についても提案する。検索エンジンで集められたWebページに含まれる、レストランの名前と評判情報を含む頻出部分構造見つける問題に、提案アルゴリズムを適用することにより得られた結果につても報告する。
  • 河田 岳大, 中村 篤祥, 外山 淳, 工藤 峰一 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解 103 (295) 1 -5 2003年09月01日 [査読無し][通常論文]
     
    文字認識システムでは,認識精度を向上させるために言語情報を用いた誤り検出・訂正の方法が提案されている.我々はN-gram確率を用いた手法に着目し,両方向N-gram確率を用いて1文字の誤字を検出する方法を提案する.本方式では文脈確率と両方向N-gram確率のそれぞれの意味から自然に導出される方法を用いる.また,推定パターンに一致しないものについて原因と検出の可能性を検討する.
  • 設樂 洋爾, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解 103 (295) 7 -11 2003年09月01日 [査読無し][通常論文]
     
    データベースに潜む傾向をルールの形で抽出する手法は,可読性の高い出力が可能であり,デークマイニングの手法として有効である.精度の高い識別子を構成し,そこから規則を得る手法は多く提案されている.しかし,必ずしも精度の高い識別子から興味深いルールが得られるとは限らない.そこで,データに基づく属性値の抽象化とスクリーニングを組み合わせ,可読性の高いルールを抽出する手法を提案する.また,識別精度とルールの興味深さについて考察を行う.
  • 中村 篤祥 応用数理 12 (4) 401 -410 2002年12月25日 [査読無し][通常論文]
     
    We explain automatic recommendation techniques by which computer systems can automatically select goods or documents preferred by each user based on his/her buying or accessing history. Especially, we focus on the collaborative filtering method using weighted majority prediction algorithm developed by the authors. From both theoretical and experimental points of view, we explain how excellent the method is compared to the method using correlation coefficient, which is the most popular collaborative filtering method. We also address issues that we should consider in order to make our method ...
  • 中村 篤祥 応用数理 12 (4) 401 -410 2002年12月25日 [査読無し][通常論文]
  • マーク ラングハインリッヒ, 神場 知成, 中村 篤祥 情報処理 40 (8) 807 -812 1999年08月15日 [査読無し][通常論文]
  • 中村 篤祥 情報処理 38 (7) 1997年07月15日 [査読無し][通常論文]
  • 安倍 直樹, 中村 篤祥 情報処理 38 (7) 558 -561 1997年07月15日 [査読無し][通常論文]
  • 安倍 直樹, 中村 篤祥 情報処理 38 (7) 575 -582 1997年07月15日 [査読無し][通常論文]
  • 中村 篤祥, 馬見塚 拓, 鳥羽 弘康, 安倍 直樹 全国大会講演論文集 第52回平成8年前期 (1) 55 -56 1996年03月06日 [査読無し][通常論文]
     
    ニュース記事などに対する個人の嗜好を、記事中の出現単語から予測する方法において、各単語にlつのブール変数を割り当て、その実数係数多項式で嗜好関数を表現する方法を提案し、その係数の学習アルゴリズムについて考察及び実験を行う。特に、計算論的学習理論において盛んに研究されているオンライン学習における重みの逐次更新法を、実数係数の学習に適用する。具体的には、Kivinen & Warmuthが提案・解析した加法的更新法GDと乗法的更新法EG^±に加え、新しく「誤差比例修正法」と呼ぶ重み更新アルゴリズムを提案し、その乗法的更新法であるDPMUについて、ある被験者の実データを用いて実験的に予測性能を評価・比較する。実験結果によればDPMUは、既存の方式と同等以上の予測性能を有する。
  • Naoki Abe, Hang Li, Atsuyoshi Nakamura Proc. of The 12th Int. Conf. on Machine Learning, 1995 1995年07月24日 [査読無し][通常論文]
     
    We consider the problem of learning a certain type of lexical semantic
    knowledge that can be expressed as a binary relation between words, such as the
    so-called sub-categorization of verbs (a verb-noun relation) and the compound
    noun phrase relation (a noun-noun relation). Specifically, we view this problem
    as an on-line learning problem in the sense of Littlestone's learning model in
    which the learner's goal is to minimize the total number of prediction
    mistakes. In the computational learning theory literature, Goldman, Rivest and
    Schapire and subsequently Goldman and Warmuth have consider...
  • 中村 篤祥 電子情報通信学会技術研究報告. COMP, コンピュテーション 94 (181) 93 -102 1994年07月25日 [査読無し][通常論文]
     
    d次元{0,1}-値行列の各点の値を1点ずつ予測するオンライン学習において、次に予測する点を自分が決めるself-directedモデルと敵が選ぶadversary-directedモデルで累積誤り数の評価を行う。self-directedモデルではd項関係を学習する、効率的で累積誤り数の少ないアルゴリズムを示す。行列の各方向に関し、要素数がn_j、値により分かれる種類の数が高々k_jであるような行列に対し、このアルゴリズムは高々Π^d_j=1>k_j+Σ^d_j=1>(n_j-k_j)log k_j回しか間違えない。また、adversary-directedモデルでは行も列も高々2種類しかない2項関係学習問題に対し、効率的で累積誤り数の少ないアルゴリズムを示す。このアルゴリズムの累積誤り数の上界として、行の数がn、列の数がmのときの下界であるm+nより2だけ多い数が示される。
  • 中村 篤祥 電子情報通信学会技術研究報告. COMP, コンピュテーション 93 (249) 53 -58 1993年09月24日 [査読無し][通常論文]
     
    平面上の任意の向きの長方形のPAC学習可能性について考えている。まず、与えられたm個の例に無矛盾な長方形を出力するΟ(mlogm)時間アルゴリズムを示した。任意の長方形のVC次元は有限であるからBlumer et al.の著名な定理よりこのアルゴリズムは必要事例数がΟ(1, εlog1/(εδ))の長方形のクラスのdistribution-free PAC学習アルゴリズムである。但し、εは正確度、δは信頼度のパラメータである。また、正例をすべて含む最小面積長方形を出力するアルゴリズムについても考察した。このアルゴリズムはΟ(mlogm)時間の一様分布PAC学習アルゴリズムであることが知られているが、必要事例数も無矛盾長方形出力アルゴリズムと同じオーダまで下げれることを示した。
  • 中村 篤祥 全国大会講演論文集 第46回平成5年前期 (3) 35 -36 1993年03月01日 [査読無し][通常論文]
     
    n次元実数領域[0,1)^n上の部分領域を、[0,1)^n上の一様分布に従って得られる点及びその点が部分領域の内外どちらかという情報から部分領域を学習する問題を扱う。但し、部分領域は次の条件を満たすものに制限する。([0,1)^n上の点を(x_1,x_2,・・・,x_n)とする。)1)原子論理式が(x_i&isins;R)という形のDNF(積和標準形)の論理式で表現される。2)Rは区間[0,1)を高々k(固定)個の領域に分割するような領域∪^l_<i=1>[a_i,b_i)とする。3)各x_iは論理式の中で高々1度しか現れない。[2]では、分布を一様分布に固定したPAC(Probably Approximately Correct)学習モデルにおいてμDNF(各変数が高々1度しか現れないDNF)プール式で表現されるプール関数が学習可能であることが示されている。本稿は上記2)の制約の下で彼らの結果の定義域を{0,1}^nから[0,1}^nに拡張したものである。

特許

受賞

  • 2013年 電子情報通信学会 情報・システムソサイエティ査読功労賞
     
    受賞者: 中村篤祥
  • 2013年 電子情報通信学会 情報・システムソサイエティ活動功労賞
     
    受賞者: 中村篤祥

共同研究・競争的資金等の研究課題

  • 『稀』:気づきを与える認識・発見技術
    日本学術振興会:科学研究費助成事業 基盤研究(B)
    研究期間 : 2019年04月 -2024年03月 
    代表者 : 工藤 峰一, 今井 英幸, 中村 篤祥
  • バンディット問題の方策の実用化のための理論の深化
    日本学術振興会:科学研究費助成事業 基盤研究(B)
    研究期間 : 2019年04月 -2023年03月 
    代表者 : 中村 篤祥, 田畑 公次, 工藤 峰一
  • 細胞集団とシンギュラリティ細胞のデータ駆動型数理解析技術の開発
    日本学術振興会:科学研究費助成事業 新学術領域研究(研究領域提案型)
    研究期間 : 2018年06月 -2023年03月 
    代表者 : 小松崎 民樹, 中村 篤祥, 小野 峻佑
  • 日本学術振興会:科学研究費助成事業 基盤研究(B)
    研究期間 : 2015年04月 -2018年03月 
    代表者 : 工藤 峰一, 今井 英幸, 中村 篤祥
     
    本研究では、一つの対象に適切な複数のキーワードや名前をつける「マルチラベル識別」問題において、識別性能を上げることと、性能を下げずに処理速度を上げることを目的として各種の検討を行った。その成果は主に三つである。第一に、ラベル間の相関利用の重要性を示し、その利用法を各種手法により示した。第二に、実用的な処理時間を確保するために、特徴空間あるいはラベル空間におけるサンプルの近さに注目した「問題分割」が有効であることを論じ、その効果を実験的に示した。最後に、滅多に出現しないラベルや付け忘れたラベルがラベル全体に対して多くの割合を占めるため、これらに関しては特別な取扱いが必要であることを指摘した。
  • 文部科学省:科学研究費補助金(基盤研究(B))
    研究期間 : 2013年 -2015年 
    代表者 : 中村 篤祥
  • 文部科学省:科学研究費補助金(基盤研究(C))
    研究期間 : 2009年 -2011年 
    代表者 : 中村 篤祥, 工藤 峰一, 外山 淳
     
    データ依存の問題の複雑さ指標として,Sperner族のデータ依存のVC次元を提案し,超矩形サブクラス問題において,データ依存のVC次元が低いデータに対し,高速に動作するアルゴリズムを開発し,実データを用いて有効性を検証した.また,文字列の連続繰返し構造に対する複雑さの指標として,繰返し表現文字列の最小サイズを提案し,実際に最小繰返し表現文字列を高速に求めるアルゴリズムを開発し,DNAの繰返し構造を分析した.
  • 文部科学省:科学研究費補助金(基盤研究(C))
    研究期間 : 2006年 -2007年 
    代表者 : 外山 淳, 工藤 峰一, 中村 篤祥
     
    1)特徴量の分類:人間が判断に用いる知識は実数で表されるような数値的なものではなく,言葉で言い表される程度のある種大雑把なものであると解釈し,「粒度」の概念を導入した.また,離散的に表されるデータに関してもグルーピング問題として解釈し直し評価基準を与えると共に,準最適なグルーピングを発見するためのアルゴリズムの開発を行った.2)プロトタイプの発見:人間が判断に用いる特徴はそれまでに得た知識全てではなく,それらの中から抽象化された数少ないデータであるとの観点に立ち,多量のデータの中から典型的な特徴を醸し出すプロトタイプを発見する方法を開発した.プロトタイプを点として捉えるのではなく,体積を持つものとして捉えることを提案するとともに,時間の経過と共に増える知識に見立ててデータが増加する際のプロトタイプ更新のアルゴリズムも提案した.また,類似性の観点から木構造をなすデータの中から,類似の構造を見出しグルーピングすると共に,頻度を数えることによって典型的な木構造を見出す研究も行った.3)状況に応じた特徴量の抽出:一つに識別子によらない識別子独立型の特徴選択手法を提案した.また,最適な特徴集合はカテゴリやカテゴリ対象に依存するとの立場に立ち,ランダムに選び出した特徴により構成した特徴空間の中から特徴選択を行う方法により識別率の向上を目指す方法についても提案した.
  • 文部科学省:科学研究費補助金(基盤研究(B))
    研究期間 : 2002年 -2005年 
    代表者 : 工藤 峰一, 林 裕樹, 天元 宏, 外山 淳, 今井 英幸, 村井 哲也, 田中 章, 中村 篤祥, 天元 宏, 林 裕樹
     
    本課題の目的とした「大規模パターン認識系の実現」に対して、交付申請時に挙げた三種類課題に関する成果は以下の通りである。1.データ数に関する規模の問題:高い識別性能(汎化性能)を達成する問題から、性能を大きく損なわない範囲で計算量を削減する問題へと、問題の設定を移し、データ数に対して少ない計算量で学習する方式を考察した。特に、1)代表的な識別手法の一つである、k最近隣法をより高速に行う手法を開発し、2)データ数に対して線形の時間で構成できるプロトタイプの構成法を検討した(今後、継続検討).2.特徴数に関する規模の問題:識別性能を維持したまま識別に本質的な特徴のみを残す「特徴選択」手法に着目して各種の提案を行った。これまで提案されているて多くの手法では規模の増大に計算量的に十分対応できない。そこで,本研究では,識別に明らかに不必要な特徴を先に除去することを「識別子に依存しない特徴選択」と位置付け,その方法論を検討した.また,提案手法により検査すべき特徴の個数を予め減らし,その後従来の代表的な方法を適用することで,全体として高速な「識別子を特定した」特徴選択が実現できることを示した.基本的な方法論として二通り、加えて、選ぶべき特徴数の選択に関する研究を行った。3.クラスの数の多い場合の検討(クラス数に関する規模の問題):漢字認識のようにカテゴリ数が非常に多い場合,識別の困難さが増す...

教育活動情報

主要な担当授業

  • 情報認識学特論
    開講年度 : 2020年
    課程区分 : 修士課程
    開講学部 : 情報科学研究科
    キーワード : 機械学習、統計的方法,教師あり学習、パターン認識、教師なし学習、回帰、線形モデル
  • 情報認識学特論
    開講年度 : 2020年
    課程区分 : 修士課程
    開講学部 : 情報科学院
    キーワード : 機械学習、統計的方法,教師あり学習、パターン認識、教師なし学習、回帰、線形モデル
  • 情報認識学特論
    開講年度 : 2020年
    課程区分 : 博士後期課程
    開講学部 : 情報科学研究科
    キーワード : 機械学習、統計的方法,教師あり学習、パターン認識、教師なし学習、回帰、線形モデル
  • 情報認識学特論
    開講年度 : 2020年
    課程区分 : 博士後期課程
    開講学部 : 情報科学院
    キーワード : 機械学習、統計的方法,教師あり学習、パターン認識、教師なし学習、回帰、線形モデル
  • コンピュータサイエンス演習Ⅰ
    開講年度 : 2020年
    課程区分 : 学士課程
    開講学部 : 工学部
    キーワード : 集合、論理、関係、写像、オートマトン、形式言語、数の表現(2 進数,補数,浮動小数点),2進数の演算アルゴリズム,計算モデルと計算量,ベクトル空間,行列計算(ガウスの消去法,LU 分解),固有値計算,誤差つき計算
  • 情報理工学演習Ⅱ
    開講年度 : 2020年
    課程区分 : 学士課程
    開講学部 : 工学部
    キーワード : 集合、論理、関係、写像、オートマトン、形式言語、数の表現(2 進数,補数,浮動小数点),2進数の演算アルゴリズム,ベクトル空間,行列計算(ガウスの消去法,LU 分解),固有値計算,誤差つき計算
  • データマイニングと機械学習
    開講年度 : 2020年
    課程区分 : 学士課程
    開講学部 : 工学部
    キーワード : パターン認識,行列分解,トピックモデル,オンライン学習,アンサンブル学習,頻出パターンマイニング
  • パターン認識
    開講年度 : 2020年
    課程区分 : 学士課程
    開講学部 : 工学部
    キーワード : パターン認識,行列分解,トピックモデル,オンライン学習,アンサンブル学習,頻出パターンマイニング
  • 統計学
    開講年度 : 2020年
    課程区分 : 学士課程
    開講学部 : 全学教育
    キーワード : 平均,分散,標準偏差,確率変数,確率分布,母集団,標本,標本分布,点推定,信頼区間,仮説検定
  • 情報理論
    開講年度 : 2020年
    課程区分 : 学士課程
    開講学部 : 工学部
    キーワード : 情報量、エントロピー、情報源、通信路、符号化、データ圧縮
  • コンピュータサイエンス実験Ⅱ
    開講年度 : 2020年
    課程区分 : 学士課程
    開講学部 : 工学部
    キーワード : データベース、Web、機械学習、並列プログラミング
  • 情報理工学実験Ⅱ
    開講年度 : 2020年
    課程区分 : 学士課程
    開講学部 : 工学部
    キーワード : データベース、Web、機械学習、並列プログラミング


Copyright © MEDIA FUSION Co.,Ltd. All rights reserved.