Last Updated :2024/07/18

- Faculty of Information Science and Technology Computer Science and Information Technology Knowledge Software Science

#### Name (Japanese)

Nakamura#### Name (Kana)

Atsuyoshi#### Name

201301071928917520

- Informatics / Intelligent informatics
- Informatics / Intelligent robotics
- Informatics / Perceptual information processing

- 2021/04 - Today Hokkaido University Graduate School of Information Science and Technology Professor
- 2002/07 - 2021/03 Hokkaido University Graduate School of Information Science and Technology Associate Professor
- 1988/04 - 2002/06 NEC Corporation

- 2000/02 - Tokyo Institute of Technology Graduate School of Science and Engineering
- 1986/04 - 1988/03 Tokyo Institute of Technology
- 1982/04 - 1986/03 Tokyo Institute of Technology School of Science Department of Information Science

- Koji Tabata, Hiroyuki Kawagoe, J Nicholas Taylor, Kentaro Mochizuki, Toshiki Kubo, Jean-Emmanuel Clement, Yasuaki Kumamoto, Yoshinori Harada, Atsuyoshi Nakamura, Katsumasa Fujita, Tamiki KomatsuzakiProceedings of the National Academy of Sciences of the United States of America 121 (12) e2304866121 2024/03/19Accelerating the measurement for discrimination of samples, such as classification of cell phenotype, is crucial when faced with significant time and cost constraints. Spontaneous Raman microscopy offers label-free, rich chemical information but suffers from long acquisition time due to extremely small scattering cross-sections. One possible approach to accelerate the measurement is by measuring necessary parts with a suitable number of illumination points. However, how to design these points during measurement remains a challenge. To address this, we developed an imaging technique based on a reinforcement learning in machine learning (ML). This ML approach adaptively feeds back "optimal" illumination pattern during the measurement to detect the existence of specific characteristics of interest, allowing faster measurements while guaranteeing discrimination accuracy. Using a set of Raman images of human follicular thyroid and follicular thyroid carcinoma cells, we showed that our technique requires 3,333 to 31,683 times smaller number of illuminations for discriminating the phenotypes than raster scanning. To quantitatively evaluate the number of illuminations depending on the requisite discrimination accuracy, we prepared a set of polymer bead mixture samples to model anomalous and normal tissues. We then applied a home-built programmable-illumination microscope equipped with our algorithm, and confirmed that the system can discriminate the sample conditions with 104 to 4,350 times smaller number of illuminations compared to standard point illumination Raman microscopy. The proposed algorithm can be applied to other types of microscopy that can control measurement condition on the fly, offering an approach for the acceleration of accurate measurements in various applications including medical diagnosis.
- Tatsuya Hayashi, Naoki Ito, Koji Tabata, Atsuyoshi Nakamura, Katsumasa Fujita, Yoshinori Harada, Tamiki KomatsuzakiPattern Recognit. 149 110224 - 110224 2024
- Xuyang Cao, Yang Cao 0011, Primal Pappachan, Atsuyoshi Nakamura, Masatoshi YoshikawaDBSec 184 - 200 2023
- Koji Tabata, Junpei Komiyama, Atsuyoshi Nakamura, Tamiki KomatsuzakiAISTATS 10994 - 11022 2023
- Atsuyoshi Nakamura, Tatsuya HayashiArray 100240 - 100240 2590-0056 2022/07
- Tatsuya Hayashi, Atsuyoshi NakamuraScientific reports 12 (1) 6078 - 6078 2022/04/12 [Refereed]

Various things propagate through the medium of individuals. Some individuals follow the others and take the states similar to their states a small number of time steps later. In this paper, we study the problem of estimating the state propagation order of individuals from the real-valued state sequences of all the individuals.We propose a method of constructing a state propagation graph from individuals' time series of observed states. The propagation order estimated by our proposed method is demonstrated to be significantly more accurate than that by a baseline method (optimal constant delay model) for our synthetic datasets, and also to be consistent with visually recognizable propagation orders for the dataset of Japanese stock price time series and biological cell firing state sequences. - Xuyang Cao, Yang Cao 0011, Masatoshi Yoshikawa, Atsuyoshi NakamuraBig Data 6605 - 6607 2022
- Tatsuya Hayashi, Naoki Ito, Koji Tabata, Atsuyoshi Nakamura, Katsumasa Fujita, Yoshinori Harada, Tamiki KomatsuzakiCoRR abs/2212.13157 2022
- Kosuke Chinone, Atsuyoshi NakamuraAI 40 - 52 2022 [Refereed]
- Yuya Sugie, Yuki Yoshida, Normann Mertig, Takashi Takemoto, Hiroshi Teramoto, Atsuyoshi Nakamura, Ichigaku Takigawa, Shin-ichi Minato, Masanao Yamaoka, Tamiki KomatsuzakiSoft Computing 25 (3) 1731 - 1749 1432-7643 2021/02 [Refereed]
- Hiroshi Uchigaito, Mamoru Okamoto, Gleb Astashkin, Yuki Furubayashi, Norihiro Obata, Thantip Krasienapibal, Hiroshi Teramoto, Yuta Mizuno, Masato Kobayashi, Atsuyoshi Nakamura, Tamiki Komatsuzaki, Takashi Takemoto2021 IEEE Electrical Power and Energy Conference, EPEC 2021 365 - 372 2021 [Refereed]

For achieving a low-carbon society and providing stable energy supply to local areas, we propose a distributed energy system in which multiple small grids are installed in a distributed manner in an area. Multiple electric vehicles located in the area are used as mobile batteries for power leveling between grids and used as regional transportation. To construct this system, it is necessary to provide practical solutions in accordance with the needs in various areas by adjusting the balance between optimization of power management and transportation, which are in a trade-off relationship, while dealing with uncertainties such as climate change and time-varying transportation demand in a short time. In this paper, we propose a multi-objective spatiotemporal optimization algorithm for solving the above problem. We compared the performance of the proposed algorithm with a vehicle-routing-problem solver of Google OR-Tools, demonstrating that the transportation rate increased by 13% and sum of power excess and shortage in grids decreased by 87%. The proposed algorithm can also adjust the balance between optimizations of transportation and electricity-management, suggesting that it can be applied in various areas. - Koji Tabata, Atsuyoshi Nakamura, Tamiki KomatsuzakiPAKDD 2021 Workshop on Machine Learning for MEasurement INformatics (MLMEIN 2021) 57 - 69 0302-9743 2021 [Refereed]
- Mitsuki Maekawa, Atsuyoshi Nakamura, Mineichi KudoACML 2020 241 - 256 2020/11 [Refereed]
- Atsuyoshi Nakamura, Ichigaku Takigawa, Hiroshi MamitsukaProceedings of the AAAI Conference on Artificial Intelligence 34 (04) 5240 - 5247 2159-5399 2020/04/03 [Refereed][Not invited]

We propose new frequent substring pattern mining which can enumerate all substrings with statistically significant frequencies of their locally optimal occurrences from a given single sequence. Our target application is genome sequences, around a half being said to be covered by interspersed and consecutive (tandem) repeats, and detecting these repeats is an important task in molecular life sciences. We evaluate the statistical significance of frequent substrings by using a string generation model with a memoryless stationary information source. We combine this idea with an existing algorithm, ESFLOO-0G.C (Nakamura et al. 2016), to enumerate all statistically significant substrings with locally optimal occurrences. We further develop a parallelized version of our algorithm. Experimental results using synthetic datasets showed the proposed algorithm achieved far higher F-measure in extracting substrings (with various lengths and frequencies) embedded in a randomly generated string with noise, than conventional algorithms. The large-scale experiment using the whole human genome sequence with 3,095,677,412 bases (letters) showed that our parallel algorithm covers 75% of the whole positions analyzed, around 4% and 24% higher than the recent report and the current cutting-edge knowledge, implying a biologically unique finding. - Atsuyoshi Nakamura, Kento SakuradaMachine Learning and Knowledge Discovery in Databases - European Conference, ECMLPKDD2019 578 - 589 0302-9743 2020 [Refereed]
- Taiga Ikeda, Kento Sakurada, Atsuyoshi Nakamura, Masato Motomura, Shinya Takamaeda-YamazakiApplied Reconfigurable Computing. Architectures, Tools, and Applications - 16th International Symposium(ARC) 345 - 357 2020 [Refereed][Not invited]
- Tabata, Koji, Nakamura, Atsuyoshi, Honda, Junya, Komatsuzaki, TamikiMACHINE LEARNING 109 (2) 327 - 372 0885-6125 2019/10 [Refereed][Not invited]

We study a bad arm existence checking problem in a stochastic K-armed bandit setting, in which a player's task is to judge whether a positive arm exists or all the arms are negative among given K arms by drawing as small number of arms as possible. Here, an arm is positive if its expected loss suffered by drawing the arm is at least a given threshold theta(U), and it is negative if that is less than another given threshold theta(L) (<= theta(U)). This problem is a formalization of diagnosis of disease or machine failure. An interesting structure of this problem is the asymmetry of positive and negative arms' roles; finding one positive arm is enough to judge positive existence while all the arms must be discriminated as negative to judge whole negativity. In the case with Delta = theta(U) - theta(L) > 0, we propose elimination algorithms with arm selection policy (policy to determine the next arm to draw) and decision condition (condition to conclude positive arm's existence or the drawn arm's negativity) utilizing this asymmetric problem structure and prove its effectiveness theoretically and empirically. - Koji Tabata, Atsuyoshi Nakamura, Junya Honda, Tamiki KomatsuzakiCoRR abs/1901.11200 2019
- Atsuyoshi Nakamura, David P. Helmbold, Manfred K. WarmuthInf. Comput. 269 2019 [Refereed][Not invited]
- Hideaki Kano, Junya Honda, Kentaro Sakamaki, Kentaro Matsuura, Atsuyoshi Nakamura, Masashi SugiyamaMach. Learn. 108 (5) 721 - 745 2019 [Refereed][Not invited]
- Aurélien Pélissier, Atsuyoshi Nakamura, Koji TabataProceedings of the 2019 SIAM International Conference on Data Mining, SDM 2019, Calgary, Alberta, Canada, May 2-4, 2019. SIAM 450 - 458 2019 [Refereed][Not invited]
- Ryo Watanabe, Junpei Komiyama, Atsuyoshi Nakamura, Mineichi KudoIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E101A (3) 662 - 667 1745-1337 2018/03/01 [Refereed][Not invited]

We propose a policy UCB-SC for budgeted multi-armed bandits. The policy is a variant of recently proposed KL-UCB-SC. Unlike KL-UCB-SC, which is computationally prohibitive, UCB-SC runs very fast while keeping KL-UCB-SC's asymptotical optimality when reward and cost distributions are Bernoulli with means around 0.5, which are verified both theoretically and empirically. - Takashi Takemoto, Normann Mertig, Masato Hayashi, Saki Susa-Tanaka, Hiroshi Teramoto, Atsuyoshi Nakamura, Ichigaku Takigawa, Shin-ichi Minato, Tamiki Komatsuzaki, Masanao Yamaoka2018 International Conference on ReConFigurable Computing and FPGAs, ReConFig 2018, Cancun, Mexico, December 3-5, 2018 IEEE 1 - 8 2018 [Refereed][Not invited]
- Yuya Sugie, Array,Array,Array, Hiroshi Teramoto, Array,Array, Shin-ichi Minato, Masanao Yamaoka,ArrayTheory and Practice of Natural Computing - 7th International Conference, TPNC 2018, Dublin, Ireland, December 12-14, 2018, Proceedings Springer 111 - 123 2018 [Refereed][Not invited]
- Ryo Watanabe, Junpei Komiyama, Atsuyoshi Nakamura, Mineichi KudoIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E100A (11) 2470 - 2486 1745-1337 2017/11/01 [Refereed][Not invited]

We study the budgeted multi-armed bandit problem with stochastic action costs. In this problem, a player not only receives a reward but also pays a cost for an action of his/her choice. The goal of the player is to maximize the cumulative reward he/she receives before the total cost exceeds the budget. In the classical multi-armed bandit problem, a policy called KL-UCB is known to perform well. We propose KL-UCB-SC, an extension of this policy for the budgeted bandit problem. We prove that KL-UCB-SC is asymptotically optimal for the case of Bernoulli costs and rewards. To the best of our knowledge, this is the first result that shows asymptotic optimality in the study of the budgeted bandit problem. In fact, our regret upper bound is at least four times better than that of BTS, the best known upper bound for the budgeted bandit problem. Moreover, an empirical simulation we conducted shows that the performance of a tuned variant of KL-UCB-SC is comparable to that of state-of-the-art policies such as PD-BwK and BTS. - Koji Tabata, Atsuyoshi Nakamura, Mineichi KudoIEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E100D (5) 994 - 1002 1745-1361 2017/05 [Refereed][Not invited]

We propose a heuristic approximation algorithm for the 1-median problem. The 1-median problem is the problem of finding a vertex with the highest closeness centrality. Starting from a randomly selected vertex, our algorithm repeats to find a vertex with higher closeness centrality by approximately calculating closeness centrality of each vertex using simpler spanning subgraphs, which are called k-neighbor dense shortest path graphs with shortcuts. According to our experimental results using real networks with more than 10,000 vertices, our algorithm is more than 100 times faster than the exhaustive search and more than 20 times faster than the state-of-the-art approximation algorithm using annotated information to the vertices while the solutions output by our algorithm have higher approximation ratio. - Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi NakamuraCoRR abs/1701.06134 2017
- Atsuyoshi Nakamura, Ichigaku Takigawa, Hisashi Tosaka, Mineichi Kudo, Hiroshi MamitsukaDISCRETE APPLIED MATHEMATICS 200 123 - 152 0166-218X 2016/02 [Refereed][Not invited]

We consider a frequent approximate pattern mining problem, in which interspersed repetitive regions are extracted from a given string. That is, we enumerate substrings that frequently match substrings of a given string locally and optimally. For this problem, we propose a new algorithm, in which candidate patterns are generated without duplication using the suffix tree of a given string. We further define a k-gap-constrained setting, in which the number of gaps in the alignment between a pattern and an occurrence is limited to at most k. Under this setting, we present memory-efficient algorithms, particularly a candidate-based version, which runs fast enough even over human chromosome sequences with, more than 10 million nucleotides. We note that our problem and algorithms for strings can be directly extended to ordered labeled trees. In our experiments we used both randomly synthesized strings, in which corrupted similar substrings are embedded, and real data of human chromosome. The synthetic data experiments show that our proposed approach extracted embedded patterns correctly and time-efficiently. In real data experiments, we examined the centers of 100 clusters computed after grouping the patterns obtained by our k-gap-constrained versions (k = 0, 1 and 2) and the results revealed that the regions of their occurrences coincided with around a half of the regions automatically annotated as Alu sequences by a manually curated repeat sequence database. (C) 2015 Elsevier B.V. All rights reserved. - Shunsuke Suzuki, Mineichi Kudo, Atsuyoshi Nakamura2016 IEEE INTERNATIONAL CONFERENCE ON IDENTITY, SECURITY AND BEHAVIOR ANALYSIS (ISBA) 1 - 6 2016 [Refereed][Not invited]

It is well known that taking wrong sitting posture all day long is harmful for health. However, quantifying the degree of collapse of posture is not so easy. Typically, checking the video of sitting state or examining the spinal curves in radiograph is made so far, but it needs a clinical specialist for diagnosis. In this paper, for attaining this goal more easily, we give a device usable in daily life. We measure a time series of the pressure distributions obtained from a mat with 16x16 pressure sensors put on a chair. More concretely we analyze the sitting person's move direction and the amount, and make a daily diagnosis report aiming at helping his/her own notice. Through the experiments using two subjects who spent around three hours at the chair, a possibility of automatic and daily-basis diagnosis of sitting posture is demonstrated. - Atsuyoshi Nakamura, David P. Helmbold, Manfred K. WarmuthLANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, LATA 2016 9618 412 - 423 0302-9743 2016 [Refereed][Not invited]

We study the loss version of adversarial multi-armed bandit problems with one lossless arm. We show an adversary's strategy that forces any player to suffer K - 1 - O(1/T) loss where K is the number of arms and T is the number of rounds. - Ryo Watanabe, Atsuyoshi Nakamura, Mineichi KudoOPERATIONS RESEARCH LETTERS 43 (6) 558 - 563 0167-6377 2015/11 [Refereed][Not invited]

We improved an upper bound on the expected regret of a UCB-type policy LLR for a bandit problem that repeats the following rounds: a player selects a maximal matching on a complete bipartite graph K-M,K-N and receives a reward for each component edge of the selected matching. Rewards are assumed to be generated independently of its previous rewards according to an unknown fixed distribution. Our upper bound is smaller than the best known result (Chen et al., 2013) by a factor of Theta (M-2/3). (C) 2015 Elsevier B.V. All rights reserved. - Koji Tabata, Atsuyoshi Nakamura, Mineichi KudoDISCOVERY SCIENCE, DS 2015 9356 275 - 283 0302-9743 2015 [Refereed][Not invited]

We developed an algorithm to exactly solve an influence maximization problem (MaxInf) for a two-terminal series parallel graph (TTSPG) in the independent cascade model. The class of TTSPGs can be considered as a class wider than that of trees, only for which an efficient exact solver of this problem has been developed so far. Our algorithm calculates candidate node sets in the divide-and-conquer manner keeping the number of them as small as possible by efficiently eliminating unnecessary ones in merge of subproblems' solutions. Furthermore, we propose a way of converting an arbitrary network to a TTSPG with edges important for propagation to apply our method to real networks. According to our empirical results, our method is significantly faster than the greedy approximation algorithm for MaxInf of a TTSPG. We also demonstrate improvement of solutions by converting to TTSPGs instead of trees using real networks made from DBLP datasets. - Ayako Mikami, Mineichi Kudo, Atsuyoshi NakamuraMULTIPLE CLASSIFIER SYSTEMS (MCS 2015) 9132 27 - 37 0302-9743 2015 [Refereed][Not invited]

Ensemble learning is a strong tool to strengthen weak classifiers. A large amount of diversity among those weak classifiers is a key to accelerate the effectiveness. Therefore, many diversity measures on a given training sample set have been proposed so far. However, they are almost all based on the oracle output that is one if the class predicted by the classifier is correct, zero otherwise. We point out such an oracle output scheme is not appropriate for the problems of more than two classes, and extend one of the most popular diversity measures, disagreement measure, to multi-class cases. On the other hand, the concept of margin has been recognized as an analytic tool to measure the generalization performance of a given classifier. Therefore, we analyze when some criteria for maximizing margins of an ensemble classifier over training samples are maximized under the assumption that the average accuracy of the base classifiers is constant. We also reveal the relationship between those criteria and the extended disagreement measure. As a result, it turns out that diversity is necessary not only over samples but also over predicted classes, if we want to extract the highest potential of ensemble. - Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi NakamuraTHEORETICAL COMPUTER SCIENCE 530 23 - 41 0304-3975 2014/04 [Refereed][Not invited]

In this paper we consider the problem of similar substring searching in the q-gram distance. The q-gram distance d(q)(x, y) is a similarity measure between two strings x and y defined by the number of different q-grams between them. The distance can be used instead of the edit distance due to its lower computation cost, O(|x| + |Y|) vs. O(|x||Y|). and its good approximation for the edit distance. However, if this distance is applied to the problem of finding all similar strings, in a long text t, to a given pattern p, the total computation cost is sometimes not acceptable. Ukkonen already proposed two fast algorithms: one with an array and the other with a tree. When "similar" means k or less in dq, their time complexities are O(|t|k + |P|) and O(|t| log k + |p|). respectively. In this paper, we propose two algorithms of average-case complexity O(|t| + |p|). although their worst-case complexities are still O(|t|k + |P|) and O(|t| log k + |p|). respectively. The linearity of the average-case complexity is analyzed under the assumption of random sampling of t and the condition that q is larger than a threshold. The algorithms exploit the fact that similar substrings in t are often found at very close positions if the beginning positions of the substrings are close. In the second proposed algorithm, we adopted a doubly-linked list supported by an array and a search tree to search for a list element in O(log k) time. Experimental results support their theoretical average-case complexities. (C) 2014 Elsevier B.V. All rights reserved. - Koji Ouchi, Atsuyoshi Nakamura, Mineichi KudoPATTERN RECOGNITION 47 (3) 1459 - 1468 0031-3203 2014/03 [Refereed][Not invited]

We develop efficient construction methods of a rectangle greedy cover (RGC), and evaluate its usefulness in applications. An RGC is a greedy cover of the set of given positive instances by exclusive axis-parallel hyperrectangles, namely, axis-parallel hyperrectangles that exclude all the given negative instances. An RGC is expected to be a compact classification rule with high readability because the number of its component rectangles is expected to be small and it can be seen as a disjunctive normal form, which is one of the most readable representations for us. We propose two approaches of RGC construction: enumeration approach and direct approach. In enumeration approach, the maximal exclusive positive subsets (MEPSs) are enumerated first and then an ordinary greedy set covering is done using the enumerated MEPSs. We make clear the relation between enumeration of the maximal frequent itemsets and enumeration of the MEPSs, and convert an efficient enumeration algorithm LCMmax [1] of maximal frequent itemsets to an enumeration algorithm LCMmax.R-naive of MEPSs. We also develop a more efficient version of LCMmax.R-naive, or LCMmax.R, by incorporating effective dynamic reordering of instances using excluded frequency and bit-parallel exclusiveness check. In direct approach, each component MEPS of an RGC is searched not from enumerated MEPSs but directly from the dataset that consists of the remaining uncovered positive instances and the whole negative instances. We developed an algorithm called MRF that efficiently finds an maximum-sized MEPS for given positive and negative instances. MRF is made from LCMmax.R by modifying it so as to find a maximum-sized MEPS only. An RGC is constructed by MRF repetition, that is, by repeatedly executing MRF using the remaining uncovered positive instances. According to our experimental evaluation using UCI-repository datasets, LCMmax.R was about 5-11 times faster than LCMmax.Rnaive, which indicates effectiveness of the introduced two improvements. MRF repetition, however, was significantly faster than LCMmax.R, and it was fast enough for practical usage. The experimental results using UCI-repository datasets also showed that accuracy of a nearest rectangle classifier using an RGC is close to that using the hyperrectangles output by the randomized subclass method (RSM) [2] though the number of component rectangles of an RGC is significantly smaller than the number of the hyperrectangles output by RSM. The performance of RGC was also shown to be comparable to that of the six popular classifiers including logistic regression and support vector machine. The disjunctive normal form representation of the classification rules obtained by RGC was demonstrated to be simpler and more readable for us than that obtained by RSM and C4.5. (C) 2013 Elsevier Ltd. All rights reserved. - Atsuyoshi NakamuraProceedings of the Sixth Asian Conference on Machine Learning, ACML 2014, Nha Trang City, Vietnam, November 26-28, 2014. JMLR.org 2014 [Refereed][Not invited]
- Atsuyoshi Nakamura, Tomoya Saito, Ichigaku Takigawa, Mineichi Kudo, Hiroshi MamitsukaDiscrete Applied Mathematics 161 (10-11) 1556 - 1575 0166-218X 2013/07 [Refereed][Not invited]

A string with many repetitions can be represented compactly by replacing h-fold contiguous repetitions of a string r with (r)h. We present a compact representation, which we call a repetition representation (of a string) or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined by the sum of the length of component letters and the description length of the component repetitions (ṡ)h which is defined by ER (h) using a repetition weight function wR. We develop two dynamic programming-based algorithms to solve this problem: CMR, which works for any type of wR, and CMR-C, which is faster but can be applied to a constant wR only. CMR-C is an O(n2logn)-time O(nlogn)-space algorithm, which is more efficient in both time and space than CMR by a ((logn)/n)-factor, where n is the length of the given string. The problem of finding an MRRS for a string can be extended to that of finding a minimum repetition representation (of a tree) or MRRT for a given labeled ordered tree. For this problem, we present two algorithms, CMRT and CMRT-C, by using CMR and CMR-C, respectively, as a subroutine. As well as the theoretical analysis, we confirmed the efficiency of the proposed algorithms by experiments, which consist of the following three parts: First we demonstrated that CMR-C and CMRT-C are fast enough for large-scale data by using synthetic strings and trees, respectively. The size of an MRRS for a given string can be a measure of how compactly the string can be represented, meaning how well the string is structurally organized. This is also true of trees. To check such ability of MRRS-size, second we measured the size of an MRRS for chromosomes of nine different species. We found that all the chromosomes of the same species have a similar compression rate when realized by an MRRS. Run length encoding (RLE) was also shown to have species-specific compression rate, but species were separated more clearly by MRRS than by RLE. Third we examined the size of an MRRT for web pages of world-leading companies by using the tag trees, showing a consistency between the compression rate by an MRRT and visual web page structures. © © 2013 Elsevier B.V. All rights reserved. - Koji Tabata, Atsuyoshi Nakamura, Mineichi KudoLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7569 169 - 183 0302-9743 2012 [Refereed][Not invited]

We present a fast approximation algorithm for the 1-median problem. Our algorithm can be applied to metric undirected graphs with node weight. Given a node v, our algorithm repeatedly finds a better node by making use of a shortest path tree of the previous node. We empirically show that our algorithm runs much faster and has better approximation ratio than a sophisticated existing method called DTZ. We demonstrate the effectiveness of our algorithm through experiments. © 2012 Springer-Verlag Berlin Heidelberg. - Atsuyoshi Nakamura, Hisashi Tosaka, Mineichi KudoProceedings of the 2012 IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012 54 - 59 2012 [Refereed][Not invited]

We propose a novel frequent approximate pattern mining in which occurrences themselves are valuable regions to extract. Given a string , our mining task is to enumerate its substrings that locally optimally match many substrings of . We show an algorithm for this problem whose time and space complexities are (3) for a string with length . According to our experiments using synthetic data, our algorithm can correctly extract embedded frequent patterns as both patterns and their occurrences even if the embedded patterns are corrupted by random edit operations. © 2012 IEEE. - Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi Nakamura2012 6TH INTERNATIONAL CONFERENCE ON NEW TRENDS IN INFORMATION SCIENCE, SERVICE SCIENCE AND DATA MINING (ISSDM2012) 544 - 549 2012 [Refereed][Not invited]

The q-gram distance d(q)(x, y) between two strings x and y is a string similarity measure correlated with a famous string distance: the edit distance. In addition, it can be computed much faster, in linear (O(vertical bar x vertical bar + vertical bar y vertical bar) time, than the edit distance in quadratic (O(vertical bar x parallel to y vertical bar) time, where I. I denotes the stmg length. However, it does not mean that we can find all substrmgs of a text t similar to a pattern p in linear time. In this paper we will propose a searching algorithm achieving quasi-linear (O(vertical bar t vertical bar log vertical bar p vertical bar + vertical bar p vertical bar) time by the q-gram distance. - Tetsuji Takahashi, Mineichi Kudo, Atsuyoshi NakamuraPATTERN RECOGNITION LETTERS 32 (16) 2224 - 2230 0167-8655 2011/12 [Refereed][Not invited]

We propose an algorithm to approximate each class region by a small number of approximated convex hulls and to use these for classification. The classifier is one of non-kernel maximum margin classifiers. It keeps the maximum margin in the original feature space, unlike support vector machines with a kernel. The construction of an exact convex hull requires an exponential time in dimension, so we find an approximate convex hull (a polyhedron) instead, which is constructed in linear time in dimension. We also propose a model selection procedure to control the number of faces of convex hulls for avoiding over-fitting, in which a fast procedure is adopted to calculate an upper-bound of the leave-one-out error. In comparison with support vector machines, the proposed approach is shown to be comparable in performance but more natural in the extension to multi-class problems. (C) 2011 Elsevier B.V. All rights reserved. - Takashi Saso, Kojiro Kobayashi, Atsuyoshi NakamuraINFORMATION PROCESSING LETTERS 111 (12) 595 - 599 0020-0190 2011/06 [Refereed][Not invited]

Block sorting in the Burrows-Wheeler transformation is to sort all of the n circular shifts of a string of length n lexicographically. We introduce a notion called the width of a sequence of n strings of length n and show that the values of widths are very different between the two types of sequences of strings; (1) a sequence of n randomly generated strings of length n, and (2) the sequence of n circular shifts of a randomly generated string of length n. (C) 2011 Elsevier B.V. All rights reserved. - Atsuyoshi Nakamura, Mineichi KudoADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II 6635 234 - 245 0302-9743 2011 [Refereed][Not invited]

We study an alignment called a packing alignment that is an alignment for sequences of various length events like musical notes. One event in a packing alignment can have a number of consecutive opposing events unless the total length of them exceeds the length of that one event. Instead of using a score function that depends on event length, which was studied by Mongeau and Sankoff [5], packing alignment deals with event lengths explicitly using a simple score function. This makes the problem clearer as an optimization problem. Packing alignment can be calculated efficiently using dynamic programming. As an application of packing alignment, we conducted experiments on frequent approximate pattern extraction from MIDI files of famous musical variations. The patterns and occurrences extracted from the variations using packing alignment have more appropriate boundaries than those using conventional string alignments from the viewpoints of the repetition structure of the variations. - Koji Ouchi, Atsuyoshi Nakamura, Mineichi KudoProceedings - 2011 IEEE International Conference on Granular Computing, GrC 2011 533 - 538 2011 [Refereed][Not invited]

We discuss efficient construction and usefulness of greedy covers of positive instances by axis-parallel rectangles that exclude negative instances. A rectangle greedy cover is expected to be a simple classification rule with high readability because the number of its component rectangles is expected to be small and it can be seen as a disjunctive normal form, which is one of the most readable representations. We develop efficient construction methods of rectangle greedy covers by making use of algorithms for efficient maximal frequent itemsets. We empirically demonstrate that, for high dimensional datasets, the method of finding a component rectangle one by one without enumerating candidate covering component sets is faster than the method with independent greedy covering process after the enumeration. In our classification experiments using 10 datasets in UCI repository, rectangle greedy covers (RGC) were shown to have classification performance comparable to the randomized subclass method (RSM) developed in [1], which is a conventional classification method using rectangles, though RGC used significantly smaller number of rectangles. The performance of RGC was also shown to be comparable to that of popular classifiers such as logistic regression and SVM. The DNF-representation of the classification rules obtained by RGC was demonstrated to be simpler than that obtained by RSM and C4.5 through our experiment. © 2011 IEEE. - Hiroyuki Hanada, Atsuyoshi Nakamura, Mineichi KudoProceedings - 2011 IEEE International Conference on Granular Computing, GrC 2011 231 - 236 2011 [Refereed][Not invited]

The edit distance is a basic string similarity measure for many applications such as string searching, text mining, signal processing, bioinformatics and so on. However, its high computational cost often prevents it from being used for a large set of strings like similar string searching. A promising solution for the problem is to approximate the edit distance with low computational cost. However, although there are many methods for approximating the edit distance, most of them are analyzed only theoretically. In fact, most of the methods can evaluate the edit distance only in terms of order notations, and do not conduct any experiment. This is a large obstacle for applying them to real applications. In this study we will first list up existing edit distance approxi-mation methods. Then we compare them by: (i) approximation performances shown by the original authors, (ii) approximation performances re-analyzed by us (concrete values instead of the order notations) and (iii) experimental performances by our implementations. © 2011 IEEE. - Masashi Sugiyama, Tomoyuki Higuchi, Tsuyoshi Ide, Akihiro Inokuchi, Toshihiro Kamishima, Hiroyuki Minami, Shinichi Nakajima, Atsuyoshi Nakamura, Koichi Shinoda, Koji Tsuda, Takashi WashioIEICE Transactions on Information and Systems E93-D (10) 2671 0916-8532 2010/10 [Refereed][Not invited]
- Atsuyoshi Nakamura, Tomoya Saito, Ichigaku Takigawa, Hiroshi Mamitsuka, Mineichi KudoSTRING PROCESSING AND INFORMATION RETRIEVAL 6393 185 - + 0302-9743 2010 [Refereed][Not invited]

A string with many repetitions can be written compactly by replacing h-fold contiguous repetitions of substring r with (r)h. We refer to such a compact representation as a repetition representation string or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined to be the sum of its component letter sizes and the sizes needed to describe the repetitions (.)h which are defined as w(R)(h) using a repetition weight function w(R). We develop two dynamic programming algorithms to solve the problem. One is CMR that works for any repetition weight function, and the other is CMR-C that is faster but can be applied only when the repetition weight function is constant. CMR-C is an O(w(n + z))-time algorithm using O(n + z) space for a given string with length n, where w and z are the number of distinct primitive tandem repeats and the number of their occurrences, respectively. Since w = O(n) and z = O(n log n) in the worst case, CMR-C is an O(n(2) log n)-time O(n log n)-space algorithm, which is faster than OMR by ((log n)/n)-factor. - Taishi Uchiya, Atsuyoshi Nakamura, Mineichi KudoALGORITHMIC LEARNING THEORY, ALT 2010 6331 375 - 389 0302-9743 2010 [Refereed][Not invited]

Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k(>= 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms (Exp3, ComBand [5]) in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3, called Exp3. M, performs best with respect to all the measures: it runs in O(K (log k + 1) time and O(K) space, and suffers at most O(root kTK log (K/k)) regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3. M is an extension of that proved by Auer et al. for Exp3. - Ichigaku Takigawa, Mineichi Kudo, Atsuyoshi NakamuraENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 22 (1) 101 - 108 0952-1976 2009/02 [Refereed][Not invited]

We propose a general framework for nonparametric classification of multi-dimensional numerical patterns. Given training points for each class, it builds a set cover with convex sets each of which contains some training points of the class but no points of the other classes. Each convex set has thus an associated class label, and classification of a query point is made to the class of the convex set such that the projection of the query point onto its boundary is minimal. In this sense, the convex sets of a class are regarded as "prototypes" for that class. We then apply this framework to two special types of convex sets, minimum enclosing balls and convex hulls, giving algorithms for constructing a set cover with them and for computing the projection length onto their boundaries. For convex hulls, we also give a method for implicitly evaluating whether a point is contained in a convex hull, which can avoid computational difficulty for explicit construction of convex hulls in high-dimensional space. (C) 2008 Elsevier Ltd. All rights reserved. - Satoshi Shirai, Mineichi Kudo, Atsuyoshi NakamuraMULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS 5519 22 - 31 0302-9743 2009 [Refereed][Not invited]

We compared boosting with bagging in different strengths of learning algorithms for improving the performance of the set of classifiers to be fused. Our experimental results showed that boosting worked much with weak algorithms and bagging, especially feature-based bagging, worked much with strong algorithms. On the basis of these observations we developed a mixed fusion method in which randomly chosen features are used with a standard boosting method. As a result, it was confirmed that the proposed fusion method worked well regardless of learning algorithms. - Tetsuji Takahashi, Mineichi Kudo, Atsuyoshi NakamuraPROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS 5856 441 - 448 0302-9743 2009 [Refereed][Not invited]

We consider an algorithm to approximate each class region by a small number of convex hulls and to apply them to classification. The convex hull of a finite set of points is computationally hard to be constructed in high dimensionality. Therefore, instead of the exact convex hull, we find an approximate convex hull (a polyhedron) in a time complexity that is linear in dimension. On the other hand, the set of such convex hulls is often too much complicated for classification. Thus we control the complexity by adjusting the number of faces of convex hulls. For reducing the computational time, we use an upper bound of the leave-one-out estimated error to evaluate the classifiers. - Satoshi Shirai, Mineichi Kudo, Atsuyoshi NakamuraSTRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION 5342 801 - 810 0302-9743 2008 [Refereed][Not invited]

In recent years, many approaches for achieving high performance by combining some classifiers have been proposed. We exploit, many random replicates of samples in the bagging, and randomly chosen feature subsets in the random subspace method. In this paper, we introduce a method for selecting both samples and features at, the same, time and demonstrate the effectiveness of the method. This method includes a parametric bagging and a parametric random subspace method as special cases. lit some experiments, this method and the parametric random subspace method showed the best performance. - Yohji Shidara, Mineichi Kudo, Atsuyoshi Nakamura19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6 571 - 574 1051-4651 2008 [Refereed][Not invited]

Associative classifiers that utilize association rules have been widely studied. It has been shown that associative classifiers often outperform traditional classifiers. Associative classifiers usually find only rules with high support values, because reducing the minimum support to be satisfied increases computational cost. However, rules with low support but high confidence may contribute to classification. We have proposed an approach to build a classifier composed of almost all consistent (100% confident) rules. The proposed classifier was extended by introducing item reduction and bagging in order to relax the constraint of consistency, which resulted in slightly increased performance for 26 datasets from the UCI machine learning repository. - Mineichi Kudo, Atsuyoshi Nakamura, Ichigaku Takigawa19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6 2260 - 2263 1051-4651 2008 [Refereed][Not invited]

A set of convex bodies including samples of a single class only is used for classification. The convex body is defined by some facets (hyper-planes) that separate the class front the other classes. This paper describes ail algorithm to find a set of such convex bodies efficiently and examine the performance of a classifier using them. The relationship to the support vector machines is also discussed. - Atsuyoshi Nakamura, Mineichi KudoICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS 482 - 491 1550-4786 2008 [Refereed][Not invited]

We study the problem of enumerating concepts in a Sperner family concept class using subconcept queries [7], which is a general problem including maximal frequent itemset mining as its instance. Though even the theoretically best known algorithm needs quasi-polynomial time to solve this problem in the worst case, there exist practically fast algorithms for this problem. This is because many instances of this problem in real world have low complexity in some measures. In this paper, we characterize the complexity of Sperner family concept class by the VC dimension of its intersection closure and its characteristic dimension, and analyze the worst case time complexity on the enumeration problem of its concepts in terms of the VC dimension. We also showed that the VC dimension of real data used in data mining is actually small by calculating the VC dimension of some real datasets using a new algorithm closely related to the introduced two measures, which does not only solve the problem but also let us know the VC dimension of the intersection closure of the target concept class. - Yohji Shidara, Mineichi Kudo, Atsuyoshi NakamuraTrans. MLDM 1 (1) 17 - 30 2008 [Refereed][Not invited]
- Yohji Shidara, Atsuyoshi Nakamura, Mineichi KudoMACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS 4571 490 - + 0302-9743 2007 [Refereed][Not invited]

We propose a novel approach which extracts consistent (100% confident) rules and builds a classifier with them. Recently, associative classifiers which utilize association rules have been widely studied. Indeed, the associative classifiers often outperform the traditional classifiers. In this case, it is important to collect high quality (association) rules. Many algorithms find only high support rules, because decreasing the minimum support to be satisfied is computationally demanding. However, it may be effective to collect low support but high confidence rules. Therefore, we propose an algorithm that produces a wide variety of 100% confident rules including low support rules. To achieve this goal, we adopt a specific-to-general rule searching strategy, in contrast to the previous many approaches. Our experimental results show that the proposed method achieves higher accuracies in several datasets taken from UCI machine learning repository. - Hisashi Tosaka, Atsuyoshi Nakamura, Mineichi KudoDISCOVERY SCIENCE, PROCEEDINGS 4755 286 - + 0302-9743 2007 [Refereed][Not invited]

We study a novel problem of mining subtrees with frequent occurrence of similar subtrees, and propose an algorithm for this problem. In our problem setting, frequency of a subtree is counted not only for equivalent subtrees but also for similar subtrees. According to our experiment using tag trees of web pages, this problem can be solved fast enough for practical use. An encouraging result was obtained in a preliminary experiment for data record extraction from web pages using our mining method. - M Kudo, A NakamuraFEDERATION OVER THE WEB 3847 79 - 96 0302-9743 2006 [Refereed][Not invited]

We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods. - Atsuyoshi NakamuraALGORITHMIC LEARNING THEORY, PROCEEDINGS 4264 378 - 392 0302-9743 2006 [Refereed][Not invited]

In this paper, we study learning-related complexity of linear ranking functions from n-dimensional Euclidean space to {1, 2,..., k}. We show that their graph dimension, a kind of measure for PAC learning complexity in the multiclass classification setting, is Theta(n + k). This graph dimension is significantly smaller than the graph dimension Omega(nk) of the class of {1, 2,..., k}-valued decision-list functions naturally defined using k - 1 linear discrimination functions. We also show a risk bound of learning linear ranking functions in the ordinal regression setting by a technique similar to that used in the proof of an upper bound of their graph dimension. - A Nakamura, M Schmitt, N Schmitt, HU SimonJOURNAL OF MACHINE LEARNING RESEARCH 6 1383 - 1403 1532-4435 2005/09 [Refereed][Not invited]

Bayesian networks have become one of the major models used for statistical inference. We study the question whether the decisions computed by a Bayesian network can be represented within a low-dimensional inner product space. We focus on two-label classification tasks over the Boolean domain. As main results we establish upper and lower bounds on the dimension of the inner product space for Bayesian networks with an explicitly given (full or reduced) parameter collection. In particular, these bounds are tight up to a factor of 2. For some nontrivial cases of Bayesian networks we even determine the exact values of this dimension. We further consider logistic autoregressive Bayesian networks and show that every sufficiently expressive inner product space must have dimension at least Omega(n(2)), where n is the number of network nodes. We also derive the bound 2(Omega(n)) for an artificial variant of this network, thereby demonstrating the limits of our approach and raising an interesting open question. As a major technical contribution, this work reveals combinatorial and algebraic structures within Bayesian networks such that known methods for the derivation of lower bounds on the dimension of inner product spaces can be brought into play. - A NakamuraINFORMATION AND COMPUTATION 201 (2) 178 - 198 0890-5401 2005/09 [Refereed][Not invited]

In this paper, we propose a new algorithm that exactly learns ordered binary decision diagrams (OBDDs) with a given variable ordering via equivalence and membership queries. Our algorithm uses at most it equivalence queries and at most 2n([log(2) m] + 3n) membership queries. where it is the number of nodes in the target-reduced OBDD and m is the number of variables. The upper bound on the number of membership queries is smaller by a factor of O(m) compared with that for the previous best known algorithm proposed by R. Gavalda and D. Guijarro [Learning Ordered Binary Decision Diagrams, Proceedings of the 6th International Workshop on Algorithmic Learning Theory, 1995, pp. 228 238]. (c) 2005 Elsevier Inc. All rights reserved. - Y Shidara, M Kudo, A NakamuraFoundations of Data Mining and Knowledge Discovery 6 161 - 170 1860-949X 2005 [Refereed][Not invited]

We propose a novel method for mining generalized rules with high support and confidence. Using our method, we can obtain generalized rules in which the abstraction of attribute values is implicitly carried out without the requirement of additional information such as information on conceptual hierarchies. Our experimental results showed that the obtained rules not only have high support and confidence but also have expressions that are conceptually meaningful. - Hidehiko Ino, Mineichi Kudo, Atsuyoshi NakamuraProceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10-14, 2005 ACM 661 - 669 2005 [Refereed][Not invited]
- A Nakamura, M KudoADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS 3518 850 - 860 0302-9743 2005 [Refereed][Not invited]

In this paper, we propose an efficient algorithm enumerating all frequent subtrees containing all special nodes that are guaranteed to be included in all trees belonging to a given data. Our algorithm is a modification of TreeMiner algorithm [10] so as to efficiently generate only candidate subtrees satisfying our constraints. We report mining results obtained by applying our algorithm to the problem of finding frequent structures containing the name and reputation of given restaurants in Web pages collected by a search engine. - Takigawa, I, M Kudo, A NakamuraMACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINDS 3587 90 - 99 0302-9743 2005 [Refereed][Not invited]

We propose a new nonparametric classification framework for numerical patterns, which can also be exploitable for exploratory data analysis. The key idea is approximating each class region by a family of convex geometric sets which can cover samples of the target class without containing any samples of other classes. According to this framework, we consider a combinatorial classifier based on a family of spheres, each of which is the minimum covering sphere for a subset of positive samples and does not contain any negative samples. We also present a polynomial-time exact algorithm and an incremental randomized algorithm to compute it. In addition, we discuss the soft-classification version and evaluate these algorithms by some numerical experiments. - H Hasegawa, M Kudo, A NakamuraKNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 4, PROCEEDINGS 3684 668 - 674 0302-9743 2005 [Refereed][Not invited]

We consider the problem of extracting texts related to a given keyword from Web pages collected by a search engine. Recently, we proposed a method using both structural and content information[1, 2]. In our previous paper, we reported good extraction performance of our method only for Ramen-shop dataset written in Japanese. In this paper, we examine it for datasets of other kind of restaurants, and also for a dataset written in English. We discuss some modification for performance improvement. - Hidehiko Ino, Mineichi Kudo, Atsuyoshi NakamuraProceedings - International Workshop on Biomedical Data Engineering, BMDE2005 2005 1257 2005 [Refereed][Not invited]

Recently, researches on extraction of densely connected subgraphs, which are called communities, from the graph representing link structure in WWW, are very popular. However, few methods guarantee that extracted subgraphs satisfy community conditions which are strictly defined. In this paper, we consider the problem of extracting subgraphs that strictly satisfy the community conditions proposed in [3]. It is known that finding all such communities is computationally hard. As methods that possibly find many communities efficiently, we experimentally compared two methods, a method with a Gomory-Hu tree construction and a method with calculating edge-betweenness. We also proposed evaluation criterion for ranking found communities. © 2005 IEEE. - Atsuyoshi Nakamura, Naoki AbeElectronic Commerce Research 5 (1) 75 - 98 1389-5753 2005/01 [Refereed][Not invited]

We propose and evaluate a number of improvements to the linear programming formulation of web advertisement scheduling, which we have proposed elsewhere together with our colleagues [Langheinrich et al., 9]. In particular, we address a couple of important technical challenges having to do with the estimation of click-through rates and optimization of display probabilities (the exploration-exploitation trade-off and the issue of data sparseness and sealability). as well as practical aspects that are essential for successful deployment of this approach (the issues of multi-impressions and inventory management). We propose solutions to each of these issues, and assess their effectiveness by running large-scale simulation experiments. © 2005 Springer Science + Business Media. Inc. - Takigawa, I, M Kudo, A Nakamura, J ToyamaINDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 193 - 200 0302-9743 2004 [Refereed][Not invited]

This paper studied the minimum l(1)-norm signal recovery in underdetermined source separation, which is a problem of separating n sources blindly from m linear mixtures for n. > m. Based on our previous result of submatrix representation and decision regions, we describe the property of the minimum l(1)-norm sequence from the viewpoint of source separation, and discuss how to construct it geometrically from the observed sequence and the mixing matrix, and the unstability for a perturbation of mixing matrix. - A Nakamura, M Schmitt, N Schmitt, HU SimonLEARNING THEORY, PROCEEDINGS 3120 518 - 533 0302-9743 2004 [Refereed][Not invited]

In connection with two-label classification tasks over the Boolean domain, we consider the possibility to combine the key advantages of Bayesian networks and of kernel-based learning systems. This leads us to the basic question whether the class of decision functions induced by a given Bayesian network can be represented within a low-dimensional inner product space. For Bayesian networks with an explicitly given (full or reduced) parameter collection, we show that the "natural" inner product space has the smallest possible dimension up to factor 2 (even up to an additive term 1 in many cases). For a slight modification of the so-called logistic autoregressive Bayesian network with n nodes, we show that every sufficiently expressive inner product space has dimension at least 2(n/4). The main technical contribution of our work consists in uncovering combinatorial and algebraic structures within Bayesian networks such that known techniques for proving lower bounds on the dimension of inner product spaces can be brought into play. - M Kudo, T Hosokawa, J Toyama, H Tenmoto, A Nakamura8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS 65 - 68 2004 [Refereed][Not invited]

In the past decades, person identification has been gathering a great deal of attention in a wide rage of applications. Among those techniques, many aim at achieving a high identification rate using several biometrics evidences such as fingerprint, iris, and finger vein. However these evidences require user's cooperation and are suitable for a "hard" identification in which any kind of impersonation is not allowed. In this study, we consider a "soft" identification, in which the system does require user-cooperation, at the expense of some degree of accuracy. On the contrary, the cost for gathering evidences becomes cheap and some weak evidences are used together The assumed applications are electronic devices on daily life. Through a soft identification, a device can be personalized to a user and the system provides some user-specific functions to the user In this study, a Bayesian network approach is investigated its potential for this goal. In addition, environmental evidence such as day and time are also incorporated to the system. - A Nakamura, M Kudo, A TanakaKNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS 2838 339 - 349 0302-9743 2003 [Refereed][Not invited]

We propose a new collaborative filtering method that uses restoration operators. The problem of restoration by operators was originally studied in the field of digital image restoration [9]. We also consider the problem of selecting items that users should be asked to rate in order to achieve a small expected squared error, and we propose a greedy method as a solution of this problem. According to our experimental results, prediction performance of restoration operators is good when the number of observed ratings is small, and our greedy method outperforms random query item selection. - A Nakamura, M Kudo, A Tanaka, K TanabeDISCOVERY SCIENCE, PROCEEDINGS 2843 393 - 401 0302-9743 2003 [Refereed][Not invited]

We propose a modified version of our collaborative filtering method using restoration operators, which was proposed in [6]. Our previous method was designed so as to minimize expected squared error of predictions for user's ratings, and we experimentally showed that, for users who have evaluated only small number of items, mean squared error of our method is smaller than that of correlation-base methods. After further experiments, however, we found that, for users who have evaluated many items, the best correlation-based method has smaller mean squared error than our method. In our modified version, we incorporated an idea of projecting on a low-dimensional subspace with our method using restoration operators. We experimentally showed that our modification overcame the shortcoming stated above. - A Nakamura, N AbeJOURNAL OF COMPUTER AND SYSTEM SCIENCES 65 (2) 224 - 256 0022-0000 2002/09 [Refereed][Not invited]

We consider the online learning problem for binary relations defined over two finite sets, each clustered into a relatively small number k, I of types (such a relation is termed a (k, l)-binary relation), extending the models of S. Goldman, R. Rivest, and R. Schapire (1993, SIAM J. Comput. 22, 1006-1034). We investigate the learning complexity of (k, l)-binary relations with respect to both the self-directed and adversary-directed learning models. We also generalize this problem to the learning problem for (k(l),...,k(d))-d-ary relations. In the self-directed model, we exhibit an efficient learning algorithm which makes at most kl + (n - k)log k + (m - l)log l mistakes, where n and in are the number of rows and columns, roughly twice the lower bound we show for this problem, '[log k] [log 1] + (1)/(2)(n - k) [log k] + (1)/(2)(m - 1) [log l]. In the adversary-directed model, we exhibit an efficient algorithm for the (2, 2)-binary relations, which makes at most n + m + 2 mistakes, only two more than the lower bound we show for this problem, n + m. As for (k(l),..., k(d))-d-ary relations, we obtain lower bounds and upper bounds on the number of mistakes in the self-directed model, teacher-directed model, and adversary-directed model. Finally we show that, although the sample consistency problem for (2, 2)-binary relations is solvable in polynomial time, the same problem for (2,2,2)-ternary relations is already NP-complete. (C) 2002 Elsevier Science (USA). - Atsuyoshi NakamuraProceedings of the 11th International Conference on World Wide Web, WWW '02 536 - 541 2002 [Refereed][Not invited]

We addressed two issues concerning the practical aspects of optimally scheduling web advertising proposed by Langheinrich et al. [5], which scheduling maximizes the total number of click-throughs for all banner advertisements. One is the problem of multi-impressions in which two or more banner ads are impressed at the same time. The other is inventory management, which is important in order to prevent over-selling and maximize revenue. We propose efficient methods which deal with these two issues. - A NakamuraTHEORETICAL COMPUTER SCIENCE 241 (1-2) 83 - 114 0304-3975 2000/06 [Refereed][Not invited]

In this paper, we study exact learnability of bounded-width ordered binary decision diagrams (OBDDs) when no ordering of the variables is given and learning is conducted by way of equivalence queries and membership queries. We present a learning algorithm for width-2 OBDDs, an algorithm which uses O(n(3)) equivalence queries alone, where n is the number of variables. We also present a learning algorithm for width-2 OBDDs that uses O(n) proper equivalence queries and O(n(2)) membership queries. Further, we show a negative result: that there are no polynomial-time algorithms capable of learning width-3 OBDDs from proper equivalence queries alone. (C) 2000 Elsevier Science B.V. All rights reserved. - Atsuyoshi Nakamura, Naoki Abe, Hiroshi Matoba, Katsuhiro OchiaiProceedings of the 8th ACM International Conference on Multimedia 2000, Los Angeles, CA, USA, October 30 - November 3, 2000. ACM 57 - 66 2000 [Refereed][Not invited]
- M Langheinrich, A Nakamura, N Abe, T Kamba, Y KosekiCOMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING 31 (11-16) 1259 - 1272 1389-1286 1999/05 [Refereed][Not invited]

Most online advertisement systems in place today use the concept of consumer targeting: each user is identified and, according to his or her system setup, browsing habits and available off-line information, categorized in order to customize the advertisements for highest user responsiveness. This constant monitoring of a user's online habits, together with the trend to centralize this data and link it with other databases, continuously nurtures fears about the growing lack of privacy in a networked society. In this paper, we propose a novel technique of adapting online advertisement to a user's short term interests in a non-intrusive way. As a proof-of-concept we implemented a dynamic advertisement selection system able to deliver customized advertisements to users of an online search service or Web directory. No user-specific data elements are collected or stored at any time. Initial experiments indicate that the system is able to improve the average click-through rate substantially compared to random selection methods. (C) 1999 Published by Elsevier Science B.V. All rights reserved. - N Abe, A NakamuraMACHINE LEARNING, PROCEEDINGS 12 - 21 1999 [Refereed][Not invited]

We have developed a method which learns to schedule internet banner advertisements so as to maximize the average click-through rate, while adhering to the requirements imposed by contracts with the advertisers such as a minimum guaranteed number of impressions. We focus on the problem of adaptively scheduling advertisement display probabilities as a function of a single attribute such as a search keyword. Our learning algorithm is based on an efficient solution of a special class of linear programming problems called the 'transportation problem,' and also embodies a number of measures to address the exploration-exploitation trade-off and an efficient attribute clustering method to help reduce the dimensionality. Our experimental results verify the advantage of our linear programming based approach, as well as the effect of various additional measures we incorporate into our method. - Atsuyoshi NakamuraProceedings of the Twelfth Annual Conference on Computational Learning Theory, COLT 1999, Santa Cruz, CA, USA, July 7-9, 1999 ACM 215 - 225 1999 [Refereed][Not invited]
- Atsuyoshi Nakamura, Naoki AbeProceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, USA, July 24-27, 1998 Morgan Kaufmann 395 - 403 1998 [Refereed][Not invited]
- N Abe, H Mamitsuka, A NakamuraDISCOVERY SCIENCE 1532 387 - 388 0302-9743 1998 [Refereed][Not invited]
- A Nakamura, J Takeuchi, N AbeANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE 23 (1-2) 53 - 82 1012-2443 1998 [Refereed][Not invited]

We consider a variant of the 'population learning model' proposed by Kearns and Seung [8], in which the learner is required to be 'distribution-free' as well as computationally efficient. A population learner receives as input hypotheses from a large population of agents and produces as output its final hypothesis. Each agent is assumed to independently obtain labeled sample for the target concept and output a hypothesis. A polynomial time population learner is said to PAC-learn a concept class, if its hypothesis is probably approximately correct whenever the population size exceeds a certain bound which is polynomial, even if the sample size for each agent is fixed at some constant. We exhibit some general population learning strategies, and some simple concept classes that can be learned by them. These strategies include the 'supremum hypothesis finder', the 'minimum superset finder' (a special case of the 'supremum hypothesis finder'), and various voting schemes. When coupled with appropriate agent algorithms, these strategies can learn a variety of simple concept classes, such as the 'high-low game', conjunctions, axis-parallel rectangles and others. We give upper bounds on the required population size for each of these cases, and show that these systems can be used to obtain a speed up from the ordinary PAC-learning model [11], with appropriate choices of sample and population sizes. With the population learner restricted to be a voting scheme, what we have is effectively a model of 'population prediction', in which the learner is to predict the value of the target concept at an arbitrarily drawn point, as a threshold function of the predictions made by its agents on the same point. We show that the population learning model is strictly more powerful than the population prediction model. Finally, we consider a variant of this model with classification noise, and exhibit a population learner for the class of conjunctions in this model. - A NakamuraALGORITHMIC LEARNING THEORY 1316 307 - 322 0302-9743 1997 [Refereed][Not invited]

In this paper, we propose a new algorithm which exactly learns ordered binary decision diagrams (OBDDs) with a given variable ordering via equivalence queries and membership queries. Our algorithm uses at most n equivalence queries and at most 2n([log(2) m] + 3n) membership queries, where n is the number of nodes in the target reduced OBDD and m is the number of variables. We have reduced the number of membership queries by a factor of m compared with the best known algorithm for this problem due to Gavalda and Guijarro. - Atsuyoshi NakamuraAlgorithmic Learning Theory, 7th International Workshop, ALT '96, Sydney, Australia, October 23-25, 1996, Proceedings Springer 37 - 50 1996 [Refereed][Not invited]
- Naoki Abe, Hang Li, Atsuyoshi NakamuraCoRR abs/cmp-lg/9507010 1995 [Refereed][Not invited]
- Naoki Abe, Hang Li, Atsuyoshi NakamuraMachine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, July 9-12, 1995 Morgan Kaufmann 3 - 11 1995 [Refereed][Not invited]
- Atsuyoshi Nakamura, Naoki AbeProceedings of the Eigth Annual Conference on Computational Learning Theory, COLT 1995, Santa Cruz, California, USA, July 5-8, 1995 ACM 214 - 221 1995 [Refereed][Not invited]
- Atsuyoshi Nakamura, Shinji MiuraAlgorithmic Learning Theory, 6th International Conference, ALT '95, Fukuoka, Japan, October 18-20, 1995, Proceedings Springer 138 - 150 1995 [Refereed][Not invited]
- A NAKAMURA, N ABETHEORETICAL COMPUTER SCIENCE 137 (1) 159 - 176 0304-3975 1995/01 [Refereed][Not invited]

We investigate the problem of exactly identifying a real-valued function of {0, 1}(n) by a weighted sum of a number of monotone terms by querying for the values of the target function at assignments of the learner's choice. When all coefficients are nonnegative, we exhibit an efficient learning algorithm requiring at most (n-[log s]+1)s queries, where n is the number of variables and s is the number of terms in the target formula. We prove a lower bound of Omega(ns/log s) on the number of queries necessary for learning this class, so no algorithm can reduce the number of queries dramatically. The algorithm runs in time O (ns(2)) in the worst case. The same algorithm can be used to learn the 'inductive-read-k' subclass, a proper superclass of the 'read-k' subclass, with a number of queries not exceeding 1/2((n-[log k])(n-[log k]+1)+2)k, which improves upon the bound achievable by a naive learning algorithm by a factor of two. In addition, the above method can be extended to handle the nonmonotone case in some restricted sense: A similar algorithm can learn the trnate linear combinations of terms with a comparable number of queries. In the general case, namely, when the coefficients vary over the reals (or any arbitrary field), we show that the number of queries required for exact learning of the k-term subclass is upper bounded by q(n,[log k]+1) and is lower bounded by q(n,[log k]), where q(n,l)=Sigma(i=0)(l)((n)(i)). These bounds are shown by generalizing Roth and Benedik's technique for analyzing the learning problem for k-sparse multivariate polynomials over GF(2) (Roth and Benedek, 1991) to those over an arbitrary field. - Atsuyoshi Nakamura, Naoki Abe, Jun-ichi TakeuchiAlgorithmic Learning Theory, 4th International Workshop on Analogical and Inductive Inference, AII '94, 5th International Workshop on Algorithmic Learning Theory, ALT '94, Reinhardsbrunn Castle, Germany, October 10-15, 1994, Proceedings Springer 500 - 515 1994 [Refereed][Not invited]
- Atsuyoshi Nakamura, Naoki AbeAlgorithmic Learning Theory, 4th International Workshop, ALT '93, Tokyo, Japan, November 8-10, 1993, Proceedings Springer 300 - 313 1993 [Refereed][Not invited]

- 田畑 公次, 中村 篤祥, 高見 亮佑, Joshua Arenson, 和田 弥生, Walker Peterson, 合田 圭介, 園下 将大, 小松崎 民樹 人工知能学会研究会資料 人工知能基本問題研究会 124- 25 -30 2023/03/06
- 中村 篤祥 人工知能学会研究会資料 人工知能基本問題研究会 123- 31 -36 2023/01/05
- 林 達也, 中村 篤祥 人工知能基本問題研究会 114- 1 -6 2020/11/20
- NAKAMURA Atsuyoshi, SAKURADA Kento Proceedings of the Annual Conference of JSAI 2020- 2I1GS201 -2I1GS201 2020
Min_DBN is an algorithm that minimizes the number of distinct branching conditions in a decision forest by sharing variable's thresholds assigned to different branching nodes under the condition that decision paths of given feature vectors must not be changed more than a specified percentage at each branching node. This simplification is effective for compact hardware implimentation of a decision forest with comparator sharing. In this paper, we experimentally investigate how effective the algorithm is for decision forests constructed by various ensemble learning methods (random forest, extremly random trees, AdaBoost, gradient boosting) in classification and regression. Futhermore, we improve the algorithm by using the feature vectors only used in a bagging-based learning for applying its path condition to each component tree of the ensumbles, and check its effectiveness experimentally.

- Keisuke Todo, Atsuyoshi Nakamura and Mineichi Kudo Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG) 2019/08 [Refereed][Not invited]
**A Fast Approximate Algorithm for k-Median Problem on a Graph** - 櫻田 健斗, 中村 篤祥 人工知能基本問題研究会 109- 62 -67 2019/03/13 [Not refereed][Not invited]
- 中村篤祥, ペリシエ オレリアン, 田畑公次, 小松崎民樹 日本細胞生物学会大会(Web) 71st- ROMBUNNO.1SEp‐07 (WEB ONLY) 2019 [Not refereed][Not invited]
- 田畑公次, 中村篤祥, 小松崎民樹 電子情報通信学会技術研究報告 118- (284(IBISML2018 44-104)(Web)) 353‐360 (WEB ONLY) 2018/10/29 [Not refereed][Not invited]
- 田畑 公次, 中村 篤祥, 本多 淳也, 小松崎 民樹 人工知能基本問題研究会 106- 94 -99 2018/03/16 [Not refereed][Not invited]
- Aurelien Pelissier, Atsuyoshi Nakamura, Koji Tabata CoRR abs/1811.07531- 2018 [Not refereed][Not invited]
- 安井拓未, 中村篤祥, 中村篤祥, 田中章, 田中章, 工藤峰一, 工藤峰一 情報処理学会研究報告(Web) 2017- (MUS-116) Vol.2017‐MUS‐116,No.15,1‐4 (WEB ONLY) 2017/08/17 [Not refereed][Not invited]
- 中村 篤祥 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 117- (110) 49 -54 2017/06/23 [Not refereed][Not invited]
- 中村 篤祥 人工知能基本問題研究会 103- 45 -50 2017/03/13 [Not refereed][Not invited]
- 渡辺 僚, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115- (323) 167 -174 2015/11/26 [Not refereed][Not invited]
- Helmbold David P, Nakamura Atsuyoshi, Warmuth Manfred K 人工知能基本問題研究会 98- 16 -21 2015/08/07 [Not refereed][Not invited]
- 中村 篤祥 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115- (112) 81 -86 2015/06/23 [Not refereed][Not invited]
- 渡辺 僚, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 96- 29 -34 2015/01/13 [Not refereed][Not invited]
- 中村 篤祥 人工知能基本問題研究会 93- 45 -48 2014/03/07 [Not refereed][Not invited]
- 田畑 公次, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 92- 35 -40 2014/01/30 [Not refereed][Not invited]
- 渡辺 僚, 中村 篤祥, 工藤 峰一 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 113- (198) 9 -16 2013/09/03 [Not refereed][Not invited]
- 花田 博幸, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 89- 25 -30 2013/02/28 [Not refereed][Not invited]
- 渡邊 僚, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 86- (0) 29 -34 2012/08/09 [Not refereed][Not invited]
- 田畑 公次, 中村 篤祥, 工藤 峰一 人工知能基本問題研究会 86- (0) 77 -82 2012/08/09 [Not refereed][Not invited]
- 中村 篤祥, 瀧川 一学, 戸坂 央 人工知能基本問題研究会 85- 23 -28 2012/02/02 [Not refereed][Not invited]
- TABATA Koji, NAKAMURA Atsuyoshi, KUDO Mineichi IEICE technical report. Theoretical foundations of Computing 111- (256) 7 -14 2011/10/14 [Not refereed][Not invited]

The Closeness Centrality is one of centrality measures of a node in a graph. It is calculated as the reciprocal of the sum of distances to all other nodes. In this paper, we propose a fast approximate algorithm that finds the node to maximize its closeness centrality in an undirected graph. Given a node v, the algorithm repeatedly find a node with higher closeness centrality by making use of a shortest path tree of the previous node. According to out experiment, our algorithm can find a node with almost maximum closeness centrality with high probability. - OUCHI Koji, NAKAMURA Atsuyoshi, KUDO Mineichi IEICE technical report 110- (265) 99 -104 2010/10/28 [Not refereed][Not invited]

Blumer et al. showed that the class of concepts represented by finite unions of hyper-rectangles in d-dimensional Euclidean space is polynomial-time PAC learnable for a fixed natural number d. In their proof, they constructed an algorithm which conducts a greedy covering of a given positive instances by hyper-rectangles that never cover any one of given negative instances. In this paper, we discuss the efficient implementation of their algorithm. According to our experimental results on n-class classification problems using UCI datasets, Blumer's covering algorithm, in most cases, outputs a... - UCHIYA Taishi, NAKAMURA Atsuyoshi, KUDO Mineichi IEICE technical report. Theoretical foundations of Computing 109- (195) 13 -20 2009/09/07 [Not refereed][Not invited]

Adversarial bandit problems studied by Auer et al. are the multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to those in the case when k(≧1) actions are selected at each time step. We analyze the problems in two settings of multiple plays, that is, in the settings of selections with and without repetition. In the both settings, we generalize the Exp3 family algorithms developed by them and give the generalized upper bounds on the regrets for the best fixed set of acti... - UCHIYA Taishi, NAKAMURA Atsuyoshi, KUDO Mineichi Technical report of IEICE. PRMU 108- (363) 213 -218 2008/12/11 [Not refereed][Not invited]

The multiarmed bandit problem is a problem in which a gambler chooses one arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. On the other hand, Auer et al. made no statistical assumption whatsoever about the nature of the process generating the payoffs of the slot machine. They gave solutions to the bandit ploblem in which an adversary has complate control over the payoffs. In this paper, we extend this problem to the proble... - TAKAHASHI Tetsuji, KUDO Mineichi, NAKAMURA Atsuyoshi Technical report of IEICE. PRMU 108- (363) 37 -41 2008/12/11 [Not refereed][Not invited]

It is an efficient way to approximate a class region by convex hulls of samples. However, the convex hull is computationally hard to be constructed in high dimensions. In this paper, we propose a way to construct an approximate convex hull in a linear time of dimension. In addition, we discuss on selecting an optimal classifier from a family of polyhedron classifiers derived from approximate convex hulls. - NAKAMURA Atsuyoshi, KUDO Mineichi IEICE technical report. Theoretical foundations of Computing 108- (237) 9 -16 2008/10/03 [Not refereed][Not invited]

We propose packing alignment as an alignment for sequences of lengthened symbols like musical notes. Furthermore, we consider sequences of symbols with length and end position as a model of a general musical piece in which notes can overlap, and we extend our packing alignment to that for such sequences. Using packing alignment, it might be possible that parts that you feel similar somehow are extracted automatically. According to our experiment using MIDI files of Bach's popular musical pieces, actually, parts that are non-trivially similar but are felt similar somehow were extracted. - Nakamura Atsuyoshi, Kudo Mineichi RIMS Kokyuroku 1599- (0) 162 -169 2008/05 [Not refereed][Not invited]
- 戸坂 央, 中村 篤祥, 工藤 峰一 人工知能学会全国大会論文集 22- 1 -4 2008
- NAKAMURA Atsuyoshi, KUDO Mineichi IEICE technical report. Theoretical foundations of Computing 107- (219) 35 -42 2007/09/13 [Not refereed][Not invited]

Representing a complicated region by union of simple regions is often preferred in various study areas because each component simple region is easy to understand and easy to deal with. Kudo et al. [2], [4] considered the following problem called subclass problem: given positive and negative examples in multidimensional real space, enumerate all maximal sets of positive examples for which there exists a hyper-rectangle that contains the set and does not contain any negative example. However, a randomized method that can find as many important subclasses as possible has been used so far becau... - TOSAKA Hisashi, NAKAMURA Atsuyoshi, KUDO Mineichi Technical report of IEICE. PRMU 107- (115) 7 -12 2007/06/21 [Not refereed][Not invited]

We study a novel problem of mining subtrees with frequent occurrence of similar subtrees, and propose an efficient algorithm for this problem. According to our problem setting, frequency of a subtree is counted not only for equivalent subtrees but also for similar subtrees. Our experiment showed that our algorithm is enough fast for practical use. Preliminary experiment for data record extraction using our mining method also showed encouraging result. - TOSAKA Hisashi, NAKAMURA Atsuyoshi, KUDO Mineichi IEICE technical report. Data engineering 107- (114) 7 -12 2007/06/21 [Not refereed][Not invited]

We study a novel problem of mining subtrees with frequent occurrence of similar subtrees, and propose an efficient algorithm for this problem. According to our problem setting, frequency of a subtree is counted not only for equivalent subtrees but also for similar subtrees. Our experiment showed that our algorithm is enough fast for practical use. Preliminary experiment for data record extraction using our mining method also showed encouraging result. - 中村 篤祥 人工知能基本問題研究会 63- (0) 33 -38 2006/09/08 [Not refereed][Not invited]
- Saito Tomoya, Nakamura Atsuyoshi, Kudo Mineichi 情報科学技術レターズ 5- (0) 1 -4 2006/08/21 [Not refereed][Not invited]
- 斉藤 智哉, 中村 篤祥, 工藤 峰一 数理解析研究所講究録 1489- (0) 216 -222 2006/05 [Not refereed][Not invited]
- NAKAMURA Atsuyoshi IEICE technical report. Theoretical foundations of Computing 105- (273) 49 -53 2005/09/08 [Not refereed][Not invited]

In this paper, we study how easy to learn the class of linear ranking functions from n-dimensional Euclidean space to {1, 2, …, k}. We show its graph dimension, which is considered to indicate how easy to learn, is Θ(n+k). This graph dimension is significantly smaller than the graph dimension Θ(nk) of the class of k-valued decision-list functions naturally defined using k - 1 linear discrimination functions. - 中村 篤祥 数理解析研究所講究録 1426- (0) 51 -56 2005/04 [Not refereed][Not invited]
- Nakamura Atsuyoshi, Shigezumi Takeya, Yamamoto Masaki RIMS Kokyuroku 1426- (0) 71 -77 2005/04 [Not refereed][Not invited]
- KAWATA Takehiro, KUDO Mineichi, TOYAMA Jun, NAKAMURA Atsuyoshi The transactions of the Institute of Electronics, Information and Communication Engineers. D-II 88- (3) 629 -635 2005/03/01 [Not refereed][Not invited]

OCRなどを通して得られる日本語文の認識結果において, N-gram確率を利用した高速な誤認識文字検出法を提案する.日本語のように単語が分かち書きされず大規模な語彙を対象とした場合, 誤り個所の指摘に文字N-gramは有効な方法である.本論文ではまず, 通常のN-gram確率の拡張として両方向N-gram確率を提案し, その有効性を情報量の点から考察する.次に, 両方向N-gram確率と文脈確率を用いて1文字の誤字を検出する方法を提案する.シミュレーション実験では, 適合率80%において従来法よりも10%以上高い約75%の再現率を達成できた.また, 誤り範囲の指摘という点では, 適合率80%で再現率90%が達成された. - NAKAMURA Atsuyoshi, KUDO Mineichi IPSJ SIG Notes 2004- (101) 67 -74 2004/10/14 [Not refereed][Not invited]

In this paper, we propose an efficient algorithm enumerating all frequent subtrees containing all special nodes that are guaranteed to be included in all trees belonging to a given data. Our algorithm is a modification of TreeMiner algorithm [9] so as to efficiently generate only candidate subtrees satisfying our constraints. We also propose a space saving method for a set of trees with a lot of nodes having the same label. We report mining results obtained by applying our algorithm to the problem of finding frequent structures containing the name and reputation of given restaurants in Web ... - NAKAMURA Atsuyoshi, KUDO Mineichi IEICE technical report. Theoretical foundations of Computing 104- (339) 7 -14 2004/10/08 [Not refereed][Not invited]

In this paper, we propose an efficient algorithm enumerating all frequent subtrees containing all special nodes that are guaranteed to be included in all trees belonging to a given data. Our algorithm is a modification of TreeMiner algorithm [9] so as to efficiently generate only candidate subtrees satisfying our constraints. We also propose a space saving method for a set of trees with a lot of nodes having the same label. We report mining results obtained by applying our algorithm to the problem of finding frequent structures containing the name and reputation of given restaurants in Web ... - KAWATA Takehiro, NAKAMURA Atsuyoshi, TOYAMA Jun, KUDO Mineichi Technical report of IEICE. PRMU 103- (295) 1 -5 2003/09/01 [Not refereed][Not invited]

The method of detecting the wrong characters and correction is proposed to improve recognition precision in character recognition. We pay attention to a technique in which N-gram probabilities are used, and propose a method which detects misspelled characters by both direction N-gram probabilities. Context probability is also introduced to be incorporated with estimated probabilities transitional patterns in both direction N-gram probabilities and context probabilities at the misspelling point. Moreover, we discuss the causes why some misspelled characters are not detected in this method. I... - SHIDARA Yohji, NAKAMURA Atsuyoshi, KUDO Mineichi Technical report of IEICE. PRMU 103- (295) 7 -11 2003/09/01 [Not refereed][Not invited]

The methods which extract rules from database are useful for mining because of their human-readable output. Various methods have been proposed which construct some accurate classifiers and extract the rules from these classifiers. However, these accurate classifiers do not always provide the interesting rules. We propose the rule-mining method based on abstraction of attribute values and screening technique to extract rules with high readability. We also discuss the relationship between the discrimination accuracy and the degree of interest of the rules - Nakamura Atsuyoshi 応用数理 12- (4) 401 -410 2002/12/25 [Not refereed][Not invited]

We explain automatic recommendation techniques by which computer systems can automatically select goods or documents preferred by each user based on his/her buying or accessing history. Especially, we focus on the collaborative filtering method using weighted majority prediction algorithm developed by the authors. From both theoretical and experimental points of view, we explain how excellent the method is compared to the method using correlation coefficient, which is the most popular collaborative filtering method. We also address issues that we should consider in order to make our method ... - LANGHEINRICH Marc, KAMBA Tomonari, NAKAMURA Atsuyoshi IPSJ Magazine 40- (8) 807 -812 1999/08/15 [Not refereed][Not invited]
- ABE Naoki, NAKAMURA Atsuyoshi IPSJ Magazine 38- (7) 558 -561 1997/07/15 [Not refereed][Not invited]
- ABE Naoki, NAKAMURA Atsuyoshi IPSJ Magazine 38- (7) 575 -582 1997/07/15 [Not refereed][Not invited]
- Nakamura Atsuyoshi, Mamitsuka Hiroshi, Toba Hiroyasu, Abe Naoki 全国大会講演論文集 第52回平成8年前期- (1) 55 -56 1996/03/06 [Not refereed][Not invited]

ニュース記事などに対する個人の嗜好を、記事中の出現単語から予測する方法において、各単語にlつのブール変数を割り当て、その実数係数多項式で嗜好関数を表現する方法を提案し、その係数の学習アルゴリズムについて考察及び実験を行う。特に、計算論的学習理論において盛んに研究されているオンライン学習における重みの逐次更新法を、実数係数の学習に適用する。具体的には、Kivinen & Warmuthが提案・解析した加法的更新法GDと乗法的更新法EG^±に加え、新しく「誤差比例修正法」と呼ぶ重み更新アルゴリズムを提案し、その乗法的更新法であるDPMUについて、ある被験者の実データを用いて実験的に予測性能を評価・比較する。実験結果によればDPMUは、既存の方式と同等以上の予測性能を有する。 - Naoki Abe, Hang Li, Atsuyoshi Nakamura Proc. of The 12th Int. Conf. on Machine Learning, 1995 1995/07/24 [Not refereed][Not invited]

We consider the problem of learning a certain type of lexical semantic

knowledge that can be expressed as a binary relation between words, such as the

so-called sub-categorization of verbs (a verb-noun relation) and the compound

noun phrase relation (a noun-noun relation). Specifically, we view this problem

as an on-line learning problem in the sense of Littlestone's learning model in

which the learner's goal is to minimize the total number of prediction

mistakes. In the computational learning theory literature, Goldman, Rivest and

Schapire and subsequently Goldman and Warmuth have consider... - Nakamura Atsuyoshi IEICE technical report. Theoretical foundations of Computing 94- (181) 93 -102 1994/07/25 [Not refereed][Not invited]

We study the learning problem in which the learner predicts the value of each entry of d-dimensional{0,1}-valued matrix one by one. According to the agent who selects the next prediction entry,we consider this problem in two different models,self-directed and adversary-directed.In self-directed model,in which the learner selects entries,we show an efficient algorithm which makes at most Π^d_j=1>k_j+Σ^d_j=1>(n_j-k_j)log k_j mistakes when the target m atrix has n_j elements for its j-th direction and those elements are classified into at most k_j types by the values of matrix entries.In adver... - Nakamura Atsuyoshi IEICE technical report. Theoretical foundations of Computing 93- (249) 53 -58 1993/09/24 [Not refereed][Not invited]

We investigate PAC learnability of planar rectangles of arbitrary orientation.We show an algorithm which outputs a hypothesis rectangle consistent with given m examples in time Ο(ml ogm).Since the class of planar rectangles of arbitrary oprientation has finite VC dimension,by the famous theorem of Blumer et al.,this algorithm is a distribution-free PAC learning algorithm whose sample complexity is Ο(1, εlog1/(εδ))for the ac curacy parameter ε and the confidence parameter δ.We also show th at the algorithm which outputs the minimum area rectangle containing all positive examples,which is kno... - Nakamura Atsuyoshi 全国大会講演論文集 第46回平成5年前期- (3) 35 -36 1993/03/01 [Not refereed][Not invited]

n次元実数領域[0,1)^n上の部分領域を、[0,1)^n上の一様分布に従って得られる点及びその点が部分領域の内外どちらかという情報から部分領域を学習する問題を扱う。但し、部分領域は次の条件を満たすものに制限する。([0,1)^n上の点を(x_1,x_2,・・・,x_n)とする。)1)原子論理式が(x_i⋴R)という形のDNF(積和標準形)の論理式で表現される。2)Rは区間[0,1)を高々k(固定)個の領域に分割するような領域∪^l_<i=1>[a_i,b_i)とする。3)各x_iは論理式の中で高々1度しか現れない。[2]では、分布を一様分布に固定したPAC(Probably Approximately Correct)学習モデルにおいてμDNF(各変数が高々1度しか現れないDNF)プール式で表現されるプール関数が学習可能であることが示されている。本稿は上記2)の制約の下で彼らの結果の定義域を{0,1}^nから[0,1}^nに拡張したものである。

- THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS INFORMATION PROCESSING SOCIETY OF JAPAN

- Japan Society for the Promotion of Science：Grants-in-Aid for Scientific ResearchDate (from‐to) : 2022/04 -2026/03Author : 畑埜 晃平, 瀧本 英二, 中村 篤祥, Hashima Sherief
- Japan Society for the Promotion of Science：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)Date (from‐to) : 2019/04 -2024/03Author : 工藤 峰一, 今井 英幸, 中村 篤祥今期は、不頻出事象の予測を抽象化したロングテール問題について基礎的検討を重ねた。ロングテールの特徴づけとして、極度にインバランス（サンプル数がクラスによって大きく異なる）であること、マルチラベル分類が多いこと、から、マルチラベル分類とインバランス問題の解決に注力した。さらに、実世界の不頻出事象に関連して、希少疾患予測の検討および独居老人の異変行動シミュレーションを行った。詳細を以下に示す。

・インバランスが顕著なマルチクラス問題を扱うため、2種類のクラス決定木（葉が一つクラスである決定木）を検討した。クラス集合対としてバランスをとるよう木を構成することでサンプル数の少ないクラスの検出精度を高めることに成功した。さらに、サンプルの少ないクラスが「次元の呪い」を受けやすいことを考慮して、特徴選択を少なくするようにクラス決定木を構成することで少数クラスのみならず全体の性能を向上させられることを示した。解釈可能性も大きく改善した。 ・希少疾患の診断において、１）医者の”希少疾患見過ごし”を防ぐために可視化手法を利用することを提案した、さらに、２）特徴選択により希少疾患の診断精度を向上させられることを確認した。 ・不頻出事象の予測に関して、極値論とサポートベクターマシンを利用した未知クラスの発見手法を提案し、これまでの手法よりよい性能を達成した。 ・独居老人の異変検知手法を開発する前段階として、異変を含む仮想住人の行動シミュレータを開発した。高齢者を実際のスマートホームに住まわせ、転倒などの異変が起きるのを待つわけにはいかないため、シミュレータは必須である。また、長期間に渡るセンサデータの採取や多様な生活行動を扱う上でも有効である。残念ながらこれまでの方式で複数の異変を適切に扱えるものがなかったため、確率モデルを利用したシミュレータを新たに開発した。 **バンディット問題の方策の実用化のための理論の深化**Japan Society for the Promotion of Science：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)Date (from‐to) : 2019/04 -2023/03Author : 中村 篤祥, 田畑 公次, 工藤 峰一- Japan Society for the Promotion of Science：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)Date (from‐to) : 2018/06 -2023/03Author : 小松崎 民樹, 中村 篤祥, 小野 峻佑ある細胞と別の細胞の主従関係を推定する場合，それら2つの細胞の軌跡データなどを用いて評価されているが，2つの細胞の振る舞いを決める因子が，その2つの細胞以外にも，第3の細胞が介在する状況なども考えられる。主従関係や因果関係における“原因”と“結果”を解析するためには，単純な対の組み合わせで表現できない多体の相互作用から成り立っている。そのため，多体のあいだの因果関係を推定することは要素間の組み合わせの数が膨大になる。2対の軌跡データを変数とする移動エントロピーを細分化した新しい相乗情報量に着眼し，改良Vicsekモデルに基づいてこの問題を考察し、移動エントロピーに含まれる相乗情報量の振る舞いを解析することで，要素間の多体の相互作用が推定できることを見いだした（Sci Adv 2022）。 昨年度開発した、細胞毎の発火状態時系列からの伝搬時差推定をギャップを挿入する文字列のアライメントで行い伝搬グラフを構築する手法を発展させ、DTW(Dynamic Time Warping)ベースの実数値列アライメントを用いて伝搬グラフを構築する方法を開発した。位置情報を用いないで伝搬関係を推定できるため、細胞間の信号の伝搬のみでなく、株価の他の株価への影響の伝搬関係など曖昧な伝搬関係の抽出にも使えることを株価データを用いて確認した。 シンギュラリティ成分検出の前処理としてイメージングデータからノイズや外乱を取り除くための正則化手法について研究を進めた。具体的には、イメージングデータに内在するスパース性を引き出す変換を構成し、スパース性を評価するノルムと組み合わせて正則化関数を設計した。加えて、この正則化を含む最適化問題としてノイズ・外乱除去問題を定式化し、これを解くアルゴリズムを開発した。最後に、実際のハイパースペクトルイメージングデータを用いた大規模な比較実験によって有効性を実証した。
- Japan Society for the Promotion of Science：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)Date (from‐to) : 2015/04 -2018/03Author : Kudo MineichiWe have dealt with "multi-label classification" problems where an object is assigned multiple labels. This study aimed at raising the classification performance and speeding up without degradation of performance. Our achievement is three of the following. First, we have pointed out the importance of the correlation between labels and showed several ways using it. Second, to keep a realistic processing time, we showed that the problem division of samples on the basis of their features or labels in some experimental results. Last, we pointed out the necessity of a special treatment on labels that appear rarely or have been forgotten to assign.
- Japan Society for the Promotion of Science：Grants-in-Aid for Scientific ResearchDate (from‐to) : 2013 -2015Author : Nakamura Atsuyoshi, KUDO Mineichi, TAKIGAWA Ichigaku, MAMITSUKA Hiroshi, KIDA Takuya, OKUBO YoshiakiWe developed an algorithm for enumerating frequent approximate string patterns, and proposed a method of extracting occurrence regions of the enumerated patterns as a method of extracting interspersed repetitive elements in a huge sequence like a DNA sequence. Patterns of proposed methods have occurrences of clear boundaries, so there is little chance to count essentially the same region more than once. Furthermore, our enumeration algorithm runs very fast and with small memory. According to our empirical results using human chromosome 21, a half of the known Alu regions, which are famous interspersed repetitive elements, is extracted as occurrence regions of 100 representative patterns that were selected from enumerated frequent approximate patterns.
- Ministry of Education, Culture, Sports, Science and Technology：Grants-in-Aid for Scientific Research(基盤研究(C))Date (from‐to) : 2009 -2011Author : Atsuyoshi NAKAMURA, Mineichi KUDO, Jun TOYAMAAs a data-dependent complexity measure, we proposed data-dependent VC dimension of Sperner family, and in hyper-rectangle subclass problem, we developed a fast algorithm for datasets with small such VC dimension. We also empirically showed efficiency of the algorithm using real data. Furthermore, as a complexity measure of contiguous repetition structure of a string, we proposed the size of a minimum repetition representation string (MRRS) for a given string. We developed a fast algorithm for constructing an MRRS and analyzed the repetition structure of DNA sequences using the algorithm.
- Ministry of Education, Culture, Sports, Science and Technology：Grants-in-Aid for Scientific Research(基盤研究(C))Date (from‐to) : 2006 -2007Author : Jun TOYAMA, 工藤 峰一, 中村 篤祥1) Classification of Features: The knowledge that is necessary for judgments are not numerical score like a real number but rough categories. A new concept "granularity" based on the rough categories was introduced. We interpreted discrete data as a grouping problem. Therefore an algorithm for discovering semi-optical answer for a grouping problem was proposed.2) Discovering Prototype: The features that is necessary for judgments are not all knowledge and experiments but some abstract data. From this point of view, an algorithm that discover typical prototype in enormous data was proposed. ...
- Ministry of Education, Culture, Sports, Science and Technology：Grants-in-Aid for Scientific Research(基盤研究(B))Date (from‐to) : 2002 -2005Author : Mineichi KUDO, 林 裕樹, 天元 宏, 外山 淳, 今井 英幸, 村井 哲也, 田中 章, 中村 篤祥, 天元 宏, 林 裕樹In this study, we classified the scaling problems of pattern recognition tasks into three of the following. The results are shown below, respectively.1.Scaling problem as for data number : We have changed the problem setting from the problem to achieve as high (predictive) performance as possible to the problem to attain as least computational time as possible keeping no large damage in performance. Especially, 1)a fast k-nearest neighbor was invented, and 2)a one-pass algorithm for prototype acquisition was discussed, though they are still under progress.2.Scaling problem as for dimensiona...

- 特開2007-047974:情報抽出装置および情報抽出方法 2007/02/22中村 篤祥, 工藤 峰一 国立大学法人 北海道大学 200903056382365616
- 特開2007-018350:情報検索装置、情報検索方法および情報検索プログラム 2007/01/25工藤 峰一, 村井 哲也, 中村 篤祥 国立大学法人 北海道大学 200903068491410832
- 中村 篤祥, 工藤 峰一, 外山 淳, 野中 秀俊 国立大学法人 北海道大学 200903085372779846
- 工藤 峰一, 中村 篤祥, 田中 章 国立大学法人 北海道大学 200903036892831793
- 特許第3389948号:表示広告選択システム 2003/01/17中村 篤祥, 安倍 直樹 日本電気株式会社 201103085685129786
- 特許第3284972号:情報フィルタリング方法及び方式 2002/03/08中村 篤祥 日本電気株式会社 201103082867695257
- 特開2001-285765:放送番組蓄積方式 2001/10/12中村 篤祥, 安倍 直樹, 落合 勝博, 的場 ひろし 日本電気株式会社 200903096301806712
- 特開2000-163477:表示広告選択方法 2000/06/16中村 篤祥, 安倍 直樹 日本電気株式会社 200903078804040603
- 特開2000-148675:カスタマイズされた広告をWWW上で提供する装置及び方法 2000/05/30マーク, ラングハインリッヒ, 中村 篤祥 日本電気株式会社 200903038845760600
- 特開平11-328275:情報フィルタリング方法及び方式 1999/11/30中村 篤祥 日本電気株式会社 200903016672941902