Nakamura Atsuyoshi

Faculty of Information Science and Technology Computer Science and Information Technology Knowledge Software ScienceProfessor
Last Updated :2024/12/06

■Researcher basic information

Researchmap personal page

Researcher number

  • 50344487

Research Keyword

  • アルゴリズム
  • 汎化性能
  • Online Leaning
  • Active Learning
  • Computational Learning Theory
  • 機械学習
  • データマイニング

Research Field

  • Informatics, Intelligent informatics
  • Informatics, Intelligent robotics
  • Informatics, Perceptual information processing

■Career

Career

  • Apr. 2021 - Present
    Hokkaido University, Graduate School of Information Science and Technology, Professor
  • Jul. 2002 - Mar. 2021
    Hokkaido University, Graduate School of Information Science and Technology, Associate Professor
  • Apr. 1988 - Jun. 2002
    NEC Corporation

Educational Background

  • Feb. 2000, Tokyo Institute of Technology, Graduate School of Science and Engineering, 博士(理学)
  • Apr. 1986 - Mar. 1988, Tokyo Institute of Technology, 大学院理工学研究科, 情報科学専攻
  • Apr. 1982 - Mar. 1986, Tokyo Institute of Technology, School of Science, Department of Information Science

■Research activity information

Awards

  • 2013, 電子情報通信学会, 情報・システムソサイエティ査読功労賞               
    中村篤祥
  • 2013, 電子情報通信学会, 情報・システムソサイエティ活動功労賞               
    中村篤祥

Papers

  • On-the-fly Raman microscopy guaranteeing the accuracy of discrimination.
    Koji Tabata, Hiroyuki Kawagoe, J Nicholas Taylor, Kentaro Mochizuki, Toshiki Kubo, Jean-Emmanuel Clement, Yasuaki Kumamoto, Yoshinori Harada, Atsuyoshi Nakamura, Katsumasa Fujita, Tamiki Komatsuzaki
    Proceedings of the National Academy of Sciences of the United States of America, 121, 12, e2304866121, 19 Mar. 2024, [International Magazine]
    English, Scientific journal, Accelerating the measurement for discrimination of samples, such as classification of cell phenotype, is crucial when faced with significant time and cost constraints. Spontaneous Raman microscopy offers label-free, rich chemical information but suffers from long acquisition time due to extremely small scattering cross-sections. One possible approach to accelerate the measurement is by measuring necessary parts with a suitable number of illumination points. However, how to design these points during measurement remains a challenge. To address this, we developed an imaging technique based on a reinforcement learning in machine learning (ML). This ML approach adaptively feeds back "optimal" illumination pattern during the measurement to detect the existence of specific characteristics of interest, allowing faster measurements while guaranteeing discrimination accuracy. Using a set of Raman images of human follicular thyroid and follicular thyroid carcinoma cells, we showed that our technique requires 3,333 to 31,683 times smaller number of illuminations for discriminating the phenotypes than raster scanning. To quantitatively evaluate the number of illuminations depending on the requisite discrimination accuracy, we prepared a set of polymer bead mixture samples to model anomalous and normal tissues. We then applied a home-built programmable-illumination microscope equipped with our algorithm, and confirmed that the system can discriminate the sample conditions with 104 to 4,350 times smaller number of illuminations compared to standard point illumination Raman microscopy. The proposed algorithm can be applied to other types of microscopy that can control measurement condition on the fly, offering an approach for the acceleration of accurate measurements in various applications including medical diagnosis.
  • Query learning algorithm for ordered multi-terminal binary decision diagrams.
    Atsuyoshi Nakamura
    Discret. Appl. Math., 352, 69, 87, 2024
    Scientific journal
  • Gaussian process classification bandits.
    Tatsuya Hayashi, Naoki Ito, Koji Tabata, Atsuyoshi Nakamura, Katsumasa Fujita, Yoshinori Harada, Tamiki Komatsuzaki
    Pattern Recognit., 149, 110224, 110224, 2024
    Scientific journal
  • Differentially Private Streaming Data Release Under Temporal Correlations via Post-processing.
    Xuyang Cao, Yang Cao 0011, Primal Pappachan, Atsuyoshi Nakamura, Masatoshi Yoshikawa
    DBSec, 184, 200, 2023
    International conference proceedings
  • Posterior Tracking Algorithm for Classification Bandits.
    Koji Tabata, Junpei Komiyama, Atsuyoshi Nakamura, Tamiki Komatsuzaki
    AISTATS, 10994, 11022, PMLR, 2023
    International conference proceedings
  • Efficient alignment-based average delay time estimation in fluctuating delayed propagation
    Atsuyoshi Nakamura, Tatsuya Hayashi
    Array, 100240, 100240, Elsevier BV, Jul. 2022
    Scientific journal
  • Propagation graph estimation from individuals' time series of observed states.
    Tatsuya Hayashi, Atsuyoshi Nakamura
    Scientific reports, 12, 1, 6078, 6078, 12 Apr. 2022, [Peer-reviewed], [International Magazine]
    English, Scientific journal, Various things propagate through the medium of individuals. Some individuals follow the others and take the states similar to their states a small number of time steps later. In this paper, we study the problem of estimating the state propagation order of individuals from the real-valued state sequences of all the individuals.We propose a method of constructing a state propagation graph from individuals' time series of observed states. The propagation order estimated by our proposed method is demonstrated to be significantly more accurate than that by a baseline method (optimal constant delay model) for our synthetic datasets, and also to be consistent with visually recognizable propagation orders for the dataset of Japanese stock price time series and biological cell firing state sequences.
  • Boosting Utility of Differentially Private Streaming Data Release under Temporal Correlations.
    Xuyang Cao, Yang Cao 0011, Masatoshi Yoshikawa, Atsuyoshi Nakamura
    Big Data, 6605, 6607, 2022
    International conference proceedings
  • Gaussian Process Classification Bandits.
    Tatsuya Hayashi, Naoki Ito, Koji Tabata, Atsuyoshi Nakamura, Katsumasa Fujita, Yoshinori Harada, Tamiki Komatsuzaki
    CoRR, abs/2212.13157, 2022
    Scientific journal
  • An Explainable Recommendation Based on Acyclic Paths in an Edge-Colored Graph.
    Kosuke Chinone, Atsuyoshi Nakamura
    AI, 40, 52, 2022, [Peer-reviewed]
    International conference proceedings
  • Minor-embedding heuristics for large-scale annealing processors with sparse hardware graphs of up to 102,400 nodes
    Yuya Sugie, Yuki Yoshida, Normann Mertig, Takashi Takemoto, Hiroshi Teramoto, Atsuyoshi Nakamura, Ichigaku Takigawa, Shin-ichi Minato, Masanao Yamaoka, Tamiki Komatsuzaki
    Soft Computing, 25, 3, 1731, 1749, Springer Science and Business Media LLC, Feb. 2021, [Peer-reviewed]
    Scientific journal
  • Multi-objective Spatiotemporal Optimization of Transportation and Power Management by using Multiple Electric Vehicles in Nanogrid Networks
    Hiroshi Uchigaito, Mamoru Okamoto, Gleb Astashkin, Yuki Furubayashi, Norihiro Obata, Thantip Krasienapibal, Hiroshi Teramoto, Yuta Mizuno, Masato Kobayashi, Atsuyoshi Nakamura, Tamiki Komatsuzaki, Takashi Takemoto
    2021 IEEE Electrical Power and Energy Conference, EPEC 2021, 365, 372, 2021, [Peer-reviewed]
    International conference proceedings, For achieving a low-carbon society and providing stable energy supply to local areas, we propose a distributed energy system in which multiple small grids are installed in a distributed manner in an area. Multiple electric vehicles located in the area are used as mobile batteries for power leveling between grids and used as regional transportation. To construct this system, it is necessary to provide practical solutions in accordance with the needs in various areas by adjusting the balance between optimization of power management and transportation, which are in a trade-off relationship, while dealing with uncertainties such as climate change and time-varying transportation demand in a short time. In this paper, we propose a multi-objective spatiotemporal optimization algorithm for solving the above problem. We compared the performance of the proposed algorithm with a vehicle-routing-problem solver of Google OR-Tools, demonstrating that the transportation rate increased by 13% and sum of power excess and shortage in grids decreased by 87%. The proposed algorithm can also adjust the balance between optimizations of transportation and electricity-management, suggesting that it can be applied in various areas.
  • Classification Bandits: Classification Using Expected Rewards as Imperfect Discriminators
    Koji Tabata, Atsuyoshi Nakamura, Tamiki Komatsuzaki
    PAKDD 2021 Workshop on Machine Learning for MEasurement INformatics (MLMEIN 2021), 57, 69, Springer International Publishing, 2021, [Peer-reviewed]
    In book
  • Data-Dependent Conversion to a Compact Integer-Weighted Representation of a Weighted Voting Classifier.
    Mitsuki Maekawa, Atsuyoshi Nakamura, Mineichi Kudo
    ACML 2020, 241, 256, PMLR, Nov. 2020, [Peer-reviewed]
    International conference proceedings
  • Efficiently Enumerating Substrings with Statistically Significant Frequencies of Locally Optimal Occurrences in Gigantic String
    Atsuyoshi Nakamura, Ichigaku Takigawa, Hiroshi Mamitsuka
    Proceedings of the AAAI Conference on Artificial Intelligence, 34, 04, 5240, 5247, Association for the Advancement of Artificial Intelligence (AAAI), 03 Apr. 2020, [Peer-reviewed], [Lead author]
    Scientific journal, We propose new frequent substring pattern mining which can enumerate all substrings with statistically significant frequencies of their locally optimal occurrences from a given single sequence. Our target application is genome sequences, around a half being said to be covered by interspersed and consecutive (tandem) repeats, and detecting these repeats is an important task in molecular life sciences. We evaluate the statistical significance of frequent substrings by using a string generation model with a memoryless stationary information source. We combine this idea with an existing algorithm, ESFLOO-0G.C (Nakamura et al. 2016), to enumerate all statistically significant substrings with locally optimal occurrences. We further develop a parallelized version of our algorithm. Experimental results using synthetic datasets showed the proposed algorithm achieved far higher F-measure in extracting substrings (with various lengths and frequencies) embedded in a randomly generated string with noise, than conventional algorithms. The large-scale experiment using the whole human genome sequence with 3,095,677,412 bases (letters) showed that our parallel algorithm covers 75% of the whole positions analyzed, around 4% and 24% higher than the recent report and the current cutting-edge knowledge, implying a biologically unique finding.
  • An Algorithm for Reducing the Number of Distinct Branching Conditions in a Decision Forest
    Atsuyoshi Nakamura, Kento Sakurada
    Machine Learning and Knowledge Discovery in Databases - European Conference, ECMLPKDD2019, 578, 589, Springer International Publishing, 2020, [Peer-reviewed], [Lead author]
    In book
  • Hardware/Algorithm Co-optimization for Fully-Parallelized Compact Decision Tree Ensembles on FPGAs.
    Taiga Ikeda, Kento Sakurada, Atsuyoshi Nakamura, Masato Motomura, Shinya Takamaeda-Yamazaki
    Applied Reconfigurable Computing. Architectures, Tools, and Applications - 16th International Symposium(ARC), 345, 357, Springer, 2020, [Peer-reviewed]
    International conference proceedings
  • A bad arm existence checking problem: How to utilize asymmetric problem structure?
    Tabata, Koji, Nakamura, Atsuyoshi, Honda, Junya, Komatsuzaki, Tamiki
    MACHINE LEARNING, 109, 2, 327, 372, SPRINGER, Oct. 2019, [Peer-reviewed], [Corresponding author]
    English, Scientific journal, We study a bad arm existence checking problem in a stochastic K-armed bandit setting, in which a player's task is to judge whether a positive arm exists or all the arms are negative among given K arms by drawing as small number of arms as possible. Here, an arm is positive if its expected loss suffered by drawing the arm is at least a given threshold theta(U), and it is negative if that is less than another given threshold theta(L) (<= theta(U)). This problem is a formalization of diagnosis of disease or machine failure. An interesting structure of this problem is the asymmetry of positive and negative arms' roles; finding one positive arm is enough to judge positive existence while all the arms must be discriminated as negative to judge whole negativity. In the case with Delta = theta(U) - theta(L) > 0, we propose elimination algorithms with arm selection policy (policy to determine the next arm to draw) and decision condition (condition to conclude positive arm's existence or the drawn arm's negativity) utilizing this asymmetric problem structure and prove its effectiveness theoretically and empirically.
  • A Bad Arm Existence Checking Problem.
    Koji Tabata, Atsuyoshi Nakamura, Junya Honda, Tamiki Komatsuzaki
    CoRR, abs/1901.11200, 2019
    Scientific journal
  • Mistake bounds on the noise-free multi-armed bandit game.
    Atsuyoshi Nakamura, David P. Helmbold, Manfred K. Warmuth
    Inf. Comput., 269, 2019, [Peer-reviewed], [Lead author]
    Scientific journal
  • Good arm identification via bandit feedback.
    Hideaki Kano, Junya Honda, Kentaro Sakamaki, Kentaro Matsuura, Atsuyoshi Nakamura, Masashi Sugiyama
    Mach. Learn., 108, 5, 721, 745, 2019, [Peer-reviewed]
    Scientific journal
  • Feature selection as Monte-Carlo Search in Growing Single Rooted Directed Acyclic Graph by Best Leaf Identification.
    Aurélien Pélissier, Atsuyoshi Nakamura, Koji Tabata
    Proceedings of the 2019 SIAM International Conference on Data Mining, SDM 2019, Calgary, Alberta, Canada, May 2-4, 2019., 450, 458, SIAM, 2019, [Peer-reviewed]
    International conference proceedings
  • UCB-SC: A fast variant of KL-UCB-SC for budgeted multi-armed bandit problem
    Ryo Watanabe, Junpei Komiyama, Atsuyoshi Nakamura, Mineichi Kudo
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E101A, 3, 662, 667, Institute of Electronics, Information and Communication, Engineers, IEICE, 01 Mar. 2018, [Peer-reviewed]
    English, Scientific journal, We propose a policy UCB-SC for budgeted multi-armed bandits. The policy is a variant of recently proposed KL-UCB-SC. Unlike KL-UCB-SC, which is computationally prohibitive, UCB-SC runs very fast while keeping KL-UCB-SC's asymptotical optimality when reward and cost distributions are Bernoulli with means around 0.5, which are verified both theoretically and empirically.
  • FPGA-Based QBoost with Large-Scale Annealing Processor and Accelerated Hyperparameter Search.
    Takashi Takemoto, Normann Mertig, Masato Hayashi, Saki Susa-Tanaka, Hiroshi Teramoto, Atsuyoshi Nakamura, Ichigaku Takigawa, Shin-ichi Minato, Tamiki Komatsuzaki, Masanao Yamaoka
    2018 International Conference on ReConFigurable Computing and FPGAs, ReConFig 2018, Cancun, Mexico, December 3-5, 2018, 1, 8, IEEE, 2018, [Peer-reviewed]
    International conference proceedings
  • Graph Minors from Simulated Annealing for Annealing Machines with Sparse Connectivity.
    Yuya Sugie, Array,Array,Array, Hiroshi Teramoto, Array,Array, Shin-ichi Minato, Masanao Yamaoka,Array
    Theory and Practice of Natural Computing - 7th International Conference, TPNC 2018, Dublin, Ireland, December 12-14, 2018, Proceedings, 111, 123, Springer, 2018, [Peer-reviewed]
    International conference proceedings
  • KL-UCB-based policy for budgeted multi-armed bandits with stochastic action costs
    Ryo Watanabe, Junpei Komiyama, Atsuyoshi Nakamura, Mineichi Kudo
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E100A, 11, 2470, 2486, Institute of Electronics, Information and Communication, Engineers, IEICE, 01 Nov. 2017, [Peer-reviewed]
    English, International conference proceedings, We study the budgeted multi-armed bandit problem with stochastic action costs. In this problem, a player not only receives a reward but also pays a cost for an action of his/her choice. The goal of the player is to maximize the cumulative reward he/she receives before the total cost exceeds the budget. In the classical multi-armed bandit problem, a policy called KL-UCB is known to perform well. We propose KL-UCB-SC, an extension of this policy for the budgeted bandit problem. We prove that KL-UCB-SC is asymptotically optimal for the case of Bernoulli costs and rewards. To the best of our knowledge, this is the first result that shows asymptotic optimality in the study of the budgeted bandit problem. In fact, our regret upper bound is at least four times better than that of BTS, the best known upper bound for the budgeted bandit problem. Moreover, an empirical simulation we conducted shows that the performance of a tuned variant of KL-UCB-SC is comparable to that of state-of-the-art policies such as PD-BwK and BTS.
  • An Efficient Approximate Algorithm for the 1-Median Problem on a Graph
    Koji Tabata, Atsuyoshi Nakamura, Mineichi Kudo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E100D, 5, 994, 1002, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, May 2017, [Peer-reviewed]
    English, Scientific journal, We propose a heuristic approximation algorithm for the 1-median problem. The 1-median problem is the problem of finding a vertex with the highest closeness centrality. Starting from a randomly selected vertex, our algorithm repeats to find a vertex with higher closeness centrality by approximately calculating closeness centrality of each vertex using simpler spanning subgraphs, which are called k-neighbor dense shortest path graphs with shortcuts. According to our experimental results using real networks with more than 10,000 vertices, our algorithm is more than 100 times faster than the exhaustive search and more than 20 times faster than the state-of-the-art approximation algorithm using annotated information to the vertices while the solutions output by our algorithm have higher approximation ratio.
  • On Practical Accuracy of Edit Distance Approximation Algorithms.
    Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi Nakamura
    CoRR, abs/1701.06134, 2017
    Scientific journal
  • Mining approximate patterns with frequent locally optimal occurrences
    Atsuyoshi Nakamura, Ichigaku Takigawa, Hisashi Tosaka, Mineichi Kudo, Hiroshi Mamitsuka
    DISCRETE APPLIED MATHEMATICS, 200, 123, 152, ELSEVIER SCIENCE BV, Feb. 2016, [Peer-reviewed], [Lead author]
    English, Scientific journal, We consider a frequent approximate pattern mining problem, in which interspersed repetitive regions are extracted from a given string. That is, we enumerate substrings that frequently match substrings of a given string locally and optimally. For this problem, we propose a new algorithm, in which candidate patterns are generated without duplication using the suffix tree of a given string. We further define a k-gap-constrained setting, in which the number of gaps in the alignment between a pattern and an occurrence is limited to at most k. Under this setting, we present memory-efficient algorithms, particularly a candidate-based version, which runs fast enough even over human chromosome sequences with, more than 10 million nucleotides. We note that our problem and algorithms for strings can be directly extended to ordered labeled trees. In our experiments we used both randomly synthesized strings, in which corrupted similar substrings are embedded, and real data of human chromosome. The synthetic data experiments show that our proposed approach extracted embedded patterns correctly and time-efficiently. In real data experiments, we examined the centers of 100 clusters computed after grouping the patterns obtained by our k-gap-constrained versions (k = 0, 1 and 2) and the results revealed that the regions of their occurrences coincided with around a half of the regions automatically annotated as Alu sequences by a manually curated repeat sequence database. (C) 2015 Elsevier B.V. All rights reserved.
  • Sitting Posture Diagnosis Using a Pressure Sensor Mat
    Shunsuke Suzuki, Mineichi Kudo, Atsuyoshi Nakamura
    2016 IEEE INTERNATIONAL CONFERENCE ON IDENTITY, SECURITY AND BEHAVIOR ANALYSIS (ISBA), 1, 6, IEEE, 2016, [Peer-reviewed]
    English, International conference proceedings, It is well known that taking wrong sitting posture all day long is harmful for health. However, quantifying the degree of collapse of posture is not so easy. Typically, checking the video of sitting state or examining the spinal curves in radiograph is made so far, but it needs a clinical specialist for diagnosis. In this paper, for attaining this goal more easily, we give a device usable in daily life. We measure a time series of the pressure distributions obtained from a mat with 16x16 pressure sensors put on a chair. More concretely we analyze the sitting person's move direction and the amount, and make a daily diagnosis report aiming at helping his/her own notice. Through the experiments using two subjects who spent around three hours at the chair, a possibility of automatic and daily-basis diagnosis of sitting posture is demonstrated.
  • Noise Free Multi-armed Bandit Game
    Atsuyoshi Nakamura, David P. Helmbold, Manfred K. Warmuth
    LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, LATA 2016, 9618, 412, 423, SPRINGER-VERLAG BERLIN, 2016, [Peer-reviewed], [Lead author]
    English, International conference proceedings, We study the loss version of adversarial multi-armed bandit problems with one lossless arm. We show an adversary's strategy that forces any player to suffer K - 1 - O(1/T) loss where K is the number of arms and T is the number of rounds.
  • An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem
    Ryo Watanabe, Atsuyoshi Nakamura, Mineichi Kudo
    OPERATIONS RESEARCH LETTERS, 43, 6, 558, 563, ELSEVIER SCIENCE BV, Nov. 2015, [Peer-reviewed], [Corresponding author]
    English, Scientific journal, We improved an upper bound on the expected regret of a UCB-type policy LLR for a bandit problem that repeats the following rounds: a player selects a maximal matching on a complete bipartite graph K-M,K-N and receives a reward for each component edge of the selected matching. Rewards are assumed to be generated independently of its previous rewards according to an unknown fixed distribution. Our upper bound is smaller than the best known result (Chen et al., 2013) by a factor of Theta (M-2/3). (C) 2015 Elsevier B.V. All rights reserved.
  • An Algorithm for Influence Maximization in a Two-Terminal Series Parallel Graph and its Application to a Real Network
    Koji Tabata, Atsuyoshi Nakamura, Mineichi Kudo
    DISCOVERY SCIENCE, DS 2015, 9356, 275, 283, SPRINGER-VERLAG BERLIN, 2015, [Peer-reviewed]
    English, International conference proceedings, We developed an algorithm to exactly solve an influence maximization problem (MaxInf) for a two-terminal series parallel graph (TTSPG) in the independent cascade model. The class of TTSPGs can be considered as a class wider than that of trees, only for which an efficient exact solver of this problem has been developed so far. Our algorithm calculates candidate node sets in the divide-and-conquer manner keeping the number of them as small as possible by efficiently eliminating unnecessary ones in merge of subproblems' solutions. Furthermore, we propose a way of converting an arbitrary network to a TTSPG with edges important for propagation to apply our method to real networks. According to our empirical results, our method is significantly faster than the greedy approximation algorithm for MaxInf of a TTSPG. We also demonstrate improvement of solutions by converting to TTSPGs instead of trees using real networks made from DBLP datasets.
  • Diversity Measures and Margin Criteria in Multi-class Majority Vote Ensemble
    Ayako Mikami, Mineichi Kudo, Atsuyoshi Nakamura
    MULTIPLE CLASSIFIER SYSTEMS (MCS 2015), 9132, 27, 37, SPRINGER-VERLAG BERLIN, 2015, [Peer-reviewed]
    English, International conference proceedings, Ensemble learning is a strong tool to strengthen weak classifiers. A large amount of diversity among those weak classifiers is a key to accelerate the effectiveness. Therefore, many diversity measures on a given training sample set have been proposed so far. However, they are almost all based on the oracle output that is one if the class predicted by the classifier is correct, zero otherwise. We point out such an oracle output scheme is not appropriate for the problems of more than two classes, and extend one of the most popular diversity measures, disagreement measure, to multi-class cases. On the other hand, the concept of margin has been recognized as an analytic tool to measure the generalization performance of a given classifier. Therefore, we analyze when some criteria for maximizing margins of an ensemble classifier over training samples are maximized under the assumption that the average accuracy of the base classifiers is constant. We also reveal the relationship between those criteria and the extended disagreement measure. As a result, it turns out that diversity is necessary not only over samples but also over predicted classes, if we want to extract the highest potential of ensemble.
  • Average-case linear-time similar substring searching by the q-gram distance
    Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi Nakamura
    THEORETICAL COMPUTER SCIENCE, 530, 23, 41, ELSEVIER SCIENCE BV, Apr. 2014, [Peer-reviewed]
    English, Scientific journal, In this paper we consider the problem of similar substring searching in the q-gram distance. The q-gram distance d(q)(x, y) is a similarity measure between two strings x and y defined by the number of different q-grams between them. The distance can be used instead of the edit distance due to its lower computation cost, O(|x| + |Y|) vs. O(|x||Y|). and its good approximation for the edit distance. However, if this distance is applied to the problem of finding all similar strings, in a long text t, to a given pattern p, the total computation cost is sometimes not acceptable. Ukkonen already proposed two fast algorithms: one with an array and the other with a tree. When "similar" means k or less in dq, their time complexities are O(|t|k + |P|) and O(|t| log k + |p|). respectively. In this paper, we propose two algorithms of average-case complexity O(|t| + |p|). although their worst-case complexities are still O(|t|k + |P|) and O(|t| log k + |p|). respectively. The linearity of the average-case complexity is analyzed under the assumption of random sampling of t and the condition that q is larger than a threshold. The algorithms exploit the fact that similar substrings in t are often found at very close positions if the beginning positions of the substrings are close. In the second proposed algorithm, we adopted a doubly-linked list supported by an array and a search tree to search for a list element in O(log k) time. Experimental results support their theoretical average-case complexities. (C) 2014 Elsevier B.V. All rights reserved.
  • An efficient construction and application usefulness of rectangle greedy covers
    Koji Ouchi, Atsuyoshi Nakamura, Mineichi Kudo
    PATTERN RECOGNITION, 47, 3, 1459, 1468, ELSEVIER SCI LTD, Mar. 2014, [Peer-reviewed], [Corresponding author]
    English, Scientific journal, We develop efficient construction methods of a rectangle greedy cover (RGC), and evaluate its usefulness in applications. An RGC is a greedy cover of the set of given positive instances by exclusive axis-parallel hyperrectangles, namely, axis-parallel hyperrectangles that exclude all the given negative instances. An RGC is expected to be a compact classification rule with high readability because the number of its component rectangles is expected to be small and it can be seen as a disjunctive normal form, which is one of the most readable representations for us.
    We propose two approaches of RGC construction: enumeration approach and direct approach. In enumeration approach, the maximal exclusive positive subsets (MEPSs) are enumerated first and then an ordinary greedy set covering is done using the enumerated MEPSs. We make clear the relation between enumeration of the maximal frequent itemsets and enumeration of the MEPSs, and convert an efficient enumeration algorithm LCMmax [1] of maximal frequent itemsets to an enumeration algorithm LCMmax.R-naive of MEPSs. We also develop a more efficient version of LCMmax.R-naive, or LCMmax.R, by incorporating effective dynamic reordering of instances using excluded frequency and bit-parallel exclusiveness check. In direct approach, each component MEPS of an RGC is searched not from enumerated MEPSs but directly from the dataset that consists of the remaining uncovered positive instances and the whole negative instances. We developed an algorithm called MRF that efficiently finds an maximum-sized MEPS for given positive and negative instances. MRF is made from LCMmax.R by modifying it so as to find a maximum-sized MEPS only. An RGC is constructed by MRF repetition, that is, by repeatedly executing MRF using the remaining uncovered positive instances.
    According to our experimental evaluation using UCI-repository datasets, LCMmax.R was about 5-11 times faster than LCMmax.Rnaive, which indicates effectiveness of the introduced two improvements. MRF repetition, however, was significantly faster than LCMmax.R, and it was fast enough for practical usage. The experimental results using UCI-repository datasets also showed that accuracy of a nearest rectangle classifier using an RGC is close to that using the hyperrectangles output by the randomized subclass method (RSM) [2] though the number of component rectangles of an RGC is significantly smaller than the number of the hyperrectangles output by RSM. The performance of RGC was also shown to be comparable to that of the six popular classifiers including logistic regression and support vector machine. The disjunctive normal form representation of the classification rules obtained by RGC was demonstrated to be simpler and more readable for us than that obtained by RSM and C4.5. (C) 2013 Elsevier Ltd. All rights reserved.
  • A UCB-Like Strategy of Collaborative Filtering.
    Atsuyoshi Nakamura
    Proceedings of the Sixth Asian Conference on Machine Learning, ACML 2014, Nha Trang City, Vietnam, November 26-28, 2014., JMLR.org, 2014, [Peer-reviewed], [Lead author]
  • Fast algorithms for finding a minimum repetition representation of strings and trees
    Atsuyoshi Nakamura, Tomoya Saito, Ichigaku Takigawa, Mineichi Kudo, Hiroshi Mamitsuka
    Discrete Applied Mathematics, 161, 10-11, 1556, 1575, Jul. 2013, [Peer-reviewed], [Lead author]
    English, Scientific journal, A string with many repetitions can be represented compactly by replacing h-fold contiguous repetitions of a string r with (r)h. We present a compact representation, which we call a repetition representation (of a string) or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined by the sum of the length of component letters and the description length of the component repetitions (ṡ)h which is defined by ER (h) using a repetition weight function wR. We develop two dynamic programming-based algorithms to solve this problem: CMR, which works for any type of wR, and CMR-C, which is faster but can be applied to a constant wR only. CMR-C is an O(n2logn)-time O(nlogn)-space algorithm, which is more efficient in both time and space than CMR by a ((logn)/n)-factor, where n is the length of the given string. The problem of finding an MRRS for a string can be extended to that of finding a minimum repetition representation (of a tree) or MRRT for a given labeled ordered tree. For this problem, we present two algorithms, CMRT and CMRT-C, by using CMR and CMR-C, respectively, as a subroutine. As well as the theoretical analysis, we confirmed the efficiency of the proposed algorithms by experiments, which consist of the following three parts: First we demonstrated that CMR-C and CMRT-C are fast enough for large-scale data by using synthetic strings and trees, respectively. The size of an MRRS for a given string can be a measure of how compactly the string can be represented, meaning how well the string is structurally organized. This is also true of trees. To check such ability of MRRS-size, second we measured the size of an MRRS for chromosomes of nine different species. We found that all the chromosomes of the same species have a similar compression rate when realized by an MRRS. Run length encoding (RLE) was also shown to have species-specific compression rate, but species were separated more clearly by MRRS than by RLE. Third we examined the size of an MRRT for web pages of world-leading companies by using the tag trees, showing a consistency between the compression rate by an MRRT and visual web page structures. © © 2013 Elsevier B.V. All rights reserved.
  • Fast approximation algorithm for the 1-median problem
    Koji Tabata, Atsuyoshi Nakamura, Mineichi Kudo
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7569, 169, 183, Springer, 2012, [Peer-reviewed]
    English, International conference proceedings, We present a fast approximation algorithm for the 1-median problem. Our algorithm can be applied to metric undirected graphs with node weight. Given a node v, our algorithm repeatedly finds a better node by making use of a shortest path tree of the previous node. We empirically show that our algorithm runs much faster and has better approximation ratio than a sophisticated existing method called DTZ. We demonstrate the effectiveness of our algorithm through experiments. © 2012 Springer-Verlag Berlin Heidelberg.
  • Frequent approximate substring pattern mining using locally optimal occurrence counting
    Atsuyoshi Nakamura, Hisashi Tosaka, Mineichi Kudo
    Proceedings of the 2012 IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012, 54, 59, 2012, [Peer-reviewed], [Lead author]
    English, International conference proceedings, We propose a novel frequent approximate pattern mining in which occurrences themselves are valuable regions to extract. Given a string , our mining task is to enumerate its substrings that locally optimally match many substrings of . We show an algorithm for this problem whose time and space complexities are (3) for a string with length . According to our experiments using synthetic data, our algorithm can correctly extract embedded frequent patterns as both patterns and their occurrences even if the embedded patterns are corrupted by random edit operations. © 2012 IEEE.
  • Quasi-Linear-Time Substring Searching by q-gram Distance
    Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi Nakamura
    2012 6TH INTERNATIONAL CONFERENCE ON NEW TRENDS IN INFORMATION SCIENCE, SERVICE SCIENCE AND DATA MINING (ISSDM2012), 544, 549, IEEE, 2012, [Peer-reviewed]
    English, International conference proceedings, The q-gram distance d(q)(x, y) between two strings x and y is a string similarity measure correlated with a famous string distance: the edit distance. In addition, it can be computed much faster, in linear (O(vertical bar x vertical bar + vertical bar y vertical bar) time, than the edit distance in quadratic (O(vertical bar x parallel to y vertical bar) time, where I. I denotes the stmg length. However, it does not mean that we can find all substrmgs of a text t similar to a pattern p in linear time. In this paper we will propose a searching algorithm achieving quasi-linear (O(vertical bar t vertical bar log vertical bar p vertical bar + vertical bar p vertical bar) time by the q-gram distance.
  • Construction of convex hull classifiers in high dimensions
    Tetsuji Takahashi, Mineichi Kudo, Atsuyoshi Nakamura
    PATTERN RECOGNITION LETTERS, 32, 16, 2224, 2230, ELSEVIER SCIENCE BV, Dec. 2011, [Peer-reviewed]
    English, Scientific journal, We propose an algorithm to approximate each class region by a small number of approximated convex hulls and to use these for classification. The classifier is one of non-kernel maximum margin classifiers. It keeps the maximum margin in the original feature space, unlike support vector machines with a kernel. The construction of an exact convex hull requires an exponential time in dimension, so we find an approximate convex hull (a polyhedron) instead, which is constructed in linear time in dimension. We also propose a model selection procedure to control the number of faces of convex hulls for avoiding over-fitting, in which a fast procedure is adopted to calculate an upper-bound of the leave-one-out error. In comparison with support vector machines, the proposed approach is shown to be comparable in performance but more natural in the extension to multi-class problems. (C) 2011 Elsevier B.V. All rights reserved.
  • On the possible patterns of inputs for block sorting in the Burrows-Wheeler transformation
    Takashi Saso, Kojiro Kobayashi, Atsuyoshi Nakamura
    INFORMATION PROCESSING LETTERS, 111, 12, 595, 599, ELSEVIER SCIENCE BV, Jun. 2011, [Peer-reviewed]
    English, Scientific journal, Block sorting in the Burrows-Wheeler transformation is to sort all of the n circular shifts of a string of length n lexicographically. We introduce a notion called the width of a sequence of n strings of length n and show that the values of widths are very different between the two types of sequences of strings; (1) a sequence of n randomly generated strings of length n, and (2) the sequence of n circular shifts of a randomly generated string of length n. (C) 2011 Elsevier B.V. All rights reserved.
  • Packing Alignment: Alignment for Sequences of Various Length Events
    Atsuyoshi Nakamura, Mineichi Kudo
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, 6635, 234, 245, SPRINGER-VERLAG BERLIN, 2011, [Peer-reviewed], [Lead author]
    English, International conference proceedings, We study an alignment called a packing alignment that is an alignment for sequences of various length events like musical notes. One event in a packing alignment can have a number of consecutive opposing events unless the total length of them exceeds the length of that one event. Instead of using a score function that depends on event length, which was studied by Mongeau and Sankoff [5], packing alignment deals with event lengths explicitly using a simple score function. This makes the problem clearer as an optimization problem. Packing alignment can be calculated efficiently using dynamic programming. As an application of packing alignment, we conducted experiments on frequent approximate pattern extraction from MIDI files of famous musical variations. The patterns and occurrences extracted from the variations using packing alignment have more appropriate boundaries than those using conventional string alignments from the viewpoints of the repetition structure of the variations.
  • Efficient construction and usefulness of hyper-rectangle greedy covers
    Koji Ouchi, Atsuyoshi Nakamura, Mineichi Kudo
    Proceedings - 2011 IEEE International Conference on Granular Computing, GrC 2011, 533, 538, IEEE, 2011, [Peer-reviewed]
    English, International conference proceedings, We discuss efficient construction and usefulness of greedy covers of positive instances by axis-parallel rectangles that exclude negative instances. A rectangle greedy cover is expected to be a simple classification rule with high readability because the number of its component rectangles is expected to be small and it can be seen as a disjunctive normal form, which is one of the most readable representations. We develop efficient construction methods of rectangle greedy covers by making use of algorithms for efficient maximal frequent itemsets. We empirically demonstrate that, for high dimensional datasets, the method of finding a component rectangle one by one without enumerating candidate covering component sets is faster than the method with independent greedy covering process after the enumeration. In our classification experiments using 10 datasets in UCI repository, rectangle greedy covers (RGC) were shown to have classification performance comparable to the randomized subclass method (RSM) developed in [1], which is a conventional classification method using rectangles, though RGC used significantly smaller number of rectangles. The performance of RGC was also shown to be comparable to that of popular classifiers such as logistic regression and SVM. The DNF-representation of the classification rules obtained by RGC was demonstrated to be simpler than that obtained by RSM and C4.5 through our experiment. © 2011 IEEE.
  • A practical comparison of edit distance approximation algorithms
    Hiroyuki Hanada, Atsuyoshi Nakamura, Mineichi Kudo
    Proceedings - 2011 IEEE International Conference on Granular Computing, GrC 2011, 231, 236, IEEE, 2011, [Peer-reviewed]
    English, International conference proceedings, The edit distance is a basic string similarity measure for many applications such as string searching, text mining, signal processing, bioinformatics and so on. However, its high computational cost often prevents it from being used for a large set of strings like similar string searching. A promising solution for the problem is to approximate the edit distance with low computational cost. However, although there are many methods for approximating the edit distance, most of them are analyzed only theoretically. In fact, most of the methods can evaluate the edit distance only in terms of order notations, and do not conduct any experiment. This is a large obstacle for applying them to real applications. In this study we will first list up existing edit distance approxi-mation methods. Then we compare them by: (i) approximation performances shown by the original authors, (ii) approximation performances re-analyzed by us (concrete values instead of the order notations) and (iii) experimental performances by our implementations. © 2011 IEEE.
  • Special section on data mining and statistical science
    Masashi Sugiyama, Tomoyuki Higuchi, Tsuyoshi Ide, Akihiro Inokuchi, Toshihiro Kamishima, Hiroyuki Minami, Shinichi Nakajima, Atsuyoshi Nakamura, Koichi Shinoda, Koji Tsuda, Takashi Washio
    IEICE Transactions on Information and Systems, E93-D, 10, 2671, Oct. 2010, [Peer-reviewed]
    Scientific journal
  • Algorithms for Finding a Minimum Repetition Representation of a String
    Atsuyoshi Nakamura, Tomoya Saito, Ichigaku Takigawa, Hiroshi Mamitsuka, Mineichi Kudo
    STRING PROCESSING AND INFORMATION RETRIEVAL, 6393, 185, +, SPRINGER-VERLAG BERLIN, 2010, [Peer-reviewed], [Lead author]
    English, International conference proceedings, A string with many repetitions can be written compactly by replacing h-fold contiguous repetitions of substring r with (r)h. We refer to such a compact representation as a repetition representation string or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined to be the sum of its component letter sizes and the sizes needed to describe the repetitions (.)h which are defined as w(R)(h) using a repetition weight function w(R). We develop two dynamic programming algorithms to solve the problem. One is CMR that works for any repetition weight function, and the other is CMR-C that is faster but can be applied only when the repetition weight function is constant. CMR-C is an O(w(n + z))-time algorithm using O(n + z) space for a given string with length n, where w and z are the number of distinct primitive tandem repeats and the number of their occurrences, respectively. Since w = O(n) and z = O(n log n) in the worst case, CMR-C is an O(n(2) log n)-time O(n log n)-space algorithm, which is faster than OMR by ((log n)/n)-factor.
  • Algorithms for Adversarial Bandit Problems with Multiple Plays
    Taishi Uchiya, Atsuyoshi Nakamura, Mineichi Kudo
    ALGORITHMIC LEARNING THEORY, ALT 2010, 6331, 375, 389, SPRINGER-VERLAG BERLIN, 2010, [Peer-reviewed], [Corresponding author]
    English, International conference proceedings, Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k(>= 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms (Exp3, ComBand [5]) in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3, called Exp3. M, performs best with respect to all the measures: it runs in O(K (log k + 1) time and O(K) space, and suffers at most O(root kTK log (K/k)) regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3. M is an extension of that proved by Auer et al. for Exp3.
  • Convex sets as prototypes for classifying patterns
    Ichigaku Takigawa, Mineichi Kudo, Atsuyoshi Nakamura
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 22, 1, 101, 108, PERGAMON-ELSEVIER SCIENCE LTD, Feb. 2009, [Peer-reviewed]
    English, Scientific journal, We propose a general framework for nonparametric classification of multi-dimensional numerical patterns. Given training points for each class, it builds a set cover with convex sets each of which contains some training points of the class but no points of the other classes. Each convex set has thus an associated class label, and classification of a query point is made to the class of the convex set such that the projection of the query point onto its boundary is minimal. In this sense, the convex sets of a class are regarded as "prototypes" for that class. We then apply this framework to two special types of convex sets, minimum enclosing balls and convex hulls, giving algorithms for constructing a set cover with them and for computing the projection length onto their boundaries. For convex hulls, we also give a method for implicitly evaluating whether a point is contained in a convex hull, which can avoid computational difficulty for explicit construction of convex hulls in high-dimensional space. (C) 2008 Elsevier Ltd. All rights reserved.
  • Comparison of Bagging and Boosting Algorithms on Sample and Feature Weighting
    Satoshi Shirai, Mineichi Kudo, Atsuyoshi Nakamura
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 5519, 22, 31, SPRINGER-VERLAG BERLIN, 2009, [Peer-reviewed]
    English, International conference proceedings, We compared boosting with bagging in different strengths of learning algorithms for improving the performance of the set of classifiers to be fused. Our experimental results showed that boosting worked much with weak algorithms and bagging, especially feature-based bagging, worked much with strong algorithms. On the basis of these observations we developed a mixed fusion method in which randomly chosen features are used with a standard boosting method. As a result, it was confirmed that the proposed fusion method worked well regardless of learning algorithms.
  • Classifier Selection in a Family of Polyhedron Classifiers
    Tetsuji Takahashi, Mineichi Kudo, Atsuyoshi Nakamura
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS, 5856, 441, 448, SPRINGER-VERLAG BERLIN, 2009, [Peer-reviewed]
    English, International conference proceedings, We consider an algorithm to approximate each class region by a small number of convex hulls and to apply them to classification. The convex hull of a finite set of points is computationally hard to be constructed in high dimensionality. Therefore, instead of the exact convex hull, we find an approximate convex hull (a polyhedron) in a time complexity that is linear in dimension. On the other hand, the set of such convex hulls is often too much complicated for classification. Thus we control the complexity by adjusting the number of faces of convex hulls. For reducing the computational time, we use an upper bound of the leave-one-out estimated error to evaluate the classifiers.
  • Bagging, Random Subspace Method and Biding
    Satoshi Shirai, Mineichi Kudo, Atsuyoshi Nakamura
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 5342, 801, 810, SPRINGER-VERLAG BERLIN, 2008, [Peer-reviewed]
    English, International conference proceedings, In recent years, many approaches for achieving high performance by combining some classifiers have been proposed. We exploit, many random replicates of samples in the bagging, and randomly chosen feature subsets in the random subspace method. In this paper, we introduce a method for selecting both samples and features at, the same, time and demonstrate the effectiveness of the method. This method includes a parametric bagging and a parametric random subspace method as special cases. lit some experiments, this method and the parametric random subspace method showed the best performance.
  • Classification by Bagged Consistent Itemset Rules
    Yohji Shidara, Mineichi Kudo, Atsuyoshi Nakamura
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 571, 574, IEEE, 2008, [Peer-reviewed]
    English, International conference proceedings, Associative classifiers that utilize association rules have been widely studied. It has been shown that associative classifiers often outperform traditional classifiers. Associative classifiers usually find only rules with high support values, because reducing the minimum support to be satisfied increases computational cost. However, rules with low support but high confidence may contribute to classification. We have proposed an approach to build a classifier composed of almost all consistent (100% confident) rules. The proposed classifier was extended by introducing item reduction and bagging in order to relax the constraint of consistency, which resulted in slightly increased performance for 26 datasets from the UCI machine learning repository.
  • Classification by Reflective Convex Hulls
    Mineichi Kudo, Atsuyoshi Nakamura, Ichigaku Takigawa
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2260, 2263, IEEE, 2008, [Peer-reviewed]
    English, International conference proceedings, A set of convex bodies including samples of a single class only is used for classification. The convex body is defined by some facets (hyper-planes) that separate the class front the other classes. This paper describes ail algorithm to find a set of such convex bodies efficiently and examine the performance of a classifier using them. The relationship to the support vector machines is also discussed.
  • What Sperner Family Concept Class Is Easy to Be Enumerated?
    Atsuyoshi Nakamura, Mineichi Kudo
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 482, 491, IEEE COMPUTER SOC, 2008, [Peer-reviewed], [Lead author]
    English, International conference proceedings, We study the problem of enumerating concepts in a Sperner family concept class using subconcept queries [7], which is a general problem including maximal frequent itemset mining as its instance. Though even the theoretically best known algorithm needs quasi-polynomial time to solve this problem in the worst case, there exist practically fast algorithms for this problem. This is because many instances of this problem in real world have low complexity in some measures. In this paper, we characterize the complexity of Sperner family concept class by the VC dimension of its intersection closure and its characteristic dimension, and analyze the worst case time complexity on the enumeration problem of its concepts in terms of the VC dimension. We also showed that the VC dimension of real data used in data mining is actually small by calculating the VC dimension of some real datasets using a new algorithm closely related to the introduced two measures, which does not only solve the problem but also let us know the VC dimension of the intersection closure of the target concept class.
  • Classification Based on Consistent Itemset Rules.
    Yohji Shidara, Mineichi Kudo, Atsuyoshi Nakamura
    Trans. MLDM, 1, 1, 17, 30, 2008, [Peer-reviewed]
  • CCIC: Consistent common itemsets classifier
    Yohji Shidara, Atsuyoshi Nakamura, Mineichi Kudo
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 4571, 490, +, SPRINGER-VERLAG BERLIN, 2007, [Peer-reviewed]
    English, International conference proceedings, We propose a novel approach which extracts consistent (100% confident) rules and builds a classifier with them. Recently, associative classifiers which utilize association rules have been widely studied. Indeed, the associative classifiers often outperform the traditional classifiers. In this case, it is important to collect high quality (association) rules. Many algorithms find only high support rules, because decreasing the minimum support to be satisfied is computationally demanding. However, it may be effective to collect low support but high confidence rules. Therefore, we propose an algorithm that produces a wide variety of 100% confident rules including low support rules. To achieve this goal, we adopt a specific-to-general rule searching strategy, in contrast to the previous many approaches. Our experimental results show that the proposed method achieves higher accuracies in several datasets taken from UCI machine learning repository.
  • Mining subtrees with frequent occurrence of similar subtrees
    Hisashi Tosaka, Atsuyoshi Nakamura, Mineichi Kudo
    DISCOVERY SCIENCE, PROCEEDINGS, 4755, 286, +, SPRINGER-VERLAG BERLIN, 2007, [Peer-reviewed]
    English, International conference proceedings, We study a novel problem of mining subtrees with frequent occurrence of similar subtrees, and propose an algorithm for this problem. In our problem setting, frequency of a subtree is counted not only for equivalent subtrees but also for similar subtrees. According to our experiment using tag trees of web pages, this problem can be solved fast enough for practical use. An encouraging result was obtained in a preliminary experiment for data record extraction from web pages using our mining method.
  • Specific-purpose web searches on the basis of structure and contents
    M Kudo, A Nakamura
    FEDERATION OVER THE WEB, 3847, 79, 96, SPRINGER-VERLAG BERLIN, 2006, [Peer-reviewed], [Corresponding author]
    English, Scientific journal, We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.
  • Learning-related complexity of linear ranking functions
    Atsuyoshi Nakamura
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 4264, 378, 392, SPRINGER-VERLAG BERLIN, 2006, [Peer-reviewed], [Lead author]
    English, Scientific journal, In this paper, we study learning-related complexity of linear ranking functions from n-dimensional Euclidean space to {1, 2,..., k}. We show that their graph dimension, a kind of measure for PAC learning complexity in the multiclass classification setting, is Theta(n + k). This graph dimension is significantly smaller than the graph dimension Omega(nk) of the class of {1, 2,..., k}-valued decision-list functions naturally defined using k - 1 linear discrimination functions. We also show a risk bound of learning linear ranking functions in the ordinal regression setting by a technique similar to that used in the proof of an upper bound of their graph dimension.
  • Inner product spaces for Bayesian networks
    A Nakamura, M Schmitt, N Schmitt, HU Simon
    JOURNAL OF MACHINE LEARNING RESEARCH, 6, 1383, 1403, MICROTOME PUBLISHING, Sep. 2005, [Peer-reviewed]
    English, Scientific journal, Bayesian networks have become one of the major models used for statistical inference. We study the question whether the decisions computed by a Bayesian network can be represented within a low-dimensional inner product space. We focus on two-label classification tasks over the Boolean domain. As main results we establish upper and lower bounds on the dimension of the inner product space for Bayesian networks with an explicitly given (full or reduced) parameter collection. In particular, these bounds are tight up to a factor of 2. For some nontrivial cases of Bayesian networks we even determine the exact values of this dimension. We further consider logistic autoregressive Bayesian networks and show that every sufficiently expressive inner product space must have dimension at least Omega(n(2)), where n is the number of network nodes. We also derive the bound 2(Omega(n)) for an artificial variant of this network, thereby demonstrating the limits of our approach and raising an interesting open question. As a major technical contribution, this work reveals combinatorial and algebraic structures within Bayesian networks such that known methods for the derivation of lower bounds on the dimension of inner product spaces can be brought into play.
  • An efficient query learning algorithm for ordered binary decision diagrams
    A Nakamura
    INFORMATION AND COMPUTATION, 201, 2, 178, 198, ACADEMIC PRESS INC ELSEVIER SCIENCE, Sep. 2005, [Peer-reviewed], [Lead author]
    English, Scientific journal, In this paper, we propose a new algorithm that exactly learns ordered binary decision diagrams (OBDDs) with a given variable ordering via equivalence and membership queries. Our algorithm uses at most it equivalence queries and at most 2n([log(2) m] + 3n) membership queries. where it is the number of nodes in the target-reduced OBDD and m is the number of variables. The upper bound on the number of membership queries is smaller by a factor of O(m) compared with that for the previous best known algorithm proposed by R. Gavalda and D. Guijarro [Learning Ordered Binary Decision Diagrams, Proceedings of the 6th International Workshop on Algorithmic Learning Theory, 1995, pp. 228 238]. (c) 2005 Elsevier Inc. All rights reserved.
  • Extraction of generalized rules with automated attribute abstraction
    Y Shidara, M Kudo, A Nakamura
    Foundations of Data Mining and Knowledge Discovery, 6, 161, 170, SPRINGER, 2005, [Peer-reviewed]
    English, International conference proceedings, We propose a novel method for mining generalized rules with high support and confidence. Using our method, we can obtain generalized rules in which the abstraction of attribute values is implicitly carried out without the requirement of additional information such as information on conceptual hierarchies. Our experimental results showed that the obtained rules not only have high support and confidence but also have expressions that are conceptually meaningful.
  • Partitioning of Web graphs by community topology.
    Hidehiko Ino, Mineichi Kudo, Atsuyoshi Nakamura
    Proceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10-14, 2005, 661, 669, ACM, 2005, [Peer-reviewed], [Corresponding author]
  • Mining frequent trees with node-inclusion constraints
    A Nakamura, M Kudo
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 3518, 850, 860, SPRINGER-VERLAG BERLIN, 2005, [Peer-reviewed], [Lead author]
    English, Scientific journal, In this paper, we propose an efficient algorithm enumerating all frequent subtrees containing all special nodes that are guaranteed to be included in all trees belonging to a given data. Our algorithm is a modification of TreeMiner algorithm [10] so as to efficiently generate only candidate subtrees satisfying our constraints. We report mining results obtained by applying our algorithm to the problem of finding frequent structures containing the name and reputation of given restaurants in Web pages collected by a search engine.
  • The convex subclass method: Combinatorial classifier based on a family of convex sets
    Takigawa, I, M Kudo, A Nakamura
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINDS, 3587, 90, 99, SPRINGER-VERLAG BERLIN, 2005, [Peer-reviewed]
    English, Scientific journal, We propose a new nonparametric classification framework for numerical patterns, which can also be exploitable for exploratory data analysis. The key idea is approximating each class region by a family of convex geometric sets which can cover samples of the target class without containing any samples of other classes. According to this framework, we consider a combinatorial classifier based on a family of spheres, each of which is the minimum covering sphere for a subset of positive samples and does not contain any negative samples. We also present a polynomial-time exact algorithm and an incremental randomized algorithm to compute it. In addition, we discuss the soft-classification version and evaluate these algorithms by some numerical experiments.
  • Empirical study on usefulness of algorithm SACwRApper for reputation extraction from the WWW
    H Hasegawa, M Kudo, A Nakamura
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 4, PROCEEDINGS, 3684, 668, 674, SPRINGER-VERLAG BERLIN, 2005, [Peer-reviewed]
    English, Scientific journal, We consider the problem of extracting texts related to a given keyword from Web pages collected by a search engine. Recently, we proposed a method using both structural and content information[1, 2]. In our previous paper, we reported good extraction performance of our method only for Ramen-shop dataset written in Japanese. In this paper, we examine it for datasets of other kind of restaurants, and also for a dataset written in English. We discuss some modification for performance improvement.
  • A comparative study of algorithms for finding Web communities
    Hidehiko Ino, Mineichi Kudo, Atsuyoshi Nakamura
    Proceedings - International Workshop on Biomedical Data Engineering, BMDE2005, 2005, 1257, 2005, [Peer-reviewed]
    English, International conference proceedings, Recently, researches on extraction of densely connected subgraphs, which are called communities, from the graph representing link structure in WWW, are very popular. However, few methods guarantee that extracted subgraphs satisfy community conditions which are strictly defined. In this paper, we consider the problem of extracting subgraphs that strictly satisfy the community conditions proposed in [3]. It is known that finding all such communities is computationally hard. As methods that possibly find many communities efficiently, we experimentally compared two methods, a method with a Gomory-Hu tree construction and a method with calculating edge-betweenness. We also proposed evaluation criterion for ranking found communities. © 2005 IEEE.
  • Improvements to the linear programming based scheduling of web advertisements
    Atsuyoshi Nakamura, Naoki Abe
    Electronic Commerce Research, 5, 1, 75, 98, Jan. 2005, [Peer-reviewed], [Lead author]
    English, International conference proceedings, We propose and evaluate a number of improvements to the linear programming formulation of web advertisement scheduling, which we have proposed elsewhere together with our colleagues [Langheinrich et al., 9]. In particular, we address a couple of important technical challenges having to do with the estimation of click-through rates and optimization of display probabilities (the exploration-exploitation trade-off and the issue of data sparseness and sealability). as well as practical aspects that are essential for successful deployment of this approach (the issues of multi-impressions and inventory management). We propose solutions to each of these issues, and assess their effectiveness by running large-scale simulation experiments. © 2005 Springer Science + Business Media. Inc.
  • On the minimum l(1)-norm signal recovery in underdetermined source separation
    Takigawa, I, M Kudo, A Nakamura, J Toyama
    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, 3195, 193, 200, SPRINGER-VERLAG BERLIN, 2004, [Peer-reviewed]
    English, Scientific journal, This paper studied the minimum l(1)-norm signal recovery in underdetermined source separation, which is a problem of separating n sources blindly from m linear mixtures for n. > m. Based on our previous result of submatrix representation and decision regions, we describe the property of the minimum l(1)-norm sequence from the viewpoint of source separation, and discuss how to construct it geometrically from the observed sequence and the mixing matrix, and the unstability for a perturbation of mixing matrix.
  • Bayesian networks and inner product spaces
    A Nakamura, M Schmitt, N Schmitt, HU Simon
    LEARNING THEORY, PROCEEDINGS, 3120, 518, 533, SPRINGER-VERLAG BERLIN, 2004, [Peer-reviewed]
    English, Scientific journal, In connection with two-label classification tasks over the Boolean domain, we consider the possibility to combine the key advantages of Bayesian networks and of kernel-based learning systems. This leads us to the basic question whether the class of decision functions induced by a given Bayesian network can be represented within a low-dimensional inner product space. For Bayesian networks with an explicitly given (full or reduced) parameter collection, we show that the "natural" inner product space has the smallest possible dimension up to factor 2 (even up to an additive term 1 in many cases). For a slight modification of the so-called logistic autoregressive Bayesian network with n nodes, we show that every sufficiently expressive inner product space has dimension at least 2(n/4). The main technical contribution of our work consists in uncovering combinatorial and algebraic structures within Bayesian networks such that known techniques for proving lower bounds on the dimension of inner product spaces can be brought into play.
  • Person identification with environment information
    M Kudo, T Hosokawa, J Toyama, H Tenmoto, A Nakamura
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS, 65, 68, INT INST INFORMATICS & SYSTEMICS, 2004, [Peer-reviewed]
    English, International conference proceedings, In the past decades, person identification has been gathering a great deal of attention in a wide rage of applications. Among those techniques, many aim at achieving a high identification rate using several biometrics evidences such as fingerprint, iris, and finger vein. However these evidences require user's cooperation and are suitable for a "hard" identification in which any kind of impersonation is not allowed. In this study, we consider a "soft" identification, in which the system does require user-cooperation, at the expense of some degree of accuracy. On the contrary, the cost for gathering evidences becomes cheap and some weak evidences are used together The assumed applications are electronic devices on daily life. Through a soft identification, a device can be personalized to a user and the system provides some user-specific functions to the user In this study, a Bayesian network approach is investigated its potential for this goal. In addition, environmental evidence such as day and time are also incorporated to the system.
  • Collaborative filtering using restoration operators
    A Nakamura, M Kudo, A Tanaka
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2838, 339, 349, SPRINGER-VERLAG BERLIN, 2003, [Peer-reviewed], [Lead author]
    English, Scientific journal, We propose a new collaborative filtering method that uses restoration operators. The problem of restoration by operators was originally studied in the field of digital image restoration [9]. We also consider the problem of selecting items that users should be asked to rate in order to achieve a small expected squared error, and we propose a greedy method as a solution of this problem. According to our experimental results, prediction performance of restoration operators is good when the number of observed ratings is small, and our greedy method outperforms random query item selection.
  • Collaborative filtering using projective restoration operators
    A Nakamura, M Kudo, A Tanaka, K Tanabe
    DISCOVERY SCIENCE, PROCEEDINGS, 2843, 393, 401, SPRINGER-VERLAG BERLIN, 2003, [Peer-reviewed], [Lead author]
    English, Scientific journal, We propose a modified version of our collaborative filtering method using restoration operators, which was proposed in [6]. Our previous method was designed so as to minimize expected squared error of predictions for user's ratings, and we experimentally showed that, for users who have evaluated only small number of items, mean squared error of our method is smaller than that of correlation-base methods. After further experiments, however, we found that, for users who have evaluated many items, the best correlation-based method has smaller mean squared error than our method. In our modified version, we incorporated an idea of projecting on a low-dimensional subspace with our method using restoration operators. We experimentally showed that our modification overcame the shortcoming stated above.
  • Online learning of binary and n-ary relations over clustered domains
    A Nakamura, N Abe
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 65, 2, 224, 256, ACADEMIC PRESS INC ELSEVIER SCIENCE, Sep. 2002, [Peer-reviewed], [Lead author]
    English, Scientific journal, We consider the online learning problem for binary relations defined over two finite sets, each clustered into a relatively small number k, I of types (such a relation is termed a (k, l)-binary relation), extending the models of S. Goldman, R. Rivest, and R. Schapire (1993, SIAM J. Comput. 22, 1006-1034). We investigate the learning complexity of (k, l)-binary relations with respect to both the self-directed and adversary-directed learning models. We also generalize this problem to the learning problem for (k(l),...,k(d))-d-ary relations. In the self-directed model, we exhibit an efficient learning algorithm which makes at most kl + (n - k)log k + (m - l)log l mistakes, where n and in are the number of rows and columns, roughly twice the lower bound we show for this problem, '[log k] [log 1] + (1)/(2)(n - k) [log k] + (1)/(2)(m - 1) [log l]. In the adversary-directed model, we exhibit an efficient algorithm for the (2, 2)-binary relations, which makes at most n + m + 2 mistakes, only two more than the lower bound we show for this problem, n + m. As for (k(l),..., k(d))-d-ary relations, we obtain lower bounds and upper bounds on the number of mistakes in the self-directed model, teacher-directed model, and adversary-directed model. Finally we show that, although the sample consistency problem for (2, 2)-binary relations is solvable in polynomial time, the same problem for (2,2,2)-ternary relations is already NP-complete. (C) 2002 Elsevier Science (USA).
  • Improvements in practical aspects of optimally scheduling web advertising
    Atsuyoshi Nakamura
    Proceedings of the 11th International Conference on World Wide Web, WWW '02, 536, 541, ACM, 2002, [Peer-reviewed], [Lead author]
    English, International conference proceedings, We addressed two issues concerning the practical aspects of optimally scheduling web advertising proposed by Langheinrich et al. [5], which scheduling maximizes the total number of click-throughs for all banner advertisements. One is the problem of multi-impressions in which two or more banner ads are impressed at the same time. The other is inventory management, which is important in order to prevent over-selling and maximize revenue. We propose efficient methods which deal with these two issues.
  • Query learning of bounded-width OBDDs
    A Nakamura
    THEORETICAL COMPUTER SCIENCE, 241, 1-2, 83, 114, ELSEVIER SCIENCE BV, Jun. 2000, [Peer-reviewed], [Lead author]
    English, Scientific journal, In this paper, we study exact learnability of bounded-width ordered binary decision diagrams (OBDDs) when no ordering of the variables is given and learning is conducted by way of equivalence queries and membership queries. We present a learning algorithm for width-2 OBDDs, an algorithm which uses O(n(3)) equivalence queries alone, where n is the number of variables. We also present a learning algorithm for width-2 OBDDs that uses O(n) proper equivalence queries and O(n(2)) membership queries. Further, we show a negative result: that there are no polynomial-time algorithms capable of learning width-3 OBDDs from proper equivalence queries alone. (C) 2000 Elsevier Science B.V. All rights reserved.
  • Automatic recording agent for digital video server.
    Atsuyoshi Nakamura, Naoki Abe, Hiroshi Matoba, Katsuhiro Ochiai
    Proceedings of the 8th ACM International Conference on Multimedia 2000, Los Angeles, CA, USA, October 30 - November 3, 2000., 57, 66, ACM, 2000, [Peer-reviewed], [Lead author]
  • Unintrusive customization techniques for Web advertising
    M Langheinrich, A Nakamura, N Abe, T Kamba, Y Koseki
    COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 31, 11-16, 1259, 1272, ELSEVIER SCIENCE BV, May 1999, [Peer-reviewed]
    English, Scientific journal, Most online advertisement systems in place today use the concept of consumer targeting: each user is identified and, according to his or her system setup, browsing habits and available off-line information, categorized in order to customize the advertisements for highest user responsiveness. This constant monitoring of a user's online habits, together with the trend to centralize this data and link it with other databases, continuously nurtures fears about the growing lack of privacy in a networked society. In this paper, we propose a novel technique of adapting online advertisement to a user's short term interests in a non-intrusive way. As a proof-of-concept we implemented a dynamic advertisement selection system able to deliver customized advertisements to users of an online search service or Web directory. No user-specific data elements are collected or stored at any time. Initial experiments indicate that the system is able to improve the average click-through rate substantially compared to random selection methods. (C) 1999 Published by Elsevier Science B.V. All rights reserved.
  • Learning to optimally schedule internet banner advertisements
    N Abe, A Nakamura
    MACHINE LEARNING, PROCEEDINGS, 12, 21, MORGAN KAUFMANN PUB INC, 1999, [Peer-reviewed]
    English, International conference proceedings, We have developed a method which learns to schedule internet banner advertisements so as to maximize the average click-through rate, while adhering to the requirements imposed by contracts with the advertisers such as a minimum guaranteed number of impressions. We focus on the problem of adaptively scheduling advertisement display probabilities as a function of a single attribute such as a search keyword. Our learning algorithm is based on an efficient solution of a special class of linear programming problems called the 'transportation problem,' and also embodies a number of measures to address the exploration-exploitation trade-off and an efficient attribute clustering method to help reduce the dimensionality. Our experimental results verify the advantage of our linear programming based approach, as well as the effect of various additional measures we incorporate into our method.
  • Learning Specialist Decision Lists.
    Atsuyoshi Nakamura
    Proceedings of the Twelfth Annual Conference on Computational Learning Theory, COLT 1999, Santa Cruz, CA, USA, July 7-9, 1999, 215, 225, ACM, 1999, [Peer-reviewed], [Lead author]
  • Collaborative Filtering Using Weighted Majority Prediction Algorithms.
    Atsuyoshi Nakamura, Naoki Abe
    Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, USA, July 24-27, 1998, 395, 403, Morgan Kaufmann, 1998, [Peer-reviewed], [Lead author]
  • Empirical comparison of competing query learning methods
    N Abe, H Mamitsuka, A Nakamura
    DISCOVERY SCIENCE, 1532, 387, 388, SPRINGER-VERLAG BERLIN, 1998, [Peer-reviewed]
    English, Scientific journal
  • Efficient distribution-free population learning of simple concepts
    A Nakamura, J Takeuchi, N Abe
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 23, 1-2, 53, 82, BALTZER SCI PUBL BV, 1998, [Peer-reviewed], [Lead author]
    English, Scientific journal, We consider a variant of the 'population learning model' proposed by Kearns and Seung [8], in which the learner is required to be 'distribution-free' as well as computationally efficient. A population learner receives as input hypotheses from a large population of agents and produces as output its final hypothesis. Each agent is assumed to independently obtain labeled sample for the target concept and output a hypothesis. A polynomial time population learner is said to PAC-learn a concept class, if its hypothesis is probably approximately correct whenever the population size exceeds a certain bound which is polynomial, even if the sample size for each agent is fixed at some constant. We exhibit some general population learning strategies, and some simple concept classes that can be learned by them. These strategies include the 'supremum hypothesis finder', the 'minimum superset finder' (a special case of the 'supremum hypothesis finder'), and various voting schemes. When coupled with appropriate agent algorithms, these strategies can learn a variety of simple concept classes, such as the 'high-low game', conjunctions, axis-parallel rectangles and others. We give upper bounds on the required population size for each of these cases, and show that these systems can be used to obtain a speed up from the ordinary PAC-learning model [11], with appropriate choices of sample and population sizes. With the population learner restricted to be a voting scheme, what we have is effectively a model of 'population prediction', in which the learner is to predict the value of the target concept at an arbitrarily drawn point, as a threshold function of the predictions made by its agents on the same point. We show that the population learning model is strictly more powerful than the population prediction model. Finally, we consider a variant of this model with classification noise, and exhibit a population learner for the class of conjunctions in this model.
  • An efficient exact learning algorithm for ordered binary decision diagrams
    A Nakamura
    ALGORITHMIC LEARNING THEORY, 1316, 307, 322, SPRINGER-VERLAG BERLIN, 1997, [Peer-reviewed], [Lead author]
    English, Scientific journal, In this paper, we propose a new algorithm which exactly learns ordered binary decision diagrams (OBDDs) with a given variable ordering via equivalence queries and membership queries. Our algorithm uses at most n equivalence queries and at most 2n([log(2) m] + 3n) membership queries, where n is the number of nodes in the target reduced OBDD and m is the number of variables. We have reduced the number of membership queries by a factor of m compared with the best known algorithm for this problem due to Gavalda and Guijarro.
  • Query Learning of Bounded-Width OBDDs.
    Atsuyoshi Nakamura
    Algorithmic Learning Theory, 7th International Workshop, ALT '96, Sydney, Australia, October 23-25, 1996, Proceedings, 37, 50, Springer, 1996, [Peer-reviewed], [Lead author]
  • On-line Learning of Binary Lexical Relations Using Two-dimensional Weighted Majority Algorithms
    Naoki Abe, Hang Li, Atsuyoshi Nakamura
    CoRR, abs/cmp-lg/9507010, 1995, [Peer-reviewed]
  • On-line Learning of Binary Lexical Relations Using Two-dimensional Weighted Majority Algorithms.
    Naoki Abe, Hang Li, Atsuyoshi Nakamura
    Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, July 9-12, 1995, 3, 11, Morgan Kaufmann, 1995, [Peer-reviewed]
  • On-line Learning of Binary and -ary Relations over Multi-dimensional Clusters.
    Atsuyoshi Nakamura, Naoki Abe
    Proceedings of the Eigth Annual Conference on Computational Learning Theory, COLT 1995, Santa Cruz, California, USA, July 5-8, 1995, 214, 221, ACM, 1995, [Peer-reviewed], [Lead author]
  • Learning Sparse Linear Combinations of Basis Functions over a Finite Domain.
    Atsuyoshi Nakamura, Shinji Miura
    Algorithmic Learning Theory, 6th International Conference, ALT '95, Fukuoka, Japan, October 18-20, 1995, Proceedings, 138, 150, Springer, 1995, [Peer-reviewed], [Lead author]
  • EXACT LEARNING OF LINEAR-COMBINATIONS OF MONOTONE TERMS FROM FUNCTION VALUE QUERIES
    A NAKAMURA, N ABE
    THEORETICAL COMPUTER SCIENCE, 137, 1, 159, 176, ELSEVIER SCIENCE BV, Jan. 1995, [Peer-reviewed], [Lead author]
    English, Scientific journal, We investigate the problem of exactly identifying a real-valued function of {0, 1}(n) by a weighted sum of a number of monotone terms by querying for the values of the target function at assignments of the learner's choice. When all coefficients are nonnegative, we exhibit an efficient learning algorithm requiring at most (n-[log s]+1)s queries, where n is the number of variables and s is the number of terms in the target formula. We prove a lower bound of Omega(ns/log s) on the number of queries necessary for learning this class, so no algorithm can reduce the number of queries dramatically. The algorithm runs in time O (ns(2)) in the worst case. The same algorithm can be used to learn the 'inductive-read-k' subclass, a proper superclass of the 'read-k' subclass, with a number of queries not exceeding 1/2((n-[log k])(n-[log k]+1)+2)k, which improves upon the bound achievable by a naive learning algorithm by a factor of two. In addition, the above method can be extended to handle the nonmonotone case in some restricted sense: A similar algorithm can learn the trnate linear combinations of terms with a comparable number of queries. In the general case, namely, when the coefficients vary over the reals (or any arbitrary field), we show that the number of queries required for exact learning of the k-term subclass is upper bounded by q(n,[log k]+1) and is lower bounded by q(n,[log k]), where q(n,l)=Sigma(i=0)(l)((n)(i)). These bounds are shown by generalizing Roth and Benedik's technique for analyzing the learning problem for k-sparse multivariate polynomials over GF(2) (Roth and Benedek, 1991) to those over an arbitrary field.
  • Efficient Distribution-free Population Learning of Simple Concepts.
    Atsuyoshi Nakamura, Naoki Abe, Jun-ichi Takeuchi
    Algorithmic Learning Theory, 4th International Workshop on Analogical and Inductive Inference, AII '94, 5th International Workshop on Algorithmic Learning Theory, ALT '94, Reinhardsbrunn Castle, Germany, October 10-15, 1994, Proceedings, 500, 515, Springer, 1994, [Peer-reviewed], [Lead author]
  • Exact Learning of Linear Combinations of Monotone Terms from Function Value Queries.
    Atsuyoshi Nakamura, Naoki Abe
    Algorithmic Learning Theory, 4th International Workshop, ALT '93, Tokyo, Japan, November 8-10, 1993, Proceedings, 300, 313, Springer, 1993, [Peer-reviewed], [Lead author]
  • Empirical comparison of competing query learning methods               
    Naoki Abe, Hiroshi Mamitsuka, Atsuyoshi Nakamura
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1532, 387, 388, Springer Verlag, 1988
    English, International conference proceedings

Other Activities and Achievements

  • 単調増加制約のあるレベルセット推定
    田畑 公次, 中村 篤祥, 高見 亮佑, Joshua Arenson, 和田 弥生, Walker Peterson, 合田 圭介, 園下 将大, 小松崎 民樹, 人工知能学会研究会資料 人工知能基本問題研究会, 124, 25, 30, 06 Mar. 2023
    一般社団法人 人工知能学会, Japanese
  • 与えられたデータに無矛盾なコンパクトな多出力二分決定グラフの質問学習
    中村 篤祥, 人工知能学会研究会資料 人工知能基本問題研究会, 123, 31, 36, 05 Jan. 2023
    一般社団法人 人工知能学会, Japanese
  • Estimation of Propagation Graph by Pairwise Alignment of Time Series Data
    林 達也, 中村 篤祥, 人工知能基本問題研究会, 114, 1, 6, 20 Nov. 2020
    人工知能学会, Japanese
  • Effect and Application of Branching Condition Sharing of Decision Forests
    NAKAMURA Atsuyoshi, SAKURADA Kento, Proceedings of the Annual Conference of JSAI, 2020, 2I1GS201, 2I1GS201, 2020

    Min_DBN is an algorithm that minimizes the number of distinct branching conditions in a decision forest by sharing variable's thresholds assigned to different branching nodes under the condition that decision paths of given feature vectors must not be changed more than a specified percentage at each branching node. This simplification is effective for compact hardware implimentation of a decision forest with comparator sharing. In this paper, we experimentally investigate how effective the algorithm is for decision forests constructed by various ensemble learning methods (random forest, extremly random trees, AdaBoost, gradient boosting) in classification and regression. Futhermore, we improve the algorithm by using the feature vectors only used in a bagging-based learning for applying its path condition to each component tree of the ensumbles, and check its effectiveness experimentally.

    , The Japanese Society for Artificial Intelligence, Japanese
  • A Fast Approximate Algorithm for k-Median Problem on a Graph               
    Keisuke Todo, Atsuyoshi Nakamura and Mineichi Kudo, Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG), Aug. 2019, [Peer-reviewed]
    Summary international conference
  • Reduction on the Number of Distinct Branch Nodes of a Random Forest Classifier
    櫻田 健斗, 中村 篤祥, 人工知能基本問題研究会, 109, 62, 67, 13 Mar. 2019
    人工知能学会, Japanese
  • バンディット問題の方策を用いたモンテカルロ木探索による最適属性集合探索
    中村篤祥, ペリシエ オレリアン, 田畑公次, 小松崎民樹, 日本細胞生物学会大会(Web), 71st, ROMBUNNO.1SEp‐07 (WEB ONLY), 2019
    Japanese
  • Algorithms for Checking Existence of a Bad Arm Utilizing Asymmetry
    田畑公次, 中村篤祥, 小松崎民樹, 電子情報通信学会技術研究報告, 118, 284(IBISML2018 44-104)(Web), 353‐360 (WEB ONLY), 29 Oct. 2018
    Japanese
  • Bad Arm Existence Checking Algorithms with Small Sample Complexity
    田畑 公次, 中村 篤祥, 本多 淳也, 小松崎 民樹, 人工知能基本問題研究会, 106, 94, 99, 16 Mar. 2018
    人工知能学会, Japanese
  • Feature selection as Monte-Carlo Search in Growing Single Rooted Directed Acyclic Graph by Best Leaf Identification.
    Aurelien Pelissier, Atsuyoshi Nakamura, Koji Tabata, CoRR, abs/1811.07531, 2018
  • 音響的に自然なつなぎ目の発見による楽曲ループ検出
    安井拓未, 中村篤祥, 中村篤祥, 田中章, 田中章, 工藤峰一, 工藤峰一, 情報処理学会研究報告(Web), 2017, MUS-116, Vol.2017‐MUS‐116,No.15,1‐4 (WEB ONLY), 17 Aug. 2017
    Japanese
  • Algorithms for Bad Arm Existence Checking Problem
    中村 篤祥, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 117, 110, 49, 54, 23 Jun. 2017
    電子情報通信学会, Japanese
  • Enumeration of Interspersed Repeats from the Human Genome
    中村 篤祥, 人工知能基本問題研究会, 103, 45, 50, 13 Mar. 2017
    人工知能学会, Japanese
  • Extended KL-UCB Policy for the Budgeted Multi-Armed Bandits
    渡辺 僚, 中村 篤祥, 工藤 峰一, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 115, 323, 167, 174, 26 Nov. 2015
    電子情報通信学会, Japanese
  • Noise Free Multi-Armed Bandit Game (特集 「人工知能・機械学習」および一般)
    Helmbold David P, Nakamura Atsuyoshi, Warmuth Manfred K, 人工知能基本問題研究会, 98, 16, 21, 07 Aug. 2015
    人工知能学会, English
  • A Bridge between Hedge and Exp3 Algorithms
    中村 篤祥, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 115, 112, 81, 86, 23 Jun. 2015
    電子情報通信学会, Japanese
  • An Extension of UCB to the Stochastic Multi-armed Bandits with Action-dependent Processing Time
    渡辺 僚, 中村 篤祥, 工藤 峰一, 人工知能基本問題研究会, 96, 29, 34, 13 Jan. 2015
    人工知能学会, Japanese
  • A Study on UCB-type Collaborative Filtering
    中村 篤祥, 人工知能基本問題研究会, 93, 45, 48, 07 Mar. 2014
    人工知能学会, Japanese
  • An Algorithm for Influence Maximization Problem on a Two-Terminal Series Pararell Graph
    田畑 公次, 中村 篤祥, 工藤 峰一, 人工知能基本問題研究会, 92, 35, 40, 30 Jan. 2014
    人工知能学会, Japanese
  • A New UCB-based Algorithm for the Matching-Selection Multi-armed Bandit Problem
    渡辺 僚, 中村 篤祥, 工藤 峰一, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 113, 198, 9, 16, 03 Sep. 2013
    電子情報通信学会, Japanese
  • Balanced Binary Search Tree with Constant-Time Insertions at the End
    花田 博幸, 中村 篤祥, 工藤 峰一, 人工知能基本問題研究会, 89, 25, 30, 28 Feb. 2013
    人工知能学会, Japanese
  • An Efficient Algorithm for Multi-Armed Bandit Problems with Matching Selection
    渡邊 僚, 中村 篤祥, 工藤 峰一, 人工知能基本問題研究会, 86, 0, 29, 34, 09 Aug. 2012
    人工知能学会, Japanese
  • Fast Approximation Algorithm for the 1-Median Problem
    田畑 公次, 中村 篤祥, 工藤 峰一, 人工知能基本問題研究会, 86, 0, 77, 82, 09 Aug. 2012
    人工知能学会, Japanese
  • Finding Approximate Frequent Patterns from DNA Sequences
    中村 篤祥, 瀧川 一学, 戸坂 央, 人工知能基本問題研究会, 85, 23, 28, 02 Feb. 2012
    人工知能学会, English
  • Fast Algorithm for Finding a Graph Node with High Closeness Centrality
    TABATA Koji, NAKAMURA Atsuyoshi, KUDO Mineichi, IEICE technical report. Theoretical foundations of Computing, 111, 256, 7, 14, 14 Oct. 2011
    The Closeness Centrality is one of centrality measures of a node in a graph. It is calculated as the reciprocal of the sum of distances to all other nodes. In this paper, we propose a fast approximate algorithm that finds the node to maximize its closeness centrality in an undirected graph. Given a node v, the algorithm repeatedly find a node with higher closeness centrality by making use of a shortest path tree of the previous node. According to out experiment, our algorithm can find a node with almost maximum closeness centrality with high probability., The Institute of Electronics, Information and Communication Engineers, Japanese
  • A note on Capped Hedge algorithm
    中村 篤祥, 人工知能基本問題研究会, 82, 1, 6, 04 Aug. 2011
    人工知能学会, Japanese
  • Efficient Implementation of Greedy Cover Learning by Hyper-Rectangles and Its Classification Performance Evaluation Using Real Data
    OUCHI Koji, NAKAMURA Atsuyoshi, KUDO Mineichi, IEICE technical report, 110, 265, 99, 104, 28 Oct. 2010
    Blumer et al. showed that the class of concepts represented by finite unions of hyper-rectangles in d-dimensional Euclidean space is polynomial-time PAC learnable for a fixed natural number d. In their proof, they constructed an algorithm which conducts a greedy covering of a given positive instances by hyper-rectangles that never cover any one of given negative instances. In this paper, we discuss the efficient implementation of their algorithm. According to our experimental results on n-class classification problems using UCI datasets, Blumer's covering algorithm, in most cases, outputs a..., The Institute of Electronics, Information and Communication Engineers, Japanese
  • Adversarial Bandit Problems with Multiple Plays
    UCHIYA Taishi, NAKAMURA Atsuyoshi, KUDO Mineichi, IEICE technical report. Theoretical foundations of Computing, 109, 195, 13, 20, 07 Sep. 2009
    Adversarial bandit problems studied by Auer et al. are the multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to those in the case when k(≧1) actions are selected at each time step. We analyze the problems in two settings of multiple plays, that is, in the settings of selections with and without repetition. In the both settings, we generalize the Exp3 family algorithms developed by them and give the generalized upper bounds on the regrets for the best fixed set of acti..., The Institute of Electronics, Information and Communication Engineers, English
  • On effect of balancing investment in nonstochastic multi-armed bandit problems
    UCHIYA Taishi, NAKAMURA Atsuyoshi, KUDO Mineichi, Technical report of IEICE. PRMU, 108, 363, 213, 218, 11 Dec. 2008
    The multiarmed bandit problem is a problem in which a gambler chooses one arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. On the other hand, Auer et al. made no statistical assumption whatsoever about the nature of the process generating the payoffs of the slot machine. They gave solutions to the bandit ploblem in which an adversary has complate control over the payoffs. In this paper, we extend this problem to the proble..., The Institute of Electronics, Information and Communication Engineers, Japanese
  • Classifier selection in a family of polyhedron classifiers
    TAKAHASHI Tetsuji, KUDO Mineichi, NAKAMURA Atsuyoshi, Technical report of IEICE. PRMU, 108, 363, 37, 41, 11 Dec. 2008
    It is an efficient way to approximate a class region by convex hulls of samples. However, the convex hull is computationally hard to be constructed in high dimensions. In this paper, we propose a way to construct an approximate convex hull in a linear time of dimension. In addition, we discuss on selecting an optimal classifier from a family of polyhedron classifiers derived from approximate convex hulls., The Institute of Electronics, Information and Communication Engineers, Japanese
  • Packing Alignment and Its Application to Music Mining
    NAKAMURA Atsuyoshi, KUDO Mineichi, IEICE technical report. Theoretical foundations of Computing, 108, 237, 9, 16, 03 Oct. 2008
    We propose packing alignment as an alignment for sequences of lengthened symbols like musical notes. Furthermore, we consider sequences of symbols with length and end position as a model of a general musical piece in which notes can overlap, and we extend our packing alignment to that for such sequences. Using packing alignment, it might be possible that parts that you feel similar somehow are extracted automatically. According to our experiment using MIDI files of Bach's popular musical pieces, actually, parts that are non-trivially similar but are felt similar somehow were extracted., The Institute of Electronics, Information and Communication Engineers, English
  • Finding Subforest Patterns with Frequent Occurrence of Similar Subforests
    戸坂 央, 中村 篤祥, 工藤 峰一, 人工知能学会全国大会論文集, 22, 1, 4, 2008
    人工知能学会, Japanese
  • Complexity and Enumeration of Subclasses
    NAKAMURA Atsuyoshi, KUDO Mineichi, IEICE technical report. Theoretical foundations of Computing, 107, 219, 35, 42, 13 Sep. 2007
    Representing a complicated region by union of simple regions is often preferred in various study areas because each component simple region is easy to understand and easy to deal with. Kudo et al. [2], [4] considered the following problem called subclass problem: given positive and negative examples in multidimensional real space, enumerate all maximal sets of positive examples for which there exists a hyper-rectangle that contains the set and does not contain any negative example. However, a randomized method that can find as many important subclasses as possible has been used so far becau..., The Institute of Electronics, Information and Communication Engineers, Japanese
  • Finding of Frequent Similar Subtrees in Tree-Structured Data
    TOSAKA Hisashi, NAKAMURA Atsuyoshi, KUDO Mineichi, Technical report of IEICE. PRMU, 107, 115, 7, 12, 21 Jun. 2007
    We study a novel problem of mining subtrees with frequent occurrence of similar subtrees, and propose an efficient algorithm for this problem. According to our problem setting, frequency of a subtree is counted not only for equivalent subtrees but also for similar subtrees. Our experiment showed that our algorithm is enough fast for practical use. Preliminary experiment for data record extraction using our mining method also showed encouraging result., The Institute of Electronics, Information and Communication Engineers, Japanese
  • Finding of Frequent Similar Subtrees in Tree-Structured Data
    TOSAKA Hisashi, NAKAMURA Atsuyoshi, KUDO Mineichi, IEICE technical report. Data engineering, 107, 114, 7, 12, 21 Jun. 2007
    We study a novel problem of mining subtrees with frequent occurrence of similar subtrees, and propose an efficient algorithm for this problem. According to our problem setting, frequency of a subtree is counted not only for equivalent subtrees but also for similar subtrees. Our experiment showed that our algorithm is enough fast for practical use. Preliminary experiment for data record extraction using our mining method also showed encouraging result., The Institute of Electronics, Information and Communication Engineers, Japanese
  • 不特定サイトからのキーワード関連情報の抽出 (テーマ:特集「ウェブデータの知的処理」および一般)
    中村 篤祥, 人工知能基本問題研究会, 63, 0, 33, 38, 08 Sep. 2006
    人工知能学会, Japanese
  • LA_001 Algorithm for Minimizing Repetition Representation Trees
    Saito Tomoya, Nakamura Atsuyoshi, Kudo Mineichi, 情報科学技術レターズ, 5, 0, 1, 4, 21 Aug. 2006
    Forum on Information Technology, Japanese
  • 繰返し構造をもつラベル付順序木の簡潔な表現法(計算理論とアルゴリズムの新展開)
    斉藤 智哉, 中村 篤祥, 工藤 峰一, 数理解析研究所講究録, 1489, 0, 216, 222, May 2006
    京都大学, Japanese
  • How Easy to Learn Linear Ranking Functions
    NAKAMURA Atsuyoshi, IEICE technical report. Theoretical foundations of Computing, 105, 273, 49, 53, 08 Sep. 2005
    In this paper, we study how easy to learn the class of linear ranking functions from n-dimensional Euclidean space to {1, 2, …, k}. We show its graph dimension, which is considered to indicate how easy to learn, is Θ(n+k). This graph dimension is significantly smaller than the graph dimension Θ(nk) of the class of k-valued decision-list functions naturally defined using k - 1 linear discrimination functions., The Institute of Electronics, Information and Communication Engineers, English
  • ランキング関数のオンライン学習について (計算機科学基礎理論とその応用)
    中村 篤祥, 数理解析研究所講究録, 1426, 0, 51, 56, Apr. 2005
    京都大学, Japanese
  • On NK-Community Problem (Theoretical Computer Science and its Applications)
    Nakamura Atsuyoshi, Shigezumi Takeya, Yamamoto Masaki, RIMS Kokyuroku, 1426, 0, 71, 77, Apr. 2005
    Kyoto University, English
  • Detection of Wrong Characters by Probability Transitional Patterns of Two-Directional N-gram Probabilities
    KAWATA Takehiro, KUDO Mineichi, TOYAMA Jun, NAKAMURA Atsuyoshi, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 88, 3, 629, 635, 01 Mar. 2005
    OCRなどを通して得られる日本語文の認識結果において, N-gram確率を利用した高速な誤認識文字検出法を提案する.日本語のように単語が分かち書きされず大規模な語彙を対象とした場合, 誤り個所の指摘に文字N-gramは有効な方法である.本論文ではまず, 通常のN-gram確率の拡張として両方向N-gram確率を提案し, その有効性を情報量の点から考察する.次に, 両方向N-gram確率と文脈確率を用いて1文字の誤字を検出する方法を提案する.シミュレーション実験では, 適合率80%において従来法よりも10%以上高い約75%の再現率を達成できた.また, 誤り範囲の指摘という点では, 適合率80%で再現率90%が達成された., The Institute of Electronics, Information and Communication Engineers, Japanese
  • Mining Frequent Trees with Node-Inclusion Constraints
    NAKAMURA Atsuyoshi, KUDO Mineichi, IPSJ SIG Notes, 2004, 101, 67, 74, 14 Oct. 2004
    In this paper, we propose an efficient algorithm enumerating all frequent subtrees containing all special nodes that are guaranteed to be included in all trees belonging to a given data. Our algorithm is a modification of TreeMiner algorithm [9] so as to efficiently generate only candidate subtrees satisfying our constraints. We also propose a space saving method for a set of trees with a lot of nodes having the same label. We report mining results obtained by applying our algorithm to the problem of finding frequent structures containing the name and reputation of given restaurants in Web ..., Information Processing Society of Japan (IPSJ), English
  • Mining Frequent Trees with Node-Inclusion Constraints
    NAKAMURA Atsuyoshi, KUDO Mineichi, IEICE technical report. Theoretical foundations of Computing, 104, 339, 7, 14, 08 Oct. 2004
    In this paper, we propose an efficient algorithm enumerating all frequent subtrees containing all special nodes that are guaranteed to be included in all trees belonging to a given data. Our algorithm is a modification of TreeMiner algorithm [9] so as to efficiently generate only candidate subtrees satisfying our constraints. We also propose a space saving method for a set of trees with a lot of nodes having the same label. We report mining results obtained by applying our algorithm to the problem of finding frequent structures containing the name and reputation of given restaurants in Web ..., The Institute of Electronics, Information and Communication Engineers, English
  • Detection of Wrong Character Using Probability Transitional Patterns of Both-Direction N-gram Probabilities
    KAWATA Takehiro, NAKAMURA Atsuyoshi, TOYAMA Jun, KUDO Mineichi, Technical report of IEICE. PRMU, 103, 295, 1, 5, 01 Sep. 2003
    The method of detecting the wrong characters and correction is proposed to improve recognition precision in character recognition. We pay attention to a technique in which N-gram probabilities are used, and propose a method which detects misspelled characters by both direction N-gram probabilities. Context probability is also introduced to be incorporated with estimated probabilities transitional patterns in both direction N-gram probabilities and context probabilities at the misspelling point. Moreover, we discuss the causes why some misspelled characters are not detected in this method. I..., The Institute of Electronics, Information and Communication Engineers, Japanese
  • Relationship between Rule Accuracy and a Degree of Interest
    SHIDARA Yohji, NAKAMURA Atsuyoshi, KUDO Mineichi, Technical report of IEICE. PRMU, 103, 295, 7, 11, 01 Sep. 2003
    The methods which extract rules from database are useful for mining because of their human-readable output. Various methods have been proposed which construct some accurate classifiers and extract the rules from these classifiers. However, these accurate classifiers do not always provide the interesting rules. We propose the rule-mining method based on abstraction of attribute values and screening technique to extract rules with high readability. We also discuss the relationship between the discrimination accuracy and the degree of interest of the rules, The Institute of Electronics, Information and Communication Engineers, Japanese
  • Automatic Recommendation(Data/Text Mining)
    Nakamura Atsuyoshi, 応用数理, 12, 4, 401, 410, 25 Dec. 2002
    We explain automatic recommendation techniques by which computer systems can automatically select goods or documents preferred by each user based on his/her buying or accessing history. Especially, we focus on the collaborative filtering method using weighted majority prediction algorithm developed by the authors. From both theoretical and experimental points of view, we explain how excellent the method is compared to the method using correlation coefficient, which is the most popular collaborative filtering method. We also address issues that we should consider in order to make our method ..., The Japan Society for Industrial and Applied Mathematics, Japanese
  • 自動リコメンデーション
    中村 篤祥, 応用数理, 12, 4, 401, 410, 25 Dec. 2002
    日本応用数理学会, Japanese
  • Targeting Techniques for Web Advertising
    LANGHEINRICH Marc, KAMBA Tomonari, NAKAMURA Atsuyoshi, IPSJ Magazine, 40, 8, 807, 812, 15 Aug. 1999
    Information Processing Society of Japan (IPSJ), Japanese
  • 0.特集「能動学習」の編集にあたって (<特集>能動学習)
    中村 篤祥, 情報処理, 38, 7, 557, 588, 15 Jul. 1997
    一般社団法人情報処理学会, Japanese
  • Introduction to Active Learning
    ABE Naoki, NAKAMURA Atsuyoshi, IPSJ Magazine, 38, 7, 558, 561, 15 Jul. 1997
    Information Processing Society of Japan (IPSJ), Japanese
  • Research on Active Learning in Computational Learning Theory
    ABE Naoki, NAKAMURA Atsuyoshi, IPSJ Magazine, 38, 7, 575, 582, 15 Jul. 1997
    Information Processing Society of Japan (IPSJ), Japanese
  • Learning personal preference functions using boolean-variable real-valued multivariate polynomials
    Nakamura Atsuyoshi, Mamitsuka Hiroshi, Toba Hiroyasu, Abe Naoki, 全国大会講演論文集, 第52回平成8年前期, 1, 55, 56, 06 Mar. 1996
    ニュース記事などに対する個人の嗜好を、記事中の出現単語から予測する方法において、各単語にlつのブール変数を割り当て、その実数係数多項式で嗜好関数を表現する方法を提案し、その係数の学習アルゴリズムについて考察及び実験を行う。特に、計算論的学習理論において盛んに研究されているオンライン学習における重みの逐次更新法を、実数係数の学習に適用する。具体的には、Kivinen & Warmuthが提案・解析した加法的更新法GDと乗法的更新法EG^±に加え、新しく「誤差比例修正法」と呼ぶ重み更新アルゴリズムを提案し、その乗法的更新法であるDPMUについて、ある被験者の実データを用いて実験的に予測性能を評価・比較する。実験結果によればDPMUは、既存の方式と同等以上の予測性能を有する。, Information Processing Society of Japan (IPSJ), Japanese
  • On-line Learning of Binary Lexical Relations Using Two-dimensional Weighted Majority Algorithms
    Naoki Abe, Hang Li, Atsuyoshi Nakamura, Proc. of The 12th Int. Conf. on Machine Learning, 1995, 24 Jul. 1995
    We consider the problem of learning a certain type of lexical semantic

    knowledge that can be expressed as a binary relation between words, such as the

    so-called sub-categorization of verbs (a verb-noun relation) and the compound

    noun phrase relation (a noun-noun relation). Specifically, we view this problem

    as an on-line learning problem in the sense of Littlestone's learning model in

    which the learner's goal is to minimize the total number of prediction

    mistakes. In the computational learning theory literature, Goldman, Rivest and

    Schapire and subsequently Goldman and Warmuth have consider..., Technical report
  • Learning d-nary Relations
    Nakamura Atsuyoshi, IEICE technical report. Theoretical foundations of Computing, 94, 181, 93, 102, 25 Jul. 1994
    We study the learning problem in which the learner predicts the value of each entry of d-dimensional{0,1}-valued matrix one by one. According to the agent who selects the next prediction entry,we consider this problem in two different models,self-directed and adversary-directed.In self-directed model,in which the learner selects entries,we show an efficient algorithm which makes at most Π^d_j=1>k_j+Σ^d_j=1>(n_j-k_j)log k_j mistakes when the target m atrix has n_j elements for its j-th direction and those elements are classified into at most k_j types by the values of matrix entries.In adver..., The Institute of Electronics, Information and Communication Engineers, Japanese
  • PAC learning of planar rectangles of arbitrary orientaion
    Nakamura Atsuyoshi, IEICE technical report. Theoretical foundations of Computing, 93, 249, 53, 58, 24 Sep. 1993
    We investigate PAC learnability of planar rectangles of arbitrary orientation.We show an algorithm which outputs a hypothesis rectangle consistent with given m examples in time Ο(ml ogm).Since the class of planar rectangles of arbitrary oprientation has finite VC dimension,by the famous theorem of Blumer et al.,this algorithm is a distribution-free PAC learning algorithm whose sample complexity is Ο(1, εlog1/(εδ))for the ac curacy parameter ε and the confidence parameter δ.We also show th at the algorithm which outputs the minimum area rectangle containing all positive examples,which is kno..., The Institute of Electronics, Information and Communication Engineers, Japanese
  • Learning Multidimensional Euclidean Resions Representable by DNF.
    Nakamura Atsuyoshi, 全国大会講演論文集, 第46回平成5年前期, 3, 35, 36, 01 Mar. 1993
    n次元実数領域[0,1)^n上の部分領域を、[0,1)^n上の一様分布に従って得られる点及びその点が部分領域の内外どちらかという情報から部分領域を学習する問題を扱う。但し、部分領域は次の条件を満たすものに制限する。([0,1)^n上の点を(x_1,x_2,・・・,x_n)とする。)1)原子論理式が(x_i&isins;R)という形のDNF(積和標準形)の論理式で表現される。2)Rは区間[0,1)を高々k(固定)個の領域に分割するような領域∪^l_<i=1>[a_i,b_i)とする。3)各x_iは論理式の中で高々1度しか現れない。[2]では、分布を一様分布に固定したPAC(Probably Approximately Correct)学習モデルにおいてμDNF(各変数が高々1度しか現れないDNF)プール式で表現されるプール関数が学習可能であることが示されている。本稿は上記2)の制約の下で彼らの結果の定義域を{0,1}^nから[0,1}^nに拡張したものである。, Information Processing Society of Japan (IPSJ), Japanese

Books and other publications

Courses

  • 計算基礎特論               
    大学院情報科学研究科コンピュータサイエンス専攻
  • 統計学               
    全学教育
  • 情報理論               
    情報エレクトロニクス学科
  • アルゴリズムとデータ構造               
    情報エレクトロニクス学科コンピュータサイエンスコース

Affiliated academic society

  • THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE               
  • THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS               
  • INFORMATION PROCESSING SOCIETY OF JAPAN               

Research Themes

  • 漸近的最適かつ実用的な純粋探索バンディット方策の開発
    科学研究費助成事業
    01 Apr. 2024 - 31 Mar. 2029
    中村 篤祥, 田畑 公次, 畑埜 晃平, 寺本 央
    日本学術振興会, 基盤研究(A), 北海道大学, 24H00685
  • Online Desision Making Methods for Various Problems and Criteria
    Grants-in-Aid for Scientific Research
    01 Apr. 2022 - 31 Mar. 2026
    畑埜 晃平, Hashima Sherief, 瀧本 英二, 中村 篤祥
    Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (B), Kyushu University, 23K24905
  • MARE: Recognition and Discovery Techniques for Giving Awarenesss
    Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)
    01 Apr. 2019 - 31 Mar. 2024
    工藤 峰一, 今井 英幸, 中村 篤祥
    今期は、不頻出事象の予測を抽象化したロングテール問題について基礎的検討を重ねた。ロングテールの特徴づけとして、極度にインバランス(サンプル数がクラスによって大きく異なる)であること、マルチラベル分類が多いこと、から、マルチラベル分類とインバランス問題の解決に注力した。さらに、実世界の不頻出事象に関連して、希少疾患予測の検討および独居老人の異変行動シミュレーションを行った。詳細を以下に示す。


    ・インバランスが顕著なマルチクラス問題を扱うため、2種類のクラス決定木(葉が一つクラスである決定木)を検討した。クラス集合対としてバランスをとるよう木を構成することでサンプル数の少ないクラスの検出精度を高めることに成功した。さらに、サンプルの少ないクラスが「次元の呪い」を受けやすいことを考慮して、特徴選択を少なくするようにクラス決定木を構成することで少数クラスのみならず全体の性能を向上させられることを示した。解釈可能性も大きく改善した。
    ・希少疾患の診断において、1)医者の”希少疾患見過ごし”を防ぐために可視化手法を利用することを提案した、さらに、2)特徴選択により希少疾患の診断精度を向上させられることを確認した。
    ・不頻出事象の予測に関して、極値論とサポートベクターマシンを利用した未知クラスの発見手法を提案し、これまでの手法よりよい性能を達成した。
    ・独居老人の異変検知手法を開発する前段階として、異変を含む仮想住人の行動シミュレータを開発した。高齢者を実際のスマートホームに住まわせ、転倒などの異変が起きるのを待つわけにはいかないため、シミュレータは必須である。また、長期間に渡るセンサデータの採取や多様な生活行動を扱う上でも有効である。残念ながらこれまでの方式で複数の異変を適切に扱えるものがなかったため、確率モデルを利用したシミュレータを新たに開発した。
    Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (B), Hokkaido University, 19H04128
  • バンディット問題の方策の実用化のための理論の深化
    Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)
    01 Apr. 2019 - 31 Mar. 2023
    中村 篤祥, 田畑 公次, 工藤 峰一
    昨年度から始めた分類バンディット問題のアルゴリズムの開発において、k-腕設定のトンプ ソンサンプリングをベースとしたアルゴリズム、および実験的な性能評価を国際学会のワークショップで発表した。更に指定した信頼度で正しく判定するためのサンプル数の理論的評価を始めた。連続腕設定に拡張した問題においては、ガウス過程を事前分布とする方式において、全腕の正負(報酬の期待値が閾値以上か未満か)を答える既存法より少ないサンプル数の上界が得られるアルゴリズムを開発した。また、正腕の割合の推定値から分類結果を予測し、その予測が正しい場合に分類の精度が上がる方の腕を引く方式を考案し、有効性を実験により確かめた。ラマン分光による細胞診断の実データにも適用し、グリット分割によるk腕設定のトンプソンサンプリングより、ガウス過程を仮定した連続腕設定の提案方式の方が少ないサンプル数(計測数)で正しく判定できることを確認した。
    小ノイズ敵対的バンディットの研究においては、以前の成果である「ノイズフリーバンディット 問題」を小さなノイズを許す問題に拡張し、それに有効なアルゴリズムを開発することを目指している。ノイズフ リー条件(1回も誤らない腕が存在するという条件)を「誤る回数が高々k回の腕が存在する」という条件に緩めた問題定式化で研究を進めている。昨年度に引き続き、その問題設定 の下、腕の数が3本以上でk>2の条件でも動作するもっと一般的なアルゴリズムの開発を進めた。
    精度効率保証大規模探索の研究においては、昨年度に引き続き属性選択アルゴリズム[Aurelien, Nakamura, Tabata 2019]のアルゴリズムの探索木の拡張法の検討を行った。
    Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (B), Hokkaido University, 19H04161
  • Development of Data-driven Mathematical Analysis for Single Cells and Singularity Cells
    Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)
    29 Jun. 2018 - 31 Mar. 2023
    小松崎 民樹, 中村 篤祥, 小野 峻佑
    ある細胞と別の細胞の主従関係を推定する場合,それら2つの細胞の軌跡データなどを用いて評価されているが,2つの細胞の振る舞いを決める因子が,その2つの細胞以外にも,第3の細胞が介在する状況なども考えられる。主従関係や因果関係における“原因”と“結果”を解析するためには,単純な対の組み合わせで表現できない多体の相互作用から成り立っている。そのため,多体のあいだの因果関係を推定することは要素間の組み合わせの数が膨大になる。2対の軌跡データを変数とする移動エントロピーを細分化した新しい相乗情報量に着眼し,改良Vicsekモデルに基づいてこの問題を考察し、移動エントロピーに含まれる相乗情報量の振る舞いを解析することで,要素間の多体の相互作用が推定できることを見いだした(Sci Adv 2022)。
    昨年度開発した、細胞毎の発火状態時系列からの伝搬時差推定をギャップを挿入する文字列のアライメントで行い伝搬グラフを構築する手法を発展させ、DTW(Dynamic Time Warping)ベースの実数値列アライメントを用いて伝搬グラフを構築する方法を開発した。位置情報を用いないで伝搬関係を推定できるため、細胞間の信号の伝搬のみでなく、株価の他の株価への影響の伝搬関係など曖昧な伝搬関係の抽出にも使えることを株価データを用いて確認した。
    シンギュラリティ成分検出の前処理としてイメージングデータからノイズや外乱を取り除くための正則化手法について研究を進めた。具体的には、イメージングデータに内在するスパース性を引き出す変換を構成し、スパース性を評価するノルムと組み合わせて正則化関数を設計した。加えて、この正則化を含む最適化問題としてノイズ・外乱除去問題を定式化し、これを解くアルゴリズムを開発した。最後に、実際のハイパースペクトルイメージングデータを用いた大規模な比較実験によって有効性を実証した。
    Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area), Hokkaido University, 18H05413
  • eXtFS:Feature Selection and Exploration in extremly large multi-label classification problems
    Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)
    01 Apr. 2015 - 31 Mar. 2018
    Kudo Mineichi
    We have dealt with "multi-label classification" problems where an object is assigned multiple labels. This study aimed at raising the classification performance and speeding up without degradation of performance. Our achievement is three of the following. First, we have pointed out the importance of the correlation between labels and showed several ways using it. Second, to keep a realistic processing time, we showed that the problem division of samples on the basis of their features or labels in some experimental results. Last, we pointed out the necessity of a special treatment on labels that appear rarely or have been forgotten to assign.
    Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (B), Hokkaido University, 15H02719
  • Analysis of Repetition Structure in Huge Sequences
    Grants-in-Aid for Scientific Research
    2013 - 2015
    Nakamura Atsuyoshi, KUDO Mineichi, TAKIGAWA Ichigaku, MAMITSUKA Hiroshi, KIDA Takuya, OKUBO Yoshiaki
    We developed an algorithm for enumerating frequent approximate string patterns, and proposed a method of extracting occurrence regions of the enumerated patterns as a method of extracting interspersed repetitive elements in a huge sequence like a DNA sequence. Patterns of proposed methods have occurrences of clear boundaries, so there is little chance to count essentially the same region more than once. Furthermore, our enumeration algorithm runs very fast and with small memory. According to our empirical results using human chromosome 21, a half of the known Alu regions, which are famous interspersed repetitive elements, is extracted as occurrence regions of 100 representative patterns that were selected from enumerated frequent approximate patterns.
    Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (B), Hokkaido University, Principal investigator, Competitive research funding, 25280079
  • A Study on Data-Dependent Complexities
    Grants-in-Aid for Scientific Research(基盤研究(C))
    2009 - 2011
    Atsuyoshi NAKAMURA, Mineichi KUDO, Jun TOYAMA
    As a data-dependent complexity measure, we proposed data-dependent VC dimension of Sperner family, and in hyper-rectangle subclass problem, we developed a fast algorithm for datasets with small such VC dimension. We also empirically showed efficiency of the algorithm using real data. Furthermore, as a complexity measure of contiguous repetition structure of a string, we proposed the size of a minimum repetition representation string (MRRS) for a given string. We developed a fast algorithm for constructing an MRRS and analyzed the repetition structure of DNA sequences using the algorithm.
    Ministry of Education, Culture, Sports, Science and Technology, 基盤研究(C), 北海道大学, Principal investigator, Competitive research funding, 21500128
  • Feature Extractions and Recognition Methods for Concept Recognition and its Applications.
    Grants-in-Aid for Scientific Research(基盤研究(C))
    2006 - 2007
    Jun TOYAMA, 工藤 峰一, 中村 篤祥
    1) Classification of Features: The knowledge that is necessary for judgments are not numerical score like a real number but rough categories. A new concept "granularity" based on the rough categories was introduced. We interpreted discrete data as a grouping problem. Therefore an algorithm for discovering semi-optical answer for a grouping problem was proposed.2) Discovering Prototype: The features that is necessary for judgments are not all knowledge and experiments but some abstract data. From this point of view, an algorithm that discover typical prototype in enormous data was proposed. ...
    Ministry of Education, Culture, Sports, Science and Technology, 基盤研究(C), 北海道大学, Coinvestigator not use grants, Competitive research funding, 18500124
  • Development of Classification Systems for Large-Scale Pattern Recognition Problems and Its Application
    Grants-in-Aid for Scientific Research(基盤研究(B))
    2002 - 2005
    Mineichi KUDO, 林 裕樹, 天元 宏, 外山 淳, 今井 英幸, 村井 哲也, 田中 章, 中村 篤祥, 天元 宏, 林 裕樹
    In this study, we classified the scaling problems of pattern recognition tasks into three of the following. The results are shown below, respectively.1.Scaling problem as for data number : We have changed the problem setting from the problem to achieve as high (predictive) performance as possible to the problem to attain as least computational time as possible keeping no large damage in performance. Especially, 1)a fast k-nearest neighbor was invented, and 2)a one-pass algorithm for prototype acquisition was discussed, though they are still under progress.2.Scaling problem as for dimensiona...
    Ministry of Education, Culture, Sports, Science and Technology, 基盤研究(B), 北海道大学, Coinvestigator not use grants, Competitive research funding, 14380151

Industrial Property Rights

Educational Organization