NGUYEN DAI HAI (グエン ダイ ハイ)
情報科学研究院 情報理工学部門 知識ソフトウェア科学分野 | 准教授 |
Last Updated :2025/06/07
■研究者基本情報
Researchmap個人ページ
ホームページURL
研究者番号
- 50968401
J-Global ID
研究分野
■経歴
経歴
学歴
■研究活動情報
論文
- Wasserstein Gradient Flow over Variational Parameter Space for Variational Inference
Dai Hai Nguyen, Tetsuya Sakurai, Hiroshi Mamitsuka
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 258, 1756, 1764, 2025年04月, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(国際会議プロシーディングス), 42086988 - Moreau-Yoshida variational transport: a general framework for solving regularized distributional optimization problems
Dai Hai Nguyen, Tetsuya Sakurai
Machine Learning, Springer Science and Business Media LLC, 2024年07月10日, [査読有り], [筆頭著者, 責任著者]
研究論文(学術雑誌) - Differentiable optimization layers enhance GNN-based mitosis detection
Zhang, Haishan, Nguyen, Dai Hai, Tsuda, Koji
SCIENTIFIC REPORTS, 13, 1, NATURE PORTFOLIO, 2023年09月, [査読有り]
英語, 研究論文(学術雑誌), Automatic mitosis detection from video is an essential step in analyzing proliferative behaviour of cells. In existing studies, a
conventional object detector such as Unet is combined with a link prediction algorithm to find correspondences between parent
and daughter cells. However, they do not take into account the biological constraint that a cell in a frame can correspond to up
to two cells in the next frame. Our model called GNN-DOL enables mitosis detection by complementing a graph neural network
(GNN) with a differentiable optimization layer (DOL) that implements the constraint. In time-lapse microscopy sequences
cultured under four different conditions, we observed that the layer substantially improved detection performance in comparison
with GNN-based link prediction. Our results illustrate the importance of incorporating biological knowledge explicitly into deep
learning models - Mirror variational transport: a particle-based algorithm for distributional optimization on constrained domains
Dai Hai Nguyen, Tetsuya Sakurai
Machine Learning, Springer Nature 2023, 2023年08月, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(学術雑誌), We consider the optimization problem of minimizing an objective functional, which admits a vari- ational form and is defined over probability distributions on the constrained domain, which poses challenges to both theoretical analysis and algorithmic design. Inspired by the mirror descent algorithm for constrained optimization, we propose an iterative particle-based algorithm, named Mirrored Variational Transport (mirrorVT), extended from the Variational Transport framework [7] for dealing with the constrained domain. In particular, for each iteration, mirrorVT maps particles to an unconstrained dual domain induced by a mirror map and then approximately perform Wasserstein gradient descent on the manifold of distributions defined over the dual space by pushing particles. At the end of iteration, particles are mapped back to the original constrained domain. Through simulated experiments, we demonstrate the effectiveness of mirrorVT for minimizing the functionals over probability distributions on the simplex- and Euclidean ball-constrained domains. We also analyze its theoretical properties and characterize its convergence to the global minimum of the objective functional. - On a linear fused Gromov-Wasserstein distance for graph structured data
Dai Hai Nguyen, Koji Tsuda
Pattern Recognition, 138, 109351, 109351, Elsevier BV, 2023年06月, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(学術雑誌) - Generating reaction trees with cascaded variational autoencoders
Dai Hai Nguyen, Koji Tsuda
The Journal of Chemical Physics, 156, 4, 2022年01月28日, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(学術雑誌) - Learning subtree pattern importance for Weisfeiler-Lehman based graph kernels
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
Machine Learning, 110, 7, 1585, 1607, 2021年06月, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(学術雑誌), Funding Information: D. H. N. has been supported in part by Otsuka Toshimi scholarship and JSPS Research Fellowship for Young Scientists (DC2) with KAKENHI [grant number 19J14714]. C. H. N. has been supported in part by MEXT KAKENHI [grant number 18K11434]. H. M. has been supported in part by JST ACCEL [grant number JPMJAC1503], MEXT KAKENHI [grant numbers 16H02868, 19H04169], FiDiPro by Tekes (currently Business Finland) and AIPSE program by Academy of Finland. Publisher Copyright: © 2021, The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature. ; Graph is an usual representation of relational data, which are ubiquitous in many domains such as molecules, biological and social networks. A popular approach to learning with graph structured data is to make use of graph kernels, which measure the similarity between graphs and are plugged into a kernel machine such as a support vector machine. Weisfeiler-Lehman (WL) based graph kernels, which employ WL labeling scheme to extract subtree patterns and perform node embedding, are demonstrated to achieve great performance while being efficiently computable. However, one of the main drawbacks of a general kernel is the decoupling of kernel construction and learning process. For molecular graphs, usual kernels such as WL subtree, based on substructures of the molecules, consider all available substructures having the same importance, which might not be suitable in practice. In this paper, we propose a method to learn the weights of subtree patterns in the framework of WWL kernels, the state of the art method for graph classification task (Togninalli et al., in: Advances in Neural Information Processing Systems, pp. 6439–6449, 2019). To overcome the computational issue on large scale data sets, we present an efficient learning algorithm and also derive a generalization gap bound to show its convergence. Finally, through experiments on synthetic and real-world data sets, we demonstrate the effectiveness of our proposed method for learning the weights of subtree patterns. ; Peer reviewed - Machine Learning for Metabolic Identification
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
2021年 - ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
Bioinformatics, 35, 14, 164, 172, 2019年07月, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(学術雑誌), Abstract Motivation Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. Results We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency. Availability and implementation The code will be accessed through http://www.bic.kyoto-u.ac.jp/pathway/tools/ADAPTIVE after the acceptance of this article. - Stroke order normalization for improving recognition of online handwritten mathematical expressions
Le, Anh Duc, Nguyen, Hai Dai, Indurkhya, Bipin, Nakagawa, Masaki
International Journal on Document Analysis and Recognition (IJDAR), 22, 1, 29, 39, 2019年, [査読有り]
英語, 研究論文(学術雑誌) - SIMPLE: Sparse Interaction Model over Peaks of moLEcules for fast, interpretable metabolite identification from tandem mass spectra
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
Bioinformatics, 34, 13, 323, 332, 2018年07月, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(学術雑誌) - Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
Briefings in bioinformatics, 20, 6, 2028, 2043, 2018年, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(学術雑誌), Abstract Motivation: Metabolomics involves studies of a great number of metabolites, which are small molecules present in biological systems. They play a lot of important functions such as energy transport, signaling, building block of cells and inhibition/catalysis. Understanding biochemical characteristics of the metabolites is an essential and significant part of metabolomics to enlarge the knowledge of biological systems. It is also the key to the development of many applications and areas such as biotechnology, biomedicine or pharmaceuticals. However, the identification of the metabolites remains a challenging task in metabolomics with a huge number of potentially interesting but unknown metabolites. The standard method for identifying metabolites is based on the mass spectrometry (MS) preceded by a separation technique. Over many decades, many techniques with different approaches have been proposed for MS-based metabolite identification task, which can be divided into the following four groups: mass spectra database, in silico fragmentation, fragmentation tree and machine learning. In this review paper, we thoroughly survey currently available tools for metabolite identification with the focus on in silico fragmentation, and machine learning-based approaches. We also give an intensive discussion on advanced machine learning methods, which can lead to further improvement on this task. - Recognition of online handwritten math symbols using deep neural networks
Hai Dai Nguyen, Anh Duc Le, Masaki Nakagawa
EICE TRANSACTIONS on Information and Systems, 99, 12, 3110, 3118, 2016年12月, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(学術雑誌) - Modified xy cut for re-ordering strokes of online handwritten mathematical expressions
Anh Duc Le, Hai Dai Nguyen, Bipin Indurkhya, Masaki Nakagawa
12th IAPR Workshop on Document Analysis Systems (DAS), 233, 238, IEEE Computer Society, 2016年04月, [査読有り]
英語, 研究論文(国際会議プロシーディングス) - Deep neural networks for recognizing online handwritten mathematical symbols
Hai Dai Nguyen, Anh Duc Le, Masaki Nakagawa
IAPR Asian Conference on Pattern Recognition (ACPR), 121, 125, IEEE, 2015年11月, [査読有り], [筆頭著者, 責任著者]
英語, 研究論文(国際会議プロシーディングス)
書籍等出版物
- A Particle-Based Algorithm for Distributional Optimization on Constrained Domains via Variational Transport and Mirror Descent
Dai Hai Nguyen, Tetsuya Sakurai
arXiv preprint arXiv:2208.00587, 2022年08月, [共著] - A generative model for molecule generation based on chemical reaction trees
Dai Hai Nguyen, Koji Tsuda
arXiv preprint arXiv:2106.03394, 2021年06月 - Creative Complex Systems
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi, Machine Learning for Metabolic Identification
Springer, Singapore, 2021年, 英語, 学術書, Metabolic identification is an essential part of metabolomics to understand biochemical characteristics of metabolites, which are small molecules that play important functions in biological systems. However, this field remains challenging with many unknown metabolites in existence. Mass spectrometry (MS) is a common technology that deals with such small molecules. Over recent decades, many methods have been proposed for MS-based metabolite identification, but machine learning has been a key process in recent progress in metabolite identification. This chapter provides a survey on computational methods for metabolic identification with the focus on machine learning, with a discussion on potential improvements for this task. - Semi-supervised learning of hierarchical representations of molecules using neural message passing
Dai Hai Nguyen, Kenta Oono, Shinichi Maeda
arXiv preprint arXiv:1711.10168, 2017年11月
講演・口頭発表等
- Mirror Variational Transport: A Particle-based Algorithm for Distributional Optimization on Constrained Domain
Nguyen, Dai Hai
ECML PKDD 2023 : European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
2023年09月18日 - 2023年09月22日, Turin, We consider the optimization problem of minimizing an objective functional, which admits a variational form and is defined over probability distributions on a constrained domain, which poses challenges to both theoretical analysis and algorithmic design. We propose Mirror Variational Transport (mirrorVT), which uses a set of samples, or particles, to represent the approximating distribution and deterministically updates the particles to optimize the functional. To deal with the constrained domain, in each iteration, mirrorVT maps the particles to an unconstrained dual domain, induced by a mirror map, and then approximately performs Wasserstein Gradient Descent on the manifold of distributions defined over the dual space to update each particle by a specified direction. At the end of each iteration, particles are mapped back to the original constrained domain. Through experiments on synthetic and real world data sets, we demonstrate the effectiveness of mirrorVT for the distributional optimization on the constrained domain. We also analyze its theoretical properties and characterize its convergence to the global minimum of the objective functional., [招待講演] - Learning Subtree Pattern Importance for Weisfeiler- Lehman based Graph Kernels
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
ECML PKDD 2021 : European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021年03月26日, 英語, 口頭発表(招待・特別)
2021年03月26日 - 2021年04月02日, Bilbao, Graph is an usual representation of relational data, which are ubiquitous in manydomains such as molecules, biological and social networks. A popular approach to learningwith graph structured data is to make use of graph kernels, which measure the similaritybetween graphs and are plugged into a kernel machine such as a support vector machine.Weisfeiler-Lehman (WL) based graph kernels, which employ WL labeling scheme to extract subtree patterns and perform node embedding, are demonstrated to achieve great performance while being efficiently computable. However, one of the main drawbacks of ageneral kernel is the decoupling of kernel construction and learning process. For moleculargraphs, usual kernels such as WL subtree, based on substructures of the molecules, consider all available substructures having the same importance, which might not be suitable inpractice. In this paper, we propose a method to learn the weights of subtree patterns in the framework of WWL kernels, the state of the art method for graph classification task [14]. To overcome the computational issue on large scale data sets, we present an efficient learning algorithm and also derive a generalization gap bound to show its convergence. Finally, through experiments on synthetic and real-world data sets, we demonstrate the effectiveness of our proposed method for learning the weights of subtree patterns., [国際会議] - ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
27th International Conference on Intelligent Systems for Molecular Biology (ISMB/ECCB 2019), 2019年07月21日, 英語, 口頭発表(一般)
2019年07月21日 - 2019年07月25日, Basel, Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction.
We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency., [国際会議] - SIMPLE: Sparse Interaction Model over Peaks of moLEcules for fast, interpretable metabolite identification from tandem mass spectra
Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2018), 2018年07月06日, 英語, 口頭発表(一般)
2018年07月06日 - 2018年07月08日, Chicago, Recent success in metabolite identification from tandem mass spectra has been led by machine learning, which has two stages: mapping mass spectra to molecular fingerprint vectors and then retrieving candidate molecules from the database. In the first stage, i.e. fingerprint prediction, spectrum peaks are features and considering their interactions would be reasonable for more accurate identification of unknown metabolites. Existing approaches of fingerprint prediction are based on only individual peaks in the spectra, without explicitly considering the peak interactions. Also the current cutting-edge method is based on kernels, which are computationally heavy and difficult to interpret.
We propose two learning models that allow to incorporate peak interactions for fingerprint prediction. First, we extend the state-of-the-art kernel learning method by developing kernels for peak interactions to combine with kernels for peaks through multiple kernel learning (MKL). Second, we formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, which is computationally light and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that both models achieved comparative prediction accuracy with the current top-performance kernel method. Furthermore SIMPLE clearly revealed individual peaks and peak interactions which contribute to enhancing the performance of fingerprint prediction., [国際会議] - Semi-supervised learning of hierarchical representations of molecules using neural message passing
Kenta Oono, Dai Hai Nguyen, Shinichi Maeda
Machine Learning for Molecules and Materials in NIPS 2017, 2017年, 英語, ポスター発表 - Deep neural networks for recognizing online handwritten mathematical symbols
Nguyen, Dai Hai
The 3rd IAPR Asian Conference on Pattern Recognition (ACPR2015), 2015年11月03日, 英語, ポスター発表
2015年11月03日 - 2015年11月06日, Kuala Lumpur, [国際会議]
共同研究・競争的資金等の研究課題
- On Optimal Transport-based Statistical Measures for Graph Structured Data and Applications
Grant-in-Aid for Young Scientists
2023年04月 - 2026年03月
Nguyen Dai Hai
日本学術振興会/科学研究費補助金 若手研究, 競争的資金, 23K16939 - 質量分析のための機械学習手法構築
2019年04月 - 2020年09月
Nguyen Dai Hai
日本学術振興会/科学研究費補助金 特別研究員奨励費(DC2), 競争的資金