CBI/日本オミクス医療学会/JSBi合同シンポジウム
日時・会場 | 9月5日(金)13:50〜14:40 第1会場 |
加藤 幸一郎 先生
(九州大学大学院工学研究院応用化学部門)
演題題目:生体高分子の量子化学計算を可能にするFMO法と機械学習の融合
Integration of the FMO Method and Machine Learning for Quantum Chemical Calculations of Biomacromolecules
The Fragment Molecular Orbital (FMO) method is one of the few quantum chemical calculation methods capable of treating entire biomacromolecules such as proteins. The data generated by FMO calculations are likewise unique, currently serving as the only large-scale quantum chemical dataset available for protein systems. The development of various machine learning models using such data—difficult to obtain via conventional software—is expected to have a significant impact on AI-driven drug discovery, an area of rapidly growing interest in recent years.
FMO datasets include detailed quantum mechanical information such as inter-fragment interaction energies, which are otherwise not feasible to compute for large biomolecules. By incorporating these features into machine learning models, we aim to go beyond conventional structural descriptors and explore electronic-level insights into biomolecular interactions.
In this talk, I present the current status of our group’s work on developing machine learning models based on FMO data, including atomic charge prediction models, interaction energy prediction models, and FMO-based machine learning force fields.
水野 忠快 先生
(東京大学大学院薬学系研究科、統計数理研究所)
演題題目:潜在的言語構造に基づく生命科学データのパターン認識
Pattern Recognition of Life Science Data Based on Latent Linguistic Structure
Language models such as ChatGPT have gained much attention in recent years, though their technical foundations in natural language processing extend much further back. Broadly, language models can be categorized into two groups: sequence-based models (e.g., Hidden Markov Models and Transformers) and Bag-of-Words models (e.g., Latent Dirichlet Allocation, LDA), the latter treating data as unordered collections of words. Despite these methodological differences, both types of models share a common goal: learning underlying syntactic and semantic structures from tokenized data. This presentation provides the application of such linguistic modeling to life science data, highlighting their potential for pattern recognition. In the first example, we apply LDA to tissue transcriptome data, extracting latent structural information in the form of cell-type proportions. The second case involves a neural machine translation model applied to chemical structures, known as a chemical language model. Through these, we illustrate that language models inherently have the ability to identify latent linguistic structures, which can be effectively leveraged for analyzing diverse life science datasets.