Efficient Tree-Matching Methods for Accurate Carbohydrate Database Queries

Kiyoko F. Aoki (kiyoko@kuicr.kyoto-u.ac.jp)
Atsuko Yamaguchi (atsuko@kuicr.kyoto-u.ac.jp)
Yasushi Okuno (okuno@kuicr.kyoto-u.ac.jp)
Tatsuya Akutsu (takutsu@kuicr.kyoto-u.ac.jp)
Nobuhisa Ueda (ueda@kuicr.kyoto-u.ac.jp)
Minoru Kanehisa (kanehisa@kuicr.kyoto-u.ac.jp)
Hiroshi Mamitsuka (mami@kuicr.kyoto-u.ac.jp)

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan


One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and analyzing similarity), the more complicated tree structure of glycans does not allow a direct implementation of such a database for glycans, and further, does not allow for the direct application of sequence alignment algorithms for performing searches or analyzing similarity. Therefore, we have utilized a polynomial-time dynamic programming algorithm for solving the maximum common subtree of two trees to implement an accurate and efficient tool for finding and aligning maximally matching glycan trees. The KEGG Glycan database for glycan structures released recently incorporates our tree-structure alignment algorithm with various parameters to adapt to the needs of a variety of users. Because we use similarity scores as opposed to a distance metric, our methods are more readily used to display trees of higher similarity. We present the two methods developed for this purpose and illustrate its validity.

[ Full-text PDF | Table of Contents ]

Japanese Society for Bioinformatics