A Gram Distribution Kernel Applied to Glycan Classification and Motif Extraction

Tetsuji Kuboyama[1] (kuboyama@ccr.u-tokyo.ac.jp)
Kouichi Hirata[2] (hirata@ai.kyutech.ac.jp)
Kiyoko F. Aoki-Kinoshita[3] (kkiyoko@t.soka.ac.jp)
Hisashi Kashima[4] (hkashima@jp.ibm.com)
Hiroshi Yasuda[5] (yasuda@mpeg.rcast.u-tokyo.ac.jp)

[1]Center for Collaborative Research, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505, Japan
[2]Department of Artificial Intelligence, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
[3]Faculty of Engineering, Soka University, 1-236 Tangi-cho, Hachioji, Tokyo, 192-8577, Japan
[4]Tokyo Research Laboratory, IBM Research, 1623-14 Shimotsuruma, Yamato, Kanagawa, 242-8502, Japan
[5]Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8902, Japan


We propose a novel general-purpose tree kernel and apply it to glycan structure analysis. Our kernel measures the similarity between two labeled trees by counting the number of common q-length substrings (tree q-grams) embedded in the trees for all possible lengths q. We apply our tree kernel using a support vector machine (SVM) to classification and specific feature extraction from glycan structure data. Our results show that our kernel outperforms the layered trimer kernel of Hizukuri et al. [9] which is well tailored to glycan data while we do not adjust our kernel to glycan-specific properties. In addition, we extract specific features from various types of glycan data using our trained SVM. The results show that our kernel is more flexible and capable of finding a wider variety of substructures from glycan data.

[ Full-text PDF | Table of Contents ]

Japanese Society for Bioinformatics