Marginalized Kernels for RNA Sequence Data Analysis

Taishin Kin (taishin@cbrc.jp)
Koji Tsuda (tsuda@cbrc.jp)
Kiyoshi Asai (asai@cbrc.jp)

Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, Japan


Abstract

We present novel kernels that measure similarity of two RNA sequences, taking account of their secondary structures. Two types of kernels are presented. One is for RNA sequences with known secondary structures, the other for those without known secondary structures. The latter employs stochastic context-free grammar (SCFG) for estimating the secondary structure. We call the latter the {\it marginalized count kernel} (MCK). We show computational experiments for MCK using 74 sets of human tRNA sequence data: (i) kernel principal component analysis (PCA) for visualizing tRNA similarities, (ii) supervised classification with support vector machines (SVMs). Both types of experiment show promising results for MCKs.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics