New Kernel Methods for Phenotype Prediction from Genotype Data
Ritsuko Onuki [1](onuki@hgc.jp)
Tetsuo Shibura [2](shibuya@hgc.jp)
Minoru Kanehisa [1][2](kanehisa@kuicr.kyoto-u.ac.jp)
[1] Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokosho, Uji,
Kyoto 611-0011, Japan
[2] Human Genome Center, Institute of Medical Science, University of
Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
Abstract
Phenotype prediction from genotype data is one of the most important issues in computational
genetics. In this work, we propose a new kernel (i.e., an SVM: Support Vector Machine) method for
phenotype prediction from genotype data. In our method, we first infer multiple suboptimal
haplotype candidates from each genotype by using the HMM (Hidden Markov Model), and the
kernel matrix is computed based on the predicted haplotype candidates and their emission
probabilities from the HMM. We validated the performance of our method through experiments on
several datasets: One is an artificially constructed dataset via a program GeneArtisan, others are a
real dataset of the NAT2 gene from the international HapMap project, and a real dataset of
genotypes of diseased individuals. The experiments show that our method is superior to ordinary
naive kernel methods (i.e., not based on haplotype prediction), especially in cases of strong LD
(linkage disequilibrium).
Japanese Society for Bioinformatics