A Markov Chain Model for Haplotype Assembly from SNP Fragments

Rui-Sheng Wang[1],[2] (wangrsh@amss.ac.cn)
Ling-Yun Wu[3] (wlyun@amt.ac.cn)
Xiang-Sun Zhang[3]* (zxs@amt.ac.cn)
Luonan Chen[1],[4],[5]* (chen@eic.osaka-sandai.ac.jp)

[1]Faculty of Engineering, Osaka Sangyo University, Osaka 574-8530, Japan
[2]School of Information, Renmin University of China, Beijing 100872, China
[3]AMSS, Chinese Academy of Sciences, Beijing 100080, China
[4]Institute of Systems Biology, Shanghai University, Shanghai 200444, China
[5]ERATO Aihara Complexity Modelling Project, JST, and Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan
*Corresponding author


Abstract

Single nucleotide polymorphism (SNP) is the most frequent form of human genetic variations and of importance for medical diagnosis and tracking disease genes. A haplotype is a sequence of SNPs from a single copy of a chromosome, and haplotype assembly from SNP fragments is based on DNA fragments with SNPs and the methodology of shotgun sequence assembly. In contrast to conventional combinatorial models which aim at different error types in SNP fragments, in this paper we propose a new statistical model — a Markov chain model for haplotype assembly based on information of SNP fragments. The main advantage of this model over combinatorial ones is that it requires no prior information on error types in data. In addition, unlike exact algorithms with the exponential-time computation complexity for most combinatorial models, the proposed model can be solved in polynomial time and thus is efficient for large-scale problems. Experiment results on several data sets illustrate the effectiveness of the new method.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics