Recognition of Polyadenylation Sites from Arabidopsis Genomic Sequences

Chuan Hock Koh (kohchuan@comp.nus.edu.sg)
Limsoon Wong (wongls@comp.nus.ed.sg)

School of Computing, National University of Singapore COM1, Law Link, Singapore 117590


Abstract

A polyadenine tail is found at the 3' end of nearly every fully processed eukaryotic mRNA and has been suggested to influence virtually all aspects of mRNA metabolism. The ability to predict polyadenylation site will allow us to define gene boundaries, predict number of genes present in a particular gene locus and perhaps better understand mRNA metabolism. To this end, we built an arabidopsis polyadenylation prediction model. The prediction model uses a machine learning method which consists of four sequential steps: feature generation, feature selection, feature integration and cascade classifier. We have tested our model on public datasets and achieved more than 97.5% sensitivity and specificity. We have also directly compared with another arabidopsis prediction model, PASS 1.0, and have achieved better results.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics