Automatic Gene Recognition without Using Training Data

Kiyoshi Asai[1] (asai@etl.go.jp)
Yutaka Ueno[1] (ueno@etl.go.jp)
Katunobu Itou[1] (kito@etl.go.jp)
Tetsushi Yada[2] (yada@tokyo.jst.go.jp)

[1] Electrotechnical Laboratories
1-1-4 Umezono, Tsukuba 305, Japan
[2] Japan Science and Technology Corporation
5-3 Yonbancho, Chiyoda-ku, Tokyo 102 Japan


Abstract

In this paper, we propose a new approach for gene recognition, which uses no training data for the recognizer. In this approach, we start from a simple model, which only uses the knowledge of start codons and the stop codons, then the recognition of the DNA sequences by the recognizer and the training of the parameters of the recognizer by the result of the recognition are repeated. We applied this parse and train approach to the complete genome sequence of cyanobacterium, and achieved the almost same recognition rate with the case of using the whole sequence as training data. This results open the possibility to use automatic gene annotation system in the early stage of sequencing projects.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics