An Algorithm for Highly Specific Recognition of Protein-coding Regions
An Algorithm for Highly Specific Recognition of Protein-coding Regions
M. S. Gelfand [1] (misha@imb.imb.ac.ru)
T. V. Astakhova [2]
M. A. Roytberg [2] (roytberg@impb.serpukhov.su)
[1] Institute of Protein Research,
Russian Academy of Sciences,
Pushchino, 142292, Russia
[2] Institute of Mathematical Problems of Biology,
Russian Academy of Sciences,
Pushchino, 142292, Russia
Abstract
Since absolutely reliable recognition of protein-coding
regions in eukaryote genomic DNA sequences by computational
methods is unattainable, most existing algorithms try to
keep some balance between underprediction and
overprediction. However, in experimental practice it is
often sufficient to have just a few protein-coding
segments, but predicted with high specificity, that is,
with (almost) no overprediction. Such predictions are then
used for construction of oligonucleotide probes and PCR
primers for analysis of cDNA libraries or total cellular
RNA.
Here we present a combinatorial algorithm solving this
problem. Unlike other prediction schemes, the algorithm
uses only the simplest statistical parameters (codon usage
and positional nucleotide sequences in splicing sites) and
thus can be used for analysis of obscure genomes, when
large learning sets are unavailable. The algorithm's
structure allows one to simply tune it for various experimental
settings.