Hisayuki Horai, (email@example.com)
Kouichi Doi (firstname.lastname@example.org)
Hirofumi Doi, (email@example.com)
Nara Institute of Science and Technology, 8916-5 Takayama-cho,
Ikoma, Nara 630-0192, Japan
Celestar Lexico-Sciences, Inc., MTG D17, 1-3 Nakase, Mihama-ku, Chiba 261-8501, Japan
Our research activity of making the lexicon of relatively short oligopeptides has been one of the first steps to view the world of proteome from the perspective of oligopeptides. We propose a new method for the prediction of protein function, especially GeneOntology terms (GO terms), based on statistical characteristics of oligopeptides as an application of the lexicon. In the lexicon, a known function of a protein is inherited to its oligopeptides, and the correspondence between oligopeptides and the function is calculated in the whole proteins. In our method, unknown functions of proteins are predicted by means of the correspondence automatically. We measured the prediction performance using the 28,520 whole human proteins registered in RefSeq for several GO terms by recall-precision graphs. The GO terms include `membrane', `nucleus', `ATP binding', `hydorolase activity', `GTP binding', `intracellular signaling cascade' and `ubiquitin cycle'. In most cases, it scores 70% recall with 80% precision. The prediction for ATP binding and GTP binding results in quite high performance: it scores 80% recall with 80% precision. Even in the worst case (ubiquitin cycle), it scores 62.6% recall with 80% precision. These results suggest that the proposed method is quite efficient for predicting GO terms.