Developing NLP Tools for Genome Informatics: An Information Extraction Perspective

Teruyoshi Hishiki[1] (hishiki@is.s.u-tokyo.ac.jp)
Nigel Collier[1] (nigel@is.s.u-tokyo.ac.jp)
Chikashi Nobata[1] (nova@is.s.u-tokyo.ac.jp)
Tomoko Okazaki-Ohta[2] (okap@ims.u-tokyo.ac.jp)
Norihiro Ogata[1] (ogata@is.s.u-tokyo.ac.jp)
Takeshi Sekimizu[1] (sekimizu@is.s.u-tokyo.ac.jp)
Roland Steiner[1] (steiner@is.s.u-tokyo.ac.jp)
Hyun S. Park[1][4] (hsp20@is.s.u-tokyo.ac.jp)
Jun'ichi Tsujii[1][3] (tsujii@is.s.u-tokyo.ac.jp)

[1] Department of Information Science, University of Tokyo
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
[2] Human Genome Center, Institute of Medical Science, University of Tokyo
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
[3] Department of Language Engineering, UMIST
PO Box 88, Manchester M60 1QD, United Kingdom
[4] Department of Computer Science, Sungshin Women's University
249-1 Dongsun-dong, Sungbuk-gu, Seoul, Korea


Abstract

Huge quantities of on-line medical texts such as Medline are available, and we would hope to extract useful information from these resources, as much as possible, hopefully in an automatic way, with the aid of computer technologies. Especially, recent advances in Natural Language Processing (NLP) techniques raise new challenges and opportunities for tackling genome-related on-line text; combining NLP techniques with genome informatics extends beyond the traditional realms of either technology to a variety of emerging applications. In this paper, we explain some of our current efforts for developing various NLP-based tools for tackling genome-related on-line documents for information extraction task.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics