Making High-level Queries on Diverse Genome Data: A Structured Genome Document Database System Based on GXML and GQL

Aaron Stokes[1] (stokes@ics.es.osaka-u.ac.jp)
Hideo Matsuda[2] (matsuda@ics.es.osaka-u.ac.jp)
Akihiro Hashimoto[2] (hasimoto@ics.es.osaka-u.ac.jp)

[1] CREST, JST (Japan Science and Technology)
[2] Department of Informatics and Mathematical Science, Graduate School of Engineering Science, Osaka University
1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan


Abstract

Complete DNA sequences (genomes) and associated data are being made available worldwide at an astonishing rate. Through computer analysis of such data, molecular biologists hope to gain an overall understanding of the genome, such as by predicting large-scale gene networks. However, this is difficult because diverse genome data are scattered across many highly heterogeneous databases, and because existing database systems lack the facilities to expose and analyze functional relationships among the data. To address these problems, we propose a new type of genome database system. Since a genome can be thought of intuitively as a kind of 'document', our system uses a structured document language based on XML to effectively represent genomes and associated data. The information-rich structures of the genome documents help cope with data diversity and heterogeneity. A powerful query language is introduced that exposes important biological relationships among the genome data. We have obtained favorable results from several experiments, demonstrating the usefulness of our method in building a top-down view of genome functionality.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics