Detection of Inter-Spread Repeat Sequence in Genomic DNA Sequence

Hiroo Murakami[1] (hiroo@ims.u-tokyo.ac.jp)
Nobuyoshi Sugaya[1] (sugaya@ims.u-tokyo.ac.jp)
Makihiko Sato[1],[2] (makihiko@ims.u-tokyo.ac.jp)
Akira Imaizumi[1],[3] (akima@ims.u-tokyo.ac.jp)
Sachiyo Aburatani[1] (sachiyo@ims.u-tokyo.ac.jp)
Katsuhisa Horimoto[1] (khorimot@ims.u-tokyo.ac.jp)

[1]Laboratory of Biostatistics, Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
[2]Computer Science and Engineering Centre, Fujitsu Ltd., 1-9-3 Nakase, Mihama-ku, Chiba 261-8588, Japan
[3]Advanced Technology Department, Fermentation and Biotechnology Laboratories, AJINOMOTO CO., INC., 1-1 Suzuki-cho, Kawasaki-ku, Kawasaki-shi 210-8681, Japan


Abstract

Various types of periodic patterns in nucleotide sequences are known to be very abundant in a genomic DNA sequence, and to play important biological roles such as gene expression, genome structural stabilization, and recombination. We present a new method, named “STEPSTONE ”, to find a specific periodic pattern of repeat sequence, inter-spread repeat, in which the tandem repeats of the conserved and the not-conserved regions appear periodically. In our method, at first, the data on periods of short repeat sequences found in a target sequence are stored as a hash data, and then are selected by application of an auto-correlation test in time series analysis. Among the statistically selected sequences, the inter-spread repeats are obtained by usual alignment procedures through two steps. To test the performance of our method, we examined the inter-spread repeats in Mycobacterium tuberculosis and Zamia paucijuga genomic sequences. As a result, our method exactly detected the repeats in the two sequences, being useful for identifying systematically the inter-spread repeats in DNA sequence.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics