A New Method for Database Searching and Clustering

Antje Krause (a.krause@dkfz-heidelberg.de)
Martin Vingron (m.vingron@dkfz-heidelberg.de)

Deutsches Krebsforschungszentrum (DKFZ), Abt. Theoretische Bioinformatik
Im Neuenheimer Feld~280, D-69120 Heidelberg, Germany


An iterative database searching method is introduced and applied to the design of a database clustering procedure. The search method virtually never produces false positive hits while determining meaningfully large sets of sequences related to the query. A novel set-theoretic database clustering algorithm exploits this feature and avoids a traditional, distance-based clustering step. This makes it fast and applicable to data-sets of the size of, e.g., the Swiss-Prot database. In practice we achieve unambiguous assignment of 80% of Swiss-Prot sequences to non-overlapping sequence clusters in an entirely automatic fashion.

[ Full-text PDF | Table of Contents ]

Japanese Society for Bioinformatics