Optimally Separating Sequences

Gene Myers (Gene.Myers@celera.com)

Informatics Research, Celera Genomics, 45 W. Gude Dr., Rockville, MD 20850, U.S.A.


We consider the problem of separating two distinct classes of k similar sequences of length n over an alphabet of size s that have been optimally multi-aligned. An objective function based on minimizing the consensus score of the separated halves is introduced and we present an O(k3n) heuristic algorithm and two optimal branch-and-bound algorithms for the problem. The branch-and-bound algorithms involve progressively more powerful lower bound functions for pruning the O(2k) search tree. The simpler lower bound takes O(n) time to evaluate given O(sn) global data structures and the stronger bound takes O((k+s)n) time by virtue of an efficient solution to the problem of finding the second-maximum envelope of a set of piece-wise affine curves. In a series of empirical trials we establish the degree to which classes can be separated using our metric and the effective pruning efficiency of the two branch-and-bound algorithms.

[ Full-text PDF | Table of Contents ]

Japanese Society for Bioinformatics