## Optimally Separating Sequences

Gene Myers (Gene.Myers@celera.com)

Informatics Research, Celera Genomics,
45 W. Gude Dr., Rockville, MD 20850, U.S.A.

### Abstract

We consider the problem of separating two distinct classes of *k* similar
sequences of length *n* over an alphabet of size *s* that have been
optimally multi-aligned. An objective function based on minimizing
the consensus score of the separated halves is introduced and we present
an *O*(*k*^{3}*n*) heuristic algorithm and two optimal branch-and-bound algorithms
for the problem. The branch-and-bound algorithms involve progressively
more powerful lower bound functions for pruning the *O*(2^{k}) search tree.
The simpler lower bound takes *O*(*n*) time to evaluate given
*O*(*sn*) global
data structures and the stronger bound takes *O*((*k*+*s*)*n*) time by virtue
of an efficient solution to the problem of finding the second-maximum
envelope of a set of piece-wise affine curves. In a series of empirical
trials we establish the degree to which classes can be separated using
our metric and the effective pruning efficiency of the two branch-and-bound
algorithms.

*Japanese Society for Bioinformatics*