Mark P Styczynski (email@example.com)
Kyle L Jensen (firstname.lastname@example.org)
Isidore Rigoutsos (email@example.com)
Gregory N Stephanopoulos (firstname.lastname@example.org)
Department of Chemical Engineering, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
Bioinformatics and Pattern Discovery Group, IBM Thomas J. Watson Research Center, PO Box 218, Yorktown Heights, NY 10598, USA
The (l,d)-motif challenge problem, as introduced by Pevzner and Sze , is a mathematical abstraction of the DNA functional site discovery task. Here we expand the (l,d)-motif problem to more accurately model this task and present a novel algorithm to solve this extended problem. This algorithm is guaranteed to find all (l,d)-motifs in a set of input sequences with unbounded support and length. We demonstrate the performance of the algorithm on publicly available datasets and show that the algorithm deterministically enumerates the optimal motifs.