A Space-Efficient Algorithm for the Constrained Pairwise Sequence Alignment Problem

Dan He Abdullah N. Arslan (aarslan@cs.uvm.edu)

Department of Computer Science, University of Vermont, Burlington, VT 05405, USA


Abstract

The constrained pairwise sequence alignment (CPSA) problem aims to align two given sequences by aligning their similar subsequences in the same region under the guidance of a given pattern (constraint). Let the lengths of the sequences be m, and n where n ≤ m, and let r ≤ n be the length of the given pattern. The optimum constrained pairwise alignment score can be computed using O(rn) space by a naive dynamic programming solution. If an optimal alignment path is desired then the space requirement of the naive dynamic programming algorithm is O(rnm). There is a divide-and-conquer algorithm that reduces the memory requirement of finding an optimal alignment for the CPSA problem to O(rn). In this paper, we present a space-efficient CPSA algorithm that returns an optimal alignment. Our analysis on real protein sequences suggests that our algorithm requires only O(n) space in practice. This algorithm is not only space efficient but also very fast. A generalization of the CPSA problem for multiple sequences is called the constrained multiple sequence alignment (CMSA) problem. Our CPSA algorithm also improves the space requirement of progressive CMSA algorithms that use solutions of CPSA problems.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics