Evaluation of Sequence Alignments of Distantly Related Sequence Pairs with Respect to Structural Similarity

Aysam Gürler (guerler@chemie.fu-berlin.de)
Ernst-Walter Knapp (knapp@chemie.fu-berlin.de)

Freie Universität Berlin, Institut fur Chemie und Biochemie, Takustr. 6, 14195, Berlin-Dahlem, Germany


We evaluate the performance of common substitution matrices with respect to structural similarities. For this purpose, we apply an all-versus-all pairwise sequence alignment on the ASTRAL40 [7] dataset, consisting of 7290 entries with a pairwise sequence identity of at most 40%. Afterwards, we compare the 100 highest scoring sequence alignments to their corresponding structural alignments, which we obtain from our structure alignment database. Our database consists of about 18.6 million pairwise entries. We calculated these alignments by applying the current version of GANGSTA [1], our non-sequential structural alignment tool, on about 26 million pairs. The results illustrate the difficulty of homology based protein structure prediction in cases of low sequence similarity. Further, the large fraction of structurally similar proteins in the ASTRAL40 dataset is quantitatively measured. Thereby, this investigation yields a new perspective on the topic of sequence and structure relation. Hence, our finding is a large-scale quality measure for any sequence based method, which aims to detect structural similarities.

