Mining Super-Secondary Structure Motifs from 3D Protein Structures: A Sequence Order Independent Approach

Zeyar Aung[1] (azeyar@i2r.a-star.edu.sg)
Jinyan Li[2] (jyli@ntu.edu.sg)

[1]Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613
[2]School of Computer Engineering, Nanyang Technological University, Nanyang Av- enue, Singapore 639798


Abstract

Super-Secondary structure elements (super-SSEs) are the structurally conserved ensembles of secondary structure elements (SSEs) within a protein. They are of great biological interest. In this work, we present a method to formally represent and mine the sequence order independent super-SSE motifs that occur repeatedly in large data sets of protein structures. We represent a protein structure as a graph, and mine the common cliques from a set of protein graphs in order to find the motifs. We mine two categories of super-SSE motifs: the generic motifs that occur frequently across the entire database of protein structures, and the fold-preferential motifs that are concentrated in particular protein fold types. From the experimental data set of 600 proteins belonging to 15 large SCOP Folds, we have discovered 21 generic motifs and 75 fold-preferential motifs that are both statistically significant and biologically relevant. A number of the discovered motifs (both generic and fold-preferential) resemble the well-known super-SSE motifs in the literature such as beta hairpins, Greek keys, zinc fingers, etc. Some of the discovered motifs are of novel shapes that have not been documented yet. Our method is time-efficient where it can discover all the motifs across the 600 proteins in less than 14 minutes on a stand-alone PC. The discovered motifs are reported in our project webpage:
http://www1.i2r.a-star.edu.sg/~azeyar/SuperSSE/

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics