Mining Super-Secondary Structure Motifs from 3D Protein Structures: A Sequence Order Independent Approach
Zeyar Aung[1] (azeyar@i2r.a-star.edu.sg)
[1]Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613
Abstract
Super-Secondary structure elements (super-SSEs) are the structurally
conserved ensembles of secondary structure elements (SSEs) within a
protein. They are of great biological interest. In this work, we
present a method to formally represent and mine the sequence order
independent super-SSE motifs that occur repeatedly in large data
sets of protein structures. We represent a protein structure as a
graph, and mine the common cliques from a set of protein graphs in
order to find the motifs. We mine two categories of super-SSE
motifs: the generic motifs that occur frequently across the entire
database of protein structures, and the fold-preferential motifs
that are concentrated in particular protein fold types. From the
experimental data set of 600 proteins belonging to 15 large SCOP
Folds, we have discovered 21 generic motifs and 75
fold-preferential motifs that are both statistically significant and
biologically relevant. A number of the discovered motifs (both
generic and fold-preferential) resemble the well-known super-SSE
motifs in the literature such as beta hairpins, Greek keys, zinc
fingers, etc. Some of the discovered motifs are of novel shapes that
have not been documented yet. Our method is time-efficient where it
can discover all the motifs across the 600 proteins in less than
14 minutes on a stand-alone PC. The discovered motifs are reported
in our project webpage:
[ Full-text PDF | Table of Contents ]
Japanese Society for Bioinformatics |



