Protein Topology Classification Using Two-Stage Support Vector Machines
Jayavardhana Gubbi[1] (jrgl@ee.unimelb.edu.au)
[1]Department of Electrical and Electronics Engineering, The
University of Melbourne, Parkville, Victoria 3010,
Australia
Abstract
The determination of the first 3-D model of a protein from its sequence alone
is a non-trivial problem. The first 3-D model is the key to the molecular
replacement method of solving phase problem in x-ray crystallography. If the
sequence identity is more than 30%, homology modelling can be used to
determine the correct topology (as defined by CATH) or fold (as defined by
SCOP). If the sequence identity is less than 25%, however, the task is very
challenging. In this paper we address the topology classification of
proteins with sequence identity of less than 25%. The input information to
the system is amino acid sequence, the predicted secondary structure and the
predicted real value relative solvent accessibility. A two stage support
vector machine (SVM) approach is proposed for classifying the sequences to
three different structural classes (α, β, α+β) in the
first stage and 39 topologies in the second stage. The method is evaluated
using a newly curated dataset from CATH with maximum pairwise sequence
identity less than 25%. An impressive overall accuracy of 87.44% and
83.15% is reported for class and topology prediction, respectively. In the
class prediction stage, a sensitivity of 0.77 and a specificity of 0.91 is
obtained. Data file, SVM implementation (SVMHEAVY) and result files can be
downloaded from
[ Full-text PDF | Table of Contents ]
Japanese Society for Bioinformatics |



