FragQA: Predicting Local Fragment Quality of a Sequence-Structure Alignment
Xin Gao[1] (x4gao@cs.uwaterloo.ca)
Dongbo Bu[1],[3] (dbu@cs.uwaterloo.ca)
Shuai Cheng Li[1] (scli@cs.uwaterloo.ca)
Jinbo Xu[2] (j3xu@tti-c.org)
Ming Li[1] (mli@cs.uwaterloo.ca)
[1]David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada, N2L 3G1
[2]Toyota Technological Institute at Chicago, Chicago, IL, USA, 60637
[3]Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100080
Abstract
Motivation.
Although protein structure prediction has made great progress in
recent years, a protein model derived from automated prediction
methods is subject to various errors. As methods for structure
prediction develop, a continuing problem is how to evaluate the
quality of a protein model, especially to identify some well
predicted regions of the model, so that the structure biology
community can benefit from automated structure prediction. It is
also important to identify badly-predicted regions in a model so
that some refinement measurements can be applied to.
Results.
We present a novel technique FragQA to accurately predict local
quality of a sequence-structure (i.e., sequence-template) alignment
generated by comparative modeling (i.e., homology modeling and
threading). Different from previous local quality assessment
methods, FragQA directly predicts cRMSD between a continuously
aligned fragment determined by an alignment and the corresponding
fragment in the native structure. FragQA uses an SVM (Support Vector
Machines) regression method to perform prediction using information
extracted from a single given alignment. Experimental results
demonstrate that FragQA performs well on predicting local quality.
More specifically, FragQA has prediction accuracy better than a top
performer ProQres. Our results indicate that (1)
local quality can be predicted well; (2) local sequence evolutionary
information (i.e., sequence similarity) is the major factor in
predicting local quality; and (3) structure information such as
solvent accessibility and secondary structure helps improving
prediction performance.
Japanese Society for Bioinformatics