Sequence Data Analysis for Long Disordered Regions Prediction in the Calcineurin Family

Pedro Romero[1] (promero@eecs.wsu.edu)
Zoran Obradovic[1] (zoran@eecs.wsu.edu)
A. Keith Dunker[2] (dunker@mail.wsu.edu)

[1] School of Electrical Engineering and Computer Science
[2] Department of Biochemistry and Biophysics, Washington State University
Pullman, WA 99164, U.S.A.


Abstract

Our recently reported results provide strong support for a hypothesis that some amino acid sequences code for disordered regions rather than structured ones and that such disordered regions are commonly involved in function. General and family-specific neural network predictors developed in those previous studies suggest that different classes of disordered regions exist. Here, family-specific data preprocessing for disorder prediction in the calcineurin (CaN) family is explored. The results show that prediction of order and disorder on CaN sequence data benefits significantly from the use of family-specific preprocessing, with feature extraction through principal components analysis (PCA) outperforming feature selection techniques, although all methods do a good job of discriminating CaN-specific disordered regions from CaN-specific ordered regions. On the other hand, for the discrimination of CaN-specific disordered regions from general (unrelated to CaN) ordered regions, feature selection approaches proved to be more appropriate than PCA. The results further support a hypothesis that different kinds of disordered regions exist, as all family-specific disorder predictors developed in this study significantly outperformed a previously reported general multi-family disorder predictor.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics