Juris Viksna,  (email@example.com)
David Gilbert (firstname.lastname@example.org)
Gilleain Torrance (email@example.com)
Institute of Mathematics and Computer Science, University of Latvia, Rainis boulevard 29, Riga LV-1459, Latvia
Bioinformatics Research Centre, Department of Computing Science, A416 Davidson Building, University of Glasgow, Glasgow G12 8QQ, UK
We describe a method for automated domain discovery for topological profile searches in protein structures. The method is used in a system TOPStructure for fast prediction of CATH classification for protein structures (given as PDB files). It is important for profile searches in multi-domain proteins, for which the profile method by itself tends to perform poorly. We also present an O(C(n)k + nk²) time algorithm for this problem, compared to the O(C(n)k + (nk)²) time used by a trivial algorithm (where n is the length of the structure, k is the number of profiles and C(n) is the time needed to check for a presence of a given motif in a structure of length n). This method has been developed and is currently used for TOPS representations of protein structures and prediction of CATH classification, but may be applied to other graph-based representations of protein or RNA structures and/or other prediction problems. A protein structure prediction system incorporating the domain discovery method is available at http://bioinf.mii.lu.lv/tops/.