Yugo Shimizu (firstname.lastname@example.org)
Masahiro Hattori (email@example.com)
Susumu Goto (firstname.lastname@example.org)
Minoru Kanehisa (email@example.com)
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
Prediction of unknown enzymatic reactions is useful for understanding biological processes such as reactions to external substances like endocrine disrupters. To create an accurate prediction, we need to define a similarity measure in the reaction. We have developed the KEGG RPAIR database which is a collection of chemical structure transformation patterns, called RDM patterns, for substrateproduct pairs of enzymatic reactions. In this study, we compared RDM patterns with EC numbers which are the well-known hierarchical classification scheme for enzymes. Additionally, we performed hierarchical clustering of RDM patterns using the information stating whether each subsubclass of EC has a particular RDM pattern or not. To represent the variation of RDM patterns in a cluster, we generalized RDM patterns in the same cluster using the hierarchy of KEGG Atomtypes, which are the components of RDM patterns. Using this generalized pattern, we can predict which cluster includes a given RDM pattern even if the reaction of the pattern has not been assigned any EC numbers. Thus we will be able to define the similarity between enzymatic reactions by using this cluster information.