Co-Evolution of Metabolism and Protein Sequences


Moritz Schütte [1](schuette@mpimp-golm.mpg.de)
Niels Klitgord [2](niels@bu.edu)
Daniel Segrè [2][3](dsegre@bu.edu)
Oliver Ebenhöh [1][4][5][6](ebenhoeh@abdn.ac.uk)

[1] Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam--Golm, Germany
[2] Boston University, Bioinformatics Program, 24 Cummington Street, Boston, MA 02215, USA
[3] Boston University, Departments of Biology and Biomedical Engineering, 24 Cummington Street, Boston, MA 02215, USA
[4] Potsdam University, Institute of Biochemistry and Biology, Karl--Liebknecht--Straße 24--25, 14476 Potsdam--Golm, Germany
[5] Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen, AB24 3UE, UK
[6] Institute of Medical Sciences, Foresterhill, University of Aberdeen, Aberdeen, AB25 2ZD, UK

Abstract

The set of chemicals producible and usable by metabolic pathways must have evolved in parallel with the enzymes that catalyze them. One implication of this common historical path should be a correspondence between the innovation steps that gradually added new metabolic reactions to the biosphere-level biochemical toolkit, and the gradual sequence changes that must have slowly shaped the corresponding enzyme structures. However, global signatures of a long-term co-evolution have not been identified. Here we search for such signatures by computing correlations between inter-reaction distances on a metabolic network, and sequence distances of the corresponding enzyme proteins. We perform our calculations using the set of all known metabolic reactions, available from the KEGG database. Reaction-reaction distance on the metabolic network is computed as the length of the shortest path on a projection of the metabolic network, in which nodes are reactions and edges indicate whether two reactions share a common metabolite, after removal of cofactors. Estimating the distance between enzyme sequences in a meaningful way requires some special care: for each enzyme commission (EC) number, we select from KEGG a consensus set of protein sequences using the cluster of orthologous groups of proteins (COG) database. We define the evolutionary distance between protein sequences as an asymmetric transition probability between two enzymes, derived from the corresponding pair-wise BLAST scores. By comparing the distances between sequences to the minimal distances on the metabolic reaction graph, we find a small but statistically significant correlation between the two measures. This suggests that the evolutionary walk in enzyme sequence space has locally mirrored, to some extent, the gradual expansion of metabolism.

[ Full-text PDF |Table of Contents ]


Japanese Society for Bioinformatics