Co-Evolution of Metabolism and Protein Sequences
Moritz Schütte [1](schuette@mpimp-golm.mpg.de)
Niels Klitgord [2](niels@bu.edu)
Daniel Segrè [2][3](dsegre@bu.edu)
Oliver Ebenhöh [1][4][5][6](ebenhoeh@abdn.ac.uk)
[1] Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1,
14476 Potsdam--Golm, Germany
[2] Boston University, Bioinformatics Program, 24 Cummington Street,
Boston, MA 02215, USA
[3] Boston University, Departments of Biology and Biomedical
Engineering, 24 Cummington Street, Boston, MA 02215, USA
[4] Potsdam University, Institute of Biochemistry and Biology, Karl--Liebknecht--Straße 24--25, 14476 Potsdam--Golm, Germany
[5] Institute for Complex Systems and Mathematical Biology, University
of Aberdeen, Aberdeen, AB24 3UE, UK
[6] Institute of Medical Sciences, Foresterhill, University of Aberdeen, Aberdeen, AB25 2ZD, UK
Abstract
The set of chemicals producible and usable by metabolic pathways must
have evolved in parallel with the enzymes that catalyze them. One
implication of this common historical path should be a correspondence
between the innovation steps that gradually added new metabolic
reactions to the biosphere-level biochemical toolkit, and the gradual
sequence changes that must have slowly shaped the corresponding enzyme
structures. However, global signatures of a long-term co-evolution have
not been identified. Here we search for such signatures by computing
correlations between inter-reaction distances on a metabolic network,
and sequence distances of the corresponding enzyme proteins. We perform
our calculations using the set of all known metabolic reactions,
available from the KEGG database. Reaction-reaction distance on the
metabolic network is computed as the length of the shortest path on a
projection of the metabolic network, in which nodes are reactions and
edges indicate whether two reactions share a common metabolite, after
removal of cofactors. Estimating the distance between enzyme sequences
in a meaningful way requires some special care: for each enzyme
commission (EC) number, we select from KEGG a consensus set of protein
sequences using the cluster of orthologous groups of proteins (COG)
database. We define the evolutionary distance between protein sequences as an asymmetric transition probability between two enzymes, derived from the corresponding pair-wise BLAST scores. By comparing the distances between sequences to the minimal distances on the metabolic reaction graph, we find a small but statistically significant correlation between the two measures. This suggests that the evolutionary walk in enzyme sequence space has locally mirrored, to some extent, the gradual expansion of metabolism.
Japanese Society for Bioinformatics