Vincent Daubin (email@example.com)
Manolo Gouy (firstname.lastname@example.org)
Guy Perriere (email@example.com)
Laboratoire de Biometrie et Biologie Eolutive, UMR CNRS 5558, Universit e Claude Bernard - Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
It has been claimed that complete genome sequences would clarify phylogenetic relationships between organisms but, up to now, no satisfying approach has been proposed to use efficiently these data. For instance, if the coding of presence or absence of genes in complete genomes gives interesting results, it does not take into account the phylogenetic information contained in sequences and ignores hidden paralogy by using a similarity-based definition of orthology. Also, concatenation of sequences of different genes takes hardly in consideration the specific evolutionary rate of each gene. At last, building a consensus tree is strongly limited by the low number of genes shared among all organisms. Here, we use a new method based on supertree construction, which permits to cumulate in one supertree the information and statistical support of hundreds of trees from orthologous gene families and to build the phylogeny of 33 prokaryotes and four eukaryotes with completely sequenced genomes. This approach gives a robust supertree, which demonstrates that a phylogeny of prokaryotic species is conceivable and challenges the hypothesis of a thermophilic origin of bacteria and present-day life. The results are compatible with the hypothesis of a core of genes for which lateral transfers are rare but they raise doubts on the widely admitted "complexity hypothesis" which predicts that this core is mainly implicated in informational processes.