Arjun Bhutkar, (email@example.com)
Susan Russo (firstname.lastname@example.org)
Temple F. Smith (email@example.com)
William M. Gelbart (firstname.lastname@example.org)
Department of Molecular and Cellular Biology, Harvard University, Cambridge MA 021383, USA
BioMolecular Engineering Research Center, Boston University, 36 Cummington St., Boston MA 02215, USA
Genome scale synteny analysis, the analysis of relative gene-order conservation between species, can provide key insights into evolutionary chromosomal dynamics, rearrangement rates between species, and speciation analysis. With the rapid availability of multiple genomes, there is a need for efficient solutions to aid in comparative syntenic analysis. Current methods rely on homology assessment and multiple alignment based solutions to determine homologs of genetic markers between species and to infer syntenic relationships. One of the primary challenges facing multi-genome syntenic analysis is the uncertainty posed by genome assembly data with un-sequenced gaps and possible assembly errors. Currently, manual intervention is necessary to tune and correct the results of homology assessment and synteny inference. This paper presents a novel automated approach to overcome some of these limitations. It uses a graph based algorithm to infer sub-graphs denoting synteny chains with the objective of choosing the best locations for homologous elements, in the presence of paralogs, in order to maximize synteny. These synteny chains are expanded by merging sub-graphs based on various user defined thresholds for micro-syntenic scrambling. This approach comprehends and accommodates for contig and scaffold gaps in the assembly to determine homologous genetic elements that might either fall in unsequenced assembly gaps or lie on the edges of sequenced segments or on small fragments. Furthermore, it provides an automated solution for breakpoint analysis and a comparative study of chromosomal rearrangements between species. This approach was applied to a comparative study involving Drosophila.melanogaster and Drosophila.pseudoobscura genomes, as an example, and has been useful in analyzing inter-species syntenic relationships.