Cary S. Gunther (firstname.lastname@example.org)
Terry Gaasterland (email@example.com)
Laboratory of Computational Genomics, The Rockefeller University, 1230 York Avenue, New York, New York 10021, USA
A pair of distinct proteins in one organism may most closely match different parts of the same protein in another organism. A comparison of all proteins from the genome of Saccharomyces cerevisiae and all proteins from 24 prokaryotic genomes yields 1010 pairs of yeast proteins whose homologs are parts of one protein from a prokaryotic genome. Marcotte et al.  showed that proteins related in this manner are more likely to interact than proteins chosen at random. In this paper, we investigated whether genes coding for such proteins are also likely to be concurrently transcribed. We identified 1010 fused pairs of proteins encoded in the yeast genome and analyzed expression of the corresponding genes at the transcriptional level. We found that the transcriptional profiles of fused gene pairs are significantly closer than those of randomly selected pairs. This finding is reproducible and established by multiple distance metrics. Moreover, such pairs frequently share additional biologically relevant properties. Thus, while protein fusion patterns are not predictive of co-expression, they are an important element in explaining co-expression. This justifies the use of curated protein fusion events to help characterize gene co-expression clusters.