Alexis Vandenbon (email@example.com)
Kenta Nakai[1,2,3] (firstname.lastname@example.org)
 Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
 Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
 Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Agency, 5-3 Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan
Regulation of transcription is controlled by sets of transcription factors binding specific sites in the regulatory regions of genes. It is therefore believed that regulatory regions driving similar expression profiles share some common structural features. We here introduce a computational approach for finding a small set of rules describing the presence and positioning of motifs in a set of promoter sequences. This rule set is subsequently used for finding promoters that drive similar expression profiles from a genomic set of sequences. We applied our approach on muscle-expressed genes in Caenorhabditis elegans. We obtained a high average performance, and in the best case we found that almost 50% of true positive test genes scored higher than 90% of the true negative test genes. High scoring non-training sequences were enriched for muscle-expressed genes, and predicted motifs fitting the rules showed a significant tendency to be present in experimentally verified regulatory regions. Our model is more general than existing cis-regulatory module models, as rules selected by our model contain a variety of information, including not only proximal but also distal positioning of pairs of motifs, positioning with regard to the translation start site, and simply presences of motifs. We believe our model can help to increase our understanding about transcription factor cooperation and transcription initiation.
[ Full-text PDF | Table of Contents ]