Chang-Jiun Wu (email@example.com)
Yutao Fu (firstname.lastname@example.org)
T. M. Murali (email@example.com)
Simon Kasif, (firstname.lastname@example.org)
Boston University Bioinformatics Program, Boston, MA 02215, USA
Department of Computer Science, Virginia Polytechnic Institute andState University, Blacksburg, VA24061, USA
Department of Bioengineering, Boston, MA 02215, USA
Recent advances in high throughput profiling of gene expression have catalyzed an explosive growth in functional genomics aimed at the elucidation of genes that are differentially expressed in various tissue or cell types across a range of experimental conditions. These studies can lead to the identification of diagnostic genes, classification of genes into functional categories, association of genes with regulatory pathways, and clustering of genes into modules that are potentially co-regulated by a group of transcription factors. Traditional clustering methods such as hierarchical clustering or principal component analysis are difficult to deploy effectively for several of these tasks since genes rarely exhibit similar expression pattern across a wide range of conditions. Bi-clustering of gene expression data is a promising methodology for identification of gene groups that show a coherent expression profile across a subset of conditions. This methodology can be a first step towards the discovery of co-regulated and co-expressed genes or modules. Although bi-clustering (also called block clustering) was introduced in statistics in 1974 few robust and efficient solutions exist for extracting gene expression modules in microarray data. In this paper, we propose a simple but promising new approach for bi-clustering based on a Gibbs sampling paradigm. Our algorithm is implemented in the program GEMS (Gene Expression Module Sampler). GEMS has been tested on synthetic data generated to evaluate the effect of noise on the performance of the algorithm as well as on published leukemia datasets. In our preliminary studies comparing GEMS with other bi-clustering software we show that GEMS is a reliable, flexible and computationally efficient approach for bi-clustering gene expression data.