On Combining Multiple Microarray Studies for Improved Functional Classification by Whole-Dataset Feature Selection

See-Kiong Ng[1] (skng@i2r.a-star.edu.sg)
Soon-Heng Tan[1] (soonheng@i2r.a-star.edu.sg)
V.S. Sundararajan[1],[2] (sundar@i2r.a-star.edu.sg)

[1]Knowledge Discovery Department, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613
[2]School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore 119260


As microarray technologies become routinely applied in genome laboratories for studying gene expression, it is not uncommon that experiments on identical or similar sets of genes are conducted by multiple laboratories for various functional studies of these genes. Much of such data are often available to researchers for their data analysis, either through collaborators or from online gene expression databases. It will be useful to combine data from different microarray studies to improve the microarray data mining results.
  We show that the functional classification of genes from microarray data can be improved further by combining gene expression data from multiple microarray studies, even if the experimental focus or conditions for each experimental study may differ. However, blindly combining all available datasets may not always improve the analysis results---it is important to be selective of the datasets for inclusion. In our approach, we consider each dataset to be one feature, and then apply feature selection strategies to select appropriate datasets for training. With a simple hill-climbing method, we show that gene classification performances can be improved by whole-dataset feature selection.

[ Full-text PDF | Table of Contents ]

Japanese Society for Bioinformatics