See-Kiong Ng (email@example.com)
Soon-Heng Tan (firstname.lastname@example.org)
V.S. Sundararajan, (email@example.com)
Knowledge Discovery Department, Institute for Infocomm Research, 21
Heng Mui Keng Terrace, Singapore 119613
School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore 119260
As microarray technologies become routinely applied in genome
laboratories for studying gene expression, it is not
uncommon that experiments on identical or similar
sets of genes are conducted by multiple laboratories for various
functional studies of these genes. Much of such data are often
available to researchers for their data analysis, either through
collaborators or from online gene expression databases. It will be
useful to combine data from different microarray studies to
improve the microarray data mining results.
We show that the functional classification of genes from microarray data can be improved further by combining gene expression data from multiple microarray studies, even if the experimental focus or conditions for each experimental study may differ. However, blindly combining all available datasets may not always improve the analysis results---it is important to be selective of the datasets for inclusion. In our approach, we consider each dataset to be one feature, and then apply feature selection strategies to select appropriate datasets for training. With a simple hill-climbing method, we show that gene classification performances can be improved by whole-dataset feature selection.