Efficient Determination of Cluster Boundaries for Analysis of Gene Expression Profile Data Using Hierarchical Clustering and Wavelet Transform

Harry Amri Moesa[1] (hammus00@yahoo.com)
Dukka Bahadur K.C.[2] (dukka@kuicr.kyoto-u.ac.jp)
Tatsuya Akutsu[2] (takutsu@kuicr.kyoto-u.ac.jp)

[1]NEC Soft Ltd, Platform System Division, Tokyo, Japan
[2]Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan


The existing methods for clustering of gene expression profile data either require manual inspection and other biological knowledge or require some cut-off value which can not be directly calculated from the given data set. Thus, the problem of systematic and efficient determination of cluster boundaries of clusters in gene expression profile data still remains demanding.
In this context, we have developed a procedure for automatic and systematic determination of the boundaries of clusters in the hierarchical clustering of gene expression data based on the ratio of with-in class variance and between-class variance, which can be fully calculated from the given expression data. After the determination of dendrogram based on agglomerative hierarchical clustering, this ratio is used to determine the cluster boundary. Except this ratio which can be completely calculated from the given expression profile data, unlike other existing approaches, our approach does not require any manual inspection or biological knowledge. Our results are favorably comparable and in some of cases better than existing method which does not utilize prior information or manual inspection. Moreover, gene expression profile data are often contaminated with various type of noise and in order to reduce this noise content, we have also applied image enhancing technique called discrete wavelet transform. We tested a number of mother wavelet functions to smooth the noise in the gene expression data set and obtained some improvements in the quality of the results.

[ Full-text PDF | Table of Contents ]

Japanese Society for Bioinformatics