Efficient Determination of Cluster Boundaries for Analysis of Gene Expression Profile Data Using Hierarchical Clustering and Wavelet Transform
Harry Amri Moesa[1] (hammus00@yahoo.com)
Dukka Bahadur K.C.[2] (dukka@kuicr.kyoto-u.ac.jp)
Tatsuya Akutsu[2] (takutsu@kuicr.kyoto-u.ac.jp)
[1]NEC Soft Ltd, Platform System Division, Tokyo, Japan
[2]Bioinformatics Center, Institute for Chemical Research, Kyoto
University,
Kyoto 611-0011, Japan
Abstract
The existing methods for clustering of gene expression profile data either require manual inspection and other biological knowledge or require some cut-off value which can not be directly calculated from the given data set. Thus, the problem of systematic and efficient determination of cluster boundaries of clusters in gene expression profile data still remains demanding.
In this context, we have developed a procedure for automatic and
systematic determination of the boundaries of clusters in the
hierarchical clustering of gene expression data based on the ratio of with-in class variance and between-class variance, which can be fully
calculated from the given expression data. After the determination of
dendrogram based on agglomerative hierarchical clustering, this ratio
is used to determine the cluster boundary. Except this ratio which can
be completely calculated from the given expression profile data, unlike
other existing approaches, our approach does not require any manual
inspection or biological knowledge. Our results are favorably
comparable and in some of cases better than existing
method which does not utilize prior information or manual inspection.
Moreover, gene expression profile data are often contaminated with
various type of noise and in order to reduce this noise content, we
have also applied image enhancing technique called discrete wavelet
transform. We tested a number of mother wavelet functions to smooth the
noise in the gene expression data set and obtained some improvements in the quality of
the results.
Japanese Society for Bioinformatics