Akihiro Nakaya (firstname.lastname@example.org)
Susumu Goto (email@example.com)
Minoru Kanehisa (firstname.lastname@example.org)
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
This paper presents a new method to extract a set of correlated genes with respect to multiple biological features. Relationships among genes on a specific feature are encoded as a graph structure whose nodes correspond to genes. For example, the genome is a graph representing positional correlations of genes on the chromosome, the pathway is a graph representing functional correlations of gene products, and the expression profile is a graph representing gene expression similarities. When a set of genes are localized in a single graph, such as a gene cluster on the chromosome, an enzyme cluster in the metabolic pathway, or a set of coexpressed genes in the microarray gene expression profile, this may suggest a functional link among those genes. The functional link would become stronger when the clusters are correlated; namely, when a set of corresponding genes form clusters in multiple graphs. The newly introduced heuristic algorithm extracts such correlated gene clusters as isomorphic subgraphs in multiple graphs by using inter-graph links that are defined based on biological relevance. Using the method, we found E.coli correlated gene clusters in which genes are related with respect to the positions in the genome and the metabolic pathway, as well as the 3D structural similarity. We also analyzed protein-protein interaction data by two-hybrid experiments and gene coexpression data by microarrays in S.cerevisiae, and estimated the possibility of utilizing our method for screening the datasets that are likely to contain many false positive relations.