Peter J. Waddell(firstname.lastname@example.org)
Hirohisa Kishino(email@example.com )
Chugai Research Institute for Molecular Medicine,
153-2 Nagai Niihari Ibaraki 300-4101, Japan
Graduate School of Agriculture and Life Sciences, University of Tokyo,
1-1-1 Yayoi Bunkyo-ku, Tokyo 113-8657, Japan
At present, there is a lack of a sound methodology to infer causal gene expression relationships on a genome wide basis. We address this first by examining the behaviour of some of the latest and fastest algorithms for tree and cluster analysis, particularly hierarchical methods popular in phylogenetics. Combined with these are two novel distances based on partial, rather than full, correlations. Theoretically, partial correlations should provide better evidence for regulatory genetic links than standard correlations. To compare the clusters obtained by many alternative methods we use tree consensus methods. To compare methods of analysis we used tree partition metrics followed by another level of clustering. These, and a tree fit metric, all suggest that the new distances give quite different trees than those usually obtained. In the second part we consider graphical modeling of the interactions of important genes of the cell cycle. Despite the models seeming to fit well on occasions, and despite the experimental error structure seeming close to multivariate normal, there are considerable problems to overcome. Latent variables, in this case important genes missing from the analysis, are inferred to have a strong effect on the partial correlations. Also, the data show clear evidence of sampling distributions conditional on the status of important cancer related genes, including TP53. Without full information on which genes are wild type the appropriate models cannot be fitted. These findings point to the need to include and distinguish not only all relevant genes but also all splice variants in the design phase of a microarray analysis. Failure to do so will induce problems similar to both latent variables and conditional distributions.