Peter J. Waddell(email@example.com)
Graduate School of Agriculture and Life Sciences, University of Tokyo,
1-1-1 Yayoi Bunkyo-ku,Tokyo 113-8657, Japan
Chugai Research Institute for Molecular Medicine, INC., 153-2 Nagai Niihari Ibaraki 300-4101, Japan
In this paper, we propose and use two novel procedures for the analysis of microarray gene expression data. The first is correspondence analysis which visualizes the relationship between genes and tissues as two 2 dimensional graphs, oriented so that distances between genes are preserved, distances between tissues are preserved, and so that genes which primarily distinguish certain types of tissue are spatially close to those tissues. For the inference of genetic links, partial correlations rather than correlations are the key issue. A partial correlation between i and j is the relationship between i and j after the effect of surrounding genes has been subtracted out of their pairwise correlation. This leads to the area of graphical modeling. A limitation of the graphical modeling approach is that the correlation matrix of expression profiles between genes is degenerate whenever the number of genes to be analyzed exceeds the number of distinct expression measurements. This can cause considerable problems, as calculation of partial correlations typically uses the inverse of the correlation matrix. To avoid this limitation, we propose two practical multiple regression procedures with variable selection to measure the net, screened, relationship between pairs of genes. Possible biases arising from the analysis of a subset of genes from the genome are examined in the worked examples. It seems that both these approaches are more natural ways of analyzing gene expression data than the currently popular approach of two way clustering.