Active Pathway Identification and Classification with Probabilistic Ensembles
Timothy Hancock (email@example.com)
Hiroshi Mamitsuka (firstname.lastname@example.org)
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
A popular means of modeling metabolic networks is through identifying frequently observed pathways. However the definition of what constitutes an observation of a pathway and how to evaluate the importance of identified pathways remains unclear. In this paper we investigate different methods for defining an observed pathway and evaluate their performance with pathway classification models. We use three methods for defining an observed pathway; a path in gene over-expression, a path in probable gene over-expression and a path of most accurate classification. The performance of each definition is evaluated with three classification models; a probabilistic pathway classifier - HME3M, logistic regression and SVM. The results show that defining pathways using the probability of gene over-expression creates stable and accurate classifiers. Conversely we also show defining pathways of most accurate classification finds a severely biased pathways that are unrepresentative of underlying microarray data structure.
Japanese Society for Bioinformatics