Large Scale Statistical Prediction of Protein-Protein Interaction by Potentially Interacting Domain (PID) Pair

Wan Kyu Kim[1] (dimlight@oitek.com)
Jong Park[2] (jong@bio.cc)
Jung Keun Suh[3] (suhjung@lgls.co.kr)

[1]Object Interaction Technologies Inc., Room 201 Jueun Building, 29-4 Jamwon-dong, Seocho-gu, Seoul 137-904, Korea
[2]MRC-Dunn Human Nutrition Unit, Hills Road, Cambridge CB2 2XY, United Kingdom
[3]LG Life Science, Ltd., R&D Park, Biotech Research Institute, 104-1 Yusung-gu, Daejeon 305-503, Korea


Abstract

Protein-protein interaction plays a critical role in biological processes. The identification of interacting proteins by computational methods can provide new leads in functional studies of uncharacterized proteins without performing extensive experiments. We developed a database for the potentially interacting domain pairs (PID) extracted from a dataset of experimentally identified interacting protein pairs (DIP: database of interacting proteins) with InterPro, an integrated database of protein families, domains and functional sites. In developing protein interaction databases and predictive methods, sensitive statistical scoring systems is critical to provide a reliability index for accurate functional analysis of interaction networks. We present a statistical scoring system, named ``PID matrix score'' as a measure of the interaction probability (interactability) between domains. This system provided a valuable tool for functional prediction of unknown proteins. For the evaluation of PID matrix, cross validation was performed with subsets of DIP data. The prediction system gives about 50% sensitivity and more than 98% specificity, which implies that the information for interacting proteins pairs could be enriched about 30 fold with the PID matrix. It is demonstrated that mapping of the genome-wide interaction network can be achieved by using the PID matrix.

[ Full-text PDF | Table of Contents ]


Japanese Society for Bioinformatics