Woo-Hyuk Jang (firstname.lastname@example.org)
Dong-Soo Han, (email@example.com)
Hong-Soog Kim (firstname.lastname@example.org)
Sung-Doke Lee (email@example.com)
School of Engineering, Information and Communications University,
119, Munjiro, Yuseong-gu, Daejeon, 305-714, Korea
Electronics and Telecommunications Research Institute, 161, Gajeong-dong, Yuseong-gu, Daejeon, 305-350, Korea
Domain Combination based Protein-Protein Interaction Prediction (DCPPIP) method is revealed to show outstanding prediction accuracy in Yeast proteins. However, it is not yet apparent whether the method is still valid and can achieve comparable prediction accuracy for the proteins in other species. In this paper, we report the validation results of applying the DCPPIP method for Fly and Human proteins. We also report the results of inter-species validation, in which protein interaction and domain data of other species are used as learning set. 10,351 interacting protein pairs are used for the validation for Fly, 2,345 protein pairs for Human. 80% of the data are used as learning sets and 20% are reserved as test sets. High prediction accuracies (Fly: sensitivity≈77%, specificity≈92%, Human: sensitivity≈96%, specificity≈95%) are achieved in both Fly and Human cases. Interactions of proteins in Human, Mouse, H. pylori, E. coli, and C. elegans are predicted and validated using the protein interaction and domain data in Yeast, Fly, and the combination of Yeast and Fly respectively. Again, good prediction accuracy is achieved when the test protein pair has common domains with the proteins in a learning set of proteins. A notion of Domain Overlapping Rate (DOR) among species is newly developed in this paper and the correlation between DOR and prediction accuracy is examined. According to out test results, there exists fairly obvious correlation between DOR and prediction accuracy.