Predicting Protein-Protein Relationships from Literature Using Latent Topics
Tatsuya Aso (firstname.lastname@example.org)
Koji Eguchi (email@example.com)
Department of Computer Science and Systems Engineering, Kobe University, 1-1
Rokkoudai, Nada-ku, Kobe, 657-8501, Japan
This paper investigates applying statistical topic models to extract
and predict relationships between biological entities, especially
A statistical topic model, Latent Dirichlet Allocation (LDA)
is promising; however, it
has not been investigated for such a task.
In this paper, we apply the state-of-the-art Collapsed Variational
Bayesian Inference and Gibbs Sampling inference to estimating
the LDA model.
We also apply probabilistic Latent Semantic Analysis (pLSA)
as a baseline for comparison, and compare them from the viewpoints of
log-likelihood, classification accuracy and retrieval
We demonstrate through experiments that the Collapsed Variational
LDA gives better results than the others, especially in terms of
classification accuracy and retrieval effectiveness in the task of
the protein-protein relationship prediction.
Japanese Society for Bioinformatics