The Potential Use of SUISEKI as a Protein Interaction Discovery Tool
Christian Blaschke (blaschke@cnb.uam.es)
Alfonso Valencia (valencia@cnb.uam.es)
Protein Design Group, CNB/CSIC, Campus Universidad Autonoma,
28049 Madrid, Spain
Abstract
Relevant information about protein interactions is stored in textual
sources. This sources are commonly used not only as archives of what
is already known but also as information for generating new knowledge,
particularly to pose hypothesis about new possible interactions that
can be inferred from the existing ones. This task is the more creative
part of scientific work in experimental systems. We present a
large-scale analysis for the prediction of new interactions based on
the interaction network for the ones already known and detected
automatically in the literature.
During the last few years it has became clear that part of the
information about protein interactions could be extracted with
automatic tools, even if these tools are still far from perfect and
key problems such as detection of protein names are not completely
solved. We have developed a integrated automatic approach, called
SUISEKI (System for Information Extraction on Interactions), able to
extract protein interactions from collections of Medline abstracts.
Previous experiments with the system have shown that it is able to
extract almost 70% of the interactions present in relatively large
text corpus, with an accuracy of approximately 80% (for the best
defined interactions) that makes the system usable in real scenarios,
both at the level of extraction of protein names and at the level of
extracting interaction between them.
With the analysis of the interaction map of Saccharomyces cerevisiae
we show that interactions published in the years 2000/2001 frequently
correspond to proteins or genes that were already very close in the
interaction network deduced from the literature published before these
years and that they are often connected to the same proteins. That is,
discoveries are commonly done among highly connected entities. Some
biologically relevant examples illustrate how interactions described
in the year 2000 could have been proposed as reasonable working
hypothesis with the information previously available in the
automatically extracted network of interactions.
Japanese Society for Bioinformatics