Christian Blaschke (firstname.lastname@example.org)
Alfonso Valencia (email@example.com)
Protein Design Group, CNB/CSIC, Campus Universidad Autonoma, 28049 Madrid, Spain
Relevant information about protein interactions is stored in textual
sources. This sources are commonly used not only as archives of what
is already known but also as information for generating new knowledge,
particularly to pose hypothesis about new possible interactions that
can be inferred from the existing ones. This task is the more creative
part of scientific work in experimental systems. We present a
large-scale analysis for the prediction of new interactions based on
the interaction network for the ones already known and detected
automatically in the literature.
During the last few years it has became clear that part of the information about protein interactions could be extracted with automatic tools, even if these tools are still far from perfect and key problems such as detection of protein names are not completely solved. We have developed a integrated automatic approach, called SUISEKI (System for Information Extraction on Interactions), able to extract protein interactions from collections of Medline abstracts.
Previous experiments with the system have shown that it is able to extract almost 70% of the interactions present in relatively large text corpus, with an accuracy of approximately 80% (for the best defined interactions) that makes the system usable in real scenarios, both at the level of extraction of protein names and at the level of extracting interaction between them.
With the analysis of the interaction map of Saccharomyces cerevisiae we show that interactions published in the years 2000/2001 frequently correspond to proteins or genes that were already very close in the interaction network deduced from the literature published before these years and that they are often connected to the same proteins. That is, discoveries are commonly done among highly connected entities. Some biologically relevant examples illustrate how interactions described in the year 2000 could have been proposed as reasonable working hypothesis with the information previously available in the automatically extracted network of interactions.