Automatic Ontology Construction from the Literature

Christian Blaschke (
Alfonso Valencia (

Protein Design Group, CNB/CSIC, Campus Universidad Autonoma, 28049 Madrid, Spain


Detailed classifications, controlled vocabularies and organised terminology are widely used in different areas of science and technology. Their relatively recent introduction in molecular biology has been crucial for progress in the analysis of genonics and massive proteomics experiments. Unfortunately the construction of the ontologies, including terminology, classification and entity relations requires considerable effort, including the analysis of massive amounts of literature. We propose here a method that automatically generates classifications of gene-product functions using bibliographic information. The corresponding classification structures mirror the ones constructed by human experts. The analysis of a large structure built for yeast gene-products, and the detailed inspection of various examples, show encouraging properties. In particular, the comparison with the well accepted GO ontology points to different situations in which the automatically derived classification can be useful for assisting human experts in the annotation of ontologies.

