Jianghong An (firstname.lastname@example.org)
Maxim Totrov (email@example.com)
Ruben Abagyan (firstname.lastname@example.org)
Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
Molsoft, LLC, La Jolla, CA 92037, USA
We have developed a new computational algorithm for de novo identification of protein-ligand binding pockets and performed a large-scale validation of the algorithm on two systematically collected datasets from all crystallographic structures in the Protein Data Bank (PDB). This algorithm, called DrugSite, takes a three-dimensional protein structure as input and returns the location, volume and shape of the putative small molecule binding sites by using a physical potential and without any knowledge about a potential ligand molecule. We validated this method using 17,126 binding sites from complexes and apo-structures from the PDB. Out of 5,616 binding sites from protein-ligand complexes, 98.8% were identified by predicted pockets. In proteins having known binding sites, 80.9% were predicted by the largest predicted pocket and 92.7% by the first two. The average ratio of predicted contact area to the total surface area of the protein was 4.7% for the predicted pockets. In only 1.2% of the cases, no “pocket density” was found at the ligand location. Further, 98.6% of 11,510 binding sites collected from apo-structures were predicted. The algorithm is accurate and fast enough to predict protein-ligand binding sites of uncharacterized protein structures, suggest new allosteric druggable pockets, evaluate druggability of protein-protein interfaces and prioritize molecular targets by druggability. Furthermore, the known and the predicted binding pockets for the proteome of a particular organism can be clustered into a “pocketome”, that can be used for rapid evaluation of possible binding partners of a given chemical compound.