Breaking News

Genome-wide Mapping of Protein-Ligand

Determining the function of proteins based on their sequences is a challenge of longstanding interest in biology. The function of a protein is determined by its interactions with other molecules in its environment, which in turn depend on the three-dimensional structure and dynamics of these molecules. For much of the past century, biology has focused on the identities and roles of individual protein molecules, and their regulation. The advent of genome projects has given us most of these protein identities.  What these projects have not given us are the partners with which each individual protein interacts, and hence their functions. This need has occasioned the development of innovative experimental technologies to determine the functions of the proteins encoded by the various genomes (functional proteomics).  These technologies include mass spectrometric investigation of protein expression patterns1, high-throughput yeast two hybrid approaches to identifying protein association partners2, large scale affinity tag purification efforts, and protein micro-array techniques to study protein-protein, protein-nucleic acid, protein-lipid, enzyme-substrate, and protein-drug interactions3, among others. 
As innovative as these experimental technologies are, the number of possible interacting partners is staggering.  Even identifying all the one-on-one partners within the proteome is beyond a purely experimental program; if one adds the possible small molecule ligands, including drugs and reagents, the problem becomes even more difficult.  The only option for making large-scale progress is to leverage available experimental information with computation.
Our goal is an integrated software system that will allow for a genome-wide mapping of the interactions of protein receptors with drug-like and macromolecular ligands.  The fundamental input to this software system will be the sequence of protein targets.  The fundamental output will be a list of ligands, from among a large list of possibilities, predicted to bind to the structure of these targets.  This software system will take a structure-based approach and its output will be describable at atomic resolution.  This goal will require:
1.    Creation of protein structure models, typically using comparative modeling.
2.    Refinement of these models.
3.    Prediction of binding sites on proteins.
4.    Docking ligands against these sites.
5.    Analyzing the predicted ligands and complexes for functional inference (eg, the identity of the substrate, the pathway to which it belongs) and modulation (eg, leads for drug discovery). 
 
A key aim is to make this software system accessible to the general biological community.  To achieve this goal, the software must operate as an integrated and largely automated pipeline.  Admittedly, this is an ambitious goal.  Structure prediction, energy-based refinement of the models, and structure-based screens (docking) have remained the purview of experts, and even in their hands have been prone to error.  We nevertheless believe that this goal is possible.  Advances in the underlying technologies have overcome important barriers in the last five years.  Thus, comparative modeling has successfully predicted the structures of proteins on a genome wide-scale, and the results have been made available to the community4.  Correspondingly, docking screens for ligands and inhibitors, though they retain algorithmic liabilities, have had important, practical successes recently; docking software can reliably predict sensible ligands, a certain percentage of which can be expected to bind.  Finally, much of the underlying technology necessary for this pipeline already exists in our laboratories; a major goal of this project will be to link already existing software. Notwithstanding its ambition, this project is thus feasible.
Whereas many investigators have contributed to each component area of this pipeline, and each area remains actively researched, this modeling and docking system will be unique because it can be applied on a genomic scale, enabling a host of new applications. In addition to making the pipeline available to investigators in the community, we will ourselves apply it to several important problems in biology and medicine that have not been accessible on any scale previously.  
For instance, we will dock large libraries of functionally annotated ligands (such as metabolites and drug analogs) against multiple proteins within a family of proteins.  Based on the patterns of ligands predicted to bind, it should be possible to infer functional relationships among the proteins that would not be available from docking calculations, or even binding experiments, against any single protein in isolation (Section ‎D.18).  
Similarly, we will use the integrated technologies to target structures determined from structural genomics, as well as those of related homologs, in an effort to predict the functions of those proteins for which  no function is known (Section ‎D.19). 
Finally, we will use the pipeline to target entire classes of drug targets among pathogenic organisms, including all annotated cysteine proteases from several pathogenic parasites, and all proteins for which a sequence is available in the malarial genome (Core 3, Driving Biological Project 3).
Of course, we anticipate that many of the new applications to which this pipeline will be addressed will come at the hands of the biologists who, because of its simplification and automation, will be able to use these modeling and docking technologies effectively against their own problems for the first time. 

No comments