Genome-wide Mapping of Protein-Ligand
Determining the function of proteins based on their
sequences is a challenge of longstanding interest in biology. The function of a
protein is determined by its interactions with other molecules in its
environment, which in turn depend on the three-dimensional structure and
dynamics of these molecules. For much of the past century, biology has focused
on the identities and roles of individual protein molecules, and their
regulation. The advent of genome projects has given us most of these protein
identities. What these projects have not
given us are the partners with which each individual protein interacts, and
hence their functions. This need has occasioned the development of innovative
experimental technologies to determine the functions of the proteins encoded by
the various genomes (functional proteomics).
These technologies include mass spectrometric investigation of protein
expression patterns1, high-throughput yeast two hybrid
approaches to identifying protein association partners2, large scale affinity tag
purification efforts, and protein micro-array techniques to study
protein-protein, protein-nucleic acid, protein-lipid, enzyme-substrate, and
protein-drug interactions3, among others.
As innovative as these experimental technologies are, the
number of possible interacting partners is staggering. Even identifying all the one-on-one partners within
the proteome is beyond a purely experimental program; if one adds the
possible small molecule ligands, including drugs and reagents, the problem
becomes even more difficult. The only option
for making large-scale progress is to leverage available experimental
information with computation.
Our goal is an
integrated software system that will allow for a genome-wide mapping of the
interactions of protein receptors with drug-like and macromolecular
ligands. The fundamental input to this software
system will be the sequence of protein targets.
The fundamental output will be a list of ligands, from among a large
list of possibilities, predicted to bind to the structure of these targets. This software system will take a structure-based
approach and its output will be describable at atomic resolution. This goal will require:
1. Creation of protein structure models,
typically using comparative modeling.
2. Refinement of these models.
3. Prediction of binding sites on proteins.
4. Docking ligands against these sites.
5. Analyzing
the predicted ligands and complexes for functional inference (eg, the
identity of the substrate, the pathway to which it belongs) and modulation (eg,
leads for drug discovery).
A key
aim is to make this software system accessible to the general biological
community. To achieve this goal, the
software must operate as an integrated and largely automated pipeline. Admittedly, this is an ambitious goal. Structure prediction, energy-based refinement
of the models, and structure-based screens (docking) have remained the purview
of experts, and even in their hands have been prone to error. We nevertheless believe that this goal is possible. Advances in the underlying technologies have
overcome important barriers in the last five years. Thus, comparative modeling has successfully
predicted the structures of proteins on a genome wide-scale, and the results
have been made available to the community4. Correspondingly, docking screens for ligands
and inhibitors, though they retain algorithmic liabilities, have had important,
practical successes recently; docking software can reliably predict sensible
ligands, a certain percentage of which can be expected to bind. Finally, much of the underlying technology
necessary for this pipeline already exists in our laboratories; a major goal of
this project will be to link already existing software. Notwithstanding its
ambition, this project is thus feasible.
Whereas
many investigators have contributed to each component area of this pipeline,
and each area remains actively researched, this modeling and docking system
will be unique because it can be applied on a genomic scale, enabling a host of
new applications. In addition to making the pipeline available to investigators
in the community, we will ourselves apply it to several important
problems in biology and medicine that
have not been accessible on any scale previously.
For
instance, we will dock large libraries of functionally annotated ligands (such
as metabolites and drug analogs) against multiple proteins within a family of
proteins. Based on the patterns of
ligands predicted to bind, it should be possible to infer functional relationships
among the proteins that would not be available from docking calculations, or
even binding experiments, against any single protein in isolation (Section D.18).
Similarly,
we will use the integrated technologies to target structures determined from
structural genomics, as well as those of related homologs, in an effort to predict
the functions of those proteins for which
no function is known (Section D.19).
Finally,
we will use the pipeline to target entire classes of drug targets among
pathogenic organisms, including all annotated cysteine proteases from several
pathogenic parasites, and all proteins for which a sequence is available in the
malarial genome (Core 3, Driving Biological Project 3).
Of course, we anticipate that many of the new applications to which this
pipeline will be addressed will come at the hands of the biologists who,
because of its simplification and automation, will be able to use these
modeling and docking technologies effectively against their own problems for
the first time.
Post Comment
No comments