A structural perspective on protein-protein interactions and complexes
Genome
sequencing has provided nearly complete lists of macromolecules present in an
organism [1,2]. However, the
component lists alone reveal comparatively little about the function of the
biological systems because the functional units in cells often correspond to
macromolecular complexes [3]. These
complexes vary widely in their activity and sizes [3-7]. They play crucial
roles in most cellular processes, and are often depicted as molecular machines [3]. This metaphor
accurately captures many of their characteristic features, such as modularity,
complexity, cyclic functions, and energy consumption [8]. For instance, the
nuclear pore complex, a 50-100 MDa protein assembly, regulates and controls the
traffic of macromolecules through the nuclear envelope [9]; the ribosome is
responsible for protein biosynthesis; the RNA polymerase catalyzes the
formation of RNA [10]; and the ATP synthase
catalyzes the formation of ATP [7] .
Macromolecular assemblies are also involved in transcription control (eg, IFNb
enhanceosome) [6,11], regulation of
cellular transport (eg,
microtubulines in complex with molecular motors myosin or kinesin) [12-14], and are crucial
components in neuronal signaling (eg,
the postsynaptic density complexes) [15]. A structural description
of the protein interactions is an important step toward a mechanistic
understanding of biochemical, cellular, and higher order biological processes [16-19].
A
comprehensive collection of the known structures of protein complexes is
provided by the Protein Quaternary Structure (PQS) database, which currently contains
~12,000 assemblies of presumed biological significance that are derived from a
variety of organisms (http://pqs.ebi.ac.uk/pqs-doc.shtml)
(April 2004) [20].
The PQS database attempts to provide
the best possible biological unit for all proteins, a complex task hampered by
crystal packing and other problems. Each assembly consists
of at least two protein chains. These assemblies can be organized into ~3,500
groups that contain chains with more than 30% sequence identity to at least one
other member of the group [19].
The
estimation of the total number of macromolecular complexes in a proteome is a
non-trivial task. This difficulty can be partly ascribed to the multitude of component
types (eg, proteins, nucleic acids,
nucleotides, metal ions), and the varying lifespan of the complexes (eg, transient complexes such as those
involved in signaling, and stable complexes such as the ribosome). The most
comprehensive information about protein-protein interactions is available for
the S. cerevisiae proteome,
consisting of ~6,200 proteins. This information has been provided by methods
such as the yeast
two-hybrid system and affinity purifications followed by mass spectrometry [21-29]. The lower bound on binary
protein interactions and functional links in yeast has been estimated to be in
the range of ~30,000 [30,31]; this number
corresponds to ~9 protein partners per protein, though not necessarily all at
the same time. The human proteome may have an order of magnitude more complexes
than the yeast cell; and the number of different complexes across all relevant
genomes may be several times larger still. Therefore, there may be thousands of
biologically relevant macromolecular complexes whose structures are yet to be
characterized [32].
We
review here recent developments in the experimental and computational
techniques that have allowed structural biology to shift its focus from the
structures of individual proteins to the structures of large assemblies [19,33,34]. We also illustrate
these developments by listing their applications to structure determination of
specific assemblies of biological importance. In contrast to structure determination
of the individual proteins, structural characterization of macromolecular
assemblies usually poses a more difficult challenge. We stress that a
comprehensive structural description of large complexes generally requires the
use of several experimental methods, underpinned by a variety of theoretical approaches
to maximize efficiency, completeness, accuracy, and resolution [19,35].
X-ray crystallography and NMR spectroscopy
X-ray
crystallography has been the most prolific technique for the structural
analysis of proteins and protein complexes, and is still the ‘gold standard’ in
terms of accuracy and resolution (Figure 1a). Structures of several macromolecular
assemblies have recently been solved by x-ray crystallography: the RNA
polymerase [36], the ribosomal
subunits [37-41], the complete ribosome
and its functional complexes [42], the proteasome [43], the GroEl chaperonin [44], various complexes
involved in the cellular transport machinery [12,13], the
Arp2/3 complex [45], photosystem I
and the light-harvesting complex of photosystem II [46,47],
the SRP complex involved in nascent protein targeting [48], and
various viral capsid and virion structures [49-51]. However, the number
of structures of macromolecular assemblies solved by x-ray crystallography is
still quite small compared to that of the individual proteins and it will likely be many years before we have a complete
repertoire of high-resolution structures for the hundreds of complexes in a
typical cell. This discrepancy is due mainly to the
difficult production of sufficient quantities of the sample and its crystallization.
NMR
spectroscopy allows determination of atomic structures of ever larger subunits
and even their complexes [52-54]. Although NMR
spectroscopy is generally not applicable to protein structures with more than
300 residues, it can be applied to molecules in solution. It is increasingly
used to determine the residues involved in protein-protein interactions (Figure
1b) [55-58]. For instance,
it was recently utilized to describe structural differences among interactions
between different LIM and SH3 domains [59].
Electron microscopy and electron tomography
There
are several variants of electron microscopy, including single-particle EM
(Figure 1c) [60], electron tomography
(Figure 1d) [61] and electron
crystallography of regular two-dimensional arrays of the sample [62].
For
particles with molecular weights larger than 200 to 500 kD, single particle
cryo-EM can determine the electron density of an assembly at resolutions as
high as 5 Å [63-70].
The full 3D structure of the particle is reconstructed from many 2D projections
of the specimen, each showing the object from a different angle. Imaging by
cryo-EM requires neither large quantities of the sample nor the sample in a
crystalline form. Therefore, single particle
cryo-EM is a powerful tool to investigate the structure and dynamics of
macromolecular assemblies for which X-ray
structure determination is very difficult. Although it is generally impossible
to build atomic models solely from cryo-EM density maps, the maps give valuable
insights into the structure and mechanism of large complexes. They are
particularly useful when combined with atomic-resolution structures of the
subunits, as reviewed in the section on hybrid methods below.
One of the most exciting developments in
structural biology is the new generation of tomography methods that are based
on multiple tilted views of the same object [33,71]. While
electron tomography can be used to study the structures of isolated
macromolecular assemblies at a relatively low resolution of a few nanometers,
its true potential lies in visualizing the assemblies in an unperturbed
cellular context [72]. These
datasets provide fascinating 3D images of entities as large as a small cell at
approximately 5 nm resolution [73]. To widen the
scope of cellular tomography, it is necessary to improve the resolution of the
tomographic images as well as identification of the structures in these images [73-75]. Theoretical
considerations [76] and ongoing
improvements in the instrumentation make
a resolution as high as 2 nm a realistic goal
[77].
Low-resolution experimental methods
A number of experimental techniques can provide
structural information about protein interactions at low resolution (Figure
1e). This information may be used to infer the configuration of the proteins in
a complex. Methods for mapping of protein interactions may provide contact or
proximity restraints on pairs of proteins that are useful in the modelling of
higher order complexes. Such methods include new implementations of the
two-hybrid system [78-81], tagged affinity chromatography [82,83], and a combination of phage display with other
techniques [84] such as synthesis of peptides on cellulose membranes
(SPOT) [85]. Because of the low-resolution nature of these
biochemical characterizations, care is needed in their interpretation. For
example, gauging the biochemicaly-derived interaction sets against known
3D structures of complexes identified potential sources of systematic errors in interaction discovery, such as
indirect interactions in two-hybrid systems, obstruction of interfaces by
molecular labels, and artificial promiscuity in the detected interactions
(Figure 2) [86].
Biochemical
and biophysical methods can also be used to derive low-resolution information
about the relative position and orientation of the domains in a larger complex.
These methods include site-directed mutagenesis that can identify residues
mediating the interaction [87], various forms
of footprinting such as hydrogen-deuterium exchange [88,89] and OH radical
footprinting [90] that can
identify surfaces buried upon complex formation, chemical cross-linking [91-93] that can
identify interacting residues, fluorescence resonance energy transfer (FRET) [94,95] that can determine the distance between the labelled
groups on the interacting proteins, and Fourier Transform Infrared Spectroscopy
(FTIR) that describes structural changes upon complex formation [96]. Small angle X-ray scattering (SAXS) is another
biophysical method that can provide low-resolution information about the shape
of a complex. Recently, SAXS has also been used to study the dynamics of
conformational changes in Bruton tyrosine kinase [97,98].
Computational protein-protein docking
When
atomic structures of the individual proteins involved in an interaction are
known, either by experiment or by modeling, there are a number of computational
methods available to suggest the structure of the interaction [99]. Most of these docking methods aim to predict an
atomic model of a complex by maximizing the shape and chemical
complementarities between a given pair of interacting proteins [99-102]. Docking strategies usually
rely on a two-stage approach:
They first generate a set of possible orientations of the two docked
proteins and then score them in the hope that the native complex will be ranked
highly. The searches may be restrained by other considerations, such as the
known binding site location. The methods differ in protein representation,
scoring of different configurations, and searching for best solutions. Some
methods boldly model the actual diffusion/collision trajectories involved in
the docking process [103,104].
While
the docking methods are not sufficiently accurate to predict whether or not two
proteins actually interact with each other, they can sometimes correctly
identify the interacting surfaces between two structurally defined subunits [105]. Docking
methods are systematically assessed through blind trials in the Critical
Assessment of PRediction of Interactions (CAPRI), a community-wide experiment
that occurs every two years [101,106].
Predictions are made just before the structures are solved experimentally,
followed by assessing the models at the CAPRI
meetings. None of the
methods assessed in the last CAPRI experiment
correctly predicted more than 3 of the 7 target complexes [106].
Methods that are able to work with comparative protein
structure models [107] instead of experimentally determined subunit
structures would extend the applicability of docking to many more biological
problems, but would likely have poorer performance. Currently, docking is often
applied in concert with experimental techniques, including site-directed
mutagenesis [108], amide hydrogen/deuterium exchange [89], NMR spectroscopy [109,110], as well as solid-state binding and surface plasmon
resonance [111].
Inferring interactions from homology
Protein interactions can also be modeled by similarity
[112-114]. If there is a complex of known structure involving
homologs of a pair of interacting proteins, it is usually possible to build a
model by comparative modeling using the known complex structure as the
template. There are now ~2000 distinct interactions of known structure (Aloy & Russell, unpublished
data) that can be used as templates, stored in the PQS database [20].
Building a model of an interacting pair of proteins
based on the known structure of interacting homologs raises some questions. The
first one is whether or not homology implies a similarity in interaction. It
was found that interactions between proteins of the same fold tend to be
similar when the sequence identity is above ~30% [115]. Below this
cutoff, there is a twilight zone where interactions may or may not be similar
geometrically.
Given a template, it is possible to model an
interaction using standard comparative modelling techniques [116]. However, frequently there are multiple templates for
the same interaction type. In addition, a single interaction template can be
used to model many interactons in a single organism. Therefore, it is important
to assess the likelihood of these potential interactions, particularly in the
absence of experimental validation [117]. For example, each of the dozens of fibroblast growth
factors (FGFs) interacts with one or more of seven receptors with different
affinities [118]. Two approaches have been developed recently that
attempt to predict specificity by modelling interactions. The first approach,
implemented by InterPReTS [112,119] and ModBase [114], uses empirical pair-potentials derived from
interfaces of known structure to score how well a pair of homologous proteins
fits a known complex structure. The second approach, MULTIPROSPECTOR, is
similar, although it attempts to study more distantly related protein sequences
by threading sequences onto a library of interacting templates, followed by
scoring how well the individual sequences fit their proposed folds as well as the
interface between them [120]. Both approaches have since been applied to study
large collections of sequences and interactions [113,114,121].
For some large complexes, the specificity of
interactions within a family of homologous subunits is an important determinant
of assembling the complex. For instance, the chaperonin CCT consists of eight
homologous subunits that are all similar to the single subunit type comprising
the thermosome [122]. Thus, to build CCT using the thermosome requires the
conversion of a seven-subunit ring into an eight subunit ring, and then a
choice of the correct arrangement out of the 5040 (8!/8) possibilities. It is possible to guide this process by
experiments, such as the detection of sub-complexes that reveal preferred
interacting pairs [123] or application of the two-hybrid system [124]. InterPReTS was also applied to select one of the 120
possible arrangements of six exosome subunits (Figure 4) [125] with mixed results.
Low-resolution computational methods
Even when docking or modelling is not feasible, it may
still be possible to get some structural insights into a protein-protein
interaction using other computational approaches. Various methods combine
structures with sequence alignments and phylogenetic trees to identify sites on
the surface that are likely to be involved in function or specificity [126-133]. Other computational methods perform alanine scanning
to identify hot spots in structures that may correspond to binding sites for
both small ligands and proteins [134]. There are also
many computational methods for prediction of protein-protein interactions when
no structural information is available (P. Bork and E. Marcotte, this issue).
Hybrid methods
In
the absence of atomic-resolution assembly structures, approximate atomic models
of assemblies can be derived by combining low-resolution cryo-EM data of whole
protein assemblies with computational docking of
atomic-resolution structures of their subunits [135-143]. It has been estimated
that using such fitting techniques improves the accuracy up to one tenth the
resolution of the original EM reconstruction.
Hybrid
approaches involving the fitting of subunits into the EM maps are illustrated
by pseudo-atomic models for complexes of the
actin-myosin complex [144], the yeast
ribosome [145,146] (Figure 3), the
bacteriophage T4-baseplate [147], pre-mRNA
splicing complex SF3b [148], the
rad51 system involved in homologous recombination and DNA repair [149], and
complex virus structures [150,151].
Unfortunately,
experimentally determined atomic-resolution structures of the isolated subunits
are frequently not available. In addition, even if they are available, the
induced fit may severely limit their utility in the reconstruction of the whole
assembly. In such cases, it might be possible to get useful models of the subunits
by comparative protein structure modeling [116,152-155]. The number of models
that can be constructed with useful accuracy is already two orders of magnitude
higher than the number of available experimentally determined structures. Models
with at least the correct fold can be constructed for domains in approximately
58% of the known protein sequences [114]. Comparative modeling
will be increasingly more applicable and accurate because of the structural
genomics initiative [156].
One of the main goals of structural genomics is to determine a sufficient
number of appropriately selected structures from each domain family, such that
all sequences are within modeling distance of at least one known protein
structure [157,158].
Structural genomics may in fact contribute to a
comprehensive and efficient structural description of complexes in an
additional way. While structural genomics currently focuses on
single proteins or their domains, it could be expanded to the sampling of
domain-domain interactions [115,159,160]. Such an
effort would provide a repertoire of templates for binary interactions, which
would facilitate building of higher-order complexes.
Although
x-ray crystallography and EM in combination with atomic structure docking have
been successfully employed to solve structures of protein assemblies, they are
not capable of efficiently characterizing the myriad of complexes that exist in
a cell. For example, most of the transient complexes cannot be addressed at all
with these approaches. Therefore, there is a great need for hybrid methods
where accuracy, high throughput, completeness, and resolution are improved by
integrating information from all available sources [19,125,161].
The dynamics of complexes
By trapping the complexes in different conformations
and configurations, hybrid methods can be used to study the functional role of
assembly dynamics. For instance, models of the two different
functional states of the E. coli 70S
ribosome demonstrated that the complex changes from a compact to a looser
conformation, and showed rearrangements of many of the ribosomal proteins [63]. Similarly, the T antigen double hexamers (a replicative helicase of simian
virus 40) were assembled at the origin of replication using 27.5 Å cryo-EM maps at different degrees of bending along the DNA axis [162]. Fitting the crystal structure of the Tag helicase
domain [163] into the 3D cryo-EM density map ascertained that the C-terminal domains
are rotated relative to each other in the complex. The results were combined
with the available biochemical data, to propose an integrated model for the
initiation of viral DNA replication. Comparison also revealed details that are key to
understanding filament function. Fitting of atomic models of actin and the
myosin cross-bridge into 14 Å cryoEM maps
showed that the closing of the actin-binding cleft upon actin binding is
structurally coupled to the opening of the nucleotide-binding pocket [67].
The dynamics of assembly models can also be studied by theoretical
calculations [164-167].
A vibrational analysis
of elastic models was employed to capture the essential motions in clamp closure in
bacterial RNA polymerase, the ratcheting of 30 and 50S subunits of the
ribosome, and the dynamic flexibility of chaperonin CCT [168]. And a quantized elastic deformational model provided a basis to
simulate conformational fluctuations related to expansion and contraction of
the truncated E2 core from the pyruvate dehydrogenase complex [169].
Post Comment
No comments