Overproduction of proteins in E. coli.
General ref: Current
Protocols in Molecular Biology, Ausubel et al Eds.
Objectives
To understand the following
issues with respect to production of foreign proteins in E. coli.
1. The need to provide an E. coli promoter and ribosomal binding
site.
2. The need to keep
expression turned off during growth and propagation of the clone.
3. Problems related to
stability and purification.
4. Use of affinity
purification systems.
5. Recombinant Phage
display.
Reasons for over-expressing proteins.
1. To purify large amounts
for study or for sale.
2. To purify from a more
convenient heterologous organism.
3. To purify away from other
components of the originating organism.
4. As a prelude to in vitro mutagenesis.
Overview
The
amount of care necessary to successfully express a foreign protein in E. coli depends on how much yield you
need. If you're just trying to get
enough to detect activity, then most any fusion to a valid E. coli promoter will probably do.
For many research purposes, expressing the protein as about a percent of
the bacterial protein is probably more than enough. If the gene comes with its own promoter, this
may be achievable by simply putting the gene on a multicopy vector. If the gene is without a promoter (a cDNA for
example), one can get this level of expression from fusing to any number of
strong E. coli promoters. At this level of expression, one is mainly
concerned with avoiding problems caused by some noxious property of the gene
product (i.e. instability, refusal to fold, toxic to the host, mRNA degradation
signals in the untranslated regions).
Other
purposes, for example supporting structural studies, require high yields. Yields in excess of 40% of total E. coli protein can be obtained. To reach these yields, one should expect to
optimize every step in the expression pathway.
Getting high level transcription is usually not too hard. One may have to supply optimal translational
start signals, to supply a transcriptional terminator, to remove some
nonoptimal codons, to remove or replace untranslated regions, and to be
prepared to recover large amounts of insoluble protein and refold it.
This image is from New England
Biolabs advertising information for one of their affinity expression
systems. The amount of the recombinant
protein produced on top of total cell protein can be seen in lane 3. (http://www.ebiotrade.com/buyf/neb/newprt/A33.asp)
Typically,
after induction and an expression period one spins down about a ml worth of
cells, cooks them in SDS loading buffer, and then analyze by SDS PAGE. This is total protein, including insoluble
inclusion bodies, membrane proteins, and soluble cytoplasmic protein. To distinguish whether the protein is in the
soluble or insoluble fraction, one would open the cells by sonication, separate
soluble and insoluble fractions, cook the insoluble fraction in SDS, and load
each on the SDS polyacrylamide gel. In
order to carry on with the affinity purification as indicated above, the
protein would have to be in the soluble fraction. If it's in inclusion bodies, there are a
series of washes to purify the inclusion bodies, then one would have to denature
and renature before carrying on. If the
protein is in the membranes, there it may be solubilized by gentle detergent
treatment, eg. in Triton X 100.
When
high level expression is coupled to in
vitro mutagenesis, one should expect additional problems with the
mutants. Mutant proteins are generally
less stable, and therefore more susceptible to degradation and
insolubility. Multiple mutations cause
progressively more trouble.
Affinity Systems.
Most
high level expression experiments in E.
coli are done by making a fusion with some protein that is easily purified
by affinity chromatography. Usually the
fusion partner comes as part of the expression vector and will be the N
terminal domain of the construct. This
is so that the novel sequence added is not near the translation expression
signals and is not at risk of forming secondary structure with them. Typically the vector contains a cloning site
just downstream of a proteolytic cleavage site.
Your insert would most easily be added by use of PCR amplification with
primers designed to add a 5' extension serving but to provide the restriction
site, and to supply an appropriate translational fusion.
The above figure is from Promega's advertisements for
expression vectors (http://www.promega.com/vectors/bacterial_express_vectors.htm).
Note
that in this typical case, one has to make the 5' primer to add the restriction
site of choice and keep the fusion protein in phase:
For
example:
MetLeuLysLeuMetLeuProSerGluAspSer
ATGCTTAAGCTTATGTTGCCCTCTCAGGACAGC
might
be used together with the HindIII cleavage site in the Xa-3 vector (where Met
Leu Pro is the beginning of the natural protein). However, the protein after factor Xa cleavage
will have the N-terminal sequence Glu Lys Leu Met Leu Pro ... There are a few vectors designed to get
your protein back out without extra residues on the N-terminus.
Since there is PCR involved,
expect to resequence the clone to rule out inadvertent PCR-induced mutations.
General methods of boosting expression.
1. Increase copy number of
the gene.
2. Fuse to more powerful
transcription and/or translation signals.
(e.g. lac, lambda PL, Trp, TAC,
beta-lactamase.)
Problems and potential solutions:
1. Codon preferences
·
Resynthesize gene or segments thereof with favored codons, particularly
codon #2, or replace runs of adjacent unfavorable codons.
·
Use host strain with extra tRNAs.
2. Degradation of protein:
·
lon- host.
·
Fusion to another protein may stabilize small proteins.
·
Use protease inhibitors after opening cells.
3. Insolubility of the expressed protein:
a. Find in inclusion bodies
and solubilize by denaturation and renaturation.
b. Solubilize under
nondenaturing conditions.
c. Increase solubility by
use of a fusion partner.
d. Look out for missing
cofactors (like metal ions) in the growth medium.
e. Co express with a
chaperonin.
f. Try growth at reduced
temperature.
g. Express at a reduced rate
to give the protein a chance to fold.
h. Be happy with the soluble
portion. (But it it's a small portion, beware that it might represent
mistranslated or unfolded material).
4. Expression of the protein
is toxic to E. coli - Use a tightly
controlled promoter to keep expression turned off until the clone has been
grown up.
5. Instability of the
plasmid. This problem is particularly
bad when the plasmid is maintained with ampicillin (or other antibiotic
resisted by a beta-lactamase).
a. Keep expression
suppressed during growth.
b. Eliminate unnecessary
passages.
c. Consider a vector with a
better antibiotic selection.
d. Use a recA- host.
6. Problems related to
fusion partners.
·
Protease to cleave the fusion domain off may cleave inside your
protein.
·
Cleavage at the protease cleavage site may be inhibited by presence of
your protein.
·
Extra residues added to protein may change its properties.
·
Your protein may interfere with binding of the partner to its affinity
resin.
·
Your segment of mRNA may form secondary structure with the translation
signals.
Somatostatin
is a peptide hormone. From the known
amino acid sequence, a somatostatin gene was synthesized with E. coli codon
preferences. It was expressed from the
lactose promoter with and without fusion to beta-galactosidase, with the latter
found to stabilize the peptide. The
fusion was made after a Met residue so that somatostatin was recovered from the
fusion protein after cyanogen bromide cleavage.
The unfused construct produced no detectable somatostatin, and the
fusion construct produced a disappointingly low yield of insoluble
protein.
This
was the first published attempt to mass produce a eucaryotic protein in E. coli.
It mainly served to anticipate some of the problems that must be
overcome for successful mass expression.
The solubility problem remains something that requires a customized
solution for each protein, although stable globular proteins do better than
short peptides. This experiment did
establish the strategy of fusing foreign peptides to a carrier protein to
stabilize them. More specific means of
cleaving the fusion junction are now available.
The
low yield was related to a failure to adequately down-regulate the expression
of the insert while the clone was being grown and propagated. The lac regulation was overpowered by the
copy number of the vector (pBR322). Even
though pBR322 exists in only about 20 molecules per cell, this enough to
titrate out the available lac repressor.
This causes partially constitutive expression of the insert, which
causes selection for deletions that take out the promoter or the insert.
It
is a common error for people to get a poor yield and blame it on degradation,
when what really happened is that the gene or promoter was already genetically
damaged in the construct by the time they looked for expression. The classic method to investigate protein
degradation is to pulse label with [35]S-methionine and observe that
the protein really is produced and then degraded. (Note: Look to be sure the protein has an
internal met codon first; the initiator met is often removed by
posttranslational processing). An
alternative would be to do a western blot.
For several of the affinity tag systems, one can obtain commercial
antibody to the tag, which could be used for this purpose. However, with modern expression systems, the
transgenic protein should be obvious on a simple Coomassie stained SDS
gel. One should both run a sample of the
cell lysate, and a sample obtained by cooking the insoluble cell debris in
SDS. It will often be true that the
major portion of the expressed protein is in the insoluble fraction.
Another
symptom of genetic instability caused by expression leakage is that the yield
drops off precipitously as the clone is propagated. So the clone might produce a great yield in a
small pilot experiment, and then make almost nothing when scaled up to several
liters. One should consider keeping back
a small sample of the culture to allow examination of the plasmid DNA itself
after the fact. Genetic instability will
often show up as a heterogeneous set of deletions. However, you need to keep in mind that point
mutations in the promoter, or even mutations in the host background can also
destroy the expression of the insert.
Instability and ampicillin
resistance.
The
instability problem when growing expression clones is worse when trying to
maintain the clone with ampicillin resistance than with other antibiotics. This is because ampicillinase (beta-lactamase)
leaks out of the cells while they are growing in liquid culture and destroy the
ampicillin in the culture fluid. After
that, bacteria that lose the plasmid tend to overgrow the culture. A typical experience goes as follows:
1. The clones behave as expected
on an ampicillin plate.
2. Small scale cultures
produce the protein as expected.
3. An overnight preculture
is prepared to start a large scale growth.
4. When the large culture is
inoculated the next day, the optical density increases only slightly, and then
decreases. To the practiced eye, there
is an accumulation of stringy debris indicative of lysis.
5. The effect is non
reproducible. Sometimes the large scale
culture grows and sometimes it lyses.
When it does grow, there can be a long lag phase, and the protein yield
is typically less than anticipated from the small scale culture.
The
explanation is that the ampicillin is cleared from the preculture and then
ampicillin sensitive bacteria that have lost the plasmid overgrow to various degrees
by morning. When the preculture is used
to inoculate media with fresh ampicillin, the bacteria begin to grow. But they cannot synthesize cell wall due to
the ampicillin, so they lyse.
This problem shows similar symptoms
to a T1 phage infestation. T1 is a
bacteriophage of E. coli that
survives dehydration, and spreads as an airborne contaminant. It causes
aggressive lysis, producing plaques on plates the size of a quarter. T1 infestation is rare, but when a culture
gets accidentally infected, lyses, and then opened, it can spread enough
airborne contamination throughout the lab or even an entire building that no
one can grow E. coli cultures for
years afterwards. This forces everyone
to derive T1 resistant versions of all of their strains. This is a tremendous setback when it happens,
hence everyone is advised upon observing a culture of E. coli to lyse unexpectedly to autoclave it without opening
it. Clearly, it is inadvisable to have a
background of cultures lysing unexpectedly due to this ampicillin selection
problem because it reduces vigilance against the T1 infestation problem.
When
working with an expression plasmid based on ampicillin selection, special
precautions are required to maintain the selection. The growth is generally done more
continuously to avoid precultures going to saturation. However the growth may still be done in
stages with inoculation into fresh
medium.
Some
biotech companies are promoting expression vectors based on different
antibiotics to counter this effect.
This
is probably the first published successful mass expression of a eucaryotic
protein in E. coli. Human growth hormone is a 191 residue peptide
hormone. The first 24 codons were
resynthesized with an Eco RI site upstream of the AUG convenient for joining to
the lac promoter and ribosome binding site.
The other end was made as a Hae III site. The synthetic segment was first cloned and
sequenced in an independent vector to verify the correct sequence.
The cDNA was cloned as a Hae III
fragment which omits the first 24 codons.
The two parts of the gene were then ligated together an joined to an Eco
RI site downstream of two lac promoters.
They
used lac iQ (overproducer of lactose repressor) to get
tighter control over expression and downstream transcriptional fusion to the
tet resistance gene of pBR322 to guard against deletion. Upon induction, they got 20% of cellular
protein as HGH.
This
strategy anticipates some of the common tricks still used today. The resynthesis of the N-terminal region as
part of a linker (or more recently as a PCR primer) is a standard method of
achieving a fusion. Oligo synthesis is
sufficiently advanced today to easily reach lengths of up to 100 bases. Lac iQ is still used to improve control over the lac
promoter. Sometimes the lac i gene is
placed on the cloning vector so that its gene copy number is increased together
with the number of lac promoters.
However, the lac promoter still leaks expression of the insert. Other promoter systems (lambda PL and T7) can give a
negligible basal expression level, and are preferred for inserts with toxic
properties. These improved promoter
systems have supplanted the use of transcriptional fusions to an antibiotic
gene as the preferred way to stabilize troublesome inserts. Additionally, one tries to avoid serial
propagation. With lac and tac promoters,
induction of expression is with IPTG.
IPTG should never be added until the production cultures are grown. Specifically, it should not be added to the
plates on which the clones are selected and/or stored.
This
experiment also typifies the multistep constructions that were common in the
last decade. In a multistep construction
(where you're putting a lot of different restriction fragments together), there
are lots of things that can go wrong. As
much as possible, you need to make one joint at a time, clone the intermediate,
verify it, and then cut it back out to use in the next step. Today, one would try to use modern techniques
and materials to reduce the number of steps.
For example, one would use an established expression vector that already
had the promoter, lots of convenient restriction sites, a host strain, and a
history of successful expression experiments.
This would avoid the steps involving creation of the vector. It would probably be preferable to use the
synthetic segment as a PCR primer, and therefore lift out the intact HGH gene
in one step. However, it is not a useful
simplification to throw four or more fragments into a ligation reaction and
expect them to all join together in the proper order.
The
object of simplifying a construction is to increase the reliability, not to
reduce your work load. Steps that are
for verification improve reliability and should be included as much as
possible. It's the steps that are mainly
opportunities for something else to go wrong that you're trying to eliminate.
When
joining the beta-globin AUG to the ribosomal binding site of the lac promoter,
they made a set of deletions with exo III and S1 to get a variety of
spacings. The clones were
translationally fused downstream to lac z so that the efficiency of the various
arrangements on the 5' end could be assayed by looking at beta galactosidase
activity. When an efficient construct
was found, the 3' end was replaced to make an unfused beta-globin gene.
The
figure above shows the relative activity recovered based on the exact sequence
that was deleted. This experiment served
to show that the spacing between the AUG and the ribosome binding site is
critical. In modern constructs, one uses
an exact copy of an efficiently translated E.
coli gene for this region.
Proinsulin - K. Talmadge, et al. (1980) PNAS 77, 3988.
Preproinsulin
has a eucaryotic signal sequence at its N-terminus that normally directs it to
be secreted. Beta-lactamase has a
bacterial signal sequence at its N-terminal which directs its secretion into
the periplasmic space. Several fusions
with part of the bacterial and part of the eucaryotic signal sequences were
made. They all directed secretion of the
protein, and in each case the signal sequence was properly cleaved off to
create correct mature proinsulin. In
fact, even the plain proinsulin signal without any bacterial component
worked.
Beta
lactamase signal | Cleavage
MSIQHFRVALIPFFAAFCLPVFA HPETLVK...
MSIQHFRVALIPFFAAFCLPVFA HPET
AAGGGGGG
QHLCGPHLVEALYLVCGE...
MSIQHFRVALIPFFAAFCLPVFA HP
LQGGGGG
WRMFLPLLALLVLWEPKPAQA
FVKQHLCGPHLVEALYLVCGE...
MSIQHFRVALIP LQGGGGG WRMFLPLLALLVLWEPKPAQA FVKQHLCGPHLVEALYLVCGE...
MSIQ AAAG WRMFLPLLALLVLWEPKPAQA FVKQHLCGPHLVEALYLVCGE...
MALWRMFLPLLALLVLWEPKPAQA
FVKQHLCGPHLVEALYLVCGE...
Preproinsulin signal ^
Cleavage site
From
Talmadge et al. (1980) PNAS 77, 3988-3992.
This
experiment established the feasibility of causing the foreign protein to be
secreted into the periplasmic space, along with removal of the signal
sequence. The idea was that by secreting
the foreign protein, it would be easy to purify, and protected from stability
and solubility problems. However, it
turns out that the periplasm has even more proteases in it than the cytoplasm,
so one generally gets a lower yield this way than by just leaving it in the
cytoplasm.
Strength of the ribosomal
binding site.
Ref: Mott et al. (1985) PNAS 82, 88-92.
In order to over express the E.
coli rho protein, it was fused to the lambda PL promoter either with its
own ribosomal binding site or with the ribosomal binding site of the lambda cII
gene. The former construct gave rho as
3%-5% of the cellular protein after induction, whereas the latter gave
approximately 40%. So even with
bacterial genes, it can help to improve the translation signals.
The
lambda PL promoter is still one of
the best around due to its very low basal expression level, its high activity,
and its ease of induction (heat).
However, you have to use a host with the CI857 ts lambda repressor gene in it, and you have to
be sure to grow the clones at 32C so as to avoid leakage of expression. Modern expression vectors usually come
already carrying the translational signals from a heavily expressed gene like
cII. Often there is an N terminal fusion
domain, so the site for fusing your coding sequence will not interfere with
translational initiation. However, if
you do try to place your coding sequence so that it will be the N terminal
domain, be sure not to disrupt anything between the rbs and the initiator AUG
from the expression vector. Your sequence can also
inadvertently create mRNA secondary structure that ties up the initiator codon
or the ribosome binding sequence and cause poor translational efficiency. Such proposed constructs should be checked
out for secondary structure problems using prediction programs. Mfold, found on the internet, is good for
that purpose.
A completely synthetic
approach.
Ref:
Jay et al. (1984) PNAS 81, 2290-2294.
The
gene for human gamma interferon was completely synthesized including a strong
bacteriophage T5 promoter and a strong ribosome binding site. The gene was ligated together from a series
of 66 overlapping oligonucleotides as illustrated in the stylized diagram
below. One can put many oligos together
in a single ligation, although it may be wise to assemble the gene as a series
of smaller restriction fragments that can be independently cloned, sequenced
and then ligated together to form the whole gene.
The
synthetic gene was ligated into a plasmid vector such that the tet resistance
gene was fused downstream to hold on selection against loss of the interferon
gene. Human interferon accumulated at
> 15% of cellular protein.
Today,
individual oligos of 100 bases can be made with little risk of incorporating
errors. This is because the chemistry
has been changed so that error products (failure to add a base at any step) are
capped and left in a condition so that they can all be removed from the correct
product in a one step purification at the end of the synthesis. So it is possible to construct entirely
synthetic genes of substantial length.
Use of high copy number
vectors.
Ref: Winter et al. (1982) Nature 299, 756-758.
M13 RF maintains a copy number of about 200 molecules per infected
cell. Gene cloned into M13 with their
own promoters can have high level expression, even if their own promoters are
not particularly strong. M13 is a phage
that packages a single stranded circle of DNA into the capsid. Within the cell M13 grows as a double
stranded plasmid called RF (replicative form).
Methods of mutagenesis prior to PCR-based methods were based on priming
a single stranded template with a mutagenic primer. Hence M13 vectors fit easily into that
strategy.
In order to avoid deletions, one
should make the phage propagate by infection rather than by division of
infected cells. Also, one should avoid
serial culturing.
Inclusion bodies.
Heavily
expressed proteins often aggregate and form inclusion bodies. Inclusion bodies pellet with the bacterial
debris after cell lysis. Since this
fraction is often discarded, it is easy to mistakenly believe that the expressed
protein has been degraded.
Inclusion
bodies can actually protect a protein from degradation. Also, after isolation by differential
centrifugation and washing, the over-expressed protein may be almost pure
within the inclusion body.
Proteins
within inclusion bodies are insoluble, often in a denatured state, and may have
inappropriate disulfide bonding. One
generally solubilizes the inclusion bodies in a denaturing agent, such as
guanidine hydrochloride, reduces, dilutes, and then tries to refold the protein
out of a low concentration of guanidine hydrochloride and in the presence of
reduced and oxidized glutathione. It is
possible to impose a purification in the denatured state, either by
chromatography in urea, or by affinity purification using a His-tag. Further purification of the refolded form
will be necessary along with physical characterization to assure that it is the
correct native form. For some kinds of
experiments, aggregate in the refolded material is particularly troublesome, so
a gel filtration is a common follow-up purification. The aggregated material can be recycled
through the denaturation and refolding steps.
Conditions for refolding vary from protein to protein and for some may
be hard to find. Failure to include an
essential metal ion cofactor would be one cause of trouble. It also follows that one has to have worked
out suitable buffers for the purification step and for storage so that the
protein is not reaggregating after the fact of getting it properly
refolded. Membrane proteins will require
detergents, probably at all steps. For
proteins with chronic solubility problems, a fused affinity domain with good
solubility characteristics may help to keep it in solution.
Proteins
can be in inclusion bodies for reasons other than being unfolded. RNA binding proteins are often found in
inclusion bodies by virtue of being networked with cellular RNA. It may be possible to release such proteins
without denaturation by treating with RNAse.
Sometimes
proteins can be engineered to avoid certain folding problems. For example, if a cys is involved in
inappropriate disulfide bonding, and homologous proteins suggest an alternative
acceptable amino acid, making that replacement by in vitro mutagenesis might improve the ease of isolation of the
protein. An alternative hit-or-miss
strategy for finding a version that behaves better is to isolate a variety of
homologues from related organisms.
Protease- mutant
host bacteria.
E. coli has numerous proteases that can
attack and degrade recombinant proteins.
In particular, synthesis of protease La, which is the product of the lon locus, is induced by the presence of
abnormal proteins. lon is a heat shock gene, and is probably there to degrade damaged
proteins after heat shock. Recombinant
proteins are degraded 2-4 times more slowly in lon- cells.
Alternatively, the HPTR locus
which encodes an alternative sigma factor for directing the induction of heat
shock genes can be mutated.
Ref:
Goff and Goldberg (1985) Cell 41, 587.
mRNA half life.
Most
E. coli messages have half lives of about 1-2 minutes. T4 gene 32 mRNA has a half life of about 30
minutes, this being part of the phage's strategy to achieve high level
expression. Sequences in the 5'
untranslated region of the message confer this excessive stability. Expression cassettes have been constructed
wherein the 5' end of T4 gene 32 is fused to the beginning of the recombinant
gene.
As
far as I can tell, this system has not appeared in a commercial vector
yet. However, be warned that the
opposite effect can happen by accident.
You may inadvertently introduce sequences in an untranslated region that
destabilize the mRNA. You should
generally avoid including untranslated regions from cDNAs in E. coli expression vectors.
Ref: Frey et al., (1988) Gene 62, 237-247.
Interaction of the transgenic protein with chaperonins
Ref: Overproductions of
Anabaena 7120 ribulose-bisphosphate carboxylase/oxygenase in Escherichia
coli-Larimer and Soper (1993) Gene 126: 85-92.
In
photosynthetic organisms Rubisco (D-ribulose-1,5-bisphosphate carboxylase catalyzes the initial step in the reductive
pentose phosphate pathway. Refolding of
Rubisco in vitro requires chaperonins.
High-level production of Rubisco activity from E. coli was
aided by the simultaneous overproduction of the E. coli (GroESL)
chaperonins.
Curiously, some proteins may be stabilized
in strains carrying a defective chaperonin (Reidharr-Olson et al., Biochemistry
29: 7563-7571(1990))
Problems with non preferred codons.
E. coli uses preferred codons among
the synonymous sets for its own highly expressed proteins. The implication is that the non preferred
codons are translated inefficiently.
Eucaryotic genes are full of these non preferred codons, yet they
usually can be highly expressed without trouble. However, sometimes it does help to put a preferred
codon at amino acid #2, or to fix stretches of adjacent non preferred
codons.
An
alternative solution is to use expression hosts that contain additional tRNA
genes added for the purpose of
increasing the level of tRNA specific for non-preferred codons. Strategene sells a series of expression
strains under the trade name CodonPlus which coexpress different sets of tRNA
genes targeted at rare codons.
Other genetic code problems.
Some
genomes don't use the same genetic code as E.
coli. For example, most
mitochondrial genomes use a few altered codons.
Once the altered code is known, the gene will have to be altered by in
vitro mutagenesis at the variant codons to match the E. coli code. A few human
nuclear genes are edited at the RNA level.
RNA editing is found elsewhere, reaching an absurd level in Trypanosome
mitochondria. One would have to be sure
to be working from the final sequence after editing.
T7 RNA Polymerase/Promoter systems
This
system (marketed under the name pET by Novagen, but also publicly available) is
very popular today. It expresses the
foreign gene from a T7 promoter on the vector.
T7 is a bacteriophage that makes its own RNA polymerase that is specific
for its own promoters. The T7 polymerase
is provided either by an inducible T7 polymerase gene in the host, or by
infecting the culture with a phage carrying the polymerase gene after the cells
have been grown up.
There
is a multiple cloning site downstream of the T7 promoter. If the gene already has a suitable ribosomal
initiation site, it can simply be inserted in the correct orientation. Alternatively, one can add a strong ribosomal
binding site, engineer the codons to match E.
coli's preferences, add the restriction sites, and even add a
transcriptional terminator, all by using the linker, or PCR fusion procedures
described above. Versions of the vector
exist that have a strong ribosomal binding site and a cloning site right at the
AUG, so that one could fuse right at the AUG.
If
the T7 polymerase gene is in the host background, it will be under the control
of the lambda PL promoter which is in turn
under control of a lambda CI857 ts
lambda repressor gene also in the host background. This promoter has one of the lowest basal
expression levels of any around, but there is still a little leakage of
expression of the transgene. If the
basal expression of the transgene proves toxic to the host, then one grows up
the clone in a host with no T7 polymerase gene, and then introduces it by
infection with a phage carrying the polymerase gene. This is the major advantage of the T7
systems. One can alter the method of
control of expression without having to make new constructs.
Some people have reported leakage of
expression in this system even without the T7 polymerase. When going for 0 basal expression, one has to
worry about leakage of transcription from other promoters in the vector that
read through into the transgene.
Fusion systems:
There
are numerous commercial systems marketed in which you fuse your protein to some
other protein that provides a purification handle. Sometimes the fusion is designed to direct
secretion into the periplasmic space.
Then some means is provided to subsequently cleave the fusion
apart. One needs to pay attention as to
whether or not the protease will cleave internally to your protein. In most systems the cleavage will leave
extraneous amino acids attached to your protein. So you will have to evaluate if that will be
acceptable for your purposes. Hopefully
the cleavage will be accomplished on the folded fusion protein, directly
releasing your folded polypeptide.
Unfortunately, sometimes you have to denature the fusion protein to get
the protease to cleave the fusion site.
Maltose-Binding Proteins
Fusions
This
system, based on vectors pMAL-c2 or -p2, can be obtained from New England
Biolabs. In this system, you make a
translational fusion downstream of malE, which is a secreted E. coli protein that binds maltose. When expressed in pMAL-p2 the fusion protein
is recovered from the periplasmic space.
Alternatively, the pMAL-c2 version is designed to leave the fusion
protein in the cytoplasm. In either
case, the fusion protein can be purified by affinity chromatography on an
amylose column, and then cleaved with factor Xa protease which is specific for
the fusion site, leaving your protein with a few extra amino acids at the
N-terminus. The XmnI specificity is
GAANN^NNTTC, making it possible to fuse with no extra amino acids if you can
arrange for your insert to start with the first codon at a blunt end.
As we saw before, secretion into the
periplasmic space turns out to usually reduce the yield (in this case about 4
x). However, some proteins that form
disulfide bonds fold better if secreted.
On the other hand, large proteins that are normally cytoplasmic have
trouble getting through the membrane.
The major attraction of making the fusion is to allow affinity
purification based on the maltose binding domain, before cleaving it off with
factor Xa. Factor Xa cleaves after the
Ile Glu Gly Arg at the fusion site.
Other fusion systems:
There
are a variety of other commercially available fusion systems that are designed
to assist purification of your protein, then let you cleave your protein away
from the bacterial domain.
Novagen, Inc. (now part of Merck), has a variety of T7 pET type vectors
designed to effect fusions with various proteins that can be used as
purification handles:
Tag n/c terminal basis for detection
and/or purification
T7-tag N monoclonal antibody
S-tag N RNAse S-protein
His-tag N or C metal chelation
chromatography
HSV-tag C monoclonal antibody
pelB/ompT N potential peri-
plasmic
localiza-
tion
The
His-tag system is essentially just a string of 6 histidines in a row fused to N
or C-terminus. This is the only affinity
method that can be used in the denatured state.
Some renaturing schemes call for refolding the protein while bound to
the metal ion column. Qiagen markets an
exopeptidase that can remove an N-terminal His-tag without the requirement for
creating a cleavage site at the junction with the body of the protein. There are limitations to the amount of
reductant that can be used with the Ni ion affinity resin without reducing the
Ni. There are commercially available
antibody affinity systems for purifying His-tagged proteins, and antibodies to
the tag can be used on a western blot to assay for protein degradation.
Pharmacia
Biotech (now GE Healthcare) uses plasmids (pGEX vectors) designed for
inducible, high-level intracellular expression of genes or gene fragments as
fusions with glutathione S-transferase (GST).
The fusion proteins can be detected using colorimetric assay or
immunoassay and purified using Glutathione Sepharose 4B affinity
chromatography. The GST domain is a
dimer, so if the quaternary structure of the recombinant protein is relevant to
the experimental design, then it will have to be removed.
Eastman
Kodak has the Flag System (now marketed through Sigma Aldrich) that is based on
the Flag marker octapeptide that is fused to a protein by molecular cloning of
its DNA coding sequence adjacent to the protein coding sequence for expression
in an appropriate vector. Detection is by
specifically binding mouse monoclonal antibodies to the octapeptide, while
purification is by affinity chromatography.
An amino-terminal Flag peptide can be removed by the protease,
enterokinase. The Flag fusion proteins
can be expressed in E. coli, yeast, insect, or animal cells.
InVitrogen uses a system based on fusion to
thioredoxin with purification by binding to a phenylarsine oxide resin.
[Stratagene
was incorporated into Agilent, and many of its products have disappeared from
the market. The most common 'solubility
enhancement tag' still used is matose binding protein with an N termina his
tag.] Stratagene
packs a variety of functions into its Verflex tag system. The tag for affinity purification is based on
binding to streptavidin. Also incorporated
is an alpha complementing fragment of beta galactosidase (Q-tag) which can be
used to quantitate the fusion protein by a beta-galactosidase assay. A final innovation is the inclusion of
"solubility enhancing tags", which are highly negatively charged
folding domains. The idea is that these
may increase solubility of the fusion protein by charge-charge repulsion.
In all cases where immunodetection
or immunoaffinity purification is used, one has to use a tag that has no
endogenous counterpart.
In
the various combinations above, one can cleave with factor Xa, thrombin, or
enterokinase. Any of these might hit
sites within your protein, in which case you switch to a vector that uses one
of the others. A variety of other
proteases have made their way into commercial expression vector systems. GE Healthcare Life Sciences (was Amersham
Pharmacia) has a product they call PreScission protease that is based on the
rhinovirus protease. They market it as a
noncleavable GST fusion. That way, you
can get your GST fusion bound to glutathione conjugated sepharose and then just
mix the protease in. You protein is
released, and the GST fusion partner as well as the protease are retained on
the resin.
New England Biolabs markets a system
named IMPACT where the cleavage is effected by the activity of a self splicing
protein called an intein. Chong et al.,
1998. Nucl. Acids Res. 26:5109.
Depending on the variety, the cleavage is instigated by pH, temperature,
or a thio reagent. It is possible to
leave a reactive thioester on the N terminus for use in subsequent coupling
reactions.
Recombinant Phage Antibody System
Pharmacia
Biotech (GE Healthcare) also has the Recombinant Phage Antibody System (RPAS)
designed for the cloning and expression of recombinant antibody fragments in
bacteria. In this system, one makes two
insertions into a fusion protein, one derived from an Ig heavy chain variable
region, and one from an Ig light chain variable region. The fusion protein juxtaposes the chains to
allow formation of an antigen binding site.
The fusion protein is displayed on the surface of an fd (M13) phage,
allowing one to screen a library of plaques with a labeled antigen. Even better, one can purify phage that bind
the antigen by affinity and then reinfect the host.
In essence
the system produces single polypeptide versions of antibodies (ScVf) quickly in
bacterial cultures. The
"cleavage" of the soluble antigen binding domain from the phage
protein domain is done by an interesting genetic manipulation. There is an amber stop codon after the
antigen binding domain and before the phage binding domain. To get bound antibody, one expresses from a
amber suppressor strain. To get soluble
antibody, one expresses from a non-suppressor strain.
Phage Display
A popular variation on the above theme is to
fuse a library of peptides to the phage coat protein and to purify the
particular sequences that bind to some ligand.
Typically the library is composed of random sequences, and the clones
that bind are sequenced and used to discern amino acid patterns required for
binding. Commercial libraries are
available with random 7 mer peptides or 12 mer peptides. One could hope to screen a large enough
library to contain all possible 7 mers, but longer peptides will necessarily
have only a fraction of all possible sequences present. Screening is generally by panning, in which
the library of phage, each containing the DNA specifying its displayed
sequence, is reacted with a surface coated with ligand. Phage retained on the surface are eluted,
amplified, and panned again several times.
Typically one uses relatively non stringent binding conditions (high
concentration of the ligand) at first because the concentration of phage that
will bind is so low. Stringency is then
increased in later rounds of panning by decreasing the concentration of ligand.
Phage
Display can also be used with protein domains subjected to saturation
mutagenesis. However, non secreted
proteins are often not expressed efficiently in this system because they fail
to successfully pass through the bacterial membrane in the assembly of the
virus.
Variations on this theme are:
1. The coat protein gene
could be on a phagemid. A phagemid is a
plasmid that additionally has an origin of replication from an M13-like
phage. When a helper phage is provided,
a single stranded version of the phagemid ends up packaged as if a viral
genome.
2. Rather than coating the
surface with the ligand, one could coat it with an antibody to the ligand. Binding could then be as a sandwich.
3. New England
Biolabs now sells a phage display kit based on M13.
4. Novagen has a T7 phage
display system.
The
following figures from NEB's
phage display manual (http://www.neb.com/nebecomm/ManualFiles/manualE8101.pdf)
show the method of construction, and a diagram of the panning step for using
their system:
Refs:
Clackson,
T., et al. Making antibody fragments using phage display libraries. Nature
352:624-628 (1991).
Scott,
J.K., et al. 1990. Searching for peptide ligands with an epitope library. Science 249:386-390.
Hogrefe,
H.H., et al. 1993. Cloning in a bacteriophage lambda vector for the display of
binding proteins on filamentous phage. Gene 137:85-91.
Problems
1.
You express a mutant protein in E. coli
and find little in the cell lysate with a Western blot. Expression of the wild type protein had never
been a problem. What do you suspect
first, and how would you test your hypothesis?
2.
Do you expect any special problems from expressing a human mitochondrial gene
in E. coli? If so, what would you do about it?
3.
You have determined the protein sequence from a trypanosome mitochondrial
gene. What would be the most direct way
to express this protein in E. coli?
4.
You make an expression construct that makes a high yield in a small pilot
growth, but gives disappointing yields when large scale (100 liter) cultures
are grown up. What do you think is the
problem, and how would you solve it?
------------------------------
last
edited 4/5/2011; Steve Hardies
Post Comment
No comments