FAMILIES OF BACTERIAL SIGMA FACTORS
Bacterial s factors belong to two large, and apparently
unrelated, protein families: the s70 and
the s54 families (Gross et al.,
1998; 1992; Helmann, 1994; Helmann and Chamberlin, 1988). Within the s70
family, there are several phylogenetic groups that often, but not always,
correlate with function. Lonetto et al.
(1992) originally distinguished between the primary (group 1) s factors, a group of
closely related but non-essential paralogs (group 2), and the more divergent
alternative s factors (group 3). To
this classification, we can now add the ECF s factors (group 4) and the newly emerging TxeR
family (group 5).
The nomenclature of s factors and their genes has generated considerable
confusion. In general, most s factors in E. coli and other
Gram negative bacteria are given the designation of RNA polymerase
subunits, rpo. Examples include the
primary s, encoded by the rpoD gene, and the heat shock s factor, encoded by the
rpoH gene. The s factors themselves are
often identified by a superscript to reflect their molecular mass (in kDa):
RpoD is s70, RpoH is s32.
For many of those s factors identified genetically, the s is still identified by the original gene name:
examples include sFecI
in E. coli and sAlgU in P.
aeruginosa (also known now as sE). In
B. subtilis, and most other Gram
positive organisms, an alternative scheme has been adopted in which each
alternative s factor is given a
letter designation and the corresponding genes are given sig designations. By convention, the primary s factor is sA and is encoded by the sigA gene, and the alternative s factors are identified by other letters. In some
cases, other nomenclature is still in place: for example in some species group
2 s factors (see below)
carry hrd (homolog of rpoD)
designations. It remains to be seen how the nomenclature will evolve now that
at least one species (S. coelicolor)
has more s factors then there are
letters in the alphabet.
Group 1. The
primary s
factors
The
Group 1 s factors include E. coli s70 and
its orthologs (Lonetto et al.,
1992). These s factors are essential proteins responsible for most transcription in
rapidly growing bacterial cells and are thus often referred to as the
"primary" s factors. As a group, the primary s factors are usually between 40 and 70 kDa in size
and have four characteristic conserved sequence regions: regions 1 through 4
(reviewed in Gross et al.,
1998; Helmann and Chamberlin, 1988). In addition, in most species
where promoter selectivity is well understood, primary s factors recognize
promoters of similar sequence: TTGaca near -35 and TAtaaT near -10 (where
uppercase refers to more highly conserved bases).
Group 2.
Non-essential proteins highly similar to primary s factors
In
some species there are s factors that are closely related to the primary s but dispensible for
growth. These group 2 s factors include the E. coli sS (RpoS) protein and three of the four Hrd (Homolog
of RpoD) proteins in Streptomyces
coelicolor: only HrdB is essential and is, by this criterion, a group 1 s (Buttner and Lewis, 1992). Like the group 1 s factors, the group 2 proteins contain all four of the conserved
sequence regions characteristic of primary s factors (Lonetto et al.,
1992). Moreover, the regions of s factor that determine promoter selectivity are
often nearly identical between the group 1 and 2 s factors. Thus, it is likely that the group 1 and 2
proteins have extensive overlap in promoter recognition.
The
most extensively studied group 2 s is the E.
coli RpoS (sS)
stationary phase s factor (Hengge-Aronis, 1999; 2000). Many promoters transcribed by the s70-containing
holoenzyme are also recognized by sS and
only a few truly sS-specific
promoters have been described. Indeed, it has been quite difficult to discern
those features of the DNA sequence that allow selective recognition by sS. This was dramatically illustrated when SELEX
methods were used to determine the optimal binding sequence for the sS holoenzyme: the resulting "consensus" was
identical to that already documented for s70 (T.
Gaal and R. Gourse, personal communication). This has led to a model in which
consensus promoters, which are extremely rare, can be recognized by both s factors and the key to
selectivity is the differential tolerance of non-consensus bases. For example, sS transcribes efficiently from promoters lacking a consensus
-35 element (Wise et al., 1996) or
having a C adjacent to the upstream T of the -10 element whereas these changes
can greatly reduce recognition by s70
holoenzyme (Becker and Hengge-Aronis, 2001; Lee and Gralla,
2001).
In the cyanobacteria and in S. coelicolor the situation is made more complex by the presence of
three or more group 2 s factors. The functions of these s factors have remained elusive. They are clearly
dispensible and even multiply mutant strains do not display obvious phenotypes (Buttner and Lewis, 1992). In several cases, these group 2 s factors have been found to be preferentially
expressed during nutrient stress conditions (Caslake et al.,
1997; Muro-Pastor et al., 2001). One interpretation of these data is that activation of one or more
group 2 s factors can alter,
perhaps in subtle ways, the precise set of genes that are expressed while
maintaining expression of most housekeeping functions normally dependent on the
primary s.
Group 3. Secondary s factors
In
1992, Lonetto et al. assigned the
remaining known alternative s factors to group 3. These proteins could all be clearly recognized as s factors based on the
presence of the conserved amino acid sequences of regions 2 and 4. However, in
many cases conserved region 1 and often region 3 was absent. These group 3
proteins are significantly smaller in size than their group 1 and 2 paralogs
(typically 25 to 35 kDa in molecular mass).
While the majority of the RNA polymerase (RNAP) core
enzyme in rapidly growing cells is associated with the primary s factor (e.g. E.
coli s70 or B.
subtilis sA), the fraction associated with group 3 s factors can be greatly
increased under conditions of stress or during developmental processes (Hecker and Volker, 1998; Price, 2000). By reprogramming RNAP, these s factors function as global regulators allowing the
coordinate activation of numerous unlinked operons. As a class, the group 3 s factors are regulated
in diverse ways: some at the level of synthesis, others by proteolysis, and
others by the reversible interaction with an anti-s factor (Haldenwang, 1995; Helmann, 1999; Hughes and Mathee,
1998; Kroos et al., 1999).
The
group 3 s factors can be divided
into several clusters of evolutionarily related proteins, often with conserved
or related functions. Thus, there is a heat shock cluster, a flagellar
biosynthesis cluster, and a sporulation cluster (Lonetto et al.,
1992). In some cases, these clusters of s factors are associated with conserved promoter
sequences. For example, the promoter selectivity of the flagellar (s28) sub-family is conserved between diverse bacteria (Helmann, 1991) and the B. subtilis s can partially complement
the corresponding E. coli mutant (Chen and Helmann, 1992). Within the sporulation sub-family, different paralogs within B. subtilis display overlapping promoter
selectivity such that some (but not all) target promoters can be recognized by
more than one s factor allowing transcription initiation from coincident start points (Helmann and Moran, 2002).
Group 4: The Extracytoplasmic Function (ECF) sub-family
In 1994, a convergence of several lines of research led
to the initial designation of the extracytoplasmic function (ECF) sub-family of
s factors (Lonetto et al.,
1994). Mark Buttner identified the gene for the alternative s factor sE in S. coelicolor and noted that it had only
distant similarity to known s factors. At about the same
time, Mike Lonetto in Carol Gross' lab noted that S. coelicolor sE, the
recently identified E. coli sE, and several other known regulatory proteins formed
a distinct sub-family within the s70
family of regulators.
Prior
to the seminal paper of Lonetto et al.
(1994), many of the ECF s factors were known to function as positive activators of gene
expression, but were assumed to act as classical transcription activators
functioning in conjunction with one or more forms of holoenzyme. This
assumption was challenged by Lonetto et
al. (1994) who predicted that these diverse regulators would all function
as s factors. This
prediction has been confirmed for all tested examples. Interestingly, the
sequence similarity between one family member, Pseudomonas aeruginosa AlgU(AlgT), and B. subtilis sH had
been noted prior to the Lonetto study (Martin et al.,
1993), but there was no experimental confirmation of the role of this
protein as a s factor.
In
keeping with the classification scheme introduced by Lonetto et al. (1992), I
propose that the ECF s factors be referred to as "group 4." Note that previously Wosten (1998) assigned these s factors as sub-group 3.2 of the group 3 s factors. However, ECF s factors are
significantly more divergent in sequence, and in many organisms they equal or
exceed in numbers, the group 3 s factors. Therefore, it seems fitting that they
define their own group with the s70
family. This view is further supported by the assignment of ECF s factors as a unique
group within the conserved orthologous groups (COG) database (Tatusov et al.,
2000).
As a class, the ECF s factors share several common features (Figure 1).
First, they often recognize promoter elements with an “AAC” motif in the –35
region. Second, in many cases the ECF s factor is cotranscribed with a transmembrane anti-s factor with an
extracytoplasmic sensory domain and an intracellular inhibitory domain. Third,
they often control functions associated with some aspect of the cell surface or
transport.
The designation "extracytoplasmic function" (or
ECF) evolved from an analysis of the functions of the known examples of group 4
factors (Lonetto et al.,
1994). This phylogenetic cluster included regulators of a periplasmic stress
and heat shock response (E. coli sE), iron transport (FecI in E. coli), a metal ion efflux system (CnrH in Alcaligenes), alginate secretion (AlgU/T in P. aeruginosa), and synthesis of membrane-localized carotenoids in Myxococcus xanthus (CarQ). The only
unifying feature of these diverse physiological processes is that they all
involve cell envelope functions (transport, secretion, extracytoplasmic
stress). Hence, the name extracytoplasmic function (or ECF) was suggested for
this family of s factors. Even this broad generalization may be an oversimplification
for this complex and rapidly growing family of regulators: at least one of the
recently characterized ECF s factors (S. coelicolor sR) controls a cytoplasmic stress response (see
below).
In the last several years the complete genome sequences
of dozens of bacteria have been
determined. A survey of currently available genome sequences reveals a wide
range in the numbers of ECF s factors (Table 1): 2 in E. coli,
7 in B. subtilis, 10 in Mycobacterium tuberculosis, and ~50 in Streptomyces coelicolor!
Group 5: The TxeR sub-family
The
discovery of the ECF sub-family of s factors taught us that the biochemical
identification of one or two regulators as s factors can provide insight into the mechanism of
action of a large family of related proteins. A similar story appears to be
unfolding with the recent description of TxeR as a s factor controlling
toxin gene expression in Clostridium
difficile. This regulatory protein functions biochemically as a s factor despite the
fact that the sequence of the protein bears little discernable resemblance to
other members of the s70
family. Addition of purified TxeR protein is sufficient for recognition of the tox promoter by either E. coli or B. subtilis core RNA polymerase (Mani and Dupuy, 2001). Since several other positive regulators of toxin genes, including C. tetani TetR, C. botulinum BotR, and C.
perfingens UviA, are related to TxeR, it seems reasonable to suggest that
these proteins are yet another distantly related group (herein designated group
5) of the s70 family. Promoter mapping studies confirm that the
UV-inducible promoters of the bcn and
uviAB genes are similar in sequence
to the toxA and toxB promoters. Moreover, the UviA protein can activate
transcription of a PtoxA fusion and, conversely, the TxeR protein
can activate transcription of the uviA
and bcn promoters when they are both
present in the heterologous host, B.
subtilis (N. Mani, personal communication). Thus, the UviA and TxeR s factors appear to have
similar promoter recognition properties. Recent biochemical studies have
confirmed that UviA has the predicted s factor activity (N. Mani, personal communication).
Post Comment
No comments