Transcription is the principal control point for bacterial gene expression, and it enables a global cellular response to an intracellular or environmental trigger. Transcriptional regulation is orchestrated by transcription factors, which activate or repress transcription of target genes by modulating the activity of RNA polymerase. Dissecting the nature and precise choreography of these interactions is essential for developing a molecular understanding of transcriptional regulation. While the contribution of X-ray crystallography has been invaluable, the ‘resolution revolution’ of cryo-electron microscopy has transformed our structural investigations, enabling large, dynamic and often transient transcription complexes to be resolved that in many cases had resisted crystallisation. In this review, we highlight the impact cryo-electron microscopy has had in gaining a deeper understanding of transcriptional regulation in bacteria. We also provide readers working within the field with an overview of the recent innovations available for cryo-electron microscopy sample preparation and image reconstruction of transcription complexes.
Introduction
The expression of a gene to assemble a protein (the Central Dogma) is a fundamental process of life. Transcription is the first step in gene expression, which is coordinated by a complex of proteins that cooperate at the promoter region to transcribe the DNA gene sequence into mRNA. The lead actor of transcription, RNA Polymerase (RNAP), comprises a core subunit of proteins labelled α2ββ′ω (Figure 1A) [1]. Upon interaction with a sigma (σ)-factor, RNAP forms the active holoenzyme [2,3]. Sigma factors are large, multi-domain proteins that bind various sites across the core RNAP and promoter DNA [4], and are responsible for guiding RNAP to the transcription start sites by locating the −35 element (consensus sequence TTGACA) and the −10 element (consensus sequence TATAAT) within the promoter [5] (Figure 1A). These two DNA elements constitute major features of a bacterial promoter and serve as notable controllers of transcriptional activity. Bacterial promoters are also decorated with other regulatory components, such as −10 extension (EXT) [6], discriminator (DISC) [7,8], and the upstream element (UP element) [9]. Once RNAP holoenzyme binds DNA, a series of conformational changes serve to manipulate the DNA and unwind a 13 base pair region (promoter melting) to initiate transcription.
Schematic model of bacterial transcription initiation and regulation by transcription factors.
In bacteria, the regulation of gene expression occurs primarily at transcription initiation, allowing bacteria to maintain homeostasis and adapt to changing environmental conditions, such as nutrient availability, by changing which genes are expressed, and which are not [excellent reviews on gene regulation focused on initiation can be found here [10,11]]. Transcriptional regulation in bacteria is predominantly modulated by transcription factors, an important class of trans-acting factor, that bind DNA and function as either activators or repressors of gene expression (Figure 1B). Whereas transcriptional activators generally bind upstream of the RNAP-binding site to co-opt RNAP and enhance activity, transcriptional repressors bind to the operator region of target genes to directly obstruct the binding and activity of the RNAP [10,12].
Transcription factors largely share a common domain architecture, comprising an N-terminal DNA-binding domain and a C-terminal effector-binding domain, typically connected by a flexible linker (Figure 1B). The DNA-binding domain recognises a specific DNA sequence and most often contains the highly conserved, helix-turn-helix motif, while the effector-binding domain functions as a signal sensor [13]. In general, the effector is a small molecule or pathway metabolite that allosterically binds the protein to trigger a conformational change that alters the affinity of the DNA-binding domain to the target sequence [12,13]. This collective function enables transcription factors to act as molecular switches, enabling bacteria to rapidly respond to sudden environmental challenges [14].
In addition to transcription factors that directly modulate RNAP activity, σ-factors also serve as an important class of trans-acting factor. By acting in complex with RNAP, σ-factors facilitate broader changes, such as the expression of genes required for bacterial cell viability [10,12] (reviewed here [10,15,16]). Overall, RNAP serves as the core element of gene regulation, combining information from an assortment of sensory systems to appropriately modulate gene expression — the net outcome is to determine which genes are transcribed, and to what extent, under any specific growth condition [17].
Over the past few decades, structural biology has been instrumental for defining these regulatory mechanisms and the biological function of RNAP during transcription [18]. Yet, despite a focused effort to delineate these processes, principally through X-ray crystallography, technical limitations have restricted our molecular understanding of transcriptional regulation. Notably, these include the intrinsic flexibility, conformational/compositional heterogeneity, and transient nature of the transcription complexes [19,20]. Crystal growth is usually easiest when the macromolecules are stably folded, can be concentrated, are homogenous (i.e. purified away from all other contaminants and in a stable oligomeric state), and have limited flexibility. Fuelled by technological breakthroughs in data collection and imaging processing (reviewed in [21]), cryo-electron microscopy (cryo-EM) now offers a powerful alternative to overcome these challenges. This capacity has granted researchers the power to ‘see’ transient or heterogeneous complexes that are unamenable to crystallisation and determine their structure at or near atomic-level resolution.
Herein, we review the contribution of cryo-EM to further our understanding of transcriptional regulation in bacteria, with a focus on studies that have provided key mechanistic insights into transcription initiation. We also highlight state-of-the-art sample preparation and 3D reconstruction strategies for structure determination with a particular focus on ‘tricks’ for protein–nucleic acid complexes.
Recent cryo-EM structures advance our understanding of bacterial transcriptional regulation
Visualising transcription complexes at the atomic level is essential for unravelling their mechanism of function. Over the past few years, cryo-EM has been indispensable for resolving large and heterogeneous complexes, where previous crystallographic studies have come up short, providing over 65 structures to date (summarised in Table 1). Here, we highlight pivotal transcription complexes active during transcription initiation, which contain transcriptional activators and repressors that until the advent of cryo-EM weren't fully understood.
Transcription complex . | Family . | Organism . | EMDB ID . | PDB ID . | Reference . |
---|---|---|---|---|---|
EcmrR-RPo2 | MerR | Escherichia coli | EMD-22234 | 6XL5 | [22] |
EcmrR-RPo2 (EcmrR-spacer DNA complex) | EMD-22235 | 6XL6 | |||
EcmrR-RPint3 with 3 nt RNA transcript | EMD-22236 | 6XL9 | |||
EcmrR-RPint3 with 3 nt RNA transcript (EcmrR-spacer DNA complex) | EMD-22237 | 6XLA | |||
EcmrR-RPint3 with 4 nt RNA transcript | EMD-22245 | 6XLJ | |||
EcmrR-RPint3 with 4 nt RNA transcript (EcmrR-spacer DNA complex) | EMD-22246 | 6XLK | |||
EcmrR-RPo2 (clearer σ70 density) | EMD-23291 | - | |||
BmrR-RNA polymerase complex | MerR | Bacillus subtilis | EMD-30390 | 7CKQ | [23] |
CueR-RNA polymerase complex (without RNA transcript) | MerR | Escherichia coli | EMD-22184 | 6XH7 6XH8 | [24] |
CueR-RNA polymerase complex (with RNA transcript) | EMD-22185 | ||||
CueR-RNA polymerase complex (clearer σ70 density) | EMD-22289 | ||||
CueR- RNA polymerase complex | MerR | Escherichia coli | EMD-30268 | 6LDI | [25] |
CueR- RNA polymerase complex (with fully duplex DNA) | EMD-0874 | 7C17 | |||
NanR-dimer1/DNA complex | GntR | Escherichia coli | EMD-21652 | 6WFQ | [26] |
NanR-dimer3/DNA complex | EMD-21661 | 6WG7 | |||
BusR-tetramer1/pAB DNA complex | GntR | Streptococcus agalactiae | EMD-13119 | 7OZ3 | [27] |
BusR-tetramer1/pAB1 DNA complex | EMD-12051 | 7B5Y | |||
TraR-Eσ70 (state I) | LuxR | Escherichia coli | EMD-0348 | 6N57 | [28] |
TraR-Eσ70 (state II) | EMD-0349 | 6N58 | |||
TraR-Eσ70 (state III) | EMD-20231 | N/A | |||
MmfR-dimer2/DNA complex | TetR | Streptomyces coelicolor | EMD-20781 | N/A | [29] |
Class-II CAP-TAC1 without RNA transcript (state I) | CAP | Escherichia coli | EMD-20287 | 6PB5 | [30] |
Class-II CAP-TAC1 without RNA transcript (state II) | EMD-20288 | 6PB6 | |||
Class-II CAP-TAC1 with RNA transcript (state II) | EMD-20286 | 6PB4 | |||
Class-I CAP-TAC1 | CAP | Escherichia coli | EMD-7059 | 6B6F | [31] |
Class-I CAP-TAC1 (focused map on αCTD-CAP region) | EMD-7060 | ||||
Crl-EσS-RNA polymerase complex | Crl | Escherichia coli | EMD-200090 | 6OMF | [32] |
Spx-RNA polymerase complex | Spx | Bacillus subtilis | EMD-31485 | 7F75 | [33] |
WhiB7-RPo2 | WhiB | Mycobacterium tuberculosis | EMD-22886 | 7KIF | [34] |
WhiB7-RPc4 | EMD-22887 | 7KIM | |||
Rgg2-short hydrophobic peptide complex | Rgg | Streptococcus thermophilus | EMD-22341 | 7JI0 | [35] |
GreB-RNA polymerase elongation complex (pre-RNA cleavage) | Gre | Escherichia coli | EMD-4892 | 6RIN | [36] |
GreB-RNA polymerase elongation complex (post-RNA cleavage) | EMD-4885 | 6RI7 | |||
GreB-RNA polymerase reactivated complex (before RNA extension) | EMD-4882 | 6RH3 | |||
CarD-RPo2 | CarD | Mycobacterium tuberculosis | EMD-9037 | 6EDT | [37] |
CarD-RNA polymerase intermediate (with 8-nt RNA transcript) | EMD-9039 | 6EE8 | |||
CarD-RPo2 (with corallopyronin A) | EMD-9041 | 6EEC | |||
CarD-RNA polymerase holoenzyme (with corallopyronin A) | EMD-9047 | 6M7J | |||
CarD-RPo2 (with Sorangicin A) | CarD | Mycobacterium tuberculosis | EMD-21407 | 6VVY | [38] |
CarD-S456LRPo2 (with Sorangicin A) | EMD-21408 | 6VW0 | |||
CarD-RPo2 (with Sorangicin A) (with 8-nt RNA transcript) | EMD-21406 | 6VVX | |||
CarD-S456LRPo2 (with Sorangicin A) (with 8-nt RNA transcript) | EMD-21409 | 6VVZ | |||
SspA-σ70-RPo2 | GST | Escherichia coli | EMD-30307 | 7C97 | [39] |
DksA-RPo2 (State I) with guanosine tetraphosphate (ppGpp) | DksA | Escherichia coli | EMD-21881 | 7KHI | [40] |
DksA-RPo2 (State II) with guanosine tetraphosphate (ppGpp) | EMD-21883 | 7KHE | |||
NusG-opsEC | NusG | Escherichia coli | EMD-7351 | 6C6U | [41] |
RfaH-NusG-N-Term-opsEC | RfaH | EMD-7350 | 6C6T | ||
RfaH-full-length-opsEC | RfaH | EMD-7349 | 6C6S | ||
RNAP-HelD | HelD | Bacillus subtillus | EMD-21921 | 6WVK | [42] |
Msm HelD–RNAP complex State I | HelD | Mycobacterium smegmatis | EMD-10996 | 6YXU | [43] |
Msm HelD–RNAP complex State II | EMD-11004 | 6YYS | |||
Msm HelD–RNAP complex State III | EMD- 11026 | 6Z11 | |||
Spt4/5-RNAP complex (with antibodies) | Spt4/5 | Pyrococcus furiosus | EMD- 1840 | N/A | [44] |
Mfd-dependent transcription termination complex | MfD | Thermus thermophilus | EMD- 30117 | 6M6A | [45] |
Mfd-dependent transcription termination complex with ATPγS | MfD | EMD- 30118 | 6M6B | ||
Mfd-bound RNA polymerase elongation complex — L1 state (with ATP) | MfD | Escherichia coli | EMD-21996 | 6X26 | [46] |
Mfd-bound RNA polymerase elongation complex — L2 state (with ADP) | EMD-22006 | 6X2F | |||
Mfd-bound RNA polymerase elongation complex — I state | EMD-22012 | 6X2N | |||
Mfd-bound RNA polymerase elongation complex — II state | EMD-22039 | 6X43 | |||
Mfd-bound RNA polymerase elongation complex — III state | EMD-22043 | 6X4W | |||
Mfd-bound RNA polymerase elongation complex — IV state | EMD-22044 | 6X4Y | |||
Mfd-bound RNA polymerase elongation complex — V state | EMD-22045 | 6X50 | |||
σ70-RPo2 | σ-factor | Klebsiella pneumoniae | EMD-0001 | 6GH5 | [47] |
σ70-RNA polymerase (intermediate partially loaded) complex | EMD-0002 | 6GH6 | |||
σ70-RNA polymerase (initially transcribing) complex | EMD-4397 | 6GFW |
Transcription complex . | Family . | Organism . | EMDB ID . | PDB ID . | Reference . |
---|---|---|---|---|---|
EcmrR-RPo2 | MerR | Escherichia coli | EMD-22234 | 6XL5 | [22] |
EcmrR-RPo2 (EcmrR-spacer DNA complex) | EMD-22235 | 6XL6 | |||
EcmrR-RPint3 with 3 nt RNA transcript | EMD-22236 | 6XL9 | |||
EcmrR-RPint3 with 3 nt RNA transcript (EcmrR-spacer DNA complex) | EMD-22237 | 6XLA | |||
EcmrR-RPint3 with 4 nt RNA transcript | EMD-22245 | 6XLJ | |||
EcmrR-RPint3 with 4 nt RNA transcript (EcmrR-spacer DNA complex) | EMD-22246 | 6XLK | |||
EcmrR-RPo2 (clearer σ70 density) | EMD-23291 | - | |||
BmrR-RNA polymerase complex | MerR | Bacillus subtilis | EMD-30390 | 7CKQ | [23] |
CueR-RNA polymerase complex (without RNA transcript) | MerR | Escherichia coli | EMD-22184 | 6XH7 6XH8 | [24] |
CueR-RNA polymerase complex (with RNA transcript) | EMD-22185 | ||||
CueR-RNA polymerase complex (clearer σ70 density) | EMD-22289 | ||||
CueR- RNA polymerase complex | MerR | Escherichia coli | EMD-30268 | 6LDI | [25] |
CueR- RNA polymerase complex (with fully duplex DNA) | EMD-0874 | 7C17 | |||
NanR-dimer1/DNA complex | GntR | Escherichia coli | EMD-21652 | 6WFQ | [26] |
NanR-dimer3/DNA complex | EMD-21661 | 6WG7 | |||
BusR-tetramer1/pAB DNA complex | GntR | Streptococcus agalactiae | EMD-13119 | 7OZ3 | [27] |
BusR-tetramer1/pAB1 DNA complex | EMD-12051 | 7B5Y | |||
TraR-Eσ70 (state I) | LuxR | Escherichia coli | EMD-0348 | 6N57 | [28] |
TraR-Eσ70 (state II) | EMD-0349 | 6N58 | |||
TraR-Eσ70 (state III) | EMD-20231 | N/A | |||
MmfR-dimer2/DNA complex | TetR | Streptomyces coelicolor | EMD-20781 | N/A | [29] |
Class-II CAP-TAC1 without RNA transcript (state I) | CAP | Escherichia coli | EMD-20287 | 6PB5 | [30] |
Class-II CAP-TAC1 without RNA transcript (state II) | EMD-20288 | 6PB6 | |||
Class-II CAP-TAC1 with RNA transcript (state II) | EMD-20286 | 6PB4 | |||
Class-I CAP-TAC1 | CAP | Escherichia coli | EMD-7059 | 6B6F | [31] |
Class-I CAP-TAC1 (focused map on αCTD-CAP region) | EMD-7060 | ||||
Crl-EσS-RNA polymerase complex | Crl | Escherichia coli | EMD-200090 | 6OMF | [32] |
Spx-RNA polymerase complex | Spx | Bacillus subtilis | EMD-31485 | 7F75 | [33] |
WhiB7-RPo2 | WhiB | Mycobacterium tuberculosis | EMD-22886 | 7KIF | [34] |
WhiB7-RPc4 | EMD-22887 | 7KIM | |||
Rgg2-short hydrophobic peptide complex | Rgg | Streptococcus thermophilus | EMD-22341 | 7JI0 | [35] |
GreB-RNA polymerase elongation complex (pre-RNA cleavage) | Gre | Escherichia coli | EMD-4892 | 6RIN | [36] |
GreB-RNA polymerase elongation complex (post-RNA cleavage) | EMD-4885 | 6RI7 | |||
GreB-RNA polymerase reactivated complex (before RNA extension) | EMD-4882 | 6RH3 | |||
CarD-RPo2 | CarD | Mycobacterium tuberculosis | EMD-9037 | 6EDT | [37] |
CarD-RNA polymerase intermediate (with 8-nt RNA transcript) | EMD-9039 | 6EE8 | |||
CarD-RPo2 (with corallopyronin A) | EMD-9041 | 6EEC | |||
CarD-RNA polymerase holoenzyme (with corallopyronin A) | EMD-9047 | 6M7J | |||
CarD-RPo2 (with Sorangicin A) | CarD | Mycobacterium tuberculosis | EMD-21407 | 6VVY | [38] |
CarD-S456LRPo2 (with Sorangicin A) | EMD-21408 | 6VW0 | |||
CarD-RPo2 (with Sorangicin A) (with 8-nt RNA transcript) | EMD-21406 | 6VVX | |||
CarD-S456LRPo2 (with Sorangicin A) (with 8-nt RNA transcript) | EMD-21409 | 6VVZ | |||
SspA-σ70-RPo2 | GST | Escherichia coli | EMD-30307 | 7C97 | [39] |
DksA-RPo2 (State I) with guanosine tetraphosphate (ppGpp) | DksA | Escherichia coli | EMD-21881 | 7KHI | [40] |
DksA-RPo2 (State II) with guanosine tetraphosphate (ppGpp) | EMD-21883 | 7KHE | |||
NusG-opsEC | NusG | Escherichia coli | EMD-7351 | 6C6U | [41] |
RfaH-NusG-N-Term-opsEC | RfaH | EMD-7350 | 6C6T | ||
RfaH-full-length-opsEC | RfaH | EMD-7349 | 6C6S | ||
RNAP-HelD | HelD | Bacillus subtillus | EMD-21921 | 6WVK | [42] |
Msm HelD–RNAP complex State I | HelD | Mycobacterium smegmatis | EMD-10996 | 6YXU | [43] |
Msm HelD–RNAP complex State II | EMD-11004 | 6YYS | |||
Msm HelD–RNAP complex State III | EMD- 11026 | 6Z11 | |||
Spt4/5-RNAP complex (with antibodies) | Spt4/5 | Pyrococcus furiosus | EMD- 1840 | N/A | [44] |
Mfd-dependent transcription termination complex | MfD | Thermus thermophilus | EMD- 30117 | 6M6A | [45] |
Mfd-dependent transcription termination complex with ATPγS | MfD | EMD- 30118 | 6M6B | ||
Mfd-bound RNA polymerase elongation complex — L1 state (with ATP) | MfD | Escherichia coli | EMD-21996 | 6X26 | [46] |
Mfd-bound RNA polymerase elongation complex — L2 state (with ADP) | EMD-22006 | 6X2F | |||
Mfd-bound RNA polymerase elongation complex — I state | EMD-22012 | 6X2N | |||
Mfd-bound RNA polymerase elongation complex — II state | EMD-22039 | 6X43 | |||
Mfd-bound RNA polymerase elongation complex — III state | EMD-22043 | 6X4W | |||
Mfd-bound RNA polymerase elongation complex — IV state | EMD-22044 | 6X4Y | |||
Mfd-bound RNA polymerase elongation complex — V state | EMD-22045 | 6X50 | |||
σ70-RPo2 | σ-factor | Klebsiella pneumoniae | EMD-0001 | 6GH5 | [47] |
σ70-RNA polymerase (intermediate partially loaded) complex | EMD-0002 | 6GH6 | |||
σ70-RNA polymerase (initially transcribing) complex | EMD-4397 | 6GFW |
1CAP-TAC, cAMP receptor protein-dependent transcription activation complex.
2RPo, RNA polymerase-promoter open complex.
3RPint, RNA polymerase-promoter initial transcribing complex.
3RPc, RNA polymerase-promoter closed complex.
N/A, Not available.
Transcriptional activators
Activators serve to increase transcription by binding at, or upstream, of a promoter region, where they can positively interact with and recruit RNAP to initiate transcription of target genes (Figure 1B, left panel). This process can be achieved by the activator distorting promoter DNA to facilitate RNAP binding, or by directly tethering RNAP to the promoter region. To illustrate how each regulatory mechanism enhances RNAP binding, we outline two recent cryo-EM structures, respectively below.
The cryo-EM structure of MerR family regulator EcmrR provides our first example of DNA distortion (Figure 2A). Promoters that bind this family contain an additional 2 to 3 base pairs (or non-canonical space) between the −35 and −10 elements, which prevents optimal promoter recognition by RNAP and transcription initiation [48,49]. In contrast, a canonical promoter contains a 17 base pair spacer region between the −35 and −10 elements. MerR regulators bind and twist the non-optimal DNA spacer, such that the DNA promoter elements are readily recognisable by the RNAP holoenzyme (Figure 2A, inset). While previous crystal structures of MerR regulators in the absence of RNAP reported this DNA distortion [50,51], multiple cryo-EM structures of EcmrR in complex with RNAP provided a deeper molecular understanding of this promoter remodelling [22]. Similarly, cryo-EM was also used to dissect the DNA distortion mechanism from another MerR family regulator, BmrR [23].
Recent bacterial transcription complexes solved by cryo-EM.
Our second example of DNA distortion is illustrated by the role of the global transcription factor, cyclic AMP (cAMP) receptor protein (CAP) to promote transcription. Two major classes of CAP exist, each of which can activate and initiate transcription by bending DNA to optimise RNAP recruitment [52]. These classes are differentiated by their promoter site and interaction mode with RNAP; class-I CAPs bind a −61 site and interact predominantly with αCTD subunit of RNAP (Figure 1A), while class-II CAPs bind at a −41 site and interact with multiple RNAP subunits. As the CAP–RNAP interactions are small, and the full CAP–RNAP–DNA complex is dynamic, any CAP-induced conformational changes in the presence of RNAP are difficult to capture by the freeze-frame feature of crystallography [52,53]. Recently, an intact transcription activation complex, containing a class-I CAP along with RNAP was resolved by cryo-EM (Figure 2B). This structure revealed extensive remodelling of the promoter DNA (∼90° kink) induced by CAP-binding (Figure 2B, inset) to wrap upstream DNA and co-opt RNAP via αCTD binding [31]. An analogous study reported the cryo-EM structure of the class-II CAP–RNAP complex [30]. Taken together, these structures shed light into how class-I and -II CAP activation complexes assemble to activate transcription.
Alternatively, an unconventional mode of activation involving protein tethering can stabilise and activate RNAP via a DNA-independent or -dependent process. DNA-independent modes involve activators that stabilise σ-factor and RNAP association solely through protein–protein interactions. This unusual mode of activation was first hypothesised using crystallography, but many interactions were absent due to crystal packing [54,55]. However, numerous cryo-EM structures of transcriptional activators, Crl [32], RbpA/CarD [37,38], Spx [33] and SspA [39], can now detail the precise interactions and conformational changes between the σ-factor that facilitate the formation of the RNAP holoenzyme. To highlight this protein–protein tethering mode, we present the structure of Crl bound to σS and a small domain of the β′ subunit to stabilise the holoenzyme (Figure 2C). Conversely, structural insight into DNA-dependent tethering was revealed through cryo-EM structures of WhiB7, which play a role in antibiotic resistance in mycobacteria. In addition to forming protein–protein contacts with the σ-factor, WhiB7 was observed to interact with promoter DNA via an AT-hook motif [34] (Figure 2D). This was unexpected as AT-hooks are rare in bacteria, yet common in eukaryotes [56]. Thus, these structures expand our understanding of how WhiB7 serves to regulate antibiotic resistance in mycobacteria, but also unearths a novel mode of transcriptional regulation in bacteria.
Transcriptional repressors
Repressors function to sterically occlude RNAP binding to DNA by occupying a site that overlaps the -35 and -10 promoter elements to prevent σ-factor recognition, switching gene transcription off [11] (Figure 1B, right panel). We illustrate this mode of regulation below.
Recently, it was shown that Escherichia coli NanR, which regulates bacterial sialic acid metabolism [57–60], cooperatively binds a three-repeat sequence overlapping the −10 element [26]. Through cryo-EM, three NanR dimers were observed to assemble in close proximity across the promoter, where intramolecular protein–protein interactions stabilise the repressor complex (Figure 2E). This multimeric assembly is unique among reported GntR-type regulators [26]. The lower-order NanR-dimer1/DNA complex (70.5 kDa) was also resolved within the study at near atomic resolution, demonstrating the power of cryo-EM in this so-called ‘resolution revolution’ (Figure 2E, inset). Similarly, cooperative binding was also observed for the TetR-type regulator, MmfR, where two dimers bound DNA at an obtuse angle of 140° in the cryo-EM structure [29].
BusR is a transcriptional repressor that binds the c-di-AMP molecule; a vital molecule in normal cellular growth conditions and a target for antibiotic development [61]. A recent cryo-EM structure of BusR revealed how it binds bipartite DNA motifs (Figure 2F) as a tetramer and proposes a new regulator family, as the protein architecture is unlike any other transcriptional regulator described [27].
Briefly, we would be remiss not to mention how cryo-EM has had a marked impact on functionally understanding the transcription factor TraR, which functions both as an activator and repressor. Using cryo-EM, Chen et al. [28] resolved a series of structures that, alongside in vitro experiments, helped elucidate the transition of active RNAP from RPc (closed complex) to RPo (open complex) in the presence of TraR. The deconvolution of heterogeneous intermediate conformations allowed researchers to propose a mechanism, and use biochemical techniques to validate it [62]. The understanding of this conformational landscape by cryo-EM is a pivotal discovery within the field, as it has structurally illuminated the complicated and multifaceted mechanism of transcription initiation.
Contemporary cryo-EM strategies for resolving protein–DNA complexes
Having highlighted how cryo-EM has transformed our understanding of transcriptional regulation in bacteria, we now outline contemporary cryo-EM strategies for determining the structure of these dynamic macromolecular assemblies, particularly protein–DNA complexes. This overview will include the recent and ongoing developments in sample preparation and structure determination (summarised in Figure 3). Further discussion on data acquisition, image processing, and refinement are beyond the scope of this review (but are reviewed in [63,64]).
Contemporary cryo-EM strategies for solving protein–DNA complexes.
Sample preparation
Sample preparation for cryo-EM is paramount. In an ideal case, upon vitrification, the sample would be free of contamination, randomly oriented and evenly distributed in a monolayer of thin ice [74,75]. In reality, two main elements must be optimised to achieve this outcome — sample preparation and grid preparation. Sample preparation involves the purification of the protein or macromolecular complex in an intact and stable manner [74,76]. Given this process isolates the sample from its cellular environment, the buffering conditions must be optimised (e.g. salt, pH) to emulate native conditions. This can be achieved systematically and in high throughput using thermal stability assays [77], such as ProteoPlex [78], which is based on differential scanning fluorimetry. Common additives such as glycerol, which mimic a crowded environment for stability is largely avoided within the cryo-EM community, because it significantly decreases contrast during data collection [76,79,80]. However, recent evidence suggests ≤20% glycerol can improve the stability of large complexes that are prone to disassembly, without compromising data quality and therefore should not be fully discounted as an additive in cryo-EM [81]. Enhanced stability can also be afforded by the addition of small effector molecules (particularly for transcriptional activators), co-factors or inhibitors that may stabilise or biochemically arrest the macromolecule in a unique functional state [22,82].
Knowledge of the specific DNA sequence that your protein–DNA system binds is critical. This includes the kinetics of your system (KD, kon, koff) and stoichiometry of the protein–DNA interaction, which can vary by the length or number of binding sites within the DNA scaffold and by protein concentration. Thus, these elements must be carefully evaluated both biochemically and biophysically to inform the optimal DNA scaffold and concentrations you should use for grid preparation.
To perform their function, protein–DNA complexes undergo dynamic conformational rearrangements across one or more subunits [18,83]. While this conformational heterogeneity can typically be tackled during image processing through independent 3D classification and masked refinement (discussed below), sources of compositional heterogeneity must be mitigated. These sources include variations in the stoichiometry of the interaction partners, partially assembled complexes or the presence of assembly intermediates [75,76,80]. At the biochemical level, these phenomena can be addressed through sample preparation procedures, mostly commonly via size exclusion chromatography to remove aggregates or unbound components and isolate a target complex [22,23,26,29]. If complexes are inherently fragile, researchers have successfully employed a fractionation technique, named GraFix, to prepare homogeneous cryo-EM samples [39,84–86]. Here, using centrifugation, a density gradient (e.g. glycerol) is combined with weak chemical fixation (e.g. glutaraldehyde), which leads to the formation of monodisperse and chemically stabilised complexes [85,87]. The use of a weak fixation reagent is advantageous as it largely favours the formation of intramolecular crosslinks, which can prevent complex dissociation. To avoid reduced contrast, the density solution is removed by buffer exchange (Zeba) spin columns [85]. If the sample is scarce or unstable following removal of the density solution, agarose fixation offers an alternative strategy [88].
Following grid preparation [reviewed in [74,89]], which is largely dependent on user expertise and experience, sample stability, heterogeneity and particle distribution can be assessed by iterative negative stain experiments or under cryogenic temperatures [74,75,79]. When vitrified, protein–DNA complexes often preferentially adhere to the air–water interface or grid support [90]. This preferred particle orientation can lead to under sampling of some structural features, sample denaturation and anisotropic resolution in the density map [91,92]. Experimentally, this can be addressed by reducing the time interval between sample application to the grid and vitrification (spot-to-plunge time) [90,93,94], the use of support layer (e.g. graphene) to sequester the complex from the air–water interface [92,95,96] or an affinity support, such as streptavidin to immobilise biotinylated molecules [97], and by data collection at a fixed tilt-angle [98]. When acquiring data at a single tilt, gold foil grids can be used to minimise beam-induced movement during imaging [99,100]. Notably, bacterial RNAP transcription complexes suffer from severe preferential orientation [101], however, this is routinely combated by the addition of the zwitterionic detergent CHAPSO during sample preparation [22,23,39,40]. Other detergents, such as β-octyl glucoside have also been used to promote random particle distribution and discourage complex dissociation [27,102,103]. Recently, molecular goniometers have been constructed using DNA origami, which enable the DNA-binding protein to bind and be precisely oriented via a sequence-specific DNA stage [104]. As proof-of-concept, this nanoscale technology was utilised to resolve the 82 kDa DNA-binding protein, BurrH [104]. Moving forward, this concept can be adaptable to other small (<100 kDa) or asymmetric protein–DNA complexes.
Structure determination
3D reconstruction and model building represent the final hurdle towards determining a structure within the cryo-EM workflow. As previously discussed, protein–DNA complexes are driven by functionally relevant conformational and compositional changes as part of their dynamic modes of action [18,83]. While this intrinsic feature poses challenges for structure determination by cryo-EM, recent innovations make it possible to study these dynamic assemblies and gain unique insights into their molecular mechanism. In this section, we highlight these key innovations, which include algorithms to visualise molecular motions and identify interacting components, along with tools for flexible fitting in cryo-EM maps by molecular dynamics.
During data acquisition, millions of snapshots across a conformational landscape are captured for the molecule of interest. This structural heterogeneity has typically been approached using various ‘3D classification’ or ‘heterogeneous refinement’ tools, implemented in cryo-EM software packages such as RELION [105], cryoSPARC [106] or cisTEM [107]. These tools effectively divide the data into a small number of independent and discrete states, each of which are assumed to be structurally homogeneous. However, in scenarios where macromolecular complexes exhibit continuous conformational transitions of single domains or motions across multiple domains, these discrete classification algorithms are ineffective for heterogeneous reconstruction as they often omit functionally relevant or transient states [83]. As a result, more focused approaches have evolved to deal with continuous flexibility in cryo-EM data. These include, multi-body refinement, which uses a discrete number of independently moving, rigid bodies to model the dynamics within a protein complex and improve density maps of flexible regions [65,108]. Implemented in RELION, this multi-body approach has been utilised to characterise the TraR-induced structural changes in E. coli RNAP to regulate transcription. An analogous strategy, masked 3D refinement, can also employed to combat structural heterogeneity by applying a mask that excludes contents outside a region of interest, local resolution can be improved [26]. To avoid introducing artefacts or overfitting, generated mask are often low-pass filtered with soft edges [83]. 3D Variability Analysis (3DVA), available in cryoSPARC, is an algorithm that fits a linear subspace model to visualise molecular motions of macromolecules at high resolution [66]. 3DVA has since been utilised to resolve the dynamic interaction between RNAP and the transcription factor, HelD from Bacillus subtilis [42]. A similar tool, cryoDRGN (http://cryodrgn.csail.mit.edu), utilises deep neural networks to reconstruct density maps that model both discrete compositional heterogeneity and continuous conformational changes [67]. While these methods primarily combat motions across multiple domains, they can provide a potential trajectory to additionally refine single domain motions.
Following 3D reconstruction, atomic models provide the basis to structurally and functionally interpret the cryo-EM density maps. Most commonly, this process involves the input of existing X-ray or NMR structures from the PDB [22,26–29], however in their absence, the neural network-based, structure prediction programs, AlphaFold [109] or RoseTTA [110] now offer an alternative. To guide this initial structure into the target density map, various flexible fitting tools, such as cryo_fit within phenix [70], ISOLDE [71] or iMODFIT [72] in Chimera, Namdinator [73] among others [29,111] can be used to accommodate conformational heterogeneity by utilising molecular dynamics simulations or normal mode analysis [112]. In scenarios where it is difficult to annotate protein and DNA in the density maps, the recent tools Haruspex [68] and Emap2sec+ [69] can be employed to detect these structures in high-resolution maps (>4 Å) and lower resolution maps 5–10 Å, respectively using neural networks. Aside from flexible fitting, atomic models can also be constructed de novo when the resolution is better than 3.5 Å [113]. Although the Rosetta refinement strategy can be implemented as a de novo model-building approach for cryo-EM maps at 3–5 Å [114]. For tools that permit further refinement and validation of these models, we direct readers to a more detailed review [115].
Conclusion
The advent of cryo-EM, driven by advances in hardware and data processing, has revolutionised our understanding of transcription regulation in bacteria. This is evidenced by the rapidly expanding structural repertoire of bacterial transcription complexes over the past two years, with >65 resolved by cryo-EM to date. As illustrated within this review, these structures have shed light on unique conformational changes not seen in previous crystallographic studies, which include: promoter remodelling to stabilise intermediate complexes of transcription initiation [22,23,37]; insights into the conformational plasticity during the transition from transcription initiation to elongation [22,28]; the cooperative assembly of the transcriptional repressor, NanR [26]; the effector-induced reconfiguration of BusR to bind a bipartite DNA motif [27]; and the interaction of Crl and WhiB7 with RNAP to tether the σ-factors that they regulate through protein–protein or protein–DNA interactions, respectively [32,34]. However, despite these advancements reviewed here, gaps in our knowledge remain.
To yield a comprehensive model of transcription regulation for a given bacterial system, researchers must employ an integrative structural, biophysical and biochemical approach [116]. While cryo-EM is powerful at resolving large, conformationally dynamic assemblies, which are difficult to be captured by crystallography due to crystal packing, there are still limitations in size and resolution. In contrast, crystallography is better suited to yield atomic coordinates of macromolecules under 100–200 kDa and can attain higher resolutions (<2 Å) as molecules are constrained to a crystal lattice [117]. It is now routine to dock these smaller, more ordered structures into cryo-EM maps using flexible fitting to accommodate minor conformational differences and allow a more in-depth interpretation of the model [112]. Hence, crystallography remains a useful tool in structural biology to complement cryo-EM studies.
In keeping with the theme of an integrative approach, small angle X-ray scattering (SAXS) can ‘observe’ the conformational landscape of transcription complexes in solution and compare this with the cryo-EM model to evaluate biological relevance [118]. Likewise, the stoichiometry and molecular mass of protein–DNA complexes can be determined in solution, using an emerging, label-free analytical ultracentrifugation method that features multi-wavelength detection to deconvolute the spectral signals of protein and DNA based on their unique optical properties [26,119,120]. Single molecule mass photometry can also be employed as a tool to determine molecular mass of complexes in solution [121]. Other tools for characterising protein–DNA interactions (reviewed in [122]), include cryo-electron tomography, NMR, mass spectrometry/cross linking, Förster resonance energy transfer (FRET) and hydrogen–deuterium exchange.
Moving forward, future structural endeavours using cryo-EM will no doubt build on the contributions highlighted here to deconvolute the complicated and often multifaceted molecular mechanisms of transcriptional regulation in bacteria. Alongside regular hardware and software improvements, we anticipate machine learning methods, such as cryoDRGN [67], along with the prospect of time-resolved cryo-EM [123,124] will enable researchers to explore more transient intermediate complexes and thus gain a deeper understanding of the molecular choreography that drives these regulatory mechanisms.
Perspectives
Transcription complexes are dynamic assembles whose function is often intertwined with their many structural configurations. The precise choreography and nature of these motions remains incompletely understood. This knowledge is essential to understand the molecular mechanisms of transcription regulation in bacteria.
Fuelled by the ‘resolution revolution’, cryo-EM has emerged to provide researchers a means of probing these larger and structurally heterogeneous macromolecules, which are sensitive to crystallisation. To date, these studies have contributed >65 complex structures and provided unprecedented insights into bacterial transcription regulation.
The prospect of temporally linking the dynamic nature of transcription complexes remains of immense interest. Excitingly, the evolution of machine learning and time-resolved cryo-EM applications represent a future avenue to explore these transient intermediates.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
R.C.J.D. acknowledges the following for funding support, in part: 1) the New Zealand Royal Society Marsden Fund (contract UOC1506); 2) a Ministry of Business, Innovation and Employment Smart Ideas grant (contract UOCX1706); and 3) the Biomolecular Interactions Centre (University of Canterbury).
Open Access
Open access for this article was enabled by the participation of University of Melbourne in an all-inclusive Read & Publish pilot with Portland Press and the Biochemical Society under a transformative agreement with CAUL.
Author Contributions
D.M.W., R.C.J.D., and C.R.H. reviewed the literature, wrote the manuscript and created the figures.
Acknowledgements
We thank the editors from the Biochemical Society for the invitation to contribute.
Abbreviations
- 3DVA
3D Variability Analysis
- CAP
cyclic AMP (cAMP) receptor protein
- CryoDRGN
Cryo-Deep Reconstructing Generative Networks
- Cryo-EM
Cryo-electron microscopy
- CTD/NTD
C-terminal/ N-terminal domain
- DNA
Deoxyribonucleic acid
- EMDB
Electron Microscopy Data Bank
- kDa
Kilodalton
- PDB
Protein Data Bank
- RNAP
RNA polymerase
- RPc
RNA polymerase closed complex
- RPo
RNA polymerase open complex
- σ-factor
Sigma factor