Genes under control of super-enhancers are expressed at extremely high levels and are frequently associated with nuclear speckles. Recent data suggest that the high concentration of unphosphorylated RNA polymerase II (Pol II) and Mediator recruited to super-enhancers create phase-separated condensates. Transcription initiates within or at the surface of these phase-separated droplets and the phosphorylation of Pol II, associated with transcription initiation and elongation, dissociates Pol II from these domains leading to engagement with nuclear speckles, which are enriched with RNA processing factors. The transitioning of Pol II from transcription initiation domains to RNA processing domains effectively co-ordinates transcription and processing of highly expressed RNAs which are then rapidly exported into the cytoplasm.
Introduction
The nucleus is structured and contains membrane-less organelles (MLOs) and chromatin that is organized into active or inactive compartments, which are separated by chromosomal boundaries [1–3]. There is increasing evidence that super-enhancer controlled, highly expressed genes are transcribed in the context of transcription hubs, transcription factories, or phase-separated domains [4–6]. Furthermore, nuclear speckles, domains that are enriched with components of transcription and RNA processing machineries, were shown to associate with highly transcribed genes [7]. In this review/perspective, we discuss the evidence for transcription factories and/or super-enhancer mediated phase-separated transcription initiation domains and describe a co-ordinated association with nuclear speckles that ensures rapid processing and export of highly expressed RNA. In the following, we present the components and characteristics of the relevant nuclear MLOs and discuss the co-operation of distinct phase-separated domains in transcription, processing, and RNA export.
Phase separation and nuclear domains
Liquid–liquid phase separation (LLPS) is characterized by the formation of spherical domains that deform in flow and exhibit wetting, dripping, and fusion [8–10]. LLPS is mediated by proteins and RNA that engage in multiple weak interactions via charged or aromatic residues [11]. Macromolecular crowding contributes to colloidal phase separation and generation of biomolecular condensates [12,13]. Proteins containing intrinsically disordered regions (IDRs) and lacking extensive secondary structure (low complexity domains, LCDs) engage in a multitude of interactions that keep phase-separated domains dynamic and liquid-like [8–10]. Among the first examples of phase separation in biological systems was the description of condensate formation by RNA binding proteins [8,14]. Indeed, most nuclear MLOs contain RNAs, which have been shown to induce LLPS and to regulate the viscosity of these domains [10]. For example, the formation of the nucleolus, a domain specialized in the assembly and maturation of ribosomal subunits, is induced by ribosomal RNA (rRNA) [1,15,16]. RNA may be engaged in base-pairing interactions and in interactions with IDRs of RNA binding proteins thus generating mesh-like domains that concentrate and slow down the mobility/diffusion of components that specify the function of distinct MLOs [3,9]. It seems that unstructured RNA preferentially engages in LLPS, and often RNA helicases are involved in the formation of MLOs [11,17]. RNA helicase Ddx4 containing droplets, like germ granules or P-granules, concentrate single-stranded RNA and DNA but exclude double-stranded DNA [18]. In Drosophila embryos, the formation of histone locus bodies (HLBs), which mediate the expression of histone gene loci, depends on the promoters of the H3 and H4 histone genes [19]. Thus, the seeding mechanisms for at least a subset of nuclear LLPS domains appear to involve specific DNA elements that promote transcription, the active process of transcription, and nascent RNA generated from these specific gene loci.
Post-translational modifications including serine/threonine phosphorylation, arginine methylation, and lysine acetylation regulate the formation and/or maintenance of LLPS domains [17,20]. For instance, acetylation of the Ddx3 IDR, a component of stress granules, impairs LLPS, which may be due to reduced RNA binding [21]. MLOs contain constitutive molecules, which are required for the formation and maintenance of MLOs and client molecules, which move in and out of LLPS domains [17]. Post-translational modifications appear to regulate both classes of LLPS domain associated proteins.
The formation of MLOs allows and stimulates reactions that would otherwise be inefficient [10,13]. Moreover, MLOs sequester and protect critical molecules from undergoing reactions that are not favorable [10,13]. The concentration of specific molecules or combination of molecules within MLOs provides an optimal environment for complex and linked reactions. Previous studies have shown that PEG-based coacervates, which form colloid-rich viscous liquid phases, increased association of RNA polymerase with DNA and enhanced transcription rate up to 6-fold [22]. Thus, the coacervate matrix or the MLO environment could act as scaffolds that spatially organize enzymatic cascades and thus promote processivity and connectivity of enzymatic reactions.
Super-enhancer associated transcription initiation domains
Super-enhancers are powerful gene regulatory elements that mediate extremely high expression of target genes [23–26]. These complex activating DNA sequences are often composed of multiple DNase I hypersensitive sites (HSs) or extended accessible regions enriched with active chromatin marks [monomethylated histone 3 lysine 4 (H3K4me), and acetylated H3 at K27 (H3K27ac)], Mediator complex, and Pol II [4,26]. One of the best-characterized super-enhancer is the β-globin locus control region (LCR) which promotes orchestrated expression of β-type globin genes during development and differentiation of erythroid cells [27–30]. Originally, the LCR has been defined as a regulatory element capable of promoting transcription of linked genes in a copy-number dependent and position-independent manner in transgenic assays [29,30]. This is a somewhat artificial definition and the LCR appears to be functionally indistinguishable from super-enhancers. The β-globin LCR contains multiple HSs that operate together to mediate extremely high levels of globin gene expression. The LCR HSs are bound by a large number of ubiquitously expressed and tissue-restricted transcription factors, and were among the first enhancer elements shown to recruit Pol II and to transcribe non-coding RNA, now referred to as enhancer RNA (eRNA) [31–34]. It has been proposed that the LCR HSs constitute the primary sites for Pol II transcription complex recruitment and assembly and that elongation-competent Pol II complexes are transferred to strong basal promoter elements associated with the β-type globin gene promoters during transient looping interactions [32,33,35–37].
Recent evidence suggests that the large number of Mediator and Pol II transcription complexes recruited to super-enhancer associated HSs generate phase-separated domains [4,38]. The Mediator associated protein Brd4 contains an IDR and is able to induce phase separation [4]. Likewise, the C-terminal domain (CTD) of Pol II has also been shown to form phase-separated droplets [39]. The Pol II CTD consists of a heptapeptide that is repeated more than 50 times in mammalian cells [40]. The CTD heptapeptide contains three serine residues that are subject to phosphorylation during the transcription cycle [41]. The unphosphorylated form of Pol II is first recruited to transcription start sites and interacts with components of the basal transcription machinery [42,43]. Phosphorylation of the CTD on serine-5 by the CDK7 subunit of TFIIH disrupts these interactions and promotes transcription initiation [41,42]. Initial transcription is unstable and often aborted, particularly at enhancers [44,45]. Once a stable elongation complex forms, transcription is paused by negative elongation factors DSIF (DRB sensitivity inhibitory factor) and NELF (negative elongation factor) to allow capping of the 5′ end of the nascent transcript [46]. Next, the CDK9 kinase associated with transcription elongation factor pTEFb phosphorylates the serine-2 residue of the Pol II CTD as well as DSIF and NELF [46]. These phosphorylation events remove NELF and convert DSIF into a positive transcription elongation factor.
Several observations suggest that the phosphorylation of Pol II dissociates it from LLPS transcription initiation domains formed by super-enhancers [47]. Kwon et al. [48] demonstrated that proteins with LCDs, including the FET proteins FUS, EWS, and TAF15, form hydrogels that associate with the unphosphorylated but not with the phosphorylated form of Pol II. The authors proposed that the promoter and unphosphorylated Pol II is bound to a polymer (hydrogel) and that phosphorylation of Pol II allows escape from the promoter-associated hydrogel. More recently, Boehning et al. [39] analyzed phase separation properties of the Pol II CTD and found that upon phosphorylation Pol II disengages from phase-separated droplets. Finally, Guo et al. [49] analyzed the association of Pol II with Mediator condensates and with splicing factor SRSF1/SRSF2 condensates. The authors found that the unphosphorylated Pol II preferentially associates with Mediator droplets while the serine-2 phosphorylated Pol II preferentially associates with SRSF condensates.
Together these data suggest that Pol II is recruited to super-enhancers in the context of phase-separated domains and that phosphorylation of the Pol II CTD initiates transcription, simultaneously disengaging Pol II from these domains (Figure 1). This model is related to but deviates from the transcription factory model proposed by Cook and co-workers [50]. According to the transcription factory model, Pol II clusters in the nucleus and genes are recruited to and reeled through these clusters during transcription elongation. This process, as discussed by Cook and Marenduzzo [51], would leave the growing RNA un-entwined in contrast with a tracking and rotating polymerase. The Cook laboratory provided evidence supporting a reeling mechanism by inducing transcription of a very long gene (221 kb) and monitoring promoter position and ongoing transcription by RNA-FISH using high-resolution microscopy [52]. In contrast with these findings, the Grosveld laboratory provided evidence, using an mCherry-CDK9 (CTD serine-2 kinase) fusion protein, that transcription elongation occurs away from transcription factories [53]. We will revisit this issue later when we discuss the coordination of transcription with RNA processing.
Transcription factory versus super-enhancer mediated phase-separated transcription initiation domains.
Nuclear speckles and RNA processing
Most nuclear MLOs are enriched with proteins that specify their functions, e.g. ribosome assembly (Nucleolus), spliceosome assembly (Cajal Bodies), and histone mRNA processing (Histone Locus Bodies) [1]. In contrast with other nuclear MLOs, the function of nuclear speckles, also referred to as interchromatin granule clusters or SC35 domains, is not completely understood [54,55]. The inhibition of transcription increased the size of nuclear speckles, which led to the proposal that they represent storage sites for the mRNA processing and splicing machinery. Consistent with this model are observations showing that nuclear speckles are devoid of DNA and active transcription [56]. However, there is increasing evidence for the notion that speckles co-ordinate transcription, processing, and export of highly expressed mRNAs [7,55]. Early studies have shown clusters of hyperphosphorylated Pol II and BrU labeled transcripts associate with nuclear speckles [57,58]. Moreover, during the early stages of human cytomegalovirus (HCMV) infection, the viral genomes associate with speckles and the immediate early (IE) viral RNA is accumulated in these domains [59]. Maul and co-workers [59] introduced the concept of ‘immediate transcript environment', postulating that the HCMV IE protein IE86 functions as a nucleator of a ‘cloud', which may now be viewed as being caused by transcription-associated LLPS. The IE86 induced ‘cloud' is formed around HCMV input DNA, is positioned between ND10/PML nuclear bodies and nuclear speckles, and is enriched with transcription factors, thus creating favorable conditions for IE transcription. IE transcripts are among a few HCMV transcripts that are extensively spliced, suggesting that RNA processing (including splicing), at least partially, occurs in nuclear speckles, which contain all components required for these processes. In addition, the association with nuclear speckles elevates the export of these IE transcripts (see below).
Recent evidence suggests that gene-rich chromosomal domains with high level of transcription are associated with nuclear speckles [7,55]. Nuclear speckles contain the long non-coding RNA (lncRNA) Malat-1 (NEAT2), snRNAs, as well as polyA-RNA and are enriched with RNA processing proteins including Son and SC35 [60]. High-resolution RNA-FISH and immunofluorescence microscopy revealed that Malat-1 is located at the periphery of nuclear speckles while the processing factors Son and SC35 are located within the core [60].
Three groups recently investigated the association of genomic loci with nuclear speckles using novel high throughput technologies [7]. The Belmont laboratory used tyramide signal amplification followed by high throughput sequencing to identify genomic loci that are in close proximity to nuclear speckles [61]. In these studies, the authors used a horse-radish peroxidase conjugated antibody specific for the Son protein, which is highly enriched in nuclear speckles, to generate diffusible biotin tyramide. The biotin tyramide covalently associates with proteins, RNA, and DNA. After streptavidin pull-down, the DNA was subjected to sequencing. The data demonstrate that transcription ‘hot zones,’ characterized by the most highly expressed genes, as well as housekeeping genes, genes exhibiting low transcriptional pausing, and genes under the control of super-enhancers, localize to the nuclear interior and come in close proximity to nuclear speckles. Previous studies established that the genome is organized into topologically associating domains (TADs) with A-TADs representing active chromatin regions, while B-TADS containing repressed gene loci [55,62,63]. A-TADs are further divided into Type I and Type II compartments, with Type I A-TADs encompassing the most transcriptionally active chromatin regions [7,55,62,63]. Chen et al. [61] found that Type I A-TADs are particularly close to the nuclear speckles. The observation that genes with low transcription pausing rates are in close proximity to nuclear speckles is consistent with enrichment of proteins regulating the pause/release of Pol II, e.g. pTEFb and CDK12, in these domains [64–66].
Studies by the Guttman laboratory support the findings that highly transcribed genomic regions are associated with nuclear speckles [67]. The authors used a novel technology called SPRITE (split pool recognition of interactions by tag extension) which is a proximity ligation-independent approach for identifying chromosomal interactions. The technique is similar to chromatin conformation capture (3C) approaches but instead of proximity ligation, short barcodes are ligated to DNA or RNA fragments of cross-linked chromatin that is fractionated into 96 well plates. The process of fractionation and barcode ligation is repeated multiple times and at the end, the DNA is subjected to sequencing. Interacting (cross-linked) DNA or DNA/RNA fragments contain the same combination of barcodes. The study revealed that highly active gene loci frequently associate with spliceosomal RNA and Malat-1 suggesting proximity to nuclear speckles, which is enriched with these RNA species [60]. Interactions of highly active gene loci with nuclear speckles were verified by DNA-FISH and SC35 immunofluorescence microscopy. An additional interesting observation resulting from the study by Quinodoz et al. [67] was that the density of Pol II transcription events, rather than high transcription activity of individual genes, determined close association with nuclear speckles.
The third study by Chen et al. [68] used a novel method to map associations between specific RNAs and genomic loci (MARGI, mapping of RNA-genome interactions). MARGI includes cross-linking of RNA and DNA followed by chromatin fragmentation, proximity ligation, and high throughput sequencing. The authors focused on nuclear speckle associated RNA (nsaRNA) including snRNA and Malat-1, and compared the results with those obtained from CDK9 ChIP-seq experiments. The data are consistent with the other studies showing that nuclear speckles are mostly associated with highly transcribed gene loci in A-TAD compartments. In summary, it appears that although splicing occurs co-transcriptionally throughout the nucleoplasm, highly transcribed genomic regions are associated with nuclear speckles to guarantee rapid and efficient RNA processing.
Nuclear speckles and RNA export
In addition to being enriched with RNA processing factors, nuclear speckles also contain proteins that regulate RNA export. In yeast, Yra1, Sub2 and the THO complex couple RNA processing and nuclear export [69]. The corresponding complex in humans is called TREX, which is recruited to active genes and appears to travel along with elongating Pol II [70,71], consistent with its binding to the ser2/ser5 phosphorylated Pol II CTD. Recent data demonstrate that recruitment of TREX is regulated by transcription, 5′ capping, pre mRNA splicing, and m6A RNA modification [72]. Interestingly, several reports demonstrate direct interactions between the Mediator complex and Trex2 [73,74] linking enhancer regulated transcription initiation with nuclear export. Intronless mRNAs have been shown to transit through nuclear speckles in a manner regulated by splicing enhancer elements but independent of transcription [75]. This transit through speckles has been associated with a quality control step that prepares mRNAs for nuclear export. Within the speckles, the intronless mRNAs interact with TREX, which is required for the subsequent release of the mRNAs from the speckles. The nuclear mRNA export receptor NXF1, a nuclear pore component, mediates the association of TREX with nuclear speckles [75]. NXF1 co-ordinates the polyadenylation of mRNAs with nuclear export and down-regulation of NFX1 leads to the accumulation of Pol II at the 3′ end of genes [76]. Thus, at least a subset of RNAs are being processed and prepared for nuclear export within the nuclear speckle compartment.
Nuclear pore complex (NPC)
The nuclear pore is an integral part of the nuclear envelope and represents a large protein complex that mediates selective transport across nuclear membranes [77]. The NPC consists mainly of nucleoporins (Nups) which are either stable components of the NPC or diffuse into the nucleoplasm and contact the NPC transiently. Most Nups contain phenylalanine-glycine (FG) rich repeats which are intrinsically disordered and capable of forming LLPS [78]. The FG domains are thought to be largely responsible for forming a selective permeability barrier at the NPCs. This selectivity is at least in part due to interactions between the Nup FG domains and nuclear export receptors like NXF1.
There is increasing evidence suggesting that highly expressed genes are located in close proximity to the NPC [77]. This is somewhat counterintuitive to the view that activation of gene loci causes movement away from the nuclear periphery to the nuclear interior. Most of the nuclear periphery associates with inactive chromatin through associations with the nuclear lamina [1]. This is not true for the NPC environment which frequently associates with active chromatin. Pascual-Garcia et al. [79] demonstrated that Drosophila Nup98 associates with promoters and enhancers and induces looping between these elements. Furthermore, the looped enhancer/promoter configurations were found in close proximity to NPCs. Likewise, Ibarra et al. [80] demonstrated that in human cells Nup153 associates with super-enhancers at the nuclear periphery. Importantly, Nup153 deficiency caused a reduction in the expression of super-enhancer controlled genes. Liu et al. [81] targeted a biotinylatable inactive mutant of Cas9 to the human β-globin LCR in K562 cells to identify LCR-associated proteins. Surprisingly, both Nup98 and Nup153 were found to associate with LCR HS sites. Nup98 and Nup153 are part of the periphery of NPCs and are diffusible components known to associate with genes in the nuclear interior [82]. Pascual-Garcia and Capelson [83] suggest that because of the FG domains, Nup proteins could assist in forming phase-separated domains in the context of super-enhancers. It is not known yet how the Nups are recruited to super-enhancers. Perhaps the proximity to the nuclear pore and the presence of a large number of proteins with intrinsically disordered domains at super-enhancers could recruit Nups, which could assist in phase separation and/or in facilitating nuclear export through mediating proximity to nuclear speckles and NPCs.
Distinct phase-separated nuclear domains co-ordinate transcription, processing, and nuclear export of highly expressed RNA
The recent data on phase-separated domains involved in transcription and RNA processing suggest a highly co-ordinated effort to rapidly process and export abundant RNA generated from super-enhancer target genes. The concept of nuclear phase separation and particularly super-enhancer mediated phase separation regulating transcription initiation is novel and a subject of debate [84–86]. However, it is clear that super-enhancers generate domains enriched with co-activators, particularly Mediator, and Pol II. This may also apply to all other components involved in the assembly of Pol II transcription complexes. It seems that after transcription initiation, Pol II disengages from the super-enhancer domain/transcription factories and associates with nuclear speckles [49,87]. At this stage, Pol II either tracks along the gene or Pol II stably associates with the periphery of nuclear speckles and the DNA is reeled through the immobilized transcription elongation complex. The second model is consistent with data from the Cook laboratory suggesting reeling of genes through immobilized Pol II [52]. Many nuclear LLPS domains exclude double-stranded DNA suggesting that the DNA is recruited to and remains at the periphery of the phase-separated transcription initiation and nuclear speckle domains. Accordingly, Pol II does not have to enter another condensate after release from the super-enhancer/initiation domain as discussed by Portz and Shorter [88].
We propose that Pol II is first recruited to and assembled into active transcription complexes at super-enhancers and transferred to high-affinity Pol II promoters during transient interactions of gene promoters with the phase-separated super-enhancer domain [37]. After transcription initiation at the promoter and as a consequence of CTD phosphorylation, Pol II disengages from super-enhancers and associates with the periphery of nuclear speckles. During transcription elongation, RNA is processed and transits through nuclear speckles to gain export competency. Close proximity to NPCs and association with nuclear mRNA export receptors mediates rapid export into the cytoplasm (Figure 2).
Coordination of transcription, processing, and export of highly expressed RNA by super-enhancers, nuclear speckles, and nuclear pore complexes.
The proximity of highly expressed gene bodies to nuclear speckles also stimulates transcription elongation as it was shown that SC35, mainly enriched in speckles, stimulates transcription elongation of specific genes [89]. Furthermore, a recent report by Kim et al. [90] demonstrates that nascent transcript levels of heat-shock protein (HSP) A1B RNA increases upon interaction with nuclear speckles. This is consistent with data showing an increased association of hsp90a and hsp70 gene loci with nuclear speckles after induction of heat shock [91]. Previous studies have shown that the active β-globin gene comes in close proximity to nuclear speckles [92]. Interestingly, both β-globin and α-globin loci were shown to frequently associate with the same speckles suggesting coordination of processing of highly expressed globin RNAs. As discussed before, more recent studies also revealed frequent associations of super-enhancer regulated gene loci with nuclear speckles [7]. The multiple HSs associated with super-enhancers recruit Pol II and initiate transcription that often occurs bidirectionally. It is interesting to view this in the context of another study showing that the density of transcription initiation events, rather than the transcription activity of individual genes, mediates association with speckles [67]. The transcription at super-enhancers could facilitate associations with nuclear speckles through frequent transcription initiation events. Furthermore, enhancers and super-enhancers interact with nucleoporins which may contribute to their close proximity to nuclear pores [77]. Thus, it appears that transcription, processing, and export or highly expressed RNA are regulated by multiple LLPS nuclear domains that are in close proximity and are functionally interconnected.
Conclusions and open questions
There is increasing evidence for the formation of functionally specified phase-separated biological condensates in the nucleus. Furthermore, some of these condensates are functionally interconnected, as discussed here for transcription initiation, RNA processing, and RNA export phase-separated domains. The connection between nuclear speckles and NPCs is speculative, but clearly, there is a connection between the processing of RNA and the preparation for nuclear export. It seems that at least for highly expressed RNA, transcription, processing, and export are highly co-ordinated. This process may have evolved to prevent the accumulation of highly expressed RNA and/or to guarantee rapid processing and export. There is still much work to be done to clearly establish the co-ordinated processes described here. For example, what is the spatial and functional relationship between nuclear speckles and NPCs? What regulates the association of highly expressed genes with nuclear speckles? What is the relationship between super-enhancer mediated phase separation and transcription factories? Answers to these questions will soon be forthcoming due to improved imaging technologies and new sophisticated protocols aimed at mapping spatial relationships between DNA, RNA, and proteins.
Summary
Mediator and RNA polymerase II interact with super-enhancers and contribute to the formation of phase separated domains.
Transcription of super-enhancer target genes initiates within the context of the phase separated domain.
Phosphorylation of the RNA polymerase II C-terminal domain disengages the enzyme from the super-enhancer domain and leads to association with nuclear speckles.
Transcription elongation of super-enhancer target genes occurs at the periphery of nuclear speckles and the RNA is processed and prepared for nuclear export within the nuclear speckle domain.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Author Contribution
J.B. wrote the review. A.M.I. and A.G. contributed specific sections and edited the manuscript.
Acknowledgements
We apologize to colleagues whose work is not cited due to space limitation. We thank our colleagues in the Bungert and Ishov laboratories. Work in the Bungert laboratory is supported by the NIH (R56DK111439), by an ASH Bridge Grant Award, and a UF pilot project grant. Work in the Ishov laboratory is supported by the NIH (R01DE026707 and R21CA198820).
Abbreviations
- CTD
C-terminal domain
- DSIF
DRB sensitivity inhibitory factor
- eRNA
enhancer RNA
- HCMV
human cytomegalovirus
- HLBs
histone locus bodies
- HSP
heat-shock protein
- HSs
hypersensitive sites
- IDRs
intrinsically disordered regions
- IE
immediate early
- LCDs
low complexity domains
- LCR
locus control region
- LLPS
liquid–liquid phase separation
- MARGI
mapping of RNA-genome interactions
- MLOs
membrane-less organelles
- NELF
negative elongation factor
- NPC
nuclear pore complex
- Pol II
polymerase II
- SPRITE
split pool recognition of interactions by tag extension
- TADs
topologically associating domains
- TSA
tyramide signal amplification