Interaction scaffolds that selectively recognize disordered protein strongly shape protein interactomes. An important scaffold of this type that contributes to transcription is the TFIIS N-terminal domain (TND). The TND is a five-helical bundle that has no known enzymatic activity, but instead selectively reads intrinsically disordered sequences of other proteins. Here, we review the structural and functional properties of TNDs and their cognate disordered ligands known as TND-interacting motifs (TIMs). TNDs or TIMs are found in prominent members of the transcription machinery, including TFIIS, super elongation complex, SWI/SNF, Mediator, IWS1, SPT6, PP1-PNUTS phosphatase, elongin, H3K36me3 readers, the transcription factor MYC, and others. We also review how the TND interactome contributes to the regulation of transcription. Because the TND is the most significantly enriched fold among transcription elongation regulators, TND- and TIM-driven interactions have widespread roles in the regulation of many transcriptional processes.
Introduction
The emergence of diverse functional roles for intrinsically disordered regions (IDRs) has highlighted their enormous regulatory potential [1–5]. One important route by which IDRs exert distinct functions is their ability to mediate selective assembly with folded interaction platforms on their binding partners [2]. To date, a number of structurally conserved interaction scaffolds that selectively ‘read’ IDRs have been identified. This includes TFIIS N-terminal domains (TNDs) [6, 7], WD40 repeat (WDR) domains [8, 9], Src Homology 2 (SH2) [10], Src Homology 3 (SH3) [11], PDZ [12, 13], WW [14, 15], and many other domains. These domains represent ‘landing pads’ that bind disordered short linear motifs (SLiMs) through structurally conserved mechanisms [16].
IDRs are particularly enriched among human transcription and chromatin regulators [2]. Compared with other protein scaffolds that selectively bind IDRs, TNDs are found with remarkable selectively in proteins that regulate gene expression (Figure 1A) [6]. As a result of this selective enrichment, TNDs are uniquely poised to aid IDR-mediated assembly of the transcriptional machinery. Here, we summarize the structural features and interaction modes maintained by this protein domain family, as well as the roles they exert in transcription.
The TFIIS N-terminal domain (TND) is a structurally conserved fold enriched among transcription regulators.
TNDs are conserved binding scaffolds for disordered TND-interacting motifs
Conservation and structural properties
The TND is conserved from humans to yeast (Figure 1B–D), with notable expansion and diversification in multicellular organisms. While the human proteome harbors at least 15 TND-containing factors, only four TND-containing proteins are currently annotated in the budding or fission yeast proteomes (Figure 1D) [17, 18]. Structurally, the TND fold is a right-handed bundle of five helices, reminiscent of a pair of HEAT repeats (Figure 1B,C) [6, 19–21]. In several human as well as yeast proteins, the TND is immediately preceded by an N-terminal HEAT subdomain (Figure 1B,C) [6, 21, 22].
Such helical repeat domains are frequently found to be protein scaffolds utilized in large protein–protein complexes, for example, ATP-dependent chromatin remodelers [23–27] or protein kinases implicated in DNA repair [28, 29]. Their high structural stability relies on tight packing of helix-turn-helix motifs into a supercoiled arrangement. Four- to five-helix bundles like the TND are the minimal viable helical repeat domains enabling variability of interaction surfaces while retaining stability [30, 31]. While overall sequence conservation of TNDs is low (27% identity and 47% similarity for 71 annotated PROSITE PS51319 domains), the fold-stabilizing core residues are more invariant (41% identity and 67% similarity) [32]. In the human proteome, the strongest conservation is observed for buried hydrophobic residues in the domain core, while side chains from poorly conserved residues are solvent exposed [6] and therefore accessible to confer interaction specificity. Because there is currently no known catalytic activity associated with the TND fold, these domains are thought to act primarily as interaction platforms [6, 33].
The TND selectively reads short, disordered TND-interacting motifs
Detailed structural analysis of TND-mediated protein complexes revealed core elements of the binding interfaces that are structurally conserved across TND-containing factors and across species [6, 7, 21, 22, 34]. Each TND coordinates its interactome by two relatively shallow binding pockets (each representing ∼80 Å2) that accommodate bulky hydrophobic residues, as well as a positively charged surface patch containing 3–5 positively charged residues positioned between pockets (Figure 2A). This recurrent configuration gives rise to a characteristic charge patterning of the TND surface that accommodates a corresponding pattern of hydrophobic and negatively charged residues on TND-interacting motifs (TIMs). Correspondingly, TIMs engage TNDs through distinct motif features: an obligate α-helix and acidic linker sequence, and an optional FxGF motif (Figure 2B). While the α-helix anchors the TIM through conserved phenylalanine, valine, leucine, and isoleucine side-chains in the first hydrophobic binding pocket created by α3, α4, and α5 of TNDs, the FxGF motif occupies the second, more shallow binding pocket via its two phenylalanines (Figure 2C). The FxGF motif plays a unique role in the recognition of TNDs found in H3K36me3 readers LEDGF and HRP2, where it significantly contributes to TIM binding. However, many TIMs lack the FxGF motif and only engage the first pocket through the short α-helix (Figure 2D). The selectivity of each TND towards the TIMs lies in differences in the configurations of these hydrophobic pockets and charge patterns, particularly their compatibility with aliphatic residues and charge patterns on the TIM amino acid sequence.
TNDs are selective binding scaffolds for disordered TND-interacting motifs (TIMs).
In addition to aliphatic contacts from the α-helix and FxGF motifs, the flexible acidic linker also contributes to overall affinity. This variable linker spans 8–16 amino acid residues in length and is enriched in glutamate and aspartate residues that recognize the basic patch on TNDs. The acidic linker is also often enriched in glycine residues that confer flexibility, as well as serine and threonine residues that are often also negatively charged due to post-translational phosphorylation (Figure 2D) [6, 7, 33]. In particular, the acidic linkers of IWS1, KMT2A (also known as MLL1), JPO2 and other TIMs are phosphorylated by casein kinase 2 (CK2) [7]. TIM phosphorylation enhances affinity towards TND-containing binding partners, thereby enabling switching between low- and high-affinity states of these interactions [6, 7]. However, many questions regarding the regulatory roles of PTMs in the TND interactome remain open. For example: Is CK2 unique, or are there other kinases that phosphorylate TIMs? What structural features confer selectivity of kinases for distinct TIMs? Is TIM phosphorylation constitutive or regulated? If it is regulated, what phosphatases remove these post-translational marks? How does TIM phosphorylation contribute to different stages of transcription? Phosphorylation of the acidic linker represents a primary avenue for regulation of these interactions, therefore answering these exciting questions would expand our mechanistic understanding of how regulated TND:TIM interactions contribute to transcription.
The TND governs assembly of higher-order structures
Several proteins possess more than a single TIM. For example, SPT6, LEO1, JPO2 and CDC7-ASK contain each two such motifs and IWS1 harbors three distinct TIMs in series [6, 7]. Therefore, these proteins have the capacity to regulate higher-order complex assemblies by engaging multiple TND-containing factors through their TIMs at the same time. Currently, the best characterized example of multiprotein complex assembly through these surfaces is human IWS1. IWS1 contains three TIMs, but also harbors its own TND and simultaneously engages these surfaces to bring together four other transcription regulators through TND:TIM interactions [6]. Importantly, regulation of higher-order structures of these factors can also be enhanced through multimerization of TND-containing factors. For example, dimerization of LEDGF is stabilized by TND domain swapping and additional electrostatic ‘stapling’ of the negatively charged α helix formed in the IDR C-terminal to the TND [35]. Importantly, the TIM interaction sites on the TNDs remain structurally unperturbed by domain swapping [35]. Such an arrangement has the potential to aid assembly of higher-order structures.
Transcriptional roles of TND across proteins and species
The TFIIS TND links transcription regulators to RNAP2
The transcription elongation factor TFIIS increases the overall transcription rate of RNAP2 by rescuing backtracked polymerases [36, 37]. TFIIS is well conserved from human to yeast, however; homologs are also found in archaea and in some viral genomes [20, 38]. Despite the eponymous naming of the TND from TFIIS (where it is also known as domain I, or LW domain), the mechanisms by which this domain contributes to TFIIS-dependent regulation of transcription remained unclear for many years, in part because the TND is not required for backtrack rescue in vitro [39]. Mutational analysis of the human TFIIS TND linked this domain with the nuclear localization of TFIIS [40]. Additionally, the TFIIS TND was implicated in early transcription events distinct from TFIIS's role in elongation, in particular, for efficient formation of RNAP2 preinitiation complexes and promoter recruitment [41]. Even though CryoEM revealed the structure of TFIIS bound to other transcription elongation complexes and RNAP2 [42–45], the TFIIS TND remained a dynamic component of these complexes and contacts to other transcription regulators mediated by this domain remained hidden. Similar to other members of the TND family, the TFIIS TND acts as an interaction scaffold for disordered TIMs, including motifs in transcriptional regulators IWS1, LEO1, PAF1 and others [6]. The TFIIS TND directly links these factors to RNAP2, thereby mediating their proximity to the transcriptional machinery, which is expected to influence their functional roles (Figure 3).
TND-mediated interactions govern many transcriptional and co-transcriptional processes.
The TND in IWS1 links transcription and mRNA processing machinery
IWS1 (Interacts with SPT6) is a transcription elongation regulator conserved from human to yeast, where the ortholog is known as Spn1. Both human IWS1 and yeast Spn1 harbor a TND, which recognizes a disordered TIM in the histone chaperone SPT6 (SUPT6H or Spt6) [6, 22]. While other TNDs recognize multiple TIMs with comparable affinity, the IWS1 TND has >200-fold higher affinity to SPT6 TIM than other measured interactions, and is thus the most stable complex supported by the IWS1 TND identified so far [6]. In mammalian cells, the IWS1:SPT6 complex was implicated in elongation-coupled placement of H3K36me3, a signature of active transcription written by the histone methyltransferase SETD2 [46]. In yeast, the ortholog Spn1 influences methylation of both H3K36 and H3K4 across the genome and acts as a histone chaperone at highly expressed genes [46, 47]. Interestingly, Spn1 is not required for the interaction of Spt6 with RNAP2, but rather plays a role in optimal Spt6 recruitment to chromatin [47]. In human cells, IWS1 localizes at actively transcribed genes, with peak occupancy close to the transcription start site [6] and, together with SPT6 and RNAP2, recruits mRNA processing factors including ALYREF/THOC4 and EXOSC10 to ensure proper mRNA maturation and export (Figure 3) [48].
A recent structure of the yeast RNAP2 elongation complex revealed that the Spn1 TND preceded by HEAT subdomain is recruited to RNAP2 through association with the Spt5 NGN and KOW2 domains using an interface distinct from the one needed for association with TIMs, leaving the TIM binding site open for interaction with Spt6 [49]. Importantly, the IWS1-TND:SPT6-TIM interaction interface in the context of fully assembled elongation complexes and RNAP2 is structurally similar to the binary complex resolved by protein NMR or crystallography [6, 21, 22, 49], confirming that the binary TND:TIM interactions exist in the context of larger assembled complexes.
As the only structured domain of IWS1, the TND is localized in the middle of the IWS1 sequence and is surrounded by IDRs. The disordered region N-terminal to the IWS1 TND harbors a series of three unique TIMs that selectively interact with different TND-containing factors: While TNDs from TFIIS and ELOA compete for the TIM1 of IWS1, TIM2 is recognized by the PP1-PNUTS phosphatase TND, and the TNDs of H3K36me3 readers LEDGF and HRP2 compete for TIM3 [6]. Additionally, the IWS1 TND independently associates with SPT6 via its TIM [6]. Therefore, IWS1 acts as a central factor that coordinates many elongation and RNA processing factors.
The TND in mediator subunit MED26 enables molecular switching between initiation and elongation
The multiprotein Mediator complex is conserved in eukaryotes [50], where it serves as a scaffold for the assembly of a functional preinitiation complex and as a bridge communicating information from gene-specific regulatory proteins to the basal RNAP2 transcription machinery. The TND within this complex is found in an N-terminal portion of a metazoan-specific subunit MED26 [51], hence the utilization of the Mediator TND is not structurally conserved to yeast. In human cells, MED26 is recruited to the Mediator complex through an interaction between its C-terminal domain and the MED4/7 subunits, which leaves the TND accessible for interactions with accessory proteins [50]. A single binding site on the MED26 TND is employed in two distinct contexts: At distinct moments, the TND mediates interactions either by recruiting super-elongation complex containing ELL/EAF family members through their disordered TIMs, or by associating with TFIID and elongation complexes [6, 51, 52]. Mutation of the interaction site on the MED26 TND does not affect Mediator-dependent binding of TFIID to the promoter, and hence the MED26 TND is not exclusively responsible for Mediator's interaction with TFIID. However this mutation does prevent Mediator from recruiting RNAP2 elongation factors [51]. Therefore, the MED26 TND was proposed to participate in molecular signaling activity that instructs RNAP2 to transition from initiation into productive elongation (Figure 3).
TNDs in H3K36me3 readers LEDGF (PSIP1) and HRP2 (HDGFL2) mediate regulation of chromatin structure
Both LEDGF and HRP2 are chromatin readers that each contain a TND and Pro-Trp-Trp-Pro (PWWP) domain that recognizes H3K36me2/3 methylated histone tails [53, 54]. The TNDs of both proteins directly interact with TIMs in transcription regulators, including the KMT2A histone methyltransferase [55, 56], IWS1 [6, 33] or JPO2 [33]. LEDGF also directly interacts with TIMs in MED1 [7] and CDC7-ASK [7]. As described in further detail below, the HRP2 TND additionally binds a TIM in DPF3a [57]. All currently known interaction partners of LEDGF and HRP2 TNDs possess the FxGF portion on their TIMs, where it is essential for their interaction. This finding suggests the FxGF portion may be generally required for interaction with these TNDs.
Although LEDGF and HRP2 share some functional redundancy, the shared and unique roles of these two proteins remain to be fully understood. Both proteins influence RNAP2 transcription elongation by functioning as histone chaperones [58]. In differentiated myoblasts, these chromatin readers are required for efficient transcription elongation genome-wide [58], where they functionally substitute for loss of histone chaperone activity by the FACT complex at the +1 nucleosome. Moreover, transcription elongation defects similar to genetic depletion of HRP2 and LEDGF were observed upon mutation of the IWS1 TIM that selectively recognizes HRP2 and LEDGF TNDs [6]. Affected genes similarly displayed increased RNAP2 pausing near the +1 nucleosome, suggesting that the contributions of these H3K36me3 readers towards pause release near the +1 nucleosome is governed in part through their interaction with IWS1 (Figure 3).
The HRP2 TND also contributes to regulation of chromatin structure through its interactions with a TIM in DPF3a, a subunit of SWI/SNF chromatin remodeling complexes [57]. This activity of HRP2 is dependent on the H3K36me3 mark and is regulated by phosphorylation of DPF3a, which enhances interaction with the HRP2 TND. Importantly, HRP2:DPF3a activity is essential for myogenesis and muscle regeneration in vivo. Its ability to recruit SWI/SNF ATPase activity [57] suggests that LEDGF and HRP2 TNDs may have pleiotropic molecular functions.
The ELOA TND acts as mediator loading platform
Elongin is an RNAP2-associated complex that is conserved to nematodes. While the C-terminus of ELOA (Elongin A) enables interaction with other subunits of the Elongin complex, the N-terminal region of ELOA harbors a TND that is accessible for other factors and complexes [59, 60]. In vivo, ELOA regulates RNAP2 promoter proximal pausing [61] and acts as a substrate recognition subunit of a Cullin-RING E3 ubiquitin ligase that targets stalled RNAP2 and promotes RNAP2 polyubiquitination and proteasomal degradation [62].
Interestingly, the ubiquitination activity of ELOA is independent of its elongation regulatory activity in vivo [63]. Even though the TND of isolated ELOA is dispensable for transcriptional activation in vitro [59], it directly interacts with TIMs conserved in IWS1, PAF1, LEO1, MED13 and other transcription elongation regulators [6]. Importantly, the ELOA TND directly links this protein to purified Mediator and facilitates recruitment of Mediator complex to promoters of stress response genes (Figure 3) [64], highlighting the role of this domain as a linker between different transcription regulatory machines.
The PP1-PNUTS phosphatase TND regulates protein stability and transcription rate
PP1-PNUTS serine/threonine phosphatase is a negative regulator of RNAP2 elongation rate [65] that also plays a role in transition between transcription stages and recycling transcriptional machinery [65–67], control of chromatin structure [68, 69], cell cycle progression [70–72] and many other cellular processes. The TND is located in the PPP1R10 subunit, also known as PNUTS, p99, FB19, or CAT53. PPP1R10 is a scaffold protein mediating the formation of the phosphatase [69]. The PPP1R10 directly interacts with TIMs in transcription elongation regulators including IWS1, SPT6 and PAF1 [6], as well as the transcription factor MYC [34]. While the exact regulatory roles exerted by association of PPP1R10 with the transcription elongation factors IWS1, SPT6, or PAF1 remains unclear, the function of MYC and PPP1R10 have been evaluated due to the prominent role of MYC in cancer. MYC and PP1-PNUTS phosphatase interact across multiple cell types and co-occupy MYC target gene promoters [73]. Disruption of PP1 activity results in MYC hyperphosphorylation, which compromises its ability to bind to chromatin and leads to reduction in MYC levels due to proteasomal degradation (Figure 3) [73]. Interestingly, PPP1R10 and MYC are co-amplified in breast cancer cells [73], suggesting that elevated PP1-PNUTS expression may confer a growth advantage by increasing MYC protein stability.
In addition to interactions mediated through the TND:TIM module, an N-terminal fragment of PPP1R10 containing the TND interacts directly with WDR82 and TOX4 [69]. While the PPP1R10 interaction with WDR82 prevents transcription–replication conflicts by promoting RNAP2 degradation [74], the association with TOX4 restricts pause release in early elongation and promotes late elongation [75], both via regulation of the phospho-state of the RNAP2 CTD. The exact mechanism of association WDR82 and TOX4 with PPP1R10 remains uncertain, however, the PPP1R10 TND may act as an interaction platform supporting these processes in a manner similar to its interaction with MYC.
The TND in disease and as a therapeutic target
TNDs and TIMs are present in proteins that represent the core of transcriptional machinery, and the factors that harbor them are generally essential. Like other pan-essential proteins, TND- and TIM-containing factors are also infrequently associated with disease-related mutations, suggesting a degree of protection from mutation and underscoring their importance as regulators of basic cellular functions. However, there are many instances when the endogenous activities of these proteins are hijacked in disease settings described below.
Viral mimicry
Due to their short length and simple interaction modes, short linear motifs like TIMs are often hijacked by viruses [76, 77]. Indeed, LEDGF and HRP2 have generated considerable interest because the TNDs of these H3K36me2/3 readers are hijacked by lentiviral integrases [78, 79]. These readers act as molecular tethers for viral pre-integration complexes, which biases viral integration into the bodies of actively transcribed genes in the host chromatin. Interestingly, HIV-1 integrase has higher affinity for the LEDGF TND compared with the HRP2 TND, and hence HIV-1 primarily uses LEDGF as an integration cofactor. However, HRP2 is also sufficient to guide site selection for viral integration in the absence of LEDGF [78]. Small-molecule antivirals known as LEDGINs that target the HIV-1 integrase and disrupt its interaction with the TND were successfully developed and currently serve as an important research tool with potential future clinical application [80].
Deregulation in cancer
As mentioned above, the PPP1R10 phosphatase subunit regulates MYC phosphorylation and stability by directly interacting with the TIM in MYC. Indeed, PP1-PNUTS expression is amplified in several cancer settings, including breast [73] and prostate cancers [81], where PPP1R10 protein levels are predictor of poor prognosis. Therefore, the PPP1R10:MYC interaction represents an interesting potential target in these cancer settings. Additionally, the chromatin tethering role of HRP2 and LEDGF are hijacked in acute leukemia, as their TNDs directly interact with oncogenic KMT2A fusions [55, 56, 82]. Interestingly, both of these TNDs have similar affinities to the TIMs in KMT2A fusions, however, while LEDGF is crucial for leukemic transformation [55, 83, 84], HRP2 is not required [56]. Separately, the interaction between LEDGF and JPO2 is a potential therapeutic target in medulloblastoma, due to its ability to promote AKT signaling [85]. HRP2 is also frequently overexpressed in human hepatocellular carcinoma tissues, where the HRP2:IWS1 complex promotes cell growth by enhancing expression of key oncogenes [86].
Small-molecule targeting of the TND
Targeting of specific TND:TIM complexes may be beneficial in diverse disease settings. However, successful protein–protein inhibitors frequently target deep grooves or pockets rather than shallow surfaces like TIM binding sites on TNDs. Additionally, given the close resemblance of all TND:TIM protein complexes, designing small molecules to selectively target a single TND:TIM surface represents a challenge. Nevertheless, targeting the disease-related activities supported by TND:TIM modules may be achieved by degradation of full-length proteins using PROTACs [87–89] designed for recognition of different parts of these proteins, or by design of small covalent molecules selectively recognizing disordered TIMs, similar to those that were recently developed for targeting MYC [90].
Concluding remarks
The conservation, diversification, and widespread utilization of TNDs underscores the functional importance of this ancient scaffold for assembly of the transcriptional machinery. As a result, addressing how the interactomes of individual TNDs are regulated to coordinate the transcription machinery represents a promising research direction. More generally, a key frontier for molecular biology is to decipher the interactions between disordered sequences and folded protein domains. The identification of TNDs as selective interaction platforms for disordered TIMs highlights one avenue by which disordered protein can influence cellular activities with high specificity by engaging in selective, well-defined interactions. However, many more motifs have been predicted to engage a variety of folded domains [91]. For this reason, identifying the underlying logic and grammar for these many interactions, as well as their influence on subnuclear organization, remain important goals.
Perspectives
TFIIS N-terminal domains (TNDs) are conserved and have diverse functional roles in many prominent regulators of transcription.
TNDs mediate specific interactions with intrinsically disordered motifs called TIMs found in other transcription regulators. Interactions between TNDs and TIMs guide the organization of the transcription machinery.
Identifying the functional grammar and spatial organization of TND:TIM interactions represents an important future direction. Additionally, TND:TIM contacts may enable structural characterization of higher-order assemblies mediated by these interaction modules.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
This work was supported by grants from the NIH (R35GM137996 to H.C.H.), the Cancer Prevention and Research Institute of Texas (RR170036 to H.C.H.), GACR grant (22-03028S to V.V.) and Chemical Biology for Drugging Undruggable Targets (ChemBioDrug) grant CZ.02.1.01/0.0/0.0/16_019/0000729 (V.V.).
Author Contributions
K.C.: Conceptualization, Writing — Original Draft, Writing — Review and Editing, Visualization; V.V.: Writing — Review and Editing, Supervision, Funding acquisition; H.C.H.: Writing — Review and Editing, Visualization, Supervision, Funding acquisition.
Acknowledgements
We thank members of the Hodges lab (BCM) and the Laboratory of Structural Biology (IOCB) for helpful feedback during manuscript preparation.
Abbreviations
- CTD
C-terminal domain
- HEAT
Helical scaffold, acronym named for Huntingtin elongation factor 3 (EF3), protein phosphatase 2A (PP2A), and TOR1
- IDR
Intrinsically disordered region
- PDZ
Domain, acronym named for Post synaptic density protein (PSD95), Drosophila disc large tumor suppressor (Dlg1), and Zonula occludens-1 protein (zo-1)
- PWWP
proline — tryptophan — tryptophan — proline motif containing domain
- RNAP2
RNA polymerase 2
- SH2
Src Homology 2 domain
- SH3
Src Homology 2 domain
- SLiM
Short linear motif
- TIM
TND-interacting motif
- TND
TFIIS N-terminal domain
- WD40
tryptophan-aspartic acid dipeptide containing 40 amino acid long structural motif
- WDR
WD40 repeat
- WW
tryptophan-tryptophan containing structural motif