Transposable elements (TEs) constitute major fractions of plant genomes. Their potential to be mobile provides them with the capacity to cause major genome rearrangements. Those effects are potentially deleterious and enforced the evolution of epigenetic suppressive mechanisms controlling TE activity. However, beyond their deleterious effects, TE insertions can be neutral or even advantageous for the host, leading to long-term retention of TEs in the host genome. Indeed, TEs are increasingly recognized as major drivers of evolutionary novelties by regulating the expression of nearby genes. TEs frequently contain binding motifs for transcription factors and capture binding motifs during transposition, which they spread through the genome by transposition. Thus, TEs drive the evolution and diversification of gene regulatory networks by recruiting lineage-specific targets under the regulatory control of specific transcription factors. This process can explain the rapid and repeated evolution of developmental novelties, such as C4 photosynthesis and a wide spectrum of stress responses in plants. It also underpins the convergent evolution of embryo nourishing tissues, the placenta in mammals and the endosperm in flowering plants. Furthermore, the gene regulatory network underlying flower development has also been largely reshaped by TE-mediated recruitment of regulatory elements; some of them being preserved across long evolutionary timescales. In this review, we highlight the potential role of TEs as evolutionary toolkits in plants by showcasing examples of TE-mediated evolutionary novelties.
Introduction
Transposable elements (TEs) are parasitic self-replicating elements. They contribute to large fractions of host genomes as they continuously amplify and disperse themselves into different loci. TEs also have been reshaping genomes by causing major genome rearrangements [1–4].
There are two major classes of TEs that differ in their transposition mechanism [1,5]. Class I TEs, or retrotransposons, often have long terminal repeats (LTRs) at each end and transpose via an RNA intermediate that is reverse-transcribed and then integrates back into the genome. Non-LTR retrotransposons include long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs). Class II TEs, or DNA transposons, are often flanked by terminal inverted repeats (TIRs) and rely on DNA intermediates for transposition by employing a cut and paste mechanism. Helitrons are class II TEs that do not contain TIRs and transpose by a rolling circle replication mechanism by inserting a single-stranded circular DNA intermediate into a new genomic locus [6–8]. Both classes of TEs include autonomous and non-autonomous elements. While autonomous elements encode the complete set of enzymes required for transposition, non-autonomous elements rely on the enzymes of autonomous TEs to transpose [9]. Based on defined criteria, class I and II TEs can be subdivided into multiple superfamilies [10].
TEs wire transcriptional networks
TE remobilization can cause multiple effects; the most direct effects are mutations due to TE insertions into genes or regulatory elements and chromosome instability [2,9]. These potential deleterious effects caused by TE remobilization have enforced the evolution of multiple epigenetic mechanisms to recognize and silence TEs and TE-derived repetitive sequences [11–15]. Nevertheless, TE insertions can be neutral or even advantageous for the host, leading to long-term retention of TEs in the host genome. The latter case, referred to as TE exaptation, or TE domestication, is a process where TEs become part of the regulatory and/or coding region of host genes and contribute to diverse phenotypes and important functions [16–18].
Insertions of TEs can have a strong regulatory impact on the expression of adjacent genes. This can be mediated by multiple mechanisms, including the alteration of chromatin structure, disruption of cis-regulatory elements, or providing novel regulatory elements [9,19–23]. In this review, we will focus on the evolutionary contribution of TEs to generate new regulatory networks and thus developmental novelties by distributing regulatory elements in plants. For more details on TE-mediated gene regulation and other aspects of TE domestication we refer the reader to recent reviews covering these topics [19,24].
TEs have long been proposed as attractive vehicles accounting for the evolution of eukaryotic gene regulatory networks [25,26]. This idea has gained support from studies in animals and plants, linking the expansion of certain TE families with the dispersal of regulatory elements required for specific developmental programs [27–30]. One striking example in animals is the evolution of the mammalian placenta, the nourishing tissue supporting embryo growth, which has been linked to the activities of endogenous retroviruses, or LTR retrotransposons; many families of which have been domesticated to generate novel regulatory genes or to convey regulatory elements [31,32]. Out of many placenta-related TEs, two distinct groups of retroviral LTR retrotransposons, THE1B and RLTR13D5, have contributed to the lineage-specific dispersal of hundreds of placenta-specific regulatory elements in anthropoid primates and mice, respectively, which likely contributed to the rapid morphological diversification of the mammalian placenta (Figure 1) [33,34].
Convergent evolution of the mammalian placenta and the Arabidopsis endosperm.
In the Arabidopsis lineage, an ancestral helitron containing transcription factor binding motifs (TFBMs) for the PHE1 MADS-box transcription factor transposed into the promoter regions of other genes, transferring the TFBM and bringing those genes under the regulatory control of PHE1 [35]. In mammals, the mouse ancestral genome, after divergence from the rat lineage, acquired an LTR retrotransposon harboring three TFBMs in the LTR region recognized by Cdx2/Eomes/Elf5. This LTR family, RLTR13D5, became activated during mouse lineage evolution and subsequently distributed the three TFBMs into multiple locations, recruiting lineage-specific targets into the placental regulatory network [33]. In the anthropoid primate lineage, a different LTR family, THE1B, became amplified and plays a regulatory role for a suite of downstream genes. Noticeably, the LTR edge of one THE1B member joined with local genomic sequences and gave rise to a DLX3 binding motif, which is recognized by DLX3 and activates the key placental gene CRH in primates [34].
In the Arabidopsis lineage, an ancestral helitron containing transcription factor binding motifs (TFBMs) for the PHE1 MADS-box transcription factor transposed into the promoter regions of other genes, transferring the TFBM and bringing those genes under the regulatory control of PHE1 [35]. In mammals, the mouse ancestral genome, after divergence from the rat lineage, acquired an LTR retrotransposon harboring three TFBMs in the LTR region recognized by Cdx2/Eomes/Elf5. This LTR family, RLTR13D5, became activated during mouse lineage evolution and subsequently distributed the three TFBMs into multiple locations, recruiting lineage-specific targets into the placental regulatory network [33]. In the anthropoid primate lineage, a different LTR family, THE1B, became amplified and plays a regulatory role for a suite of downstream genes. Noticeably, the LTR edge of one THE1B member joined with local genomic sequences and gave rise to a DLX3 binding motif, which is recognized by DLX3 and activates the key placental gene CRH in primates [34].
Interestingly, the convergent evolution of the endosperm, the nourishing tissue in flowering plants, was also promoted by TE transpositions (Figure 1). The endosperm-specific type I MADS-box transcription factor PHERES1 (PHE1) in Arabidopsis thaliana binds to two kinds of motifs, one resembling canonical MADS-box associated CArG motifs, while the other representing partially modified CArG motifs, which are both enriched in helitron elements [35]. Helitron sequences in the A. thaliana genome have a higher density of CArG motif sequences than other TE sequences and the rest of the genome, consistent with their proposed role as vehicles of such binding motifs. Many targeted genes with helitron-derived PHE1 binding motifs have high expression in the endosperm, reflecting endosperm-specific regulation by PHE1. Helitrons containing PHE1 binding motifs are present in the promoter regions of several PHE1-targeted orthologs in the Brassicaceae, indicating ancestral insertion events [35]. It is tempting to speculate that the distribution of type I MADS-box transcription factor binding motifs by helitron TEs contributed to endosperm evolution by allowing the recruitment of crucial developmental genes into a common transcriptional network, analogous to the TE-driven diversification of the mammalian placenta (Figure 1).
Besides the helitron-mediated amplification of binding motifs targeted by the type I MADS-box transcription factor PHE1, the binding motifs recognized by the type II MADS-box transcription factor SEPALLATA3 (SEP3) were transposed by TEs (Figure 2), which was initially recognized in A. lyrata [36] and, based on a new method to detect degenerated TE sequences, also found in A. thaliana [37]. This method revealed large numbers of remnant TEs in A. thaliana which are close to genes and enriched for transcription factor binding motifs, in agreement with the hypothesis that domesticated regions of TEs are less likely to decay [37]. Importantly, some of the TE-derived regulatory elements are conserved across rosids and overrepresented in the flowering gene regulatory network, suggesting a deeply conserved functional role for these sequences since the spread of flowering plants in the Cretaceous [37]. In contrast to A. thaliana, in A. lyrata, many TEs with SEP3 recognized motifs are not yet decayed and could therefore be detected [36], suggesting recent, lineage-specific transposition events. Indeed, a repetitive element family similar to Copia-type LTR retrotransposons containing SEP3 bound motifs are present with only nine copies in A. thaliana, contrasting with more than 169 copies in A. lyrata. It is nevertheless also possible that this repeat family was lost in A. thaliana [36]. Genes being specifically controlled by TE-transposed SEP3 binding motifs in A. lyrata have potential functional roles in reproductive development [36]; whether they determine morphological differences between both species like flower organ size, remains an attractive hypothesis to be tested.
TEs rewired the regulatory network modulating flower development through transposition during rosid evolution.
Type II MADS-box transcription factors (TFs), including SEP3, are a major group of TFs regulating flower development. Their binding motifs (TFBMs) have been amplified by several TE families during the evolution of rosids and Brassicaceae lineages. New genes are continuously recruited into the flower regulatory network by novel TFBMs located in identified TE sequences. Some of the ancestral TEs have decayed, while the TFBMs were retained when functionally relevant [37]. Some TEs recently amplified in A. lyrata and generated novel targets, but not A. thaliana [36].
Type II MADS-box transcription factors (TFs), including SEP3, are a major group of TFs regulating flower development. Their binding motifs (TFBMs) have been amplified by several TE families during the evolution of rosids and Brassicaceae lineages. New genes are continuously recruited into the flower regulatory network by novel TFBMs located in identified TE sequences. Some of the ancestral TEs have decayed, while the TFBMs were retained when functionally relevant [37]. Some TEs recently amplified in A. lyrata and generated novel targets, but not A. thaliana [36].
Transposition of MADS-box transcription factor binding motifs is not exceptional. A recent study proposed that TE transposition underpins the evolution of C4 photosynthesis among grasses [38]. C4 photosynthesis repeatedly evolved from C3 photosynthesis and is characterized by higher efficiency in light, water, and nitrogen usage [39–41]. This is achieved by a special leaf morphology, two types of chloroplasts, as well as a series of enzymes specifically expressed in different leaf tissues of C4 plants [40–42]. Based on a systematic genome-wide survey between maize and rice, it was found that the evolution of C4 photosynthesis was associated with the recruitment of more than 1000 C4-associated motifs from pre-existing cis-regulatory elements originally belonging to non-photosynthetic genes into photosynthetic genes. Strikingly, more than 60% of the potentially recruited motifs are located in TEs, suggesting TE transposition has facilitated the distribution of C4-associated motifs. Both retrotransposons and DNA transposons, in particular short miniature inverted-repeat TEs (MITEs) were involved in this process (Figure 3). Strikingly, despite the large differences between C3 and C4 photosynthesis, C4 photosynthesis has independently and rapidly evolved in more than 60 lineages [40]. The dispersal of pre-existing motifs by TEs provides a compelling mechanism underpinning the repeated evolution of C4 photosynthesis. Importantly, C4 photosynthesis is an adaptation of the C3 pathway to dry environments by allowing to improve photosynthetic efficiency and minimizing water loss [40,41,43]. Therefore, the TE-driven evolution of C4 photosynthesis provides a striking example how TE transposition benefits host adaptation.
TEs promote the transition from C3 to C4 photosynthesis by transposition of transcription factor binding motifs.
TEs distributed C4-related transcription factor binding motifs (TFBMs) from pre-existing cis-regulatory elements of non-photosynthetic genes, generating a novel regulatory network of C4 photosynthetic genes [38].
TEs distributed C4-related transcription factor binding motifs (TFBMs) from pre-existing cis-regulatory elements of non-photosynthetic genes, generating a novel regulatory network of C4 photosynthetic genes [38].
TEs promote adaptation to environmental stresses
The response to stress is shaped by natural selection, leading to changes in the transcriptional regulation of a variety of genes. Based on her fundamental work in maize, Barbara McClintock was the first one to propose that stress activates TEs, which generates mutations that promote the adaptation to stress [44]. It is now generally accepted that TEs indeed play a key role in translating environmental cues into changes at the genomic level (Figure 4). The ability of TEs to respond to stress is in many cases mediated by cis-regulatory elements present in the LTR regions of retrotransposons that trigger expression of TEs in response to a particular stimulus [45]. Several retrotransposons contain regulatory sequences with similarity to well-characterized motifs required for the activation of stress-responsive genes [46–51]. Importantly, the insertion of LTR TEs nearby genes can also confer stress-mediated activation to such genes [52,53], revealing that gene promoters co-opted sequences from LTRs for their own regulation.
TEs contribute to stress adaptation.
TEs with stress-responsive transcription factor binding motifs (TFBMs) can amplify themselves in response to environmental stresses, thereby distributing TE-born responsive elements to new locations and generating new targets controlled by the stress responsive transcription factors (TFs).
TEs with stress-responsive transcription factor binding motifs (TFBMs) can amplify themselves in response to environmental stresses, thereby distributing TE-born responsive elements to new locations and generating new targets controlled by the stress responsive transcription factors (TFs).
In addition to LTR retrotransposons, DNA transposons such as MITEs also have been demonstrated to confer stress-responsive transcription to proximal genes, such as the highly proliferative mPing MITE in rice, which controls a stress-inducible network of genes conferring tolerance to cold and salt [54]. mPing is likely to function as an enhancer element, since it is able to act regardless of whether it is inserted at the 5′ or the 3′ region of target genes. MITEs have rapidly and massively amplified throughout eukaryotic genomes, suggesting that similar stress-responsive networks have evolved in other organisms as well [54].
In maize, different TE families are associated with stress-responsive gene expression, including all major TE families, such as TIR DNA transposons, Gypsy and Copia LTR elements, and LINEs [55]. Most of the TEs associated with stress-responsive genes contain the consensus motif for the stress-responsive DREB/CBF transcription factors. Notably, allelic variation for TE insertions is strongly associated with variation in stress-responsive gene expression, linking TEs to adaptive stress responses [55].
TEs do not only confer stress-responsive gene activation, but can potentially amplify the stress response by starting to proliferate under stress. For example, the tomato Copia LTR retrotransposon family Rider accumulates transcripts and transposition intermediates in the form of extrachromosomal DNA in response to drought stress [56]. There are several cis-regulatory elements in the LTR regions of Rider linked to abscisic acid (ABA) signaling that may confer drought responsive expression to neighboring genes. It is tempting to speculate that drought-responsive transcriptional activation induces transposition and by that amplifies drought-responsive regulatory elements throughout the genome. Similarly, ONSEN, another Copia retrotransposon in A. thaliana, has acquired a heat-responsive element that is recognized by heat shock transcription factors (HSFs), resulting in heat-responsive transcription and production of extrachromosomal DNA that can successfully integrate in epigenetically compromised mutants [52,57]. Heat-responsiveness in the Brassicaceae COPIA families has recurrently evolved, suggesting an adaptive advantage [53].
That TEs contain binding motifs for stress-response transcription factors and regulate a wide spectrum of stress-responsive genes is a conserved phenomenon and also found in yeast and animals. In the fission yeast Schizosaccharomyces pombe, the LTR retrotransposon Tf1 frequently targets stress response genes and enhances the expression of downstream genes, likely through the recruitment of ATF/CREB stress response factors that are modulators of the heat shock factor (HSF) transcription complex [58,59]. In the fruitfly Drosophila melanogaster, binding motifs for several transcription factors related to a variety of stress responses have been amplified by TEs; among them is also HSF [60]. Also, in Caenorhabditis nematodes, the binding motifs for HSF were transposed by species-specific helitrons [61]. In human and likely other primate genomes, Alu repeats, which occupy large proportions of primate genomes, harbor HSF binding motifs and regulate the heat shock response through an antisense-mediated mechanism [62]. Considering the conserved and critical role of HSFs in the regulation of the heat-response, the co-option of TE-derived heat shock elements seems deeply conserved since the origin of eukaryotes. Gene regulatory networks underlying the response to other stresses can potentially have the same evolutionary history. The transcriptional network underlying innate immunity has also been rewired through TE domestication. MER41-like LTR retrotransposons have dispersed the binding motifs of several interferon-inducible transcription factors and thus reshaped the immunity regulatory network independently during their colonization of multiple mammalian genomes [63]. These examples demonstrate that the transcriptional networks responsive to environmental stresses, both abiotic and biotic, have been driven by TEs throughout eukaryote evolution.
TEs co-ordinate biosynthetic genes
Biosynthetic gene duplication and functional specialization is an important strategy of plant defense responses. However, how duplicated genes are wired into existing regulatory networks remained a long-standing open question. TEs can coordinate biosynthetic genes in the same metabolic pathway synchronized by a single master regulator. The TE-mediated recruitment of the duplicated gene CYP82C2 into the indole-3-carbonylnitrile (ICN) biosynthetic pathway provides a nice example addressing this question (Figure 5). The ICN pathway produces tryptophan-derived defence metabolites upon pathogen attack in A. thaliana. The LINE retrotransposon (EPCOT3) transferred a WRKY33-binding motif close to CYP82C2, the key enzyme for the last step in 4-hydroxy-ICN biosynthesis. WRKY33 directly activates the expression of other genes in this pathway, thus, the control of CYP82C2 by WRKY33 was the required key step for the production of the inducible defense metabolite 4-hydroxy-ICN [64]. We speculate that there are probably many other cases where the TE-mediated recruitment of a newly duplicated gene into a regulatory network enriched chemical diversity, producing lineage-specific metabolites adaptive for plant immunity.
A LINE retrotransposon allowed WRKY33 to regulate CYPB2C2 in response to pathogen infection.
CYPB2C2 and CYPB2C4 are tandem duplicates with conserved biochemical function. After gene duplication, CYPB2C2 in Arabidopsis thaliana acquired WRKY33-related motifs by insertion of a LINE retrotransposon. This generated a new pathogen defense pathway controlled by WRKY33 [64,106].
CYPB2C2 and CYPB2C4 are tandem duplicates with conserved biochemical function. After gene duplication, CYPB2C2 in Arabidopsis thaliana acquired WRKY33-related motifs by insertion of a LINE retrotransposon. This generated a new pathogen defense pathway controlled by WRKY33 [64,106].
Regulatory motifs potentially facilitating DNA TE transpositions
The E2F family of transcription factors has a broad spectrum of functions in the regulation of cell cycle related processes, including DNA repair, cell proliferation, and differentiation in both animals and plants [65,66]. In line with the fundamental functions controlled by E2F transcription factors, E2Fs are highly conserved among organisms, as are the targeted binding motifs [67]. Despite the fundamental function of E2Fs and high level of conservation of this pathway, E2F motifs were extensively amplified in the Brassicaceae [68]. In A. thaliana, more than 70% of E2F binding motifs locate in TEs, especially MITEs. Interestingly, out of all potential E2F recognition motifs fitting the E2F consensus sequence, only one of them is overrepresented in TEs and dominant in the genome, evidencing it as the motif amplified through TE transposition. The capture of the E2F motifs by TEs likely occurred in an ancestral Brassicaceae genome and was then amplified to different extend during Brassicaceae evolution. Nevertheless, while many of the transposed E2F motifs are bound by E2F, only few of them are close to promoters and therefore of potential regulatory relevance [68]. This conclusion is supported by a study identifying binding motifs for the E2F-interacting cell-fate master regulator, RETINOBLASTOMA RELATED 1 (RBR1) [69]. RBR1 binds to E2F motifs that have been amplified by TEs, but apparently does not regulate genes downstream of those E2F-TE motifs [69]. Therefore, whether binding of E2F/RBR1 to distally located transposed motifs has regulatory relevance seems unlikely. Instead of being of functional relevance, it is possible that facilitated transposition of E2F motifs is a consequence of E2F/RBR1 binding to those sites, which may promote DNA replication and/or repair and thus facilitate transposition of DNA TEs [69].
TEs facilitate transcription factor network evolution
The emergence of transcription factor binding motifs by de novo accumulation of mutations can occur on a small evolutionary timescale; time estimates for a given 5-mer to emerge in at least one of all human promoters are around 7500 years, for 8-mers around 350 000 years and for 10-mers around 700 000 to 4.8 million years, depending on the sequence context [70]. However, the actual binding of transcription factors may not simply rely on the presence of the motifs, but on additional specificity determinants outside the core motifs. This can include additional sequence features, other transcription factor binding motifs, or the general chromatin landscape determining access of transcription factors to their binding motifs [71–73]. Furthermore, once a transcription factor binding motif and the appropriate surrounding environment have evolved, in order for this transcription factor to control a network of genes, these binding motifs need to spread through the genome. This process has likely been facilitated by TEs. If TEs insert close to pre-existing transcription factor binding motifs, they can transpose these motifs to new locations, generating new transcriptional networks controlled by these transcription factors [6,38,74–76]. This process can explain the rapid and repeated evolution of transcriptional networks, as discussed in the previous sections.
The regulatory impact of TEs transposing binding motifs depends on their position of insertion. Many TEs have evolved mechanisms to favor integration sites that maximize their chance of propagation. This can be gene-poor heterochromatic regions for some TE families, or open, transcriptionally active regions for others [23,77]. Examples of TEs inserting favorably in transcriptionally active regions in plants are Mutator elements in maize [78] and mPing MITEs in rice [54], while helitrons preferentially insert in intergenic regions [79] and LTR retrotransposons show a higher preference to insert into older copies of the same family [80]. Nevertheless, it is important to note that selection may obscure initial insertion preferences, so the picture we have now may be biased [23].
It is predictable that this process occurs more frequently in lineages with higher transposition activities. In the case of SEP3, a much higher proportion of SEP3-bound sequence motifs was located in transposons in A. lyrata than in A. thaliana [36]. This coincides with the observation that TE activity is much higher in A. lyrata than in A. thaliana [81]. Thus, the advantages of effective TE domestication can potentially balance out the negative impact of TE activities.
The detection of TE-mediated amplification of transcription factor binding motifs
Current knowledge of TE-associated transcription factor binding motifs is mainly derived from two approaches. Some studies first experimentally identify transcription factor binding motifs (e.g. by chromatin immunoprecipitation followed by sequencing) and then test for the enrichment of these motifs in TEs, while other studies directly scan TEs for the sequences of know transcription factor binding motifs [35,36,82–84]. The first approach provides the actual picture of transcription factor binding and the realized regulatory network. The second approach requires subsequent validation to rule out false positives, as many putative motifs are likely not utilized [35,36,68]. Nevertheless, the binding of a transcription factor to motifs does not necessarily imply a regulatory role. Functional validation is required to proof the relevance of specific motifs. Genome-editing tools such as the CRISPR/Cas9 system allow to test the functional relevance of TEs and binding motifs on putative target genes as demonstrated by [85,86].
The contribution of TEs to the genomic landscape of known transcription factor binding motifs has been well-studied in human [82,83,87]. A comparative genomic analysis of orthologous transcription factors and their binding motifs between human and mouse genomes revealed substantial TE-mediated regulatory network divergence [88]. A pioneer study in plants comparing MITE-associated transcription factors in the two closely related species peach and Prunus mume came to a similar conclusion; in both species MITEs were transposing the same transcription factor binding motifs, but the extend of this translocation and the generated networks differed [84].
Importantly however, many TE-transposed transcription factor motifs have probably escaped detection, because the TE sequences degenerate over time making their identification difficult [37]. Therefore, the majority of TE-derived binding motifs that can be easily detected are likely to be young, and many of them may be transient over evolutionary time. Most of them will be functionally neutral and disappear during species evolution, and only those conveying fitness advantage will remain. Thus, conserved TE-derived binding motifs are likely to reveal key elements of transcriptional networks.
A comprehensive knowledge of ancestral TEs is required to fully appreciate the extent to which TEs contributed to generate transcriptional networks and thus contributed to developmental novelties. Two recent studies identified ancient TEs by using novel computational approaches detecting degenerated TE sequences. This elevated TE-derived sequences in the A. thaliana genome from the currently annotated ∼20% to as much as 50% [37,89]. Compared with previously identified, putatively young TEs, remnant sequences of ancient TEs are shorter, but enriched for transcription factor binding motifs as consequence of functional conservation [37]. Some of the TE-derived binding motifs are deeply conserved across rosids, indicating the shared fundamental regulatory pathways established by ancestral TE activities. Some remnant TEs contain multiple binding motifs associated with different transcription factors, evidencing the complexity of regulatory networks built by TEs [37]. These observations strongly support the role of TEs in rewiring gene regulatory networks with long-term evolutionary impact.
The evolutionary significance of TE-mediated dispersal of transcription factor binding motifs
As outlined above, TEs are, undeniably, major contributors to developmental innovation and adaptive responses to environmental conditions. What has been underappreciated thus far, is the potential role of TE-mediated transfer of transcription factor binding motifs in concert with gene duplication. After gene duplication, there are many potential trajectories for regulatory divergence. Regulatory neofunctionalization contributes to the retention of duplicated genes, where one duplicate can acquire novel regulatory elements and subsequently adopt new expression profiles [90–92]. Allopolyploidization, the merger of two genomes due to hybridization followed by genome doubling gives rise to large numbers of gene duplicates and is accompanied by relaxed TE repression and bursts of novel TE insertions [16,93]. Autopolyploidization leads to whole genome duplication without hybridization and transcriptome shock, nevertheless, it generates many gene duplicates with relaxed purifying selection allowing TE over-accumulation [94]. Thus, there seems to be a large potential for TEs in distributing transcription factor binding motifs and driving divergence and functional retention of homeologs in polyploids that remains to be investigated. Besides polyploidization, many small-scale duplication events, such as tandem duplications, produce targets for novel TE insertions that convey new regulatory functions. As discussed above, the WRKY33-binding site in the TE-derived EPCOT3 contributed to the regulatory neofunctionalization of the duplicated gene CYP82C2 in A. thaliana, rendering it non-redundant with other CYP82C paralogs in the tandem cluster [64]. Similarly, in the regulatory network of PHE1 in the A. thaliana endosperm, some of the helitron-recruited PHE1 binding motifs are in recent duplicates in the Brassicaceae lineage that diverged from their paralogs [35]. The contribution of TEs to the retention and divergence of duplicated genes could be an interesting topic of further study.
Many signaling components and transcription factors such as HOX and TCPs that control organ patterning in animals and plants, respectively, have been repeatedly recruited to function in the regulation of comparable morphological traits or similar developmental patterns. This type of evolutionary convergence has been termed ‘deep homology', and the genes are referred to as ‘evolutionary toolkits' [95–97]. An intriguing example is the transcription factor family CYCLOIDEA2 in flowering plants that has been repeatedly recruited for the establishment of bilateral flower symmetry from the ancestral radial state [98,99]. Based on the discussed ability of TEs to transpose transcription factor binding motifs, we propose that TEs play the role of evolutionary toolkits by transferring binding motifs of major developmental regulators. This could explain repeated and fast evolving convergent phenotypic innovation like bilateral flower asymmetry or C4 photosynthesis in a lineage-specific manner. Thus, TEs are powerful non-coding toolkits that construct complex developmental regulatory networks and thus contribute to species adaptation and radiation throughout the evolutionary history of all eukaryotes.
If TEs act as evolutionary toolkits, is it possible to take advantage of them as novel genetic tools for crop improvement? For example, the application of inducible mPing could potentially accelerate stress-tolerance breeding in rice [100]. Likewise, a synthetic stress-activated TE with a binding motif for a stress-responsive transcription factor could possibly generate novel candidate targets regulated in response to stress. By chemical suppression of the epigenetic silencing machinery, rice LTR retrotransposons such as Houba and Tos17 could be successfully mobilized, leading to novel insertions [101–103]. TEs are known to have facilitated crop domestication naturally. A famous example is the retrotransposon Hopscotch, which inserted in the upstream region of the well-known maize domestication gene teosinte branched1 (tb1) and acts as an enhancer of gene expression. This promoted increased apical dominance in maize compared with its progenitor, generating a preferred phenotype for agricultural purpose [104]. We propose that by controlled unleashing of specific TEs, novel and improved traits could be generated in a much shorter time than conventional breeding could achieve. Thus, domestication by domesticated TEs offers large future potential.
Perspectives
Importance of the field: Previously considered as genomic junk, TEs become increasingly recognized as major drivers of genome evolution. Their wide presence in prokaryotes and eukaryotes and maintenance over long evolutionary timescales indicates that TEs have important regulatory roles, which have started to be unraveled over recent years.
Current thinking: TEs frequently contain regulatory elements that are recognized by transcription factors. Mobile TEs can transfer these regulatory sites, amplifying them through the genome and thus recruiting nearby genes into the regulatory network of the corresponding transcription factors. This process occurred repeatedly throughout the evolutionary history of diverse lineages, giving rise to a wide spectrum of phenotypic novelties, promoting adaptation and diversification.
Future directions: That TEs generate transcriptional networks and promote developmental novelties is supported by several key studies. However, to fully appreciate the extend to which TEs promoted evolutionary novelties, a systematic investigation of TE-transferred transcription factor binding motifs and their functional relevance remains an important future endeavor. Newly developed bioinformatics tools to detect TE remnants [37] and improved tools to map transcription factor binding motifs [105] provide the technical base to approach these exciting questions.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
This work was supported by a grant from the Swedish Research Council (to C.K.) and support from the Göran Gustafsson Foundation for Research in Natural Sciences and Medicine (to C.K.).