Abstract
Cytosine DNA methylation is prevalent throughout eukaryotes and prokaryotes. While most commonly thought of as being localized to dinucleotide CpG sites, non-CG sites can also be modified. Such non-CG methylation is widespread in plants, occurring at trinucleotide CHG and CHH (H = A, T, or C) sequence contexts. The prevalence of non-CG methylation in plants is due to the plant-specific CHROMOMETHYLASE (CMT) and RNA-directed DNA Methylation (RdDM) pathways. These pathways have evolved through multiple rounds of gene duplication and gene loss, generating epigenomic variation both within and between species. They regulate both transposable elements and genes, ensure genome integrity, and ultimately influence development and environmental responses. In these capacities, non-CG methylation influence and shape plant genomes.
Introduction
Methylation of cytosines is a widespread modification of DNA and is found across both eukaryotes [1,2] and prokaryotes [3]. Because it is a modification of DNA itself, it can persist through both mitosis and meiosis, facilitating its own replication. In eukaryotes, DNA methylation plays critical roles in diverse biological processes, but notably, it is associated with gene and transposon silencing [1,4]. DNA methylation in plants is distinct from mammals, having a fractionated pattern where methylation is predominantly found outside of genic regions [5]. They also differ in underlying sequence contexts. Methylation of cytosines in the dinucleotide CpG (CG) context is found throughout plants, animals, and fungi. Maintenance across these diverse taxa is carried out by the highly conserved DNA METHYLTRANSFERASE 1 (DNMT1), also called METHYLTRANSFERASE 1 (MET1) in plants [6]. In 1981 it was shown that DNA methylation in plants can be found at symmetrical trinucleotide CHG sites (H = A, C, or T) [7] and later in 1994 it was found to also occur at asymmetrical trinucleotide CHH sites [8]. Recently, non-CG methylation has been found in specific cell-types in humans and mice [9,10] (e.g. embryonic stem cells [10] and brain [11,12]) and may be important during differentiation [13]. The prevalence of non-CG methylation in plants is due to several plant-specific pathways and enzymes. These involve either the CHROMOMETHYLASE (CMT) family of DNA methyltransferases or the RNA-directed DNA Methylation (RdDM) pathway [14–18]. Here we review the mechanisms, evolution, and biological roles of non-CG methylation in plants.
CMTs
CMTs are cytosine methyltransferases characterized by the presence of a bromo-adjacent homology (BAH) and a CHRomatin Organization MOdifier (CHROMO) domain between the cytosine methyltransferase catalytic motifs I and IV [19,20]. Some homologous sequences in green algae, however, lack the CHROMO domain [21]. Phylogenetically, CMTs can be separated into CMT1, CMT2, CMT3, ZMETs (ZEA METHYLTRANSFERASE), and the homologous CMTs (hCMT) α/β (Figure 1A). This last group, hCMTα/β, are found in non-angiosperms. A duplication event at the base of the angiosperms, prior to the divergence of monocots and eudicots, gave rise to CMT2 and the clade that includes CMT1, CMT3, and ZMET. Amborella trichopoda, which is sister to all flowering plants, has a CMT2 gene and one CMT sister to CMT1/CMT3/ZMETs. ZMETs, named in reference to work from Zea mays [22], form a monophyletic group containing monocots, monocots/commelinids, and magnoliids [21]. In eudicots, a further duplication, perhaps the γ whole-genome duplication event, gave rise to CMT1 and CMT3 [21].
In Arabidopsis thaliana, CMT2 and CMT3 have different sequence preferences. CMT2 targets CHH sites [18], however recent work suggests greater specificity, preferentially methylating CAA/CTA in A. thaliana and CAA/CAT in Solanum lycopersicum [23,24]. CMT3 targets CHG [15,22], but preference may be given to CAG/CTG, as CCG methylation is typically lower in heterochromatin. CMT1 has not been studied outside of A. thaliana and is generally considered dispensable in A. thaliana, as many ecotypes contain alleles of CMT1 resulting in a truncated protein and do not display any apparent effects on DNA methylation [19,25]. The ZMETs function much like CMT3 in CHG methylation and mutants in both Z. mays and Oryza sativa have a number of developmental defects [22,26]. Interestingly, in Z. mays, ZMET2 also appears to methylate CHH at some loci [23,24]. The moss Physcomitrella patens has a single CMT gene which falls in the hCMTα/β clade, and based on Ppcmt mutants, functions in CHG methylation [27].
The BAH and CHROMO domains of CMT2 and CMT3 recognize H3K9me2 containing nucleosomes and target these regions for DNA methylation [14,28,29] (Figure 1B). In A. thaliana, H3K9me2 is established by H3K9 methyltransferases SUPPRESSOR OF VARIEGATION 3-9 HOMOLOGs: KRYPTONITE (KYP or SUVH4), SUVH5, and SUVH6 [30,31]. These proteins bind methylated DNA (Figure 1B) via a SET and RING finger associated (SRA) domain and methylate H3K9 [32,33]. Thus, non-CG methylation and H3K9me2 form a self-reinforcing feedback loop to establish regions of constitutive heterochromatin [34]. Interestingly, KYP appears to have little affinity for CCG methylation, which may in part explain the observed lower levels of methylation at heterochromatic CCG sites [23,35].
CMT3 appears to also have a role in establishing a unique pattern of CG-only DNA methylation found in many expressed genes, referred to as gene-body methylation (gbM) [36]. Links to CMT3 were discovered when two Brassicaceae species, Eutrema salsugineum and Conringia planisilqua, were found to have independently lost both CMT3 and gbM [36]. This association between presence/absence of CMT3 and gbM is observed throughout plants [21,37]. However, as CMT3 primarily methylates CHG sites and not CG sites, this relationship was unexpected. Further experimental evidence was recently provided by transgenic expression of A. thaliana CMT3 in E. salsugineum. This resulted in AtCMT3-expression dependent CHG methylation, followed by CG and CHH methylation, over a subset of actively transcribed genes orthologous to A. thaliana gbM genes. CG methylation was preferentially retained and inherited relative to CHG and CHH methylation following loss of AtCMT3 expression [38]. It is not yet clear mechanistically how CMT3 triggers the accumulation of CG methylation in gene bodies, but it is clearly necessary for its initiation, even if not its maintenance. While there is no evident function for gbM, elevated CG-TG substitution rates were found associated with CMT3 and gbM in Brassicaceae species, indicating that it can still have evolutionary consequences [37].
Canonical RdDM
First discovered in viroid-infected tobacco [39], the RdDM pathway has evolved in plants as a method for targeted DNA methylation via 24-nucleotide (nt) small interfering RNAs (siRNAs) [40]. RdDM can methylate all sequence contexts (CG, CHG, and CHH), however, it is often associated with CHH methylation. RdDM couples the action of the plant-specific RNA polymerases IV and V (Pol IV and Pol V), with components of or related to those of RNAi, such as DICER-LIKE (DCL) and ARGONAUTE (AGO) proteins [40,41] (Figure 2A). Canonical RdDM can be broadly divided into two steps. The first involves the biogenesis of 24-nt siRNAs by Pol IV. Pol IV can be recruited to some target sites by SAWADEE HOMEODOMAIN HOMOLOGUE 1 (SHH1) [41,42] and CLASSY (CLSY) proteins [43–45], the latter appearing to be key locus-specific regulators of RdDM [45]. Pol IV then transcribes 30 to 40-nt single-stranded RNAs (ssRNAs) [46,47], which are subsequently copied by RNA-DEPENDENT RNA POLYMERASE 2 (RDR2) to generate double-stranded RNA (dsRNA) [48]. DICER-LIKE PROTEIN 3 (DCL3) then cleaves the dsRNAs to 24-nt siRNAs [47].
The second step of canonical RdDM is Pol V-mediated de novo methylation. Here, Pol V is recruited to target loci by the SU(VAR)3-9 homologs SUVH2 and SUVH9, which bind to methylated DNA [49]. Pol V then transcribes a class of long non-coding RNAs, thought to serve as scaffolds for RdDM targeting [50]. During this step, either AGO4 or AGO6 binds 24-nt siRNAs [51] and are recruited to Pol V by interaction with the largest subunit of Pol V and KOW DOMAIN-CONTAINING TRANSCRIPTION FACTOR 1 [52,53]. The 24-nt siRNAs can then pair with nascent scaffold RNAs produced by Pol V [54,55]. AGO4/AGO6 then recruits DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2), a homolog of mammalian DNMT3, to catalyze DNA methylation. It is thought that RNA-DIRECTED DNA METHYLATION 1 (RDM1) may assist in the AGO4–DRM2 interaction, however, its role remains unclear [55].
Many of the proteins involved in RdDM arose through gene duplication, and subsequent sub and neofunctionalization [56,57]. Some key components have only been found in angiosperms [58,59]. For instance, Pol IV and V evolved from duplication of various subunits of Pol II [57,58], and there is evidence of further lineage-specific duplication and evolution of a putative Pol VI in grasses [60]. From an evolutionary perspective, RdDM is a powerful example of how novelty and complexity can arise through gene duplication.
Non-canonical RdDM
Multiple ‘non-canonical’ variations of the RdDM pathway exist in plants [40,61] (Figure 2B). All identified to date require Pol V and DRM2, differing primarily in the source and production of small RNAs involved. These non-canonical pathways provide alternative sources and entry points for small RNAs into RdDM and potentially solve the mystery of how de novo silencing of transcriptionally active TEs is initiated [62,63]. Notably, many of the small RNAs in these pathways are derived from Pol II transcription instead of Pol IV and are facilitated by overlaps and similarities in the components of RdDM and post-transcriptional gene silencing (PTGS) [61]. There is a competitive hierarchy among different DCL proteins involved in both RdDM (DCL3, producing 24-nt small RNAs) and PTGS (DCL1/DCL2/DCL4, producing 21–22-nt small RNAs) for Pol II transcripts [64]. For example, two non-canonical pathways, Pol II-DCL3 RdDM, and microRNA (miRNA)-directed DNA methylation make use of Pol II transcribed inverted repeats or miRNA precursors (respectively). These are cleaved by DCL3, rather than DCL1/DCL2/DCL4, to produce 24-nt siRNAs which then feed directly into the canonical downstream pathway [61,65,66].
Two other pathways derive siRNAs from PTGS through the action of RNA-DEPENDENT RNA POLYMERASE 6 (RDR6) [63,67,68]. In RDR6 RdDM, certain Pol II-transcribed mRNAs, such as those from the non-protein-coding TAS loci or transcriptionally activated TEs, are targeted and cleaved by PTGS. SsRNA derived from these cleaved mRNAs is converted into dsRNA by RDR6 and then processed by DCL2/DCL4 into 21–22-nt secondary siRNAs. These can be directly loaded on to AGO6 for RdDM [63]. Alternatively, in RDR6-DCL3 RdDM, RDR6-derived dsRNA is cleaved by DCL3 into 24-nt siRNAs and fed into Pol V-mediated RdDM [68]. A dicer-independent pathway has also been observed, where a non-diced dsRNA from either Pol II-RDR6 or Pol IV-RDR2 is directly loaded on to AGO4. Exonucleases from the exosome core complex then trim the RNA to 21–24 nt and are then used in Pol V-mediated RdDM [69].
Biological roles of non-CG methylation
Transposons
Transposable elements (TEs) encompass a large portion of most plant genomes [70,71] and can be mutagenic [72]. Limiting TE activity is critical for cells and the chromatin environment is the primary means of defense. Generally, TEs are silenced by the complementary action of DNA methylation (Figure 3A) and histone modification [73]. Temporary loss of DNA methylation and subsequent reactivation of TEs, as during stress-induced bursts of TEs, can be a source of novel genetic variation in plant evolution [74,75]. The specific DNA methylation pathways targeting TEs vary with context, depending on whether the TE is found near genes or in regions of constitutive heterochromatin. Nearly all TE methylation in A. thaliana results from either DECREASE IN DNA METHYLATION 1 (DDM1) or RdDM. DDM1 encodes a SWI2/SNF2-like protein which alters chromatin architecture and allows DNA methyltransferases, like the CMTs, access to heterochromatic regions [76–78]. DDM1 appears to primarily affect DNA methylation in constitutive heterochromatic regions where RdDM is less active [18]. The RdDM pathway commonly targets short TEs in euchromatic regions [79] and on the edges of heterochromatic long TEs. It is essential for RdDM to re-silence TEs with each round of replication, especially near genes. While the two pathways have their preferred targets, there is a degree of cooperation between pathways as evidenced by various DNA methylation mutants [18].
TEs and TE methylation can be a source of hidden variation that conditionally affect the proper expression of nearby genes [80]. Non-CG methylation is often critical to protecting genes from neighboring TEs. For example, KARMA is a TE insertion in the intron of the homeotic gene DEFECIENS in Elaeis guineensis. CHG hypomethylation near the splice site of KARMA, induced at high-frequency during tissue culture, causes mutants displaying mantling with pronounced yield loss [81]. Originally identified in Z. mays, CHH islands are regions of high CHH methylation at euchromatin/heterochromatin borders and have been proposed to reinforce TE silencing by creating boundaries between highly methylated (CG and CHG), silenced chromatin of the TE and active chromatin of the adjacent gene [82,83] (Figure 3A). CHH islands may have a more direct role in mitigating the adverse effects of neighboring TEs on gene expression. In A. thaliana, it was shown that SUVH1 and SUVH3 bind methylated cytosines in regions of high CHH methylation adjacent to genes and interact with two DNAJ domain-containing homologs, DNAJ1 and DNAJ2, to negate the effects of TEs on gene expression [84].
Reproduction and development
DNA methylation plays a vital role in plant reproductive processes [85,86]. A lot of confusion comes from imposing models of mammalian reproduction and development on plants. Key differences exist that should caution against this. During plant gametogenesis, adult somatic cells undergo meiosis to produce haploid microspores (male) or megaspores (female) which then differentiate into male (sperm cells) and female (egg cells) gametophytes [85–87]. DNA methylation during these steps differs greatly from those seen in animals. Plant germlines never fully erase DNA methylation, instead often reinforcing DNA methylation states, while animals show complete reprogramming [87,88]. This may be due to the fact that the germline of animals is defined during embryogenesis and before meiosis takes place [87].
In A. thaliana, the male meiocyte, which undergoes meiosis to give rise to the haploid microspores, retains CG and CHG methylation in transposons while losing most of its CHH methylation (Figure 3B). However, de novo CHH methylation is acquired at specific loci, such as the last intron of MULTIPOLAR SPINDLE 1 (MPS1). This is essential for correct splicing of MPS1 and proper meiosis of the meiocytes [89]. The microspores and sperm cells retain CG and CHG methylation while losing CHH methylation, the latter correlating with low DRM2 expression. In contrast, the vegetative nucleus (VN), a non-reproductive companion cell in pollen, gains CHH methylation (Figure 3B). At the same time, there is extensive demethylation of many sites and reactivation of TEs in the VN [86,87]. Changes in VN DNA methylation and TE activity are due in part to a coordination of shutting down DNA methylation maintenance and promoting demethylation. Specifically, DDM1 is not expressed in the VN [90], while the DNA glycosylases DEMETER (DME) [79] and REPRESSOR OF SILENCING 1 (ROS1) [91], which excise 5-methylcytosine from DNA, are active. This demethylation leads to reactivation of TEs, which produce small RNAs capable of initiating targeted silencing in the gametes. This has been proposed as a means of reinforcing TE silencing in the germline [87,90]. Additionally, many silenced genes are re-expressed and these may have a role in promoting pollen development [92]. Like the male germline, CG methylation is retained during female gametogenesis, but CG methylation levels do decrease in the egg cell specifically. CHH methylation is reduced during sporogenesis, whereas it is stable during gametogenesis and in the egg cell [85]. Fertilization appears to ultimately restore methylation levels in the embryo [85,87], with CHH methylation increasing greatly during embryogenesis before decreasing substantially during seed germination [93–96]. In contrast with the embryo, the endosperm is hypomethylated [86] due to demethylation by DME [79].
There is limited evidence of extensive DNA methylation changes in plants during later stages of development. Comparison of such distinct tissues as leaves and inflorescences have found largely similar patterns [92]. Again, this is in contrast with mammalian development, where distinct DNA methylation changes can be found [12,13]. However, this may in part be a limitation of the common use of whole organs (e.g. leaves, roots, etc.) in past research. A handful of studies have identified tissue and cell-type specific DNA methylation differences (Figure 3C) in A. thaliana roots [97], Sorghum bicolor vasculature [98], and Medicago truncatula root nodules [99]. Furthermore, DNA methylation has a role in A. thaliana flowering time and experimental populations with disrupted DNA methylation, such as epigenetic recombinant inbred lines (epiRILs) [100,101], show altered developmental phenotypes.
DNA methylation has essential roles in mediating interactions between parental genomes, or in the case of polyploids, subgenomes. RdDM and demethylation pathways can be involved in imprinting and the biased expression of paternal or maternal alleles [102] with developmental impacts [102,103]. The importance of DNA methylation in these processes can vary with the species. For example, paternally expressed imprinted genes (PEGs) in Arabidopsis lyrata are associated with CHG hypermethylation and silencing of the maternal allele, but not in A. thaliana [104]. Changes in DNA methylation can occur as a result of hybridization between genetically distinct lines [105–108]. These can even be induced in hybrids of genetically similar lines with differing DNA methylation states. For instance, offspring of a cross between a hypomethylated A. thaliana plant and a normally methylated A. thaliana plant-induced novel epialleles and resulted in TE reactivation [109]. However, most DNA methylation differences between parents are stably associated with and segregate with the original parental genotype [106,107]. DNA methylation changes also occur in interspecific hybridization [110,111]. It is thought that changes from both intra and interspecific hybrids are the result of some sort of ‘shock’ (variously referred to as genomic, epigenomic, or transcriptomic ‘shock’) resulting from novel interactions induced between genetically/transcriptionally distinct genomes. Changes in DNA methylation continue following whole-genome duplication or gene duplication [86,111–114] and non-CG methylation, appears to play a role in mediating gene expression of paralogous genes following polyploidization through interactions with TEs [113]. Such genome–epigenome interactions shape genome evolution and phenotypes, yet we have little understanding of the mechanisms by which these events are sensed and triggered.
Environmental interactions and stress
Extensive changes to chromatin architecture and gene expression occur during plant stress responses [115]. DNA methylation, in particular, has been extensively studied for its role in stress [116,117]. A commonly observed trend is that the most dynamic DNA methylation changes are typically for non-CG methylation at TEs and repeats [118–121]. For example, Dowen et al. [118] found in A. thaliana that pathogen-induced differences in methylation, especially those associated with changes in gene expression, were enriched among neighboring TEs. Similarly, under abiotic stress, Secco et al. [121] showed that phosphate starvation-induced DNA methylation changes are localized at TEs near phosphate starvation-induced genes in O. sativa. Specifically, they observed increased levels of CHH methylation. However, the same stress in A. thaliana had a much smaller effect, suggesting that these effects are likely correlated with TE content [121]. Critically, they found that stress-induced differential DNA methylation followed gene expression changes and was concentrated on TEs, suggesting that changes in gene expression were causal in driving changes in DNA methylation, rather than the reverse. This fits with the putative role of CHH islands in creating a boundary between euchromatin and heterochromatin, ensuring that the chromatin changes due to high expression of stress-induced genes are not carried over to adjacent TEs. Alternatively, they may serve to buffer or fine-tune gene expression through mechanisms like SUVH1/SUVH3/DNAJ1/DNAJ2 [84].
DNA methylation mutants provide further evidence for the role of DNA methylation in stress responses. For example, defense responses are enhanced in non-CG methylation mutants. These mutants show constitutive priming of PATHOGENESIS-RELATED 1 (PR1), a molecular marker for systemic acquired resistance [122]. However, the PR1 promoter is not normally methylated, and transgenerational priming is not dependent on DNA methylation of PR1, suggesting potential trans-acting regulation [123]. Improved biotic and abiotic stress tolerance has also been observed in epiRILs and other novel epi-mutagenized populations [124,125]. These can be inherited for several generations; however, the DNA methylation changes were induced through genetic means, not stress. Unequivocal examples of transgenerational inheritance of stress-induced epigenetic changes are scarce [88,126]. In plants, environmentally induced DNA methylation variation can be inherited to a limited extent within the first-generation, but shows little evidence of long-term stable inheritance [116,127,128]. A growing body of evidence suggests an intervening generation of sexual reproduction without stress resets acquired epigenetic changes (Figure 3D), thus questioning the transgenerational stability and adaptive role of environmentally induced changes [128,129]. Stress-induced epigenetic changes can be conditionally transmitted to offspring through the female lineage, but on the paternal side transmission of acquired DNA methylation changes are restricted by DME DNA glycosylase activity [128]. Futhermore, DDM1 and Morpheus’ Molecule 1, which encodes a transcriptional repressor, function together in restoring DNA methylation to prestress states, preventing its transgenerational inheritance [130].
Perspectives and conclusions
It is notable how widespread non-CG methylation is in plants. A point underscored by the evolution of multiple novel pathways for its establishment and maintenance. Across plants, DNA methylation varies significantly within [131,132] and between species [18,133–135], with non-CG methylation having the greatest variance. Some of this variation is explained by differences in target sequences (e.g. TEs), however, much is explained by variation in the underlying pathways [21,22,36,131]. The recent discovery of multiple non-canonical RdDM pathways [61] shows that these pathways have not been fully elucidated. We also still have much to learn regarding how the different DNA methylation pathways interact with each other [14,136] and with other epigenomic and gene-regulatory mechanisms to shape plant development and environmental responses. Two methodological advances will be necessary to further our understanding of the roles of DNA methylation in plants. The first will be the application of tissue-specific and single-cell methods [137]. The second will be improved methods of manipulating DNA methylation to directly test the functionality of specific methylation changes. Plant populations with disrupted epigenomes, such as epiRILs [100,101] and epi-mutagenized lines [138–140] are already making progress in this area. More powerful yet will be the development of methods for making locus-specific modifications to DNA methylation, as recently demonstrated in A. thaliana [141]. With these advances we can more fully understand how non-CG methylation has and continues to shape plant genomes.
Summary
Plant epigenomes are characterized by extensive non-CG cytosine DNA methylation at CHG and CHH (H = A, T, or C) sites.
Non-CG methylation is established by the plant-specific CMT and RdDM pathways.
Non-CG DNA methylation has roles in transposon and gene silencing, genome integrity, reproduction, development, and environmental responses.
Extensive natural variation in non-CG DNA methylation and the underlying pathways exists both within and between plant species.
Acknowledgments
The authors would like to thank Bruce Martin for the artistic portrayal of the Z. mays plant and Dr. Adam Bewick for comments on the manuscript. We also thank the two anonymous reviewers for their valuable feedback.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
This work is supported by Michigan State University through a Startup Package and by USDA National Institute of Food and Agriculture Hatch Funds [project number MICL02572 (to C.E.N.)]; and the MSU University Distinguished Fellowship [to E.J.R.].
Author Contribution
S.K.K.R., E.J.R., and C.E.N. each contributed to the planning, preparation, and editing of the manuscript.
Abbreviations
- AGO
ARGONAUTE
- BAH
Bromo-Adjacent Homology
- CHROMO
CHRomatin Organization MOdifier
- CLSY
CLASSY
- CMT
CHROMOMETHYLASE
- DCL
DICER-LIKE
- DDM1
DECREASE IN DNA METHYLATION 1
- DME
DEMETER
- DNMT
DNA METHYLTRANSFERASE
- DRM2
DOMAINS REARRANGED METHYLTRANSFERASE 2
- epiRIL
epigenetic Recombinant Inbred Line
- hCMT
homologous CMT
- KYP
KRYPTONITE
- MET1
METHYLTRANSFERASE 1
- miRNA
microRNA
- MPS1
MULTIPOLAR SPINDLE 1
- PEG
paternally expressed imprinted genes
- PMC
Pollen Mother Cell
- PR1
PATHOGENESIS-RELATED 1
- PTGS
Post-Transcriptional Gene Silencing
- RdDM
RNA-directed DNA Methylation
- RDM1
RNA-DIRECTED DNA METHYLATION 1
- RDR
RNA-DEPENDENT RNA POLYMERASE
- ROS1
REPRESSOR OF SILENCING 1
- SHH1
SAWADEE HOMEODOMAIN HOMOLOGUE 1
- SNF2
Sucrose Non-Fermentable 2
- SRA
SET and RING finger associated
- SUVH
SUPPRESSOR OF VARIEGATION HOMOLOG
- SWI2
SWItch 2
- TE
transposable element
- VN
vegetative nucleus
- ZMET
Zea methyltransferase