Abstract
Monoallelic gene expression occurs in diploid cells when only one of the two alleles of a gene is active. There are three main classes of genes that display monoallelic expression in mammalian genomes: (1) imprinted genes that are monoallelically expressed in a parent-of-origin dependent manner; (2) X-linked genes that undergo random X-chromosome inactivation in female cells; (3) random monoallelically expressed single and clustered genes located on autosomes. The heritability of monoallelic expression patterns during cell divisions implies that epigenetic mechanisms are involved in the cellular memory of these expression states. Among these, methylation of CpG sites on DNA is one of the best described modification to explain somatic inheritance. Here, we discuss the relevance of DNA methylation for the establishment and maintenance of monoallelic expression patterns among these three groups of genes, and how this is intrinsically linked to development and cellular states.
Introduction
In diploid organisms, somatic cells possess two alleles of each gene that are in most cases expressed at the same time and at similar levels. However, some genes can be expressed either strictly or preferentially from either one of the two alleles. This phenomenon, known as monoallelic expression, is stable and clonally inherited during cell divisions. Monoallelic expression is often associated with DNA sequence polymorphisms, within regulatory regions for example, which may render one of the two alleles of the gene less expressed or completely silent. However, monoallelic expression can also arise from differential epigenetic marks decorating at least one of the two alleles of a gene, without any changes in the underlying DNA sequence. Epigenetically based monoallelic expression is usually associated with a programmed necessity to regulate gene dosage during development, as biallelic expression can be associated with severe phenotypes, or as a means to enhance cellular diversity and specificity [1,2].
Epigenetically based monoallelic expression can be imprinted; in this case, a gene is expressed in a parent-of-origin dependent manner from either the paternal or the maternal allele in all cells [3]. Monoallelic expression can also be random, where a gene can be expressed either from the paternal or from the maternal allele in different cells. The most classic example of random monoallelic expression (RME) concerns X-linked genes that undergo X-chromosome inactivation (XCI) during early embryonic development in female cells [4]. RME also affects large families of autosomal genes located in clusters, such as antigen receptors (AgRs), olfactory receptors (ORs) or protocadherins (Pcdh), which are expressed in a highly cell type-specific manner [2]. Moreover, RME also occurs at the level of individual autosomal genes that belong to a wide variety of gene ontologies, which were identified through genome-wide studies in polymorphic clonal cell populations [5,6].
The stable maintenance and heritability of monoallelic expression during cell divisions imply that epigenetic mechanisms are at play to maintain these expression states. These mechanisms include nuclear organisation, DNA replication timing, histone modifications and also DNA methylation [1,2], the best-described modification for somatic inheritance of transcriptional states [7]. DNA methylation consists of the addition of methyl groups to specific nucleotides on the DNA molecule. In mammals, the 5-methylcytosine (5mC – fifth carbon of the pyrimidine ring) is the major form of DNA modification, which occurs predominantly on both complementary strands of the palindromic CpG dinucleotide [8]. In higher eukaryote genomes, 5mC is mostly targeted to repetitive sequences, but is also present on gene bodies of active genes. Interestingly, CpG dinucleotides are highly methylated; however, 5mC is generally absent from CpG-dense regions, commonly known as CpG islands (CGI), frequently present at promoters [9]. There are a few exceptions to this rule particularly in developmental and disease contexts [10,11]. A notable example discussed in this review is the case of monoallelically expressed genes, where methylation of the promoter is often associated with the silent allele [12].
5mC is established and maintained through cell division by respectively, de novo (DNMT3A, DNMT3B and the sperm-specific DNMT3C) and maintenance DNA methyltransferases (DNMT1) [8,13]. Their actions are assisted by specific accessory proteins such as the DNMT3-like (DNMT3L) and the ubiquitin-like containing PHD and RING finger domains 1 (UHRF1) [14–16]. Removal of 5mC can be achieved by passive loss through DNA replication in the absence of DNMT1 or through active pathways involving the conversion of 5mC into 5-hydroxymethylcytosine by the ten-eleven translocation (TET) enzymes (TET1, TET2 and TET3) or by less-understood DNA repair pathways involving the activation-induced deaminase (AID) and thymine-DNA glycosylase [17,18]. In somatic cells, DNA methylation patterns are remarkably stable and only subtle changes can be induced through environmental causes or ageing [19,20]. In contrast, DNA methylation is highly dynamic during development. For instance, during the mammalian life cycle, two major waves of global DNA demethylation of the genome occur: (1) during the early specification of the germline lineage, which is followed by sex-specific remethylation in the specialised gametes; (2) during pre-implantation stage, which is then followed by stage- and tissue-specific re-establishment of DNA methylation after implantation [8,11,21]. Interestingly, monoallelic expression is established at different stages of the DNA methylation cycle. In this review, we summarise the current understanding about the role of DNA methylation in regulating expression of the three classes of monoallelically expressed genes in mammalian cells.
Genomic imprinting: parental allele-specific gene expression determined by DNA methylation
Genomic imprinting is an epigenetic mechanism affecting over 100 genes that are expressed from only one of the two parental alleles [3]. Imprinted genes exert important roles in growth and development of the foetus and placenta during intra-uterine life, as well as in brain functions and metabolic pathways in adults [3,22]. Dysregulation of the dosage of imprinted genes is associated with several (neuro)-developmental, such as Angelman or Beckwith–Weidemann syndromes [23].
Imprinted genes tend to be located near each other in genomic regions called imprinted clusters. These regions contain both maternally and paternally imprinted genes and sometimes non-imprinted genes [3]. Interestingly, imprinted expression across these clusters is coordinated and depends on shared regulatory DNA regions, of which the most important are imprinting control regions (ICRs). These are CpG-dense regions that have been defined for all the imprinted clusters described in the mammalian genome [3]. Deletion of ICRs affects imprinted expression and results in either biallelic or no expression of several genes within the same cluster [24,25], confirming their crucial role in cis-acting long-range imprinting control. The most distinctive feature of ICRs is their differential DNA methylation state between the maternally and paternally inherited alleles, which is established in the germline (Figure 1A,B). Besides ICRs, there are other differentially methylated regions (DMRs) between the parental alleles within imprinted clusters. Those, known as somatic DMRs, acquire differential DNA methylation after implantation and are hierarchically dependent on the methylation status of the ICRs [26,27].
Genomic imprinting regulation by parental allele-specific DNA methylation
Imprinting cycle
A role for DNA methylation as a central epigenetic mechanism in genomic imprinting was unveiled shortly after the first imprinted genes were discovered. Consistent with the parental allele-specific differences in DNA methylation encountered at imprinted loci [28–31], Dnmt1−/− embryos were shown to be unable to sustain imprinted expression [30]. Moreover, DNA methylation was also shown to be crucial for the establishment of genomic imprinting. Indeed, DNMT3A and its cofactor DNMT3L are necessary for the correct establishment of methylation marks or imprints at ICRs in the germline [15,32]. The majority of mouse ICRs (22 out of 25) acquire DNA methylation in the oocyte and map to CpG-dense promoters, while only 3 ICRs, located in intergenic regions, have sperm-derived imprints (http://www.mousebook.org/mousebook-catalogs/imprinting-resource). Interestingly, DNA methylation does not seem to be specifically imposed on maternal and paternal imprints. This rather occurs as part of the genome-wide DNA methylation programme that targets preferentially transposons and intergenic regions during spermatogenesis [33,34], and gene bodies and intragenic CGI during oogenesis [34–37]. One distinctive feature of ICRs compared with the rest of gametic-specific DMRs is their ability to resist the genome-wide DNA demethylation wave during pre-implantation development [38] (Figure 1A). Several factors including DNMT1, PGC7/STELLA/DPPA3, KAP1/TRIM28 and NAA10P have been implicated [39–43]. But the specificity of ICRs to escape the genome-wide demethylation seems to be associated with the binding of the ZFP57 and ZNF445/ZFP445 zinc-finger proteins to methylated ICRs [41,44]. This mechanism is well established for the ZFP57 protein, which binds the TGCCGC motif present at ICRs in a methylation-dependent manner and recruits epigenetic modifiers through its association with the KAP1 cofactor [45,46]. Both ZFP57 and ZNF445 seem to be required to protect imprinting memory, but their requirement varies for different imprinted clusters [41,44]. Interestingly, there are a few documented cases, where gametic-specific DMRs are lost [34,47]. This occurs, for example at the Gpr1/Zdbf2 locus, where differential parental allele-specific methylation is lost due to gain of methylation of the paternal allele around implantation [48]. This is, nonetheless, sufficient to establish life-long imprinted expression [48,49]. Conversely, unmethylated ICRs also resist global de novo DNA methylation around the time of implantation [38] (Figure 1A). The mechanisms for this protection remain poorly studied. In the case of the Igf2/H19 locus, binding of the zinc finger protein CTCF or the OCT4/SOX2 pluripotency factors to the unmethylated maternal allele have been implicated in this protection [50–52], but it is not known whether this applies to other imprinted loci. It is also during this period that methylation of most somatic DMRs is established [26,27].
Once surviving the dynamic changes occurring during early development, parental allele-specific differences in DNA methylation at ICRs and at most somatic DMRs remain remarkably stable in somatic cells throughout life [53] (Figure 1A). Interestingly, many genes exhibit tissue-specific imprinted expression [54]. This relies on secondary transcriptional and chromatin states, which are nonetheless initially imposed by the methylation status of the ICR. In contrast to somatic cells, aberrations in methylation at ICRs were reported in cancer cells, leading to loss of monoallelic expression of many imprinted genes [55,56]. Interestingly, a recent study suggests that DNA methylation changes at ICRs in cancer cells are mostly caused by locus-specific copy number aberrations rather than epigenetic alterations [57], suggesting a less labile methylation pattern at ICRs than originally thought. Environmental factors are also known to influence epigenetic states [19], but genomic imprinting is not believed to be neither more vulnerable nor protected from environmental perturbations during development [58].
To complete the imprinting cycle, parental allele-specific DNA methylation differences in the ICRs are erased early in the germline lineage to be reset during gametogenesis according to the sex of the individual (Figure 1A). This occurs during the second major wave of DNA demethylation from which imprints do not escape [59–62]. This starts from embryonic day 8 in the mouse and involved multiple passive, such as the down-regulation of UHRF1, and active mechanisms, through the action of AID and TET1/TET2 proteins, which are part of the major epigenetic reprogramming events that lead to massive chromatin changes in primordial germ cells [63].
Regulation of imprinted clusters
ICRs are enigmatic cis-acting DNA regions that dictate imprinted expression across an imprinted cluster, which can contain up to ten genes and span up to 4 Mb in size. How this is mechanistically regulated is not completely understood and might vary from loci to loci. There are two main models to explain the coordinated regulation of imprinted expression within a cluster: the insulator and the lncRNA models [64]. The insulator model was proposed to explain imprinting regulation at the Igf2-H19 cluster (Figure 1B). The intergenic ICR, located between the two genes, acts as a binding site for CTCF only on the unmethylated maternal allele. CTCF, which is a major regulator of the 3D chromatin structure [65], induces chromatin loops believed to prevent interaction of Igf2 with downstream enhancers on the maternal allele [66,67]. Absence of CTCF binding to the methylated paternal ICR allows Igf2 to interact with its enhancers, which results in its paternal-specific expression (Figure 1B). This is illustrative of how parental allele-specific epigenetic differences at ICRs might reshape the 3D conformation of imprinted regions and affect gene expression. The lncRNA model results from the fact that many imprinted clusters contain genes encoding lncRNAs (e.g. Airn, Kcnq1ot1, Nespas), which are themselves subject to imprinted expression. In these clusters, ICRs are often located around the promoter region of a lncRNA gene, which is expressed from the unmethylated allele, while protein-coding genes are preferentially expressed from the opposite allele [64], as illustrated by the Kcnq1-Kcnq1ot1 imprinted cluster (Figure 1B). Deletion experiments for some lncRNAs result in biallelic expression of the protein-coding genes, suggesting that these lncRNAs function as cis-acting silencers [68–70]. Two major mechanisms, not mutually exclusive, have been proposed to explain how imprinted lncRNAs silence neighbouring genes. This could occur through transcriptional interference of sense–antisense pairs of protein-coding/lncRNA transcripts [71]; or formation of a silent compartment reminiscent of the one induced by the Xist lncRNA on the inactive X chromosome (Xi) [72], as proposed for the Kcnq1ot1 and Airn lncRNAs [73,74]. Recent data have strengthened this analogy by showing that both lncRNAs are able to spread chromatin marks imposed by the Polycomb repressive complexes 1 and 2 (PRC1 and PRC2) across the imprinted cluster, which might be mediated by the HNRNPK RNA-binding protein as for Xist lncRNA [75].
DNA methylation-independent imprinting
Parental allele-specific differences in DNA methylation are key to the regulation of genomic imprinting. Interestingly however, DNA methylation-independent imprinting has been recently described. This new form of genomic imprinting is controlled by the H3K27me3 histone mark, deposited by PRC2 in oocytes [76]. Asymmetric differences in H3K27me3 between gametes persist past fertilisation during preimplantation development and determine paternal-specific expression of a few genes. An interesting example is the silencing of the maternal copy of the Xist gene, which results in specific inactivation of the paternal X chromosome during pre-implantation development in female murine embryos [77]. However, in contrast with canonical imprinting, H3K27me3-dependent genomic imprinting is lost after implantation and only five genes were found to remain imprinted in extra-embryonic lineages [76]. Genomic imprinting remains therefore a classical example of an epigenetic mechanism dictating monoallelic expression of genes, which is predominantly dependent on germline-specified DNA methylation.
X-chromosome inactivation: differential methylation patterns of the active and inactive X chromosomes
XCI is an epigenetic mechanism, which allows dosage compensation of X-linked genes between XX females and XY males in mammals. This process leads to chromosome-wide silencing of one the two X chromosomes, chosen at random, during early embryonic development in female cells [78]. The way gene silencing is initiated and maintained has been the focus of intense studies and results from the interplay of multiple epigenetic mechanisms, including chromatin compaction, histone modifications and DNA methylation, which was proposed to play a role in XCI as early as 1975. Indeed, Riggs [79] argued for a role of DNA methylation in XCI based on a processive spreading mechanism along the chromosome and a model of inheritance of methylation patterns during DNA replication. The first experimental evidence demonstrating the link between DNA methylation and XCI came shortly after from studies revealing that promoter regions of selected X-linked genes are differentially methylated on the two X chromosomes in somatic cells [80–84]. Furthermore, treatment of mouse–human somatic cell hybrids or mouse transformed cells with 5-azacytidine, a drug that inhibits the activity of DNMTs, was shown to lead to sporadic reactivation of selected genes on the Xi [85–87]. These early studies were seminal in establishing a role for DNA methylation in the maintenance of gene silencing on the Xi. Moreover, these analyses already pointed towards DNA methylation to likely be a late event during XCI, as silencing precedes methylation [88].
Despite its clear importance in XCI, genome-wide studies of DNA methylation with a focus on the X chromosome in female cells came later. A first study in human somatic cells using immunoprecipitation of methylated DNA combined with microarrays confirmed the overall hypermethylation of promoter associated-CGI on the Xi in female versus male cells [89]. In contrast, genes escaping XCI that are biallelically expressed remain unmethylated on both X chromosomes (Figure 2A). Surprisingly, this analysis also revealed that the level of CpG methylation along the X chromosome, particularly in gene-poor regions, is lower in female cells when compared with male cells [89]. These sex-specific differences reflect the overall reduced methylation of the Xi compared with the active X (Xa) and autosomes, and confirmed earlier cytogenetic analyses on metaphase chromosomes [90,91]. Another seminal study using microarrays described the first allele-specific analysis of DNA methylation on the Xi and Xa in human somatic cells, interrogating approximately 1000 informative loci along the chromosome [92]. This analysis also reported an overall excess of monoallelically methylated CpG on the Xa compared with the Xi within gene bodies (Figure 2A). Interestingly, some of these CpG sites are biallelically methylated in human embryonic stem (ES) cells prior to XCI [92]. In addition, bodies of genes escaping XCI were shown to be methylated on both X chromosomes [92,93], suggesting a correlation between gene body methylation and expression (Figure 2A). Subsequent genome-wide methylation analysis revealed that the bodies of active genes are indeed heavily methylated and represent the most conserved target of DNA methylation across eukaryote genomes [94,95].
DNA methylation during XCI
One study analysed the dynamics of Xi promoter-CGI methylation in a developmental context. This analysis revealed two modes of DNA methylation on the Xi. While most CGI acquire methylation slowly and late throughout development, a subset of CGI show fast methylation kinetics [96]. Interestingly, these CGI differ by their CpG composition, immediate genomic environment and expression levels prior to XCI [96].
Overall, these analyses revealed that differential methylation of the two X chromosomes in female cells is found at CGI associated with inactive gene promoters that are hypermethylated on the Xi and at both intergenic and intragenic sequences that are hypermethylated on the Xa (Figure 2A) [89,97]. While promoter-CGI methylation on the Xi is associated with monoallelic expression, intragenic methylation is associated with transcription and does not have any impact on allelic expression.
Mechanisms of methylation of the inactive X chromosome
Treatment of somatic cells with 5-azacytidine provided the first experimental evidence for the importance of DNA methylation in the maintenance of gene silencing on the Xi [85]. This opened the door to investigate which enzymes are necessary for XCI-induced DNA methylation and in dissecting the interplay between methylation and transcription in XCI. Analysis of mouse post-implantation embryos and differentiating ES cells mutant for Dnmt1 indicated that Xist is ectopically expressed from the Xa in a small proportion of cells in both males and females, which led to occasional aberrant silencing of X-linked genes [98]. This indicates that DNMT1 is essential for the stable maintenance of Xist monoallelic repression in differentiated cells, but is dispensable for the initiation of gene silencing on the Xi [98]. In another study, it was found that an X-linked LacZ transgene actually becomes reactivated in mutant embryos at later post-implantation stages after initial silencing, consistent with a role of DNA methylation in maintenance of silencing [99]. In contrast with the embryonic lineage, absence of DNMT1 does not majorly affect the Xi state in murine extraembryonic tissues, where an imprinted form of XCI with exclusive silencing of the paternal X chromosome takes place [99].
Analysis of embryos mutant for both de novo methyltransferases, Dnmt3a and Dnmt3b, reported hypomethylation of the Xist promoter, which was nevertheless associated with repression of the gene on the Xa [100]. Moreover, promoter-CGI on the Xi were shown to be extensively hypomethylated in double mutant embryos [100,101], however without derepression of the silent alleles. This indicates that the Xist-coated Xi undergoes XCI in mutant embryos and that the absence of both DNMT3A and DNMT3B does not impair neither initiation, nor propagation of XCI [100]. Analysis of single KO embryos revealed that methylation of CGI is actually dependent on the DNMT3B enzyme only, while DNMT3A and DNMT3L are dispensable [96]. This was the case for both fast and slow methylating CGI associated with promoters, but also intra- and intergenic CGI, indicating that DNMT3B establishes methylation of all CGI on the Xi (Figure 2B) [96,102]. This is in agreement with the observation showing reduced Xi CGI methylation in cells from patients with ICF (immunodeficiency centromeric-instability facial anomalies) syndrome, which is caused by mutations in the DNMT3B gene [103]. In agreement with previous observations, monoallelic expression appears to be stably maintained in Dnmt3b−/−females embryos, at least for the few genes tested [96,104]. However, chromosome-wide allele-specific expression analysis would be needed to determine whether some genes, such as those linked to fast methylating CGI [96], may be more sensitive, even sporadically, to the absence of methylation.
The structural maintenance of chromosomes hinge domain-containing 1 (SMCHD1) protein, a non-canonical SMC protein, was also shown to play a role in the maintenance of XCI and methylation of CGI on the Xi. Characterisation of female mutant post-implantation embryos indicated a widespread hypomethylation of most promoter-CGI on the Xi, associated with reactivation of a subset of genes, unlike in Dnmt3b−/− embryos [96,104,105]. Allele-specific analysis of hybrid Smchd1 null XX MEF cell lines confirmed these observations demonstrating extensive promoter-CGI hypomethylation in both gene-poor and gene-dense regions. This is accompanied by depletion of H3K27me3 and expression from the Xi for a large proportion of genes [97]. Direct comparison of Dnmt3b and Smchd1 KO embryos at similar stages indicated that CGI hypomethylation is not sufficient to explain the Smchd1-specific loss of silencing on the Xi [104], suggesting that this protein might act upstream of DNA methylation. Interestingly, recent work showed that SMCHD1 is involved in the establishment of the unique higher order chromatin architecture of the Xi [97,106,107]. This 3D chromatin organisation might be necessary to facilitate accessibility of de novo methyltransferases and subsequent promoter-CGI methylation to lock the inactive state (Figure 2B). Interestingly, fast methylating islands are methylated, at least in part, in an SMCHD1-independent manner [96], suggesting that other mechanisms are involved (Figure 2B).
In contrast with genomic imprinting that depends on defined parent-of-origin dependent DNA methylation patterns for monoallelic expression, methylation of promoter-CGI on the Xi occurs late during XCI and is not the driving force for initiating gene silencing. In any case, long-term silencing of X-linked genes definitely relies on DNA methylation, although this occurs in combination with other epigenetic marks. While it is clear that DNMT3B methylates all CGI on the Xi, it is currently unknown how this enzyme is recruited to the Xi during XCI. This could occur via recognition of specific features of Xi chromatin. It also remains unclear why X-linked gene reactivation following loss of promoter-CGI methylation only occurs in particular developmental or cellular contexts.
DNA methylation of random monoallelically expressed genes
Monoallelic expression also concerns autosomal genes that can be expressed randomly from either the paternal or the maternal allele, in a stable manner, and independently of DNA sequence polymorphisms. RME affects a wide variety of gene functions, from genes encoding cell surface-associated proteins to developmental transcription factors and may be involved in promoting not only cellular specificity, but also diversity in gene expression patterns [1]. An important class of RME genes comprises large clustered gene families, such as AgRs, ORs or Pcdh, that presumably play roles in generating specificity and identity at the cellular surface of highly specialised cells. RME of AgR loci, which undergo genetic recombination in developing B or T cells to generate one expressed functional allele, appears to be predetermined during early development. However, RME of Pcdh or OR genes occurs through a stochastic process. Pcdh genes show monoallelic and combinatorial expression of exons in individual Purkinje neurons in the cortex, while OR genes are expressed in a monogenic and monoallelic manner in olfactory neurons [2]. RME also occurs at the level of single genes, for which monoallelic expression is more labile and less well understood. RME for these genes is established during development or differentiation into particular lineages, presumably through a stochastic and independent regulation of the two alleles. They are thus usually expressed in a cell-type or tissue-specific manner and their expression can vary among individuals [1]. These characteristics make their study rather challenging.
Analysis of clonal cell populations in vitro revealed that single RME genes can be expressed monoallelically from either allele, but also biallelically or not expressed at all in independent clones, in contrast with imprinted and X-inactivated genes (Figure 3A) [5,6]. Moreover, these expression patterns were found to be remarkably stable during cell passaging and differentiation of neural progenitor cell (NPC) clones [5], raising the question of what epigenetic mechanisms could account for this stability. Again, studies in clonal cell populations allowed testing whether promoter DNA methylation could be involved in monoallelic silencing. Bisulfite-based methods were used to measure the methylation levels of CGI associated with the promoters of a few RME genes in NPC clones [5,6]. This analysis revealed that approximately half of the genes analysed show good correlation between the methylation levels (absent, intermediate or full) and the expression status (biallelic, monoallelic or not expressed) in a given NPC clone (Figure 3A) [5,6]. However, for other RME genes, there was no clear correlation between promoter-CGI methylation levels and expression [5,6]. In another study, the identification of sequences with dual methylation patterns, i.e. showing both methylated and unmethylated states, was used to identify new genes showing monoallelic expression in the central nervous system, of which 12% were confirmed to show RME in clonal cell populations [108]. Additionally, a genome-wide analysis of DNA methylation in human clonal neural stem cell lines using Illumina Infinium methylation beadchip reported a significant correlation between monoallelic expression and intermediate levels of DNA methylation, whereas biallelic genes were hypomethylated [109]. In contrast, no evidence for DNA methylation imbalances associated with allele-specific expression was found in a separate genome-wide study in humans, except when linked to variations in DNA sequences (see last paragraph) [110]. It should be noted however that this analysis was not performed on clonal cell populations unlike other studies, thus differences in DNA methylation could potentially be masked.
Allele-specific DNA methylation imbalances associated with RME or with DNA sequence polymorphisms
DNA methylation patterns were also analysed for clustered gene families. Stochastically expressed Pcdh exons display mosaic methylation patterns, while constitutively expressed exons are hypomethylated [111]. Interestingly, these methylation patterns are established during early embryonic stages long before lineage specification and expression [104,111]. On the other hand, OR gene clusters are located in regions devoid of DNA methylation and repression appears to be mediated rather through nuclear localisation and histone modifications [112,113].
Mechanisms controlling methylation of random monoallelically expressed genes
The specific role of DNMTs in the methylation of RME genes has been investigated so far only for clustered gene families. It was shown that DNMT3B is required for de novo methylation at promoters of the three Pcdh gene clusters, which is established early during development, after the blastocyst stage when the enzyme is highly expressed in epiblast cells [111,114]. DNMT3A is on the other hand dispensable for this methylation [111]. Dnmt3b-deficient Purkinje cells express an increased number of Pcdh isoforms per cell, indicating that the DNA methylation by DNMT3B regulates the expression frequency of each Pcdh isoforms in neurons. Smchd1-deficient embryos also show hypomethylation of Pcdh-α and Pcdh-β clusters promoter regions, correlated with increased expression of many different isoforms in mutant embryos and adult brain [104]. These observations mirror the phenotype observed in Dnmt3b-deficient embryos reinforcing the idea that DNA methylation plays a role in the regulation of stochastic and monoallelic expression of Pcdh isoforms. Additionally, the chromatin state of OR genes was studied quite extensively in the mouse olfactory epithelium, in particular in mice deficient for Dnmt3a. While this enzyme is important for the regulation of global gene expression in olfactory sensory neurons, it does not appear to regulate choice and monoallelic expression of OR genes [113,115].
Few studies have yet explored the molecular mechanisms involved in methylation of single RME genes due to their random, transient and variable nature within a cell population or a tissue. Treatment of cells with 5-azacytidine allowed testing whether DNA methylation is necessary for the maintenance of RME patterns. The erasure of DNA methylation at 5′ CGI of RME genes did not induce biallelic expression in NPC clonal cell lines, although this was analysed only for a small number of loci [5,6]. Besides these examples, the dependence of monoallelic expression of single RME genes towards DNA methylation is unknown. It is likely that the cellular memory at different RME loci rely on a variety and combination of epigenetic mechanisms, including DNA methylation, which might be relevant for some genes [5,6,108,109]. As RME appears somewhat less stable and more transient than the monoallelic expression of imprinted and X-linked genes, this could account for some of the differences observed. Interestingly, RME is often associated with random monoallelically accessible promoter regions in NPC, suggestive of regulation through transcription factor binding [116]. It has been hypothesised that RME could represent a way of fine-tuning the expression of genes in specific cell types during development or differentiation. In this case, repression of the silent allele through histone modifications or activation by transcription factors would allow easy reversal of the expression states when needed. In contrast, RME genes associated with DNA methylation may indicate a need for stronger stability at these loci.
Importantly, genome-wide studies of DNA methylation in human and mice [117–120] revealed a strong correlation between allele-specific DNA methylation (ASM) and the presence of DNA sequence polymorphisms in the vicinity. This phenomenon is widespread in the human genome and sometimes associated with allele-specific expression imbalances [121] (Figure 3B). These studies demonstrate how allele-specific genetic variations in cis can influence CpG methylation in both mouse and human and occasionally lead to allelic differences in expression. These findings, which connect genetic polymorphisms and phenotypic variability, may help to understand how some of these loci that may be associated with dosage-sensitive genes, could contribute to human diseases.
Concluding remarks
Although the three classes of genes showing monoallelic expression in mammalian genome share multiple properties, they also display specific features. DNA methylation is an epigenetic modification essential for the establishment and maintenance of imprinting by differential marking of parentally inherited alleles at ICRs in the germline. X-inactivated genes are characterised by promoter hypermethylation, which is established after transcriptional silencing. Although X-linked gene promoter methylation is considered to be important for the maintenance of silencing, it appears only essential upon certain cellular or developmental contexts. Finally, whereas DNA methylation is sometimes associated with RME loci, it does not seem to be a common epigenetic signature of this class of genes. This is likely linked to the nature of RME genes, whose expression often arises during differentiation, is highly cell-type specific, often more variable and potentially transient, unlike genes regulated by imprinting and XCI.
In the future, methods combining RNA analysis, methylomes, chromatin signatures, accessibility and lineage tracing will help to understand the extent to which some RME loci share similar properties to imprinted and X-inactivated genes, in particular in vivo. Moreover, the manipulation of the methylation machinery in vitro or in vivo using CRISPR-Cas9 and the use of conditional KOs will allow to determine the extent to which DNA methylation is instrumental for the maintenance of monoallelic expression states in specific developmental or tissue contexts. Furthermore, the ability to use human induced pluripotent stem cells and 3D organoids will allow further investigation on the role of DNA methylation and monoallelic expression in human cells, which could have important implications not only for development, but also for disease.
Summary
Three main classes of genes display monoallelic expression independently of DNA sequence polymorphisms in mammalian genomes: imprinted genes, X inactivated genes and RME genes on autosomes.
DNA methylation is essential for the establishment and maintenance of imprinting.
DNA methylation of X-linked gene promoter regions occurs late during XCI and participates in the long-term silencing of genes on the Xi.
DNA methylation can be associated with RME loci, but it does not appear to be a general feature of this class of genes.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
This work was supported by Fundação para a Ciência e Tecnologia (FCT)/Ministério da Ciência, Tecnologia e Ensino Superior (MCTES), Portugal, through the project grants PTDC/BEX-BCM/2612/2014 and PTDC/BIA-MOL/29320/2017 IC&DT. S.T.d.R. has a CEECUIND/01234/207 assistant research contract from FCT. A.-V.G. is supported by an INSERM investigator position. Publications costs were supported by UID/BIM/50005/2019, project funded by FCT/MCTES through Fundos do Orçamento de Estado.
Author Contribution
S.T.d.R. and A.-V.G. both prepared and wrote the manuscript.
Abbreviations
- AgR
antigen receptor
- AID
activation-induced deaminase
- CGI
CpG island
- CpG
5′-cytosine-phosphate-guanine-3′
- DMR
differentially methylated region
- DNMT
DNA methyltransferase
- DNMT3L
DNMT3-like
- ES
embryonic stem
- H3K27me3
trimethylation of lysine 27 on histone H3
- ICR
imprinting control region
- KO
knock-out
- NPC
neural progenitor cell
- OR
olfactory receptor
- Pcdh
protocadherin
- PRC
Polycomb repressive complex
- RME
random monoallelic expression
- SMCHD1
structural maintenance of chromosomes hinge domain-containing 1
- TET
ten-eleven translocation
- UHRF1
ubiquitin-like containing PHD and RING finger domains 1
- Xa
active X chromosome
- XCI
X-chromosome inactivation
- Xi
inactive X chromosome
- ZFP
zinc finger protein
- ZNF
zinc finger
- 5mC
5-methylcytosine