Abstract
As one of the most abundant and well-studied epigenetic modifications, DNA methylation plays an essential role in normal development and cellular biology. Global alterations to the DNA methylation landscape contribute to alterations in the transcriptome and deregulation of cellular pathways. Indeed, improved methods to study DNA methylation patterning and dynamics at base pair resolution and across individual DNA molecules on a genome-wide scale has highlighted the scope of change to the DNA methylation landscape in disease states, particularly during tumorigenesis. More recently has been the development of DNA hydroxymethylation profiling techniques, which allows differentiation between 5mC and 5hmC profiles and provides further insights into DNA methylation dynamics and remodeling in tumorigenesis. In this review, we describe the distribution of DNA methylation and DNA hydroxymethylation in different genomic contexts, first in normal cells, and how this is altered in cancer. Finally, we discuss DNA methylation profiling technologies and the most recent advances in single-cell methods, bisulfite-free approaches and ultra-long read sequencing techniques.
Introduction
DNA methylation in mammalian cells is characterised by the addition of a methyl group at the carbon-5 position of the cytosine base (5-methylcytosine; 5mC) primarily in the context of cytosine-guanine dinucleotides (CpG) through the action of the DNA methyltransferase enzymes (DNMTs) [1] (Figure 1A). Widespread interest in DNA methylation is attributed to the critical role it plays in cell biology [2]; regulating gene expression, retro-element silencing, centromere stability and chromosome segregation in mitosis, X-chromosome inactivation [3,4] and monoallelic silencing of imprinted genes [5].
Normal and cancer genomes exhibit distinct DNA methylation profiles
It is well known that DNA methylation patterns frequently become altered in cancer, including DNA hypomethylation events at retro-elements, centromeres and oncogenes in combination with focal DNA hypermethylation associated with repression of critical gene regulatory elements such as distal enhancers and promoters overlapping transcriptional start sites (Figure 1B). Moreover, the discovery that 5mC can be oxidised to DNA hydroxymethylation (5-hydroxymethylcytosine; 5hmC) by the ten-eleven translocation (TET) enzymes [6–8] has prompted widespread interest in the possible roles of 5hmC in remodeling the methylation landscape. The ability of TET proteins to further oxidise 5hmC to 5-formylcytosine (5fC) and 5-carboxycytosine (5caC) [9], which can be excised by thymine DNA glycosylase (TDG) in the base excision repair (BER) pathway and replaced with an unmodified cytosine (Figure 2A) provides a mechanism that may contribute to DNA methylation pattern dynamics in the early embryo [10,11], normal cell biology [12,13] and in disease processes [14].
The presence of 5-hydroxymethylation is indicative of both active and passive DNA demethylation pathways
With the advent of genome-wide approaches to interrogate DNA methylation and advances to distinguish major and minor DNA methylation intermediates such as 5hmC from 5mC [15,16], the field is building comprehensive maps of DNA methylation landscapes. Genome-wide mapping studies have revealed that 5hmC and TET proteins are enriched at promoters, gene bodies and distal regulatory elements in mammalian genomes [17,18]. This suggests that the postulated functions of 5mC at these regulatory regions can be revised by taking 5hmC enrichment into account. As we integrate this information from different cell types and in the context of other epigenetic layers such as post-translational histone modifications and nucleosome positions, we are constantly improving our understanding of the role and scope of DNA methylation in different genomic contexts in normal and diseased cells, and importantly, as a function of tumorigenesis.
Normal genomic distribution of DNA methylation
In the human genome, there are approximately 28 million CpG sites of which approximately 70% are methylated in normal somatic cells. Interestingly, these CpG sites are not evenly distributed; in fact, the bulk of the genome is depleted of CpG sites with overall representation of CpG dinucleotides occurring at only one-fifth of the expected frequency. By contrast, clusters of CpG sites occur at the expected frequency, and these regions are termed as ‘CpG islands’ [19]. The majority of CpG islands are 500–1000 base pairs (bp) in length and commonly span promoters of genes, and housekeeping genes [19,20] in particular. Differing from the bulk of the genome, CpG sites located within CpG islands are typically unmethylated in normal somatic cells (Figure 1B). They exist in a transcriptionally permissive chromatin state that is also characterised by various combinations of post-translational histone modifications [21] and distinctive nucleosome organisation [22]. Unmethylated CpG sites within promoter CpG islands provide a binding platform for a complement of transcription factors to control gene activity. A prime example of this is the activity of ubiquitous transcription factor, Sp1 (Specificity protein 1), whose interactions with DNA is modulated by the presence or absence of DNA methylation at CpG islands [23]. Sp1 typically occupies unmethylated DNA to promote gene transcription, whereas binding to methylated CpG sites is inhibited and correlated with transcriptional silencing. While the majority of CpG islands are maintained in an unmethylated state, a number of repressed genes harbour methylated promoter CpG islands in somatic cells, which include genes on the inactive X chromosome in females (dosage compensation) and imprinted alleles [24]. Adjacent to CpG islands are regions known as CpG island ‘shores’, which are located approximately two kilobase pairs (2 kB) distal from CpG islands and have comparatively low CpG density [25]. Like CpG island promoters, shores are typically unmethylated in normal cells and this pattern is associated with gene activity.
Gene bodies tend to carry extensive 5mC that, in contrast with CpG island promoters and shores, correlates with active gene expression (Figure 1B). This intragenic DNA methylation has been shown to prevent spurious transcription initiation in mouse embryonic stem cells (mESCs) by protecting the gene body from spurious RNA polymerase II binding [26]; furthermore, DNA methylation at CpG islands located within gene bodies may repress alternative or tissue-specific promoters [27]. Distal regulatory regions, such as tissue-specific enhancers, are CpG-poor and belong to a class of lowly methylated regions (LMRs) exhibiting average DNA methylation levels ranging between 10 and 50% [28] (Figure 1B). Enhancers possess characteristic post-translational histone modifications [21,29,30] and nucleosome positions [31,32] that work in concert with 5mC to modulate expression of their cognate genes. DNA methylation levels of the enhancer have been shown to be associated with gene activity at validated promoter–enhancer pairs, with low level of 5mC correlating with increased gene expression [33]. Super-enhancers are regions encompassing clusters of putative enhancers, and here, specific patterns of 5mC have been described ranging from fully methylated to moderately unmethylated and completely unmethylated [34]. Several super-enhancers possess a unique hypermethylated pattern, punctuated by focal unmethylated patches, suggesting non-uniform activity across the super-enhancer. Overall, super-enhancer 5mC levels correlate with the expression of corresponding genes, with completely unmethylated enhancers associated with higher gene activity [34].
‘Reshaping’ of 5mC patterns in cancer cells
It is well established that normal epigenetic processes are disrupted during the initiation and progression of tumorigenesis, including global changes to normal DNA methylation patterns [35]. Broadly, this is characterised by overall genome-wide hypomethylation accompanied by regional DNA hypermethylation of CpG island promoters [35–37] (Figure 1B). The hypermethylation of CpG islands is common and frequently associated with silencing of tumour suppressor genes, genes controlling cell growth and downstream signalling pathways. Indeed, numerous loci-specific and genome-wide DNA methylation profiling studies have revealed multiple promoter-associated CpG islands that consistently undergo aberrant DNA hypermethylation in tumour cells. Examples of these include Glutathione S-Transferase P (GSTP1) in ∼90% of prostate cancers [38], the cyclin-dependent kinase inhibitor, p16INK4a in ∼20% of lung carcinomas [39] and BRCA1 in ∼12% of breast and ovarian carcinomas [40]. It has also been shown that not only are single loci hypermethylated in cancer, but multiple contiguous regions can become coordinately silenced and aberrantly hypermethylated [41]. Furthermore, the frequency of p16INK4a and GSTP1 hypermethylation has been shown to increase during disease progression [42,43], suggesting that DNA hypermethylation may be predictive of disease stage or progression. Indeed, The Cancer Genome Atlas consortium first reported the existence of CpG Island Methylator Phenotypes (CIMP) in glioblastoma [44] and colon cancers [45], enabling stratification of disease subtypes by the 5mC signature. Interestingly, the expression of DNMT enzymes is also frequently disrupted in cancer, which provides a feedback loop driving altered DNA methylation patterning across the genome and with potential to cause mutations in the genomic sequence [46]. There is still concerted effort in understanding the similarities between the 5mC profile of embryonic stem cells and cancer cells, including the interesting predisposition of Polycomb marked developmental genes to become preferentially hypermethylated in cancer cells [47–50], and whether the similarities to developmental states represent functional aberrations in the cancer epigenome.
While CpG islands are susceptible to DNA methyltransferase activity, CpG-poor regions tend to undergo hypomethylation during tumorigenesis, resulting in the global decrease in DNA methylation characteristic of tumours [51–53], first described by Feinberg and Vogelstein [54] in colon adenocarcinoma and small cell lung cancer, then later observed in prostate cancer and chronic lymphocytic leukaemia. The exception to this pattern of CpG-poor hypomethylation in cancer cells, is that of CpG-poor enhancer elements; these regions are unmethylated in normal cells and often gain methylation in cancer cells [33,55,56]. In addition, CpG island shores, which also have a lower CpG density, flanking CpG islands up to 2 kb distant, can become DNA hypermethylated in cancer. This was first observed in human colon cancer [25] and also shown in cancer from breast, lung, thyroid and Wilms’ tumour [25,51].
Global DNA hypomethylation in cancer is thought to contribute to genomic instability and increases in aneuploidy [57], both common features of cancer genomes. Indeed, reduced levels of DNA methyltransferase 1 (Dnmt1) can result in increased mutation rate, aneuploidies and tumour development, which provides some evidence for the role of DNA hypomethylation in increased chromosomal fragility [57,58]. Widespread genome instability is commonly accepted to accompany global loss of DNA methylation observed in cancer cells; however causality is still to be definitively shown. Broad regions of global hypomethylation are associated with global changes in chromatin organisation and structural variation [59]. Loss of DNA methylation is also accompanied by aberrant expression of transposable elements, repeat elements and oncogenes [35,57] such as MYC, resulting in global deregulation of cellular pathways [60] and occurring concomitant with alterations to chromatin organisation [32] and the three-dimensional genome [56].
There exists a strong link between chromatin structure and DNA methylation, in particular the requirement for a nucleosome to be present for anchoring of DNMT enzymes [61] prior to acquisition of 5mC [62]. Gene expression can be modified in the absence of any changes to DNA methylation patterns, and indeed, developmental gene expression is often already repressed in normal cells but these promoters are prone to abnormal DNA hypermethylation following transformation [50,63]. This is likely due to the fact that the promoters of such developmental genes exhibit a ‘closed’ conformation and well-positioned nucleosomes occlude the transcriptional start site [31]. Organisation of nucleosomes across DNA methylated regions is unexpectedly comparable with genes devoid of 5mC [32] supporting an overall highly organised physical chromatin state irrespective of cell type, normal or cancer origin. Despite remaining well organised on a broad scale, the substructure of genome organisation is disrupted in cancer cells, further characterised by loss of interactions between enhancer and promoter gene regulatory elements across the cancer genome occurring alongside hotspots of copy number variation [56]. Though a residual level of approximately 5% DNA methylation is required for cell survival [64], the mechanisms driving the initial change in DNA methylation patterning either as a cause or consequence of cancer progression remain to be clearly demonstrated.
Contribution of 5hmC to the normal DNA methylome
The purpose of 5hmC in the DNA demethylation process has prompted interest in its role in maintaining promoter CpG islands in the unmethylated state in normal cells. In mouse embryonic stem cells (ESCs), 5hmC is depleted from promoters of actively transcribed genes [65,66] but enriched at bivalent promoters of poised genes in a bimodal fashion surrounding the transcription start sites (Figure 2B) [17,67]. Similarly, in non-pluripotent cells, 5hmC demarcates the borders of promoter CpG islands [68–70] and large undermethylated regions called CpG canyons [71]. Levels of 5hmC accumulate at gene bodies during neuronal differentiation [72,73] and positively correlate with gene expression [13,73–76] (Figure 2B). The correlation between 5hmC and gene activity is more pronounced than that of 5mC [77]. Notably, 5hmC accumulation at gene bodies during differentiation was not accompanied by subsequent DNA demethylation, suggesting that 5hmC is a stable epigenetic mark at gene bodies in the brain [72]. Such a role would imply the existence of 5hmC binding proteins [75,78] and/or the ability of 5hmC to repel the recruitment and counteract the activity of 5mC binding proteins. This is supported by the observation that a subset of the methyl-CpG binding domain (MBD) family of proteins, while possessing high affinity to 5mC, were unable to bind 5hmC containing DNA [79]. There has also been interest in the role of intragenic 5mC [80] and 5hmC [76,81,82] in the regulation of alternative splicing, particularly at exon–intron boundaries in brain tissue. The presence of 5mC, together with 5hmC and specific histone modifications [83–85], at gene bodies may establish specific chromatin signatures and nucleosome positioning within the transcribed unit to regulate RNA transcription kinetics, gene activity and splicing. However, the role for 5hmC in shaping the regulatory function of these proteins is yet to be elucidated. Importantly, TET triple knockout in mouse and human ESCs results in increased DNA methylation at the borders of bivalent but not active, promoters [6,86]. Similarly, in differentiated cells, TET1 depletion triggers DNA methylation spreading into promoter CpG islands from the methylated borders [68] highlighting the crucial role for TET proteins and 5hmC in the maintenance of DNA methylation boundaries. Depletion of DNMT 3A isoform 1 (DNMT3A1), in turn, results in erosion of 5mC concomitant with 5hmC reduction at the shores of bivalent CpG islands [87] and CpG canyons [71].
Much like the accumulation of 5hmC at the borders of bivalent promoters, 5hmC is normally highly enriched at distal regulatory elements (Figure 2B) coinciding with enhancer-associated post-translational histone marks H3K4me1 and H3K27ac [88], DNase hypersensitive sites [89] and LMRs [90]. 5hmC is enriched at enhancers in a bimodal fashion surrounding the TF/p300 binding site [89] as well as at the boundaries of super-enhancers [91], coinciding with a local depletion of 5mC [76,89,91], which increases in response to TET depletion [6,92]. Moreover, in TDG-depleted cells, the majority of induced accumulation of 5caC occurs at poised and active enhancers [66] supporting the idea that TET-mediated 5mC oxidation followed by TDG-mediated excision is essential for the retention of reduced 5mC levels at distal regulatory elements. In fact, TET2 depletion-mediated hypermethylation predominantly occurs at distal regulatory elements to the highest extent at ‘weak’ enhancers with lower enrichment of H3K27ac and lower TF occupancy, but higher baseline levels of 5mC methylation [92]. Therefore, weak enhancers might be subject to a more dynamic interplay between DNMT-mediated methylation and TET-mediated demethylation, potentially governing enhancer activity through the mediation of TF binding. Altogether, these observations suggest that the balance between DNA methylation and DNA demethylation is required for the maintenance of methylation landscape at promoter borders. Consequently, an impairment of this balance may result in DNA methylation spreading into the interior of the promoter CpG island or, oppositely, erosion of DNA methylation boundary and formation of hypomethylated CpG island shores with both scenarios shown as a feature of carcinogenesis [51]. The cross-talk between 5mC, 5hmC and chromatin modifications at enhancer elements is complex and further research into the role of DNA methylation at distal regulatory elements is ongoing [93].
Rearrangement of the 5hmC landscape in cancer cells
Alongside the global loss of 5mC, a widespread reduction in 5hmC has been observed in all studied human carcinomas of different histological types [94,95]. Dot blot hybridisation and immunohistochemistry analyses have revealed that levels of 5hmC are significantly reduced in melanoma [94], breast, prostate, colon [95], liver, lung and pancreatic cancer [96] compared with the adjacent normal tissues. Quantitative estimates of 5hmC reduction in cancer have confirmed significant depletion of 5hmC up to 5-fold in small cell lung carcinomas and 30-fold in astrocytomas and glioblastomas [97]. It is likely that reduced levels of 5hmC observed in cancer cells are due to impaired activity of TET enzymes or a reduction in expression in TET proteins [94,96]. A number of cancers also carry mutations in TET enzymes with a high recurrence of TET2 mutations in myeloid malignancies [98,99]. Tumours carrying TET mutations display reduced 5hmC [100,101], and TET depletion reduces 5hmC levels in human leukaemia cell lines [102,103]. In the absence of TET aberrations, an indirect effect can be observed via altered enzymatic activity of Isocitrate Dehydrogenases 1 and 2 (IDH1/2) [104]. IDH1/2 catalyses the oxidative decarboxylation of isocitrate to 2-oxoglutarate, which serves as an essential cofactor of TET catalytic activity [105]. IDH1/2 mutations result in the reduction of 2-oxoglutarate and aberrant accumulation of (R-)-2-hydroxyglutarate, a competitive inhibitor of TET catalytic activity [106–108] were found in melanoma [94, glioma [44,109] and acute myeloid leukaemia [110]. Consistent with TET mutations, tumours carrying IDH mutations displayed increased CpG island hypermethylation [44,106]. The reintroduction of catalytically active TET2 or IDH2 results in restoration of the 5hmC landscape and suppression of melanoma invasion and growth [94]. Similar effects are seen following the reintroduction of catalytically active TET1 in normal and tumour breast cells [111]. These data highlight a vital role for TET and IDH proteins in the maintenance of the DNA methylation and DNA hydroxymethylation landscape in normal cells and their alteration in tumorigenesis. However, the function of TET or IDH enzymes is impaired only in a minority of cancers, whereas widespread 5hmC reduction has been shown to occur in all cancer tissues studied to date [95–97]. Thus, this suggests that a more generalised mechanism exists for 5hmC loss in cancer.
Critically, apart from IDH/2-oxoglutarate, the activity of TET as well as many other chromatin-modifying enzymes is dependent on several other cofactors such as Fe2+, oxygen and ascorbic acid. Oxygen availability is a crucial regulator of TET catalytic activity. It enables 5mC to 5hmC oxidation by oxidising Fe2+ in TET catalytic pocket and inducing oxidative decarboxylation of 2-oxoglutarate. Indeed, 5hmC loss has been attributed to tumour oxygenation with hypoxic areas within patient-derived tumour xenografts exhibiting decreased 5hmC [112]. Ascorbic acid is another essential cofactor of TET enzymes serving as an electron donor in the Fe3+ to Fe2+ reduction reaction. Supplementation of ascorbic acid was shown to enhance TET catalytic activity and transient 5hmC accumulation in mouse ESCs [113] and embryonic fibroblasts [114]. In melanoma and bladder cancer cells ascorbic acid enabled restoration of 5hmC levels and this was sufficient to inhibit cancer cell growth and migration [115,116]. Overall, TET/IDH mutations, hypoxia and ascorbic acid depletion can attenuate the catalytic function of TET enzymes potentially serving as underlying cause of widespread 5hmC loss in cancer.
DNA methylation profiling technologies
Outstanding biological questions remain regarding the DNA methylation process itself, as well as how genes are silenced and targeted for silencing. In addition, the field is at an inflection point where it must adopt new techniques that circumvent current limitations such as cost, genomic coverage and sample input requirements so that the study of 5mC and 5hmC become standard approaches in the clinic for diagnostics and monitoring of tumorigenesis.
Technologies to measure DNA methylation are long-established but constantly evolving. In particular, the potential to examine the genome-wide presence or absence of 5mC using next-generation sequencing techniques has also enabled assessment of DNA methylation at the level of individual CpG dinucleotides. By contrast, some of the earliest techniques such as high performance liquid chromatography (HPLC, [117]) measure global 5mC content. As noted by Vryer and Saffery [118], the distinction is important, as direct approaches remain the only current techniques that determine global methylation by definition. This is affirmed by the presence of persistent ‘gaps’ in genomic sequencing coverage meaning that the majority (approximately 95%), but not all, CpG sites can be analysed using the whole-genome bisulfite sequencing technique [119]. While we are in an excellent position to perform comprehensive characterisation of DNA hyper- and hypo-methylation events in distinct tumour types, the field has simultaneously exploded into the single cell space and is now developing bisulfite-free DNA methylation assays to circumvent some of the existing limitations in measuring DNA methylation. The overarching goal in cancer biology is to apply these cutting-edge advances to detect aberrant 5mC and 5hmC changes to gain insights into common versus distinct regulatory pathways. Perhaps more importantly, to provide a position from which to determine the prerequisites underlying the aberrant 5mC and 5hmC changes, and new options for manipulating them for the benefit of patients.
Global technologies
Global measurement of DNA methylation defines the total amount of 5-methylcytosine relative to unmethylated cytosine content in the genome. This is directly achieved using HPLC [117], high-performance capillary electrophoresis (HPCE, [120]) or liquid chromatography in combination with tandem mass spectrometry (LC-MC/MS, [121]). Generally, direct measurement approaches are not commonly used due to the need for relatively large amounts of sample DNA, specialist equipment and being unsuitable for high-throughput processing.
Indirect measurement of global DNA methylation levels is achieved using any other method. The overarching assumption of indirect measurements is that the average methylation level of the subsampled CpG dinucleotides is representative (proxy) of the remainder of the genome. This category of assays includes subsampling of repeat genomic elements, enzymatic digestion with or without dependence on the presence of methyl groups at consensus sequences containing CpG sites (for example, by using HpaII or MspI enzymes) and luminometric methylation (LUMA, [122]) or enzyme-linked immunosorbant assays (ELISA). A proxy measurement is most accurate using the long interspersed numerical element (LINE) assay when compared with HPLC [123] and most variable using standard ELISA-based approaches (for example, the 5mC DNA ELISA Kit from Zymo Research). However, the specific biological question, sample type, budget and available equipment are all important considerations when deciding on a method to assess DNA methylation. Thus, ELISA-based approaches that use specific antibodies to detect 5mC or 5hmC are appropriate for high throughput screening or when the goal is to identify large changes (>1.5-fold) between samples.
Genome-wide technologies
Next-generation sequencing platforms have revolutionised biology, notably for assessment and interpretation of DNA methylation events. Foremost, the ability to achieve base pair resolution and assess methylation outside dense, CpG island regions usually found at gene promoters [19,20]. Whole-genome bisulfite sequencing (WGBS) was first demonstrated by Lister et al. [124] by adaptation of the current gold-standard from Frommer and Clark [125]. This approach is becoming more accessible as a consequence of improvements in workflow (for example, picogram amounts of DNA input) and reduced costs of genomic sequencing; yet, <2000 WGBS datasets have been deposited into the Gene Expression Omnibus (GEO) database to date. This reflects the relative expense of a WGBS dataset compared with other options, requirement for technical expertise and significant data analysis. For these reasons, WGBS is not yet routinely used in research or clinical laboratories. Nucleosome Occupancy and Methylation Sequencing (NOMe-Seq [32,126]) takes advantage of the WGBS protocol, but offers improvement by simultaneously measuring endogenous nucleosome occupancy and chromatin accessibility. This is achieved by treating nuclei with bacterial M.CviPI enzyme, which gives an exogenous methylation profile of the accessible GpC nucleotides (nucleosomes and accessibility) without affecting endogenous CpG methylation. Samples are otherwise prepared, and cost is equivalent to WGBS. Enrichment-based methods such as MeDIP [127] and MBDCap-Seq [128] tend to produce comparatively ‘noisy’ data and report on an ∼18% fraction of CpG sites. Moreover, MeDIP favours regions of low CpG density such as intergenic regions while MBDCap-Seq is biased towards CpG-rich elements. The cost to sequence an enriched sample is at least 15-fold less than WGBS or NOMe-Seq, but reduced bias can be achieved with more recent array-based assays for similar expense. Indeed, the latest iteration of the Illumina Infinium Array (MethylationEPIC) remains one of the most popular selections for DNA methylation analysis, particularly for clinical samples. Data from the ∼850 000 CpG dinucleotides covered by the MethylationEPIC reliably mirrors WGBS measurements of identical sites [129] and covers ∼3% of all CpG dinucleotides located within a spectrum of genomic regulatory elements including enhancers and promoters. While biological interpretation of the MethylationEPIC data rely on the ‘co-methylation assumption’ of adjacent CpG sites [130], there is an increasing number of user-friendly data analysis options that improve accessibility of array-based methods (e.g. CHAMP [131]). Targeted bisulfite sequencing offers similar benefits with increased coverage and the potential for customisation (for example, CpGiant Enrichment System (Roche) can assess up to 5.5 million CpG sites) but has a greater sample input requirement. The BLUEPRINT consortium [132] evaluated performance, sensitivity, scalability and cost of 27 locus-specific and genome-wide DNA methylation assays for clinical applications. This report highlighted the relative strengths and weaknesses of common assays; of the current methods available, the consortium recommended locus-specific amplicon bisulfite sequencing and pyrosequencing methods for overall performance, noting that currently popular genome-wide alternatives (such as MethylationEPIC) have comparatively reduced accuracy and higher cost.
Single-cell advances, bisulfite-free and ultra-long read sequencing approaches
Existing next-generation sequencing based assays usually use a population of cells to achieve the input requirement and typically do not distinguish 5mC from 5hmC unless modified protocols are employed (for example, TET-assisted bisulfite sequencing [89]). Thus, the field has been adapting and developing techniques that allow interrogation of cell-type specific methylation, in single cells and/or independently of a bisulfite conversion step.
The development of single cell bisulfite sequencing [133] (scBS-Seq) was a breakthrough being the first epigenomic methodology adapted for single cells. However, the initial datasets produced in oocytes and embryonic stem cells covered fewer than half of all possible CpG sites (up to ∼48%) in the mouse genome and data mapping efficiency was <25% due to low sample complexity [133]. Variations on scBS-Seq have seen the ability to measure DNA methylation and transcription in parallel from single cells (scM&T-Seq, [134]) and single cell nucleosome occupancy and methylation sequencing (scNOMe-Seq, [135]). Wang et al. [136] have recently devised a new approach (methyltransferase treatment followed by single-molecule long-read sequencing (MeSMLR-Seq) based upon the original NOMe-seq protocol, which adopts single molecule long read sequencing (Oxford Nanopore Technologies) to enable in excess of 53 kB read lengths (up to 356 nucleosomes). MeSMLR-Seq offers one of the first epigenetic applications of long-read sequencing and great potential to uncover biological insight over short-read sequencing options; however, there is currently little cost or convenience benefit in adopting these approaches in their current form.
The vast majority of current DNA methylation assays rely on bisulfite conversion, including interrogation of 5hmC. This is problematic because the process of bisulfite conversion itself damages 84–99% of the genomic DNA [137,138]. While single cell and low input options exist for bisulfite-dependent methods including WGBS, there is a need to further develop bisulfite-independent options. ACE-Seq takes advantage of human-specific APOBEC3A enzyme to resolve 5hmC distribution without bisulfite conversion [139] and chemical labelling methods increase sensitivity and specificity of the 5hmC readout at single molecule resolution [74,140]. Adaptations of chemical labelling approaches based on 5fC-T transition have enabled mapping of the low abundance 5fC intermediate in bulk cells [141,142] and at a single cell level [143]. Using three variations of a pic-borane treatment coupled with exogenous TET enzyme treatment, Liu et al. [144] have resolved 5mC (TET-assisted pyridine borane sequencing β; TAPSβ), 5hmC (chemical-assisted borane sequencing; CAPS) and 5mC+5hmC (TET-assisted pyridine borane sequencing; TAPS). These methods preserve double-stranded DNA without excessive fragmentation, resulting in greater coverage of CpG sites without sequence bias [144]. These methods are still in their relative infancy and their uptake is not yet widespread; however, we predict that it will be the future adaptation of these bisulfite-free assays in a clinical setting, likely in combination with the genome-wide sequencing applications currently in use, that will be groundbreaking for patient treatments or refined diagnoses. In part, because of the information retention possible without damaging bisulfite treatment but additionally, because of the reduced amounts of DNA sample required for input. The next advances will also require increased capability to map the three-dimensional genome at an extraordinarily fine resolution, alongside the methylome, which is already achieved at base-pair resolution. As with all massive datasets, the difficulty extends beyond the ability to produce the data in the first place, and the field will be faced with ongoing challenges regarding the interpretation of methylomes either standalone or in combination with genetic, transcriptomic and/or complementary epigenetic information (e.g. post-translational histone modification screening) for patient benefit.
Concluding remarks
We are at an inflection point in generating and interpreting cancer epigenomes, particularly DNA methylation. With comprehensive mapping of cancer-associated DNA methylation changes now possible, we are faced with the necessity to improve the cost: accessibility and genome coverage: input ratios of genome-wide approaches (Figure 3). This is particularly important as bisulfite-free methods with base pair resolution, single molecule and single cell capability become possible. There is need for excitement around these approaches because they address three pitfalls of whole-genome bisulfite sequencing; namely, (1) the input requirements are compatible with a wider range of sample types, (2) DNA integrity is preserved and (3) ‘CpG dropout’ or ‘gaps’ are avoided. The ultimate benefit is greater or complete coverage of the 28 million CpG sites is achieved for less cost. However, from a biological perspective there are many open questions that remain despite great advances in the past decade. The mechanisms underlying DNA methylation changes incorporating global hypomethylation alongside punctate DNA hypermethylation events are still to be characterised. Why are CpG islands, shores and distal enhancer elements susceptible to DNA methyltransferase activity in cancer? Indeed, which of these are driving cancer initiation and progression? How dynamic is a change in DNA methylation at any given locus or CpG site and does fluctuate across disease trajectories? The ability to profile single cells will not offer an explanation to this end, and carefully designed animal and cell line experiments will be required to monitor DNA methylation and DNA hydroxymethylation flux. These platforms, in addition to being able to monitor DNA methylomes in clinical samples, are a critical base to extend our understanding of DNA methylation biology in normal and cancer cell contexts.
Comparison of genome coverage in input requirements of common assays to measure DNA methylation
Summary
DNA methylation changes are widespread in cancer cells, including both DNA hyper- and hypomethylation events and DNA hydroxymethylation alterations.
DNA methylation and DNA hydroxymethylation changes occur across the genome with emphasis on changes occurring distal from promoters and the need to understand the broader impact of these changes on gene expression, genome structure and cell behaviour.
Many options are available to assess DNA methylation on global, locus-specific and genome-wide scales in cell populations as well as single cells.
The field is developing new methods to circumvent the disadvantages of current techniques.
A focus on data interpretation is also important, as the number of available datasets is continually increasing.
Author Contribution
K.S., C.S. and P.T. contributed equally to writing of this manuscript.
Funding
This work was supported by the National Health and Medical Research Council (NHMRC) Project Grants [grant numbers APP 1128912 (to C.S.), APP 1161985 (to P.T.), APP 1109696 (to P.T.)]; the National Breast Cancer Foundation Investigator Initiated Research Scheme Grant [grant number IIRS-18-137 (to C.S.)]; the National Foundation and Medical Research and Innovation (NFMRI) grant [grant number NFMRI_Stirzaker (to C.S.)]; and Cancer Council Tasmania (Australia) Grants Scheme [grant number T24716 (to P.T.)].
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
glossary
- 5-methylcytosine
5mC; methylated form of the fifth carbon position of the cytosine nucleotide in deoxyribonucleic acid (DNA)
- 5-hydroxymethylcytosine
5hmC; DNA pyrimidine nitrogen base derived as the first oxidative product in the process of demethylating cytosine nucleotides. Oxidation is facilitated by Ten-Eleven-Translocation (TET) enzymes. Precursor to 5-formylcytosine (5fC), then 5-carboxylcytosine (5caC)
- Bisulfite sequencing
shotgun or genome-wide sequencing methods applied to bisulfite-treated DNA allowing measurement of cytosine methylation at single base pair resolution. Currently considered the ‘gold standard’, it is theoretically but not practically possible to resolve every cytosine using bisulfite sequencing, which also does not distinguish between 5mC and 5hmC
- Coverage
‘depth’; referring to the number sequencing reads covering any given nucleotide in the genome. The average coverage is typically reported for sequencing datasets
- CpG island
defined as a genomic region greater than 500 bp with a G+C content equal to or greater than 55% and observed CpG/expected CpG of 0.65 [20]. This definition identified islands more likely to be associated with the 5′ regions of genes and excluded the majority of Alu-repetitive elements
- Cytosine
C; one of four nucleotide bases found in DNA and ribonucleic acid (RNA)
- Direct
referring to the absolute measurement of 5mC in a sample (e.g. by HPLC) and distinct from most genome-wide methods available that measure a subset of all CpG sites and extrapolate the total percent methylation (i.e. indirectly measuring global DNA methylation levels)
- DNA methylation
Usually refers to the presence of 5mC and the activity of DNA methyltransferase (DNMT) enzymes. However, the term ‘DNA methylation’ could encompass DNA hydroxymethylcytosine and other variants, as well as methylation in non-CpG contexts
- DNA hydroxymethylation
Referring to the presence of 5hmC
- ELISA
enzyme-linked immunosorbent assay (commercially available; MethylFlash Methylated DNA 5mC Quantification Kit (Epigentek) or EpiSeeker (Abcam) for example
- Enhancer
A regulatory region DNA of variable (50–1000 bp) size capable of binding transcriptional and chromatin regulatory proteins for the purpose of controlling gene expression. Often located distal to the promoter, they usually interact with their cognate genes by DNA looping
- Genome-wide
‘across’, referring to measurement of 5mC at a representative subset (1.7–95% using current assays) of all cytosine residues across the genome
- Global
‘total’, referring to the total level of 5-methylcytosine relative to unmethylated cytosine content
- Hypomethylation
Loss/reduction in methylation compared with normal or what is expected (e.g. hypomethylation of the bulk of the genome in cancer)
- Hypermethylation
Gain/increase in methylation compared with normal or what is expected (e.g. hypermethylation of CpG islands in cancer)
- Indirect
referring to the assumption that 5mC levels at a subset or majority of CpG sites accurately reflects the global DNA methylation content
- Intergenic
located between or outside a gene/s
- Intragenic
located within a gene/s
- LUMA
Luminometric methylation assay; couples restriction enzyme digestion with pyrosequencing
- Nucleosome
an octamer of eight histone proteins; facilitates the compaction of the DNA strand inside the nucleus, influences DNA accessibility profiles and provides a platform for the binding of enzymes that post-translationally modify the histone proteins
- Post-translational histone modification
Covalent modification of histone proteins following translation and including the processes of acetylation, phosphorylation and methylation, for example
- Promoter
Located proximal (5′-) to the transcriptional start site, promoters are 500–1000 bp and can contain many defined regulatory elements. Promoters are the sites of binding for many transcription factors and chromatin modifying proteins required for transcription (e.g. RNA Polymerase II)
- ‘Shores’
Immediately flanking and up to ∼2 kB distal from CpG islands; ‘shores’ are regions of relatively low CpG density compared with CpG islands. The presence of DNA methylation at ‘shores’ is correlated with gene silencing
- Tumorigenesis
formation and evolution of a tumor