Transcription factor expression repertoire basis for epigenetic and transcriptional subtypes of colorectal cancers
Contributed by Stephen B. Baylin; received February 3, 2023; accepted June 15, 2023; reviewed by Timothy A. Chan and Joseph F. Costello
Significance
We provide an understanding of the development of epigenetic and transcriptomic Colorectal cancers (CRCs) subtypes that is rooted in altered TF expression. Our data reveal cancer-specific methylation at enhancers and promoters associated with changes in expression of TFs in CRCs, which are early events important in shaping CRC development and link the epigenetic and transcriptional subtypes of CRCs. Profiling of patients based on expression of methylation-specific TFs identifies a subgroup with worse overall survival.
Abstract
Colorectal cancers (CRCs) form a heterogenous group classified into epigenetic and transcriptional subtypes. The basis for the epigenetic subtypes, exemplified by varying degrees of promoter DNA hypermethylation, and its relation to the transcriptional subtypes is not well understood. We link cancer-specific transcription factor (TF) expression alterations to methylation alterations near TF-binding sites at promoter and enhancer regions in CRCs and their premalignant precursor lesions to provide mechanistic insights into the origins and evolution of the CRC molecular subtypes. A gradient of TF expression changes forms a basis for the subtypes of abnormal DNA methylation, termed CpG-island promoter DNA methylation phenotypes (CIMPs), in CRCs and other cancers. CIMP is tightly correlated with cancer-specific hypermethylation at enhancers, which we term CpG-enhancer methylation phenotype (CEMP). Coordinated promoter and enhancer methylation appears to be driven by downregulation of TFs with common binding sites at the hypermethylated enhancers and promoters. The altered expression of TFs related to hypermethylator subtypes occurs early during CRC development, detectable in premalignant adenomas. TF-based profiling further identifies patients with worse overall survival. Importantly, altered expression of these TFs discriminates the transcriptome-based consensus molecular subtypes (CMS), thus providing a common basis for CIMP and CMS subtypes.
Sign up for PNAS alerts.
Get alerts for new articles, or get an alert when an article is cited.
Colorectal cancers (CRCs) form a heterogeneous group of tumors traditionally classified into genetic and epigenetic subtypes and more recently into transcriptome-based consensus molecular subtypes (CMSs) (1, 2). Analyses of DNA methylation at specific gene promoter CpG-island loci have identified subsets of CRCs, predominantly in the proximal colon, to have a very high frequency of increased hypermethylation of CpG residues at the promoter CpG-islands, a phenotype termed the CpG-island hypermethylator phenotype or CIMP (3–5). Genome-wide methylation analyses of CpG-island (CGI) promoters have extended CIMP as a quantitative phenomenon which can separate CRCs and other cancers into subsets based on the degree of aberrant DNA methylation at the CGI promoters. Accordingly, based on the numbers of CGIs that are methylated genome-wide, CRCs have been classified into CIMP-High (CIMP-H), CIMP-Low (CIMP-L, intermediate methylation), and two clusters of very low DNA hypermethylation frequencies, cluster-3 and cluster-4 (CIMP-L3 and L4) (3, 4). The CIMP subtypes are etiologically and clinically relevant as they are early events during tumorigenesis, detectable in early precursor lesions (adenomatous and serrated) of CRCs (6–12) that evolve via multiple pathways of CRC development (11). CIMP-L CRCs tend to have KRAS mutations, are microsatellite stable (MSS), have tubular features, tend to be in the distal (Left) colon, and evolve from the more common adenomatous polyps (13–16). The CIMP-H subtype generally evolves from benign, serrated polyps and is enriched in CRCs occurring in the proximal (right) colon with a high frequency of BRAF mutations and the hypermutator phenotype associated with microsatellite instability (MSI-high, MSI-H) (11). CIMP classification has been shown to be clinically relevant, especially since CIMP-H CRCs tend to present as higher-grade lesions with shorter overall survival in response to standard chemotherapy regiments, and negative prognosis upon recurrence (17–20). The CIMP-H cases with MSI-H exhibit overall better survival, while the CIMP-H cases lacking MSI-H generally have the worst prognosis among all CRCs (21). The MSI-H stratification within the CIMP-H cases is especially important clinically due to their better survival outcomes in response to immune-checkpoint inhibitors (22–25).
Transcriptome-based consensus molecular subtype (CMS) classification, established in recent years, has further provided an integrated view of the transcriptional changes in the context of genetic and epigenetic CRC subtypes (2). Based on the transcriptional profiles, CRCs are classified into four CMS categories (CMS1, 2, 3, and 4) that associate with unique molecular features of these cancers (2, 26–28). Especially, the CMS1 subtype is characterized by pathways important in immune infiltration and immune evasion and is tightly linked to CIMP-High, MSI+ CRCs. The other CIMP subtypes are distributed varyingly among the CMS2, 3, and 4 categories, wherein CMS2 is characterized by epithelial and Wnt signatures, CMS3 exemplifies increased metabolic signatures, and CMS4 is characterized by increased stromal infiltration and upregulation of epithelial-to-mesenchymal transition genes.
Little is known about the molecular basis of the promoter CGI-based CIMP subtypes and its links to the CMS subtypes. To identify these links, we have examined the relationships between cancer-specific DNA methylation alterations, not only at gene promoters but also enhancer regions, and related these data to those for expression of transcription factors (TF) which bind to both of the above regions. We hypothesized that identifying enhancer regions, their DNA methylation status, and their link to TF-regulatory networks will help better understand the relationships between CIMP-associated promoter methylation, corresponding enhancer methylation patterns, and their links to transcriptional subtypes. While enhancers differ from promoters in harboring distinct chromatin composition (H3K4me1/H3K27ac), and a subset of enhancers harboring CGI are also subject to hypermethylation in cancers (29–31), the genome-wide cancer-specific gains of methylation changes at enhancers and their relation to the promoter hypermethylator phenotypes, i.e., the CIMP categories, have not been explored. Recent studies have developed methods to determine transcription factor activities based on altered DNA methylation changes at enhancer regions and thus infer the transcription factor regulatory networks influential in maintaining the epigenetic and transcriptomic landscape (32–35). By integrating new and publicly available epigenomic and transcriptomics datasets from CRCs and adenomas, we show that the cancer-specific hypermethylator phenotype in CRCs is a genome-wide phenomenon that occurs at both promoter and enhancer regions. This phenomenon is potentially driven by CRC subtype–specific deregulation of a set of TFs, which provides a mechanistic basis for the epigenetic and transcriptional subtypes of CRCs, and further allows identification of patients with worse survival within the CIMP-H group.
Results
Concordant DNA Hypermethylator Phenotypes at Promoter Regions during CRC Development.
Aberrant cancer-specific CGI promoter DNA methylation alterations are an early event during tumorigenesis, occurring as early as in the aberrant crypt foci precursor lesions of CRCs and polyps (6, 11, 36, 37). The relationships of methylation changes at promoter and enhancer elements during CRC development are not known. We established the relationships of methylation changes at promoter and enhancer elements during CRC development (Materials and Methods) by combining gene expression with genome-wide DNA methylation at the comprehensive set of promoter and enhancer candidate cis-regulatory elements (cCREs). The cCREs represent regulatory DNA elements which are defined by integrating DNA accessibility (DNase I-hypersensitive sites) and chromatin modification data in the ENCODE project and represent a robust set of well-annotated regulatory elements across a wide array of tissues and cell types (38). The cCREs are classified into promoter-like signatures, proximal enhancer-like signatures that are within 2 kb of a TSS, and distal enhancer-like signatures that are more than 2 kb away from the nearest TSS (38) (Fig. 1A). In our analyses, we further parsed these regions by the presence or absence of CGI.
Fig. 1.

Combined analyses of methylation patterns in CRCs with that of tubulovillous adenomas and serrated adenomas (detailed in Dataset S1) (1, 14) show that most of the PLS-CGI DNA methylation alterations observed in CRCs are conserved in the polyps (Fig. 1B). For the most variable PLS-CGI probes in the TCGA dataset (COAD), colon polyps harbor the range of promoter DNA methylation patterns similar to that of CRCs and segregate well with the different CIMP classes (Fig. 1B). Further, as in previous reports (11, 39), the tubulovillous and serrated polyps clearly separate out within this spectrum of CIMP categories with the latter clustering with the CIMP-H and CIMP-L CRCs. The dysplastic serrated polyps appear very similar to that of the nondysplastic serrated polyps (SI Appendix, Table S1). These data recapitulate that genome-wide CIMP features at CGI promoters are established very early during CRC development and support the model that serrated and conventional tubular/tubulovillous adenomas represent precursors that lead to CRCs having distinct epigenetic alterations along the CIMP spectrum (11, 39).
Concordant DNA Hypermethylator Phenotypes Occur at Enhancer Elements during CRC Development.
We show that the cancer-specific gain of DNA methylation significantly extends to the pELS and dELS enhancer regulatory regions in a manner quantitatively similar to the spectrum of CIMP phenotypes. To relate the CIMP subtypes, which is based on promoter-CGI methylation, to the degree of enhancer methylation, we have introduced the term CpG-enhancer methylator phenotype or CEMP classification based on the methylation changes in the pELS and dELS regions. The enhancer methylation is similar in CRCs and polyps and occurs in both CGI (Fig. 1 C and D), and non-CGI regions (SI Appendix, Fig. S1 A and B). Further, this spectrum of methylation patterns is also observed for an independent set of putative enhancer regions identified as non-promoter-associated accessible chromatin using ATAC-seq data for a subset of colon cancers in the TCGA (40) (SI Appendix, Fig. S1C). Hierarchical clustering of methylation at these putative CRC-specific enhancer elements mirrors the same spectrum of CIMP-associated hypermethylation patterns observed in the CRCs and adenomas. Importantly, enhancer regions overall have a cancer-specific gradient in hypermethylator phenotypes (CEMP) classically shown for promoter CGI regions (CIMP). This is observed in the high correspondence of the CRC samples assigned to the epigenetic subtypes (CIMP-H, L, L3, and L4) based on methylation at the CGI promoter regions (classical promoter CIMP subtypes) and those based on methylation at the enhancer regions (pELS and dELS; CEMP-H, L, L3, and L4) (Fig. 1 E and F).
Relation between Polycomb (PcG) Marked Chromatin and Promoter CGI Hypermethylation Extends to Enhancer Hypermethylation.
Previous studies have shown that promoter-CGIs associated with the PcG mark (H3K27me3) in progenitor tissue stem cells and/or embryonic stem cells have a high likelihood of being hypermethylated in cancers (41–44). We show a similar likelihood for enhancer hypermethylation to be associated with PcG mark in polyps and CRCs by analyzing the degree of hyper and hypomethylation in enhancer regulatory elements categorized into various activity states based on H3K27me3, H3K4me1, and H3K27ac marks (ChromHMM patterns from ENCODE) (45). In both CRCs, and polyps, the degree of hypermethylation at promoter and enhancer elements harboring the various chromatin marks follow the gradient of hypermethylator phenotypes, with CIMP-H showing the highest fraction of probes methylated in each category of regulatory elements (Fig. 2 A and B). The degree of hypermethylation varies by the chromatin marks and presence of CGI, with the PcG-associated promoter elements in the context of CGI showing the highest proportion of hypermethylated probes: For example, in the CRC samples, about 52% of probes in PcG-TSS-CGI are hypermethylated in CIMP-H CRCs, and this drops to ~35% for PcG-TSS-non-CGI regions, i.e., PcG containing promoter elements not associated with CGI (SI Appendix, Fig. S2A). In contrast and as expected, active promoters exhibit a lower degree of hypermethylation in both CGI and non-CGI (the highest observed being ~7 and 8%, respectively, for the CIMP-H CRCs). Like observed for the PcG-marked promoters, PcG-marked enhancers, harboring the bivalent enhancer mark (H3K27me3 + H3K4me1), have a high frequency of hypermethylated probes in the context of CGI (~60%), which drops to ~40% in the non-CGI regions (SI Appendix, Fig. S2A). Although the proportions of methylated probes significantly decrease for the non-PcG marked poised and active enhancers, the fraction of probes hypermethylated in these regulatory elements follow the same increasing trend from low to high hypermethylator phenotypes, indicating that tumor-specific hypermethylation along the CIMP spectrum is a general phenomenon at these gene regulatory elements (Fig. 2A and SI Appendix, Fig. S2A). In relation to H3K27me3-marked regions, H3K9me3-marked constitutive heterochromatin showed weaker relation to DNA hypermethylation along the CIMP spectrum. Only H3K9me3-marked regions in the context of CGI showed an increasing trend to be methylated with increasing CIMP phenotype, but this was far less compared to that observed for H3K27me3-marked regions, while the H3K9me3-marked regions associated with non-CGI have a low tendency to be methylated across the CIMP spectrum. Similar hypermethylation patterns and association with the chromatin marks are observed in the adenoma samples for these ChromHMM promoter and enhancer regulatory element classes (Fig. 2 A and B and SI Appendix, Fig. S2A). Further, the genome-wide hypermethylation patterns at the different regulatory elements are tightly linked as the fraction of probes methylated in each of these regulatory element categories are highly correlated in CRC and adenoma samples (SI Appendix, Fig. S2 C and D). Taken together, the above analyses using multiple independent enhancer annotations (Figs. 1 and 2) show that methylation patterns in the enhancer elements across the CRCs and the polyp datasets have the same relationship with hypermethylator subtypes classically observed for the promoter regions and that these linked promoter (CIMP) and enhancer (CEMP) hypermethylator patterns occur early during colon tumorigenesis.
Fig. 2.

Similar analyses as above, but for the degree of hypomethylation across the CIMP subtypes, show that in contrast to the increases in hypermethylation along the spectrum of CIMP phenotypes in CRCs and polyps (Fig. 2 A and B), hypomethylation is relatively similar across all CIMP subtypes (Fig. 2 C and D). Genomic regions harboring active marks, especially in the context of non-CGI, tend to show a higher degree of hypomethylation compared to inactive marks or those associated with CGI. The degree of hypomethylation at the different regulatory elements is less correlated to each other in both the CRCs and adenoma samples (SI Appendix, Fig. S2 E and F) in contrast to the highly correlated hypermethylation occurring genome-wide at these regulatory elements (SI Appendix, Fig. S2 C and D). The similarities in hyper- and hypomethylation changes in polyps and CRCs indicate that both methylation gains and losses are established early during tumor development.
Role for Transcription Factor Repertoire Downregulation in DNA Hypermethylator Phenotype in CRCs.
The high correlation between the CIMP and CEMP phenotypes indicates that the genome-wide hypermethylation alterations at gene-regulatory elements constitute systemic epigenetic dysregulation phenomenon during early stages of tumorigenesis which potentially has common mechanistic bases. We identified that among genes whose expression is positively or negatively correlated with the hypermethylator phenotypes, there is a significant enrichment for the class of DNA binding TFs (229 TFs enriched in 1813 genes that are negatively correlated with hypermethylator phenotype) (SI Appendix, Fig. S3 A and B). As reduction in TF binding to target sites has been directly related to de novo methylation alterations (46, 47), we examined the role of TF expression changes in the hypermethylator phenotypes. TFs with regulatory roles in the hypermethylated and hypomethylated promoter and enhancers cCREs were identified in the TCGA colon cancer dataset using the ELMER approach, which scans for TFs with binding sites in these regions and for which the expression of corresponding TFs is negatively correlated with the cancer-specific differential methylation alterations in these regions (see workflow in Fig. 1A, model in Fig. 3A, and details in Materials and Methods) (33). These TF sets associated with hypermethylated or hypomethylated promoter (PLS) and enhancer (pELS and dELS) elements significantly overlap with each other (Fig. 3B, TFs listed in Dataset S2). As expected, overall expression of TFs associated with hypermethylated cCRE sites is down-regulated, while the TFs associated with hypomethylated cCRE sites are up-regulated, in the CRC samples compared to normal samples (Fig. 3C). The degree of downregulation of TFs with binding sites enriched in hypermethylated cCREs is significantly higher (~threefold) than the degree of upregulation of TFs with binding sites enriched in hypomethylated cCRE, as seen in the higher negative median log gene expression ratio (−1.54) for the down-regulated TFs compared to that for the up-regulated TFs (+0.80) in relation to normal samples (Fig. 3C). Importantly, these factors are also deregulated to a similar extent in the polyp dataset, indicating that the TFs and their expression trends identified in the CRC dataset represent early events associated with neoplastic transformation. Further, the hypermethylated cCRE–associated TFs that are down-regulated are primarily enriched for TFs relevant to cancer-related pathways, while the hypomethylated cCRE–associated TFs that are up-regulated are enriched in pathways regulating cell cycle, viral response pathways, among other pathways (Fig. 3D).
Fig. 3.

Given that TFs have a direct role in modulating methylation (46, 47), deregulated expression of these cCRE methylation–associated TFs may potentially influence the DNA methylation landscape at regulatory regions and the manifestation of the hypermethylator phenotypes. In support of this, the expression of the cCRE methylation–associated TFs occurs in a manner correlated with the quantitative CIMP phenotypes. We identified the TFs associated with hyper- and hypomethylated cCRE (PLS and dELS) for the two extremes of the hypermethylator phenotypes, highest (CIMP-H) and lowest (CIMP-L4), to determine the relation between TF expression changes and the CIMP phenotypes (Fig. 4 A and E; TFs listed in Dataset S2; data for PLS and dELS shown in the following figures; similar results obtained for pELS due to the high overlap of the TFs among these regulatory elements). While the TFs associated with hypermethylated PLS or dELS regions are expected to be down-regulated, the following key characteristics associated with the degree of their downregulation and correlation with the hypermethylator phenotypes suggest that TF expression profiles strongly influence the cancer-specific global hypermethylator patterns. Although TFs commonly associated with hypermethylated PLS (Fig. 4B) and dELS (Fig. 4F), regulatory elements in both the CIMP-H and L4 subtypes are down-regulated in both these subtypes, CIMP-H cancers tend to display higher levels of downregulation. The latter is exemplified in the median expression ratio of the CIMP-H- and CIMP-L4-specific cCRE hypermethylation–associated TFs in the CIMP-H and L4 subtypes with respect to normal samples (Fig. 4 L, Top). Noticeably, the CIMP-H-specific cCRE hypermethylation–associated TFs are markedly down-regulated in the CIMP-H cancers (Fig. 4 C and L), which is in stark contrast to the degree of downregulation of the CIMP-L4-specific cCRE hypermethylation–associated TFs in CIMP-L4 cancers (Fig. 4 D and L). The cCRE hypomethylation–associated TFs identified in CIMP-H or CIMP-L4 are up-regulated to similar levels across the CIMP subtypes (Fig. 4 B–D and F–H and SI Appendix, Fig. S4A), which is consistent with findings that the degree of hypomethylation is similar across the CIMP subtypes (shown in Fig. 2 C and D and SI Appendix, Fig. S2 B, E, and F). Importantly, the cCRE hypermethylation–associated TFs specific to CIMP-H, in contrast to those specific to CIMP-L4, display a robust trend for decreasing expression with increasing degree of the hypermethylator phenotype from CIMP-L4 to H (SI Appendix, Fig. S4 A, Left).
Fig. 4.

We independently validated the above relationship of the augmented TF downregulation in the CIMP-H subtype by linking TF expression changes to the methylation frequency (measured as the relative risk for methylation) at TSS distal sites (>5 kb from TSS) occurring near the TF binding motifs of corresponding up- and down-regulated TFs in each tumor sample. In these analyses, the relative risk for hypermethylation at sites near TF motifs for up- and down-regulated TFs in each CRC sample was estimated without a priori requiring negative correlation of TF expression and target site DNA methylation. Relative risk for hypermethylation is higher at sites near motifs of down-regulated TFs, as opposed to up-regulated TFs, across all the CIMP categories, and these relationships are particularly highest for the CIMP-H subtype (SI Appendix, Fig. S4B). Further, the associated TFs are markedly down-regulated as a function of the increasing hypermethylator phenotype, with most downregulation in the CIMP-H subtype (SI Appendix, Fig. S4C).
As previous studies have shown that expression of epigenetic modifiers, such as UHRF1, is directly linked to the CIMP profiles (48), we compared the impact of expression of genes encoding key enzymes involved in DNA methylation modification (DNMT1, DNMT3A, DNMT3B, TET1, TET2, TET3, and UHRF1), in relation to the median expression of the TFs associated with the hypermethylated cCRE elements, on the variability in hypermethylation at promoters across the CRC CIMP spectrum. DNMT1 and UHRF1 both are highly expressed in the CIMP-H samples, which is in direct relation to their potential increased activity being associated with increased hypermethylation in the CIMP-H CRC and adenoma samples (SI Appendix, Fig. S4 D and E). However, the standardized regression coefficients associated with the expression of hypermethylation-associated TFs as well as these epigenetic modifiers computed using multiple linear regression model show that TF expression has the strongest and most significant association with the degree of hypermethylation across the CIMP spectrum (SI Appendix, Fig. S4F).
Furthermore, as immune cell infiltration varies based on the CRC subtypes, especially with the MSI samples enriched in the CIMP-H subtype showing favorable immune cell infiltration (49, 50), we tested whether the CRC tumor purity differences among the CIMP subtypes due to differences in the immune microenvironment may be directly related to the cCRE methylation–associated TF expression differences in the CIMP subtypes. We applied the CIBERSORT algorithm to identify the proportions of immune cell infiltration in the TCGA CRC cohort, which shows an increased immune activity in the CIMP-H subtype in agreement with previous reports (49, 50) (SI Appendix, Fig. S4G). Multiple regression analyses of the variability in hypermethylation at promoters across the CIMP categories as a function of the proportion of various immune cell type infiltration along with the median expression of TFs associated with the hypermethylated cCRE elements show that TF expression has the highest and significant standardized regression coefficient explaining the variability in hypermethylation across the CIMP spectrum (SI Appendix, Fig. S4H).
TF Repertoire Downregulation Is Associated with CIMP-H Phenotypes across Other Major Cancer Types.
As CIMP subtypes have been reported in cancers of various tissues of origin, we queried whether downregulation of TF expression is associated with increasing methylator phenotypes in other cancers. We analyzed TCGA samples for which sufficient numbers of matched methylation and expression data are available and where CIMP categories have been well described. Accordingly, gastric, esophageal, and glioma cancer datasets (TCGA) were identified to satisfy these criteria and were analyzed to identify TFs significantly associated with hypermethylated and hypomethylated dELS regulatory elements. For esophagus and glioma cancer, the CIMP-H and L (intermediate methylation) subtypes were combined into a high methylation group (CIMP-H’), and subtypes L3 and L4 were combined into a low methylation group (CIMP-L’) due to low sample number availability. The TFs commonly associated with the hypermethylated cCREs in both the CIMP-H/H’ and L4\L’ subtypes are down-regulated to a similar extent in gastric, esophageal, and glioma cancers (Fig. 4 I–K). Like the observations in the CRCs, most significant downregulation is consistently observed for TFs associated with hypermethylated cCREs in the CIMP-H subtype of these cancers (CIMP-H and L panels in Fig. 4 I–K and Fig. 4 L, Top). These data thus suggest that the increased downregulation of subsets of TFs is associated with the CIMP-High phenotypes across cancer types.
Enhancer Associated Target Gene Expression Alterations Are Established Early in CRC Development.
Analyses of gene expression relationships of putative enhancer-target genes reveal that the enhancer methylation alterations are functionally linked to expression of their associated target genes early during CRC development. Enhancer gene targets were identified in the CRC data using the ELMER algorithm, which identifies target genes based on their proximity to enhancer probes and significant cancer-specific negative correlation in expression of the gene and enhancer methylation. To evaluate the functional relevance and early roles, these enhancer methylation and target gene expression relationships were then explored in the independent polyp dataset. As expected, based on the ELMER approach for identification of enhancer–gene pairs, the target genes of the hypermethylated dELS enhancer regions are down-regulated in CRCs compared to normal samples (Fig. 5A). Importantly, a similar trend for decreased expression of these target genes is also observed in the independent polyp dataset. In the case of hypomethylated dELS regions, the overall expression of target genes in CRCs is higher than that in normal samples, and again, this relationship is mirrored in the polyps (Fig. 5B). The negative correlations between dELS methylation and expression of its target gene, identified in the CRC samples, are also present in polyps for substantial numbers of enhancer–target gene pairs, with 49% (367/750) and 75% (544/725) of the hyper- and hypomethylated enhancer probe–gene pairs showing significant negative correlation (P < 0.05) in adenomas (Fig. 5 C and D). Similarly, for the hypermethylated CGI promoters in CRCs whose direct target genes have lower expression in CRCs, these genes are down-regulated in polyps (SI Appendix, Fig. S5A) with a significant negative correlation between promoter methylation and expression in the polyps (SI Appendix, Fig. S5B). Examination of individual genes highlights the relationship between enhancer methylation changes and target gene expression in CRCs and polyps (SI Appendix, Fig. S5 C–F). These example genes include those important in cancer pathways, like SOX17 (51, 52) and LMX1A (53, 54) that are down-regulated in conjunction with hypermethylation of their cognate enhancer probes, and MYC (55, 56) and SLC6A6 (57) that are up-regulated in conjunction with hypomethylation (SI Appendix, Fig. S5 C–F).
Fig. 5.

Consistent with the functional nature of individual genes described above, the genes affected by enhancer methylation are enriched for biological pathways relevant to tumorigenesis. Our previous studies have shown that promoter-methylated genes in CRCs are important regulators of tumor suppression and pathways important in cancers, development, and differentiation (44, 58). This is again the case for the target genes of hypermethylated enhancers which are enriched for pathways such as cancer development and Wnt signaling (Fig. 5 E, Bottom). In contrast, the hypomethylated enhancer target genes are involved in basic cell functions such as cell cycle, RNA transport, and DNA replication, important for increased cell turnover (Fig. 5 E, Top).
Alterations in cCRE Methylation–Associated TF Expression Link Epigenetic and Transcriptome-Based CMSs.
As introduced earlier, recent transcriptome-based profiling of CRCs into CMS subtypes has revealed clinically significant subgroups linked to important molecular pathways and the CIMP subtypes (2). We thus analyzed the relation between expression of the cCRE methylation–associated TFs with transcriptome-based CMS subtypes to determine whether expression of these TFs may form a basis for the CIMP–CMS relationships. To determine the CIMP–CMS relationship, we applied TF expression–based clustering on CRC datasets (TCGA and GSE39582 cohort, which were used in previous studies for developing the CMS subtyping) (2). Clustering CRCs based on expression of CIMP-specific cCRE-associated TFs shows that these TFs not only distinguish the CIMP-H subtype but also separate the CMS categories into their respective subgroups (Fig. 6A). Using the K-means clustering approach, we identified three major CRC clusters (K1, K2, and K3) that showed significant overall association with both CIMP and CMS subtypes (SI Appendix, Fig. S6F, Fisher’s test, P < 0.001). In this combined cohort, CIMP status is designated as CIMP-H, CIMP-Low, and CIMP-Negative (CIMP-Neg, constituting CIMP-L3 and L4), based on methods used to identify the CIMP subtypes in the non-TCGA cohort (2). The majority (~77%) of CIMP-H samples separated in the distinct K1 cluster, which is enriched for CMS1 and three subtypes (Fig. 6B), indicating that expression of CIMP-specific cCRE methylation–associated TFs may provide a strong basis linking the CIMP-H and CMS1 and three subtypes. The CIMP-L and negative subtypes are mainly enriched in K2 and K3 clusters, which are discretely enriched for CMS2 and CMS4 subtypes, respectively. In order to test the enrichment of each CIMP/CMS subtype in the three K-means clusters, we have performed Fisher’s tests of pairs CIMP-H vs. non-CIMP-H, CIMP-Low vs. non-CIMP-Low, CIMP-Neg vs. non-CIMP-Neg, (CMS1+CMS3) vs. (CMS2+CMS4), CMS2 vs. (CMS1+CMS3+CMS4), and CMS4 vs. (CMS1+CMS2+CMS3) against the three K-means clusters (SI Appendix, Fig. S6F). Except for CIMP-Low vs. nonCIMP-Low comparison, every CIMP and CMS subtype is significantly enriched in specific K-means clusters. Taken together, expression of cCRE methylation–associated TFs not only separates the CIMP-H group but also sufficiently recapitulates the CMS subtypes. Thus, expression of these TFs, which potentially gives rise to the methylation spectrum defining the CIMP-H subtype, may provide a strong basis linking the CIMP and CMS subtypes.
Fig. 6.

Alterations in cCRE Methylation–Associated TF Expression Define Survival Differences in the CIMP-H Subtype.
CIMP-H CRCs have been variably associated with survival outcomes (17, 19, 21, 59, 60). A major indicator of the survival outcomes in the context of CIMP-H is the MSI-status, with the CIMP-H/MSS phenotype being an unfavorable indicator of overall survival (OS) while MSI-H being a favorable indicator (18, 21, 61–63). We show that stratification of CIMP-H samples based on expression of CIMP-H-specific hypermethylated cCRE-associated TFs allows better discrimination of patients by OS than that using MSI-based stratification (Fig. 6 C and D). Stratification of CIMP-H CRCs based on expression of TFs associated with hypomethylated regions specific to both CIMP-H as well as CIMP-L subtypes showed no significant differences in OS (SI Appendix, Fig. S6 C and D). In these analyses, the CIMP-H cases in general did not show overall differences in survival compared to CIMP-L and negative (SI Appendix, Fig. S6A). Stratifying the CIMP-H cases by MSI status (MSI-high and MSS) shows only marginal trends for better OS in MSI-high cases (P-value = 0.22, HR = 0.65) (SI Appendix, Fig. S6B). In contrast, CIMP-H patients with higher expression of cCRE-associated TFs are significantly linked to worst survival (Fig. 6C). While CIMP-H cases are known to have a high tendency to harbor mutations in BRAF compared to KRAS, no significant association of BRAF or KRAS mutation status was observed in the CIMP-H cases with high and low expression of CIMP-H specific hypermethylated cCRE-associated TFs, indicating that these TF expression profiles are independent of oncogenic mutations (Fig. 6D).
CIMP-H cases with low expression of hypermethylated cCRE-associated TFs, which have better survival, showed a tendency to be enriched for the MSI-H cases with borderline significance (P-value = 0.08, OR = 2.09) (Fig. 6D). Thus, we further tested whether the differences in the OS in CIMP-H cases with high or low expression of the CIMP-H-specific hypermethylation-associated cCRE TFs are related to MSI status (Fig. 6E). The favorable prognosis for CIMP-H patients with low expression of hypermethylated cCRE-associated TFs remained even after adjusting for MSI status (Fig. 6E, Cox regression analyses, HR = 0.45, P-value = 0.047). Furthermore, the association of OS with expression differences of the hypermethylated cCRE-associated TFs is specific to the CIMP-H patients as OS was not associated with CIMP-L and negative patients stratified based on expression differences of these TFs. Taken together, the above analyses show that the CIMP-specific hypermethylation-associated cCRE TFs allow prognostication of CIMP-H subtypes which performs better than MSI-based subtyping.
Discussion
DNA methylation alterations are a universal feature in all cancers and, along with genetic alterations, are important contributors to the molecular basis of malignant transformation (64, 65). The CIMP hypermethylator phenotype was classically defined by increased methylation at CpG-island promoters in a subset of CRCs, and subsequently, various human malignancies have been shown to display such subsets harboring hypermethylator phenotypes (3–5, 14, 15, 20). In this work, we show that the hypermethylator phenotype is a general phenomenon encompassing regulatory elements throughout the genome, which includes both promoters and enhancers, which we designated as CIMP and CEMP, respectively. In effect, these represent a hypermethylator phenotype encompassing genome-wide CpG sites related to regulatory elements. These methylation patterns at promoters and enhancer regions occur very early during tumor development. The high congruence between CRC samples classified into the increasing degrees of promoter and enhancer methylation categories suggests that these hypermethylator phenotypes are one and the same with a common basis. Accordingly, we have identified cancer-specific alterations in expression of TFs that have binding sites enriched at enhancers and promoters with altered methylation and show that this TF expression repertoire is tightly associated with the epigenetic subtypes. Our data suggest that the TFs, which in association with chromatin modifiers have a role in modulating DNA methylation (47, 66), may play an important role in the observed spectrum of hypermethylator phenotypes linking promoter and enhancer methylation.
Above findings suggest how deregulated expression of a TF-repertoire may form a strong basis underlying various hypotheses and mechanisms proposed to explain why tumors of the same tissue type display a quantitative spectrum of DNA methylation frequencies, especially the CIMP-H phenotype, observed in cancers of various tissues. It should be noted that although the methods used to determine the TFs associated with hypermethylated regulatory elements rely on anticorrelation of TF expression and hypermethylation across the CIMP types, it is striking that the TFs specifically identified in CIMP-H cancers show markedly increased downregulation. Mechanistically, this supports the idea that the CIMP-H phenotype may be driven by loss of protection from DNA methylation machinery as a result of reduced expression of specific TFs, consistent with the concept that TFs protect from DNA methylation during normal cellular homeostasis (47, 67). Extreme downregulation of factors in subsets of cancers may lead to loss of protection from DNA methylation at these sites by a combination of direct competition with targeting of DNMTs and/or by changes to the histone code (47, 66, 68). Mechanistically, this will involve combinatorial patterns of TFs whose downregulation in concert may drive methylation changes. Moreover, we show that this model is pertinent even in cases of gliomas, where the CIMP phenotype is attributed to the IDH1-mutant driven oncometabolite, 2-hydroxy-glutarate, which inhibits the TET enzymes and prevents demethylation. Even in this scenario, absence of TFs with binding sites at the regulatory elements may be the required primary step for de novo methylation of the target sites, with the increased methylation frequency being compounded by the lack of TET activity. In general, our data suggest that alterations in TF expression, which is an early event during tumorigenesis, might “expose” promoter and enhancer regulatory elements to the DNA methylation machinery and initiate the primary and important step toward generation of the aberrant cancer epigenome. The augmented downregulation of transcription factors in subsets of cancers may thus further result in the extreme hypermethylator phenotypes.
Our data are important for the fact that DNA methylation at promoters is classically known to be important in regulation of gene expression, especially for key tumor suppressor genes (64, 69). The findings here re-enforce that this hypermethylation of CGI associated with promoters is an early event during CRC tumorigenesis, occurring in adenomas and aberrant crypt foci (6). Recent studies have revealed that DNA methylation alterations in enhancer regions have a profound impact on gene expression regulation in cancers (70–72). Our studies here show that tumor-specific changes in enhancers are also an early event in CRC tumorigenesis. Interestingly, significant numbers of CGI not associated with promoters, termed “orphan” CGI, have been shown to be associated with enhancer regions and to undergo cancer-specific hypermethylation (31). For the enhancer analyses, we have used predicted enhancer cCREs based on pan-tissue analyses from ENCODE (38) or chromatin marks from normal colon mucosa (45) or those identified using DNA methylation and putative target gene expression associations. These observations regarding tumor-specific methylation alterations at enhancers can be further refined by identification of functional enhancers in CRCs. Furthermore, previous studies have linked the presence of the PcG mark (H3K27me3) to the high likelihood for promoter CGI hypermethylation in cancers (41–44). We show that PcG-marked enhancers, like the PcG-marked promoters, primarily are the main regulatory regions that get hypermethylated in the CRC samples. In the context of CGI, these PcG-marked promoter and enhancer regulatory elements harbor the highest degree of DNA methylation across every CIMP category. Current models for enhancer activity suggest mechanisms involving recruitment of TFs at promoters also control transcriptional activity at enhancers (29, 30). Identification of large numbers of common TFs associated with altered methylation at promoters and enhancers in the current work, both in the presence or absence of CGI, suggests a common basis involving altered expression of these TFs for the cancer-specific genome-wide methylation changes during tumor development. Consistent with the above, we observe a similar extent of methylation changes at both promoters and enhancers in the adenomas and the CRCs. Functionally, the enhancer methylation changes may have roles during early tumorigenesis as the putative target genes of these enhancers are enriched in cancer development pathways. We also observe that expression of the cCRE methylation–associated TFs is deregulated in adenomas. Altered expression of these TFs is thus potentially important during the early initiation and progression of the neoplastic state. Elucidating what drives TF expression changes, especially the downregulation of specific TFs, and linking it to methylation changes early on during tumor initiation will help in understanding the molecular steps involved in the de novo DNA methylation changes important for CRC initiation and development. Further, our analyses do not distinguish 5-hydroxymethylcytosine (5 hmC) from 5-methylcytosine (5 mC). As 5 hmC and 5 mC can have differential roles in gene regulation (73), future studies should analyze alterations in TF activities in relation to 5 mC and 5 hmC alterations.
Finally, the TF expression repertoire associated with the epigenetic landscape in CRCs provides a common basis linking the epigenetic-based hypermethylator subtypes and the transcriptome-based CMS subtypes. Due to the prognostic relevance of the epigenetic and CMS-based subtyping, facile approaches to unify these subtypes will be of clinical importance. The TF repertoire linking the epigenetic CMS subtypes may represent markers not only for integrating the CMS and CIMP subtypes but also for stratification of CRC prognosis and therapeutic approaches. We show using available public datasets that have both methylation/CIMP and gene expression data that the hypermethylated cCRE-associated TFs can predict OS in the CIMP-H patients, with the subset of patients with low expression of these TFs exhibiting better prognosis even after adjusting for the MSI status. This is particularly interesting because multiple studies have shown that the CIMP-H patients have worst survival while occurring in the context of MSS, while those in the context of MSI-H are associated with better prognosis (18, 21, 61–63) and respond to immune checkpoint inhibitors (22–25). Expression profiling of relevant TFs, in conjunction with the CMS/CIMP categories, may thus help develop better prognostic and therapeutic strategies across the CMS and CIMP subtypes. Future studies combining mapping methylation and targeted expression of minimal sets of these TFs to build classification models for integrating CMS with the hypermethylator subtypes in archived samples will help advance the application of the clinical implication of these subtyping approaches.
Materials and Methods
See also SI Appendix, Materials and Methods.
Profiling DNA Methylation of Tubular Adenomas.
DNA was extracted from two to three slices (10 µm) from the FFPE samples (Qiagen), quantified using PicoGreen, and checked for quality using the Illumina FFPE QC Kit. The DNA was then subjected to bisulfite conversion (EZ DNA methylation kit, Zymo Research) and controlled by two in-house qPCRs. Next, 250 ng DNA was subjected to Illumina’s FFPE restoration protocol. Data were processed in two batches: the first batch was profiled on Infinium HumanMethylation 450 BeadChip, and the second batch was run on Infinium MethylationEPIC BeadChip. The methylation data retained for analysis consisted in the β-values (ratio of the methylated probe intensity and the overall intensity) of 390,277 probes that are common to both array types.
Profiling Gene Expression of Tubular Adenomas.
RNA was extracted from one slice (10 µm) from the FFPE samples (Qiagen RNEasy FFPE kit). Amplification and labeling of total RNA were performed using the GeneChip® PICO Reagent Kit following the manufacturer’s protocol (ThermoFisher 2016, P/N 902,790). Biotin-labeled target samples were hybridized to the Affymetrix HT HG-U133+ PM Array. Target hybridization was processed on the GeneTitan® Instrument according to the manufacturer’s instructions provided for Expression Array Plates (P/N 702,933). Images were analyzed using the GeneChip® Command Console Software (GCC) (ThermoFisher).
Public Datasets Used.
Following are the public datasets used in this study: 1) TCGA CRC (colon tumors 279 vs. normal 19) (Methylation and Expression); 2) sessile serrated adenomas (n = 78) (ArrayExpress, E-MTAB-7854) (Methylation); and 3) colon cancer dataset GSE39582 (n = 516) (Expression).
Analysis of DNA Methylation Data.
We integrated and normalized the DNA methylation array data from TCGA colon tumors (n = 317), TCGA colon normal (n = 38), tubular adenoma polyps (accrued by Janssen Pharmaceuticals, n = 127), and sessile serrated adenomas (n = 78) (ArrayExpress, E-MTAB-7854) (74). Package “Minfi” was used to process the methylation data (75). See SI Appendix, Materials and Methods, for a detailed description on analysis of methylation data.
Analysis of Expression Data.
Expression dataset for the Janssen polyps was derived using the Affymetrix Array platform (HT HG-U133 Plus PM) and was processed using the affy package (76) with Robust Multichip Average normalization. The Janssen polyp cohort contained 61 samples that had matched methylation and expression profiles. Expression data for both TCGA colon and Janssen polyps were integrated and normalized for removing batch effects by using the “ComBat” subroutine from the SVA package (77) (SI Appendix, Fig. S5G). Analyses of enhancer methylation and associated TFs and target gene expression were done only for the tubular/tubulovillous samples as the expression profiles are available for only these samples.
Estimation of the contribution of immune cell infiltration to CIMP phenotypes was carried out using expression data and CIBERSORT (78). See SI Appendix, Materials and Methods, for details.
TF Identification and Analysis of Chromatin Marks.
See SI Appendix, Materials and Methods, for a detailed description of TF identification and the presence of chromatin marks. In brief, for TF analysis, TCGA level three data matched to both DNA methylation and expression were downloaded directly from the GDC Data Portal (website) by using the ELMER program (33). Limma package (79) was then used to identify differentially methylated probes associated with the various cCRE elements in the promoter and enhancer regions. Differentially hypermethylated cCRE sites were defined as sites that are unmethylated in normal samples, satisfying mean β-value (µβnormal) <= 0.2 and µβcancer-µβnormal >= 0.2, with FDR <= 0.05. Differentially hypomethylated cCRE sites were defined as sites that are methylated in normal samples, satisfying µβnormal >= 0.6 and µβcancer-µβnormal <= −0.2 with FDR <= 0.05. We used the TCGA colon tumor data for TF analyses as these samples represent the comprehensive spectrum of CIMP phenotypes while the polyp samples with expression and DNA methylation data mainly consisted of tubular adenomas. The ELMER algorithm was used to identify TFs whose expression is correlated to the methylation of their binding motifs. In order to obtain various chromatin states of interest, we used ChromHMM emission states derived from ChIP-seq of H3K4me3, H3K27ac, H3K4me1, H3K27me3, and H3K9me3 on normal human colon mucosa from the ENCODE project. Accordingly, Illumina methylation array probes overlapping following ChromHMM states summarized into the various activity states of the promoter, enhancer, and constitutive heterochromatin elements. All annotations were performed using hg19 genome version.
K-means Clustering and Survival Analysis.
TCGA CRC data and GSE39582 (colon cancers, Affymetrix expression dataset downloaded from Synapse, Synapse ID syn2623706) datasets were normalized together using “ComBat” algorithm (77) for batch effect removal (SI Appendix, Fig. S6E). After filtering for samples with known CMS subtypes, there were 208 TCGA CRC samples and 516 GSE39582 samples for analysis. We used CIMP-specific cCRE-associated TF (N = 476) and performed an unsupervised clustering method called K-means clustering to categorize the samples into specific classes. The choice of the optimum number of K-means clusters (N = 3) was guided by using the elbow method and gap statistics.
Survival analyses were carried out on the combined CRC dataset (TCGA and GSE39582) by using the Kaplan–Meier method. In order to identify the low and high expression groups associated with the CIMP-H specific hypermethylated cCRE-associated TF (N = 145), the gene set variation analysis (GSVA) method was used (80). The GSVA method assigns an expression score to each sample by comparing the expression of genes inside and outside of the given gene set. The samples were grouped into high and low expression classes based on the median cutoff applied to the expression scores. Multivariate Cox regression analysis was used to identify the hazard ratio and adjust for MSI, age, gender, and cancer stage.
Data, Materials, and Software Availability
Methylation and expression datasets for the Janssen polyps are accessible through GEO Series accession number GSE233616 (81). The source codes for generating the transcription factor lists and carrying out K-means clustering are available at https://github.com/Baylin-Easwaran-Labs/TF_analysis (82). Anonymized DNA Methylation, Gene expression data have been deposited in GEO TBD. All study data are included in the article and/or supporting information.
Acknowledgments
This research was supported by NIH, United States: R01CA230995 (H.E.) and RO1CA229240 (S.B.B. and H.E.); National Institute of Environmental Health Sciences, United States: R01ES011858 (S.B.B); National Cancer Institute, United States: R21CA212495 (H.E.); Sam Waxman Research Foundation and National Institute on Aging, United States: U01AG066101 (S.B.B. and H.E.); Janssen Initiative; Commonwealth Grant (H.E.); and Grollman Glick Scholarship (H.E.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We acknowledge Ben Berman for constructive comments on the manuscript. We acknowledge Ludmila Danilova, Alexander Favorov, Leslie Cope, and Elana Fertig for helpful discussion on statistical analysis. We acknowledge the administrative support from JoAnn Murphy during the manuscript submission. Competitive grant to Dr. H.E., Dr. S.B.B. awarded from Janssen Pharmaceuticals.
Author contributions
Y.R.B., J.R., L.V.W., K.E.B., and H.E. designed research; Y.R.B., J.R., L.V.W., K.E.B., S.B.B., and H.E. performed research; Y.R.B., V.K., S.-J.T., K.E.B., S.B.B., and H.E. analyzed data; Y.R.B., S.-J.T., and H.E. prepared figures; and Y.R.B., V.K., R.P., S.P., E.G., O.K., P.G., S.B.B., and H.E. wrote the paper.
Competing interests
V.K., P.G., J.R., L.V.W., and K.E.B. are employees of Janssen Research and Development Inc, and hold stock and stock options in this company. Methylation-specific PCR (MSP): S.B.B. consults for MDxHealth. MSP is licensed to MDxHealth in agreement with Johns Hopkins University (JHU). S.B.B. and JHU are entitled to royalty shares received from sales. Multiple co-authors are employed by Janssen Pharmaceuticals.
Supporting Information
Appendix 01 (PDF)
- Download
- 2.19 MB
Dataset S01 (XLSX)
- Download
- 17.82 KB
Dataset S02 (XLSX)
- Download
- 25.21 KB
References
1
TCGA, Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
2
J. Guinney et al., The consensus molecular subtypes of colorectal cancer. Nat. Med. 21, 1350–1356 (2015).
3
M. Toyota et al., CpG island methylator phenotype in colorectal cancer. Proc. Natl. Acad. Sci. U.S.A. 96, 8681–8696 (1999).
4
J.-P.J. Issa, L. Shen, M. Toyota, CIMP, at last. Gastroenterology 129, 1121–1124 (2005).
5
J. P. Issa, CpG island methylator phenotype in cancer. Nat. Rev. Cancer 4, 988–993 (2004).
6
M. P. Hanley et al., Genome-wide DNA methylation profiling reveals cancer-associated changes within early colonic neoplasia. Oncogene 36, 5035–5044 (2017).
7
C. Liu et al., DNA methylation changes that precede onset of dysplasia in advanced sessile serrated adenomas. Clin. Epigenetics 11, 90 (2019).
8
S. Yang, F. A. Farraye, C. Mack, O. Posnik, M. J. O’Brien, BRAF and KRAS mutations in hyperplastic polyps and serrated adenomas of the colorectum: Relationship to histology and CpG island methylation status. Am. J. Surg. Pathol. 28, 1452–1459 (2004).
9
W. C. Fernando et al., The CIMP phenotype in BRAF mutant serrated polyps from a prospective colonoscopy patient cohort. Gastroenterol. Res. Pract. 2014, 374926 (2014).
10
P. Minoo et al., Extensive DNA methylation in normal colorectal mucosa in hyperplastic polyposis. Gut 55, 1467–1474 (2006).
11
Y. Luo et al., Differences in DNA methylation signatures reveal multiple pathways of progression from adenoma to colorectal cancer. Gastroenterology 147, 418–429.e8 (2014).
12
S. Dehghanizadeh et al., Active BRAF-V600E is the key player in generation of a sessile serrated polyp-specific DNA methylation profile. PLoS One 13, e0192499 (2018).
13
D. J. Weisenberger et al., CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat. Genet. 38, 787–793 (2006).
14
T. Hinoue et al., Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Res. 22, 271–282 (2012), https://doi.org/10.1101/gr.117523.110.
15
T. Hinoue et al., Analysis of the association between CIMP and BRAF in colorectal cancer by DNA methylation profiling. PLoS One 4, e8357 (2009).
16
P. W. Ang et al., Comprehensive profiling of DNA methylation in colorectal cancer reveals subgroups with distinct clinicopathological and molecular features. BMC Cancer 10, 227 (2010).
17
C. Gallois et al., Prognostic value of methylator phenotype in stage III colon cancer treated with oxaliplatin-based adjuvant chemotherapy. Clin. Cancer Res. 24, 4745–4753 (2018).
18
A. I. Phipps et al., Association between molecular subtypes of colorectal cancer and patient survival. Gastroenterology 148, 77–87.e2 (2015).
19
J. M. Bae et al., Distinct clinical outcomes of two CIMP-positive colorectal cancer subtypes based on a revised CIMP classification system. Br J. Cancer 116, 1012–1020 (2017).
20
H. Suzuki, E. Yamamoto, R. Maruyama, T. Niinuma, M. Kai, Biological significance of the CpG island methylator phenotype. Biochem. Biophys. Res. Commun. 455, 35–42 (2014).
21
A. I. Phipps et al., Association between molecular subtypes of colorectal tumors and patient survival, based on pooled analysis of 7 international studies. Gastroenterology 158, 2158–2168.e4 (2020).
22
D. T. Le et al., PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med. 372, 2509–2520 (2015).
23
D. T. Le et al., Phase II open-label study of pembrolizumab in treatment-refractory, microsatellite instability-high/mismatch repair-deficient metastatic colorectal cancer: KEYNOTE-164. J. Clin. Oncol. 38, 11–19 (2020).
24
M. J. Overman et al., Nivolumab in patients with metastatic DNA mismatch repair-deficient or microsatellite instability-high colorectal cancer (CheckMate 142): An open-label, multicentre, phase 2 study. Lancet Oncol. 18, 1182–1191 (2017).
25
T. André et al., Pembrolizumab in microsatellite-instability-high advanced colorectal cancer. N. Engl. J. Med. 383, 2207–2218 (2020).
26
E. Fessler, J. P. Medema, Colorectal cancer subtypes: Developmental origin and microenvironmental regulation. Trends Cancer 2, 505–518 (2016).
27
W. Wang et al., Molecular subtyping of colorectal cancer: Recent progress, new challenges and emerging opportunities. Semin. Cancer Biol. 55, 37–52 (2019).
28
M. A. Komor et al., Consensus molecular subtype classification of colorectal adenomas. J. Pathol. 246, 266–276 (2018).
29
A. Panigrahi, B. W. O’Malley, Mechanisms of enhancer action: The known and the unknown. Genome Biol. 22, 108 (2021).
30
P. Cramer, Organization and regulation of gene transcription. Nature 573, 45–54 (2019).
31
J. S. K. Bell, P. M. Vertino, Orphan CpG islands define a novel class of highly active enhancers. Epigenetics 12, 449–464 (2017).
32
L. Burger, D. Gaidatzis, D. Schübeler, M. B. Stadler, Identification of active regulatory regions from DNA methylation data. Nucleic Acids Res. 41, e155 (2013).
33
L. Yao, H. Shen, P. W. Laird, P. J. Farnham, B. P. Berman, Inferring regulatory element landscapes and transcription factor networks from cancer methylomes. Genome Biol. 16, 105 (2015).
34
S. Kim, H. J. Park, X. Cui, D. Zhi, Collective effects of long-range DNA methylations predict gene expressions and estimate phenotypes in cancer. Sci. Rep. 10, 3920 (2020).
35
A. Kel et al., Walking pathways with positive feedback loops reveal DNA methylation biomarkers of colorectal cancer. BMC Bioinformatics 20, 119 (2019).
36
F. Bormann et al., Cell-of-origin DNA methylation signatures are maintained during colorectal carcinogenesis. Cell Rep. 23, 3407–3418 (2018).
37
D. C. Koestler et al., Distinct patterns of DNA methylation in conventional adenomas involving the right and left colon. Mod. Pathol. 27, 145–155 (2014).
38
ENCODE Project Consortium et al., Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
39
M. Bettington et al., Clinicopathological and molecular features of sessile serrated adenomas with dysplasia or carcinoma. Gut 66, 97–106 (2017).
40
M. R. Corces et al., The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).
41
J. E. Ohm et al., A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat. Genet. 39, 237–242 (2007).
42
M. Widschwendter et al., Epigenetic stem cell signature in cancer. Nat. Genet 39, 157–168 (2007).
43
Y. Schlesinger et al., Polycomb-mediated methylation on Lys27 of histone H3 pre-marks genes for de novo methylation in cancer. Nat. Genet. 39, 232–236 (2007).
44
H. Easwaran et al., A DNA hypermethylation module for the stem/progenitor cell signature of cancer. Genome Res. 22, 837–849 (2012).
45
J. Ernst, M. Kellis, ChromHMM: Automating chromatin-state discovery and characterization. Nat. Methods 9, 215–226 (2012).
46
M. J. Bonder et al., Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017).
47
F. Lienert et al., Identification of genetic elements that autonomously determine DNA methylation states. Nat. Genet. 43, 1091–1097 (2011).
48
T. Niinuma et al., UHRF1 depletion and HDAC inhibition reactivate epigenetically silenced genes in colorectal cancer cells. Clin. Epigenetics 11, 70 (2019).
49
E. Becht et al., Immune and stromal classification of colorectal cancer is associated with molecular subtypes and relevant for precision immunotherapy. Clin. Cancer Res. 22, 4057–4066 (2016).
50
H.-O. Lee et al., Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat. Genet. 52, 594–603 (2020).
51
D. Sinner et al., Sox17 and Sox4 differentially regulate beta-catenin/T-cell factor activity and proliferation of colon carcinoma cells. Mol. Cell Biol. 27, 7802–7815 (2007).
52
W. Zhang et al., Epigenetic inactivation of the canonical Wnt antagonist SRY-box containing gene 17 in colorectal cancer. Cancer Res. 68, 2764–2772 (2008).
53
L. Feng, Y. Xie, Z. Zhao, W. Lian, LMX1A inhibits metastasis of gastric cancer cells through negative regulation of β-catenin. Cell Biol. Toxicol. 32, 133–139 (2016).
54
H.-H. Chung et al., A novel prognostic dna methylation panel for colorectal cancer. Int. J. Mol. Sci. 20, 4672 (2019).
55
J. Schuijers et al., Transcriptional dysregulation of MYC reveals common enhancer-docking mechanism. Cell Rep. 23, 349–360 (2018).
56
K. Dave et al., Mice deficient of Myc super-enhancer region reveal differential control mechanism between normal and pathological growth. Elife 6, e23382 (2017).
57
M. Yasunaga, Y. Matsumura, Role of SLC6A6 in promoting the survival and multidrug resistance of colorectal cancer. Sci. Rep. 4, 4852 (2014).
58
Y. Tao et al., Aging-like spontaneous epigenetic silencing facilitates wnt activation, stemness, and braf(V600E)-induced tumorigenesis. Cancer Cell 35, 315–328.e6 (2019).
59
E. M. F. De Sousa et al., Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions. Nat. Med. 19, 614–618 (2013).
60
K. J. Kang et al., The role of the CpG island methylator phenotype on survival outcome in colon cancer. Gut Liver 9, 202–207 (2015).
61
F. A. Sinicrope et al., Molecular markers identify subtypes of stage III colon cancer associated with patient outcomes. Gastroenterology 148, 88–99 (2015).
62
D. J. Sargent et al., Defective mismatch repair as a predictive marker for lack of efficacy of fluorouracil-based adjuvant therapy in colon cancer. J. Clin. Oncol. 28, 3219–3226 (2010).
63
C. M. Ribic et al., Tumor microsatellite-instability status as a predictor of benefit from fluorouracil-based adjuvant chemotherapy for colon cancer. N. Engl. J. Med. 349, 247–257 (2003).
64
S. B. Baylin, P. A. Jones, Epigenetic determinants of cancer. Cold Spring Harb. Perspect. Biol. 8, a019505 (2016).
65
S. B. Baylin, P. A. Jones, A decade of exploring the cancer epigenome - biological and translational implications. Nat. Rev. Cancer 11, 726–734 (2011).
66
M. B. Stadler et al., DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495 (2011).
67
T. Baubec, D. Schubeler, Genomic patterns and context specific interpretation of DNA methylation. Curr. Opin. Genet Dev. 25, 85–92 (2014).
68
C. Gebhard et al., General transcription factor binding at CpG islands in normal cells correlates with resistance to de novo DNA methylation in cancer cells. Cancer Res. 70, 1398–1407 (2010).
69
H. Easwaran, S. B. Baylin, "Origin and mechanisms of DNA methylation dynamics in cancers" in The DNA, RNA, S. Histone Methylomes, J. Barciszewski. Jurga, Eds. (Springer International Publishing, 2019), pp. 27–52.
70
J. Charlet et al., Bivalent regions of cytosine methylation and H3K27 acetylation suggest an active role for DNA methylation at enhancers. Mol. Cell 62, 422–431 (2016).
71
W. A. Flavahan, E. Gaskell, B. E. Bernstein, Epigenetic plasticity and the hallmarks of cancer. Science 357, eaal2380 (2017).
72
D. Aran, S. Sabato, A. Hellman, DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 14, R21 (2013).
73
K. D. Rasmussen, K. Helin, Role of TET enzymes in DNA methylation, development, and cancer. Genes Dev. 30, 733–750 (2016).
74
A. Athar et al., ArrayExpress update - from bulk to single-cell expression data. Nucleic Acids Res. 47, D711–D715 (2019).
75
M. J. Aryee et al., Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).
76
L. Gautier, L. Cope, B. M. Bolstad, R. A. Irizarry, affy–Analysis of affymetrix genechip data at the probe level. Bioinformatics 20, 307–315 (2004).
77
J. T. Leek, W. E. Johnson, H. S. Parker, A. E. Jaffe, J. D. Storey, The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
78
B. Chen, M. S. Khodadoust, C. L. Liu, A. M. Newman, A. A. Alizadeh, Profiling tumor infiltrating immune cells with CIBERSORT. Methods Mol. Biol. 1711, 243–259 (2018).
79
M. E. Ritchie et al., limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
80
S. Hänzelmann, R. Castelo, J. Guinney, GSVA: Gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).
81
Y. R. Bhandari, et al., Transcription factor expression repertoire basis for epigenetic and transcriptional subtypes of colorectal cancers. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi. Deposited 26 May 2023.
82
Y. R. Bhandari, S. B. Baylin, H. Easwaran, Transcription factor analysis. Github. https://github.com/Baylin-Easwaran-Labs/TF_analysis. Accessed 3 May 2023.
Information & Authors
Information
Published in
Classifications
Copyright
Copyright © 2023 the Author(s). Published by PNAS. This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).
Data, Materials, and Software Availability
Methylation and expression datasets for the Janssen polyps are accessible through GEO Series accession number GSE233616 (81). The source codes for generating the transcription factor lists and carrying out K-means clustering are available at https://github.com/Baylin-Easwaran-Labs/TF_analysis (82). Anonymized DNA Methylation, Gene expression data have been deposited in GEO TBD. All study data are included in the article and/or supporting information.
Submission history
Received: February 3, 2023
Accepted: June 15, 2023
Published online: July 24, 2023
Published in issue: August 1, 2023
Keywords
Acknowledgments
This research was supported by NIH, United States: R01CA230995 (H.E.) and RO1CA229240 (S.B.B. and H.E.); National Institute of Environmental Health Sciences, United States: R01ES011858 (S.B.B); National Cancer Institute, United States: R21CA212495 (H.E.); Sam Waxman Research Foundation and National Institute on Aging, United States: U01AG066101 (S.B.B. and H.E.); Janssen Initiative; Commonwealth Grant (H.E.); and Grollman Glick Scholarship (H.E.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We acknowledge Ben Berman for constructive comments on the manuscript. We acknowledge Ludmila Danilova, Alexander Favorov, Leslie Cope, and Elana Fertig for helpful discussion on statistical analysis. We acknowledge the administrative support from JoAnn Murphy during the manuscript submission. Competitive grant to Dr. H.E., Dr. S.B.B. awarded from Janssen Pharmaceuticals.
Author contributions
Y.R.B., J.R., L.V.W., K.E.B., and H.E. designed research; Y.R.B., J.R., L.V.W., K.E.B., S.B.B., and H.E. performed research; Y.R.B., V.K., S.-J.T., K.E.B., S.B.B., and H.E. analyzed data; Y.R.B., S.-J.T., and H.E. prepared figures; and Y.R.B., V.K., R.P., S.P., E.G., O.K., P.G., S.B.B., and H.E. wrote the paper.
Competing interests
V.K., P.G., J.R., L.V.W., and K.E.B. are employees of Janssen Research and Development Inc, and hold stock and stock options in this company. Methylation-specific PCR (MSP): S.B.B. consults for MDxHealth. MSP is licensed to MDxHealth in agreement with Johns Hopkins University (JHU). S.B.B. and JHU are entitled to royalty shares received from sales. Multiple co-authors are employed by Janssen Pharmaceuticals.
Notes
Reviewers: T.A.C., Cleveland Clinic; and J.F.C., University of California San Francisco.
Authors
Metrics & Citations
Metrics
Altmetrics
Citations
Cite this article
Transcription factor expression repertoire basis for epigenetic and transcriptional subtypes of colorectal cancers, Proc. Natl. Acad. Sci. U.S.A.
120 (31) e2301536120,
https://doi.org/10.1073/pnas.2301536120
(2023).
Copied!
Copying failed.
Export the article citation data by selecting a format from the list below and clicking Export.
Cited by
Loading...
View Options
View options
PDF format
Download this article as a PDF file
DOWNLOAD PDFLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Personal login Institutional LoginRecommend to a librarian
Recommend PNAS to a LibrarianPurchase options
Purchase this article to access the full text.