G-quadruplex DNA structure is a positive regulator of MYC transcription

Significance DNA G-quadruplexes (G4s) are four-stranded DNA structures enriched in regulatory regions of the human genome; however, their functional role in transcription remains an incompletely answered question. Using CRISPR genome editing, genomic approaches for chromatin profiling, and biophysical assays, we demonstrate that a G4 structure folds endogenously within the upstream promoter region of the critical MYC oncogene to positively regulate transcription. Key transcription factors and chromatin proteins bind to the MYC promoter via preferential interaction with a G4 DNA structure, rather than with the duplex primary sequence. Overall, this study demonstrates how G4 structures, rather than DNA sequence, alter the local chromatin landscape and nucleosome occupancy to positively promote transcription.


Supporting text
We investigated the minimum number of mutations required to abolish G4 formation at the endogenous genetic context of MYC in HEK293T cells.It was essential to consider Pu27 (27 bp, five G-runs) within the context of an extended 48 bp region (i.e.MYC WT, eight G-runs), as there are flanking G-runs that can contribute to G4 folding when central G-runs are mutated (1)(2)(3).
Using circular dichroism (CD) spectroscopy, we investigated oligonucleotides (fig.S1A, table S8) consisting of single point mutations within the 27 bp or 48 bp context.Mutating a single G at a time in each of the eight G-runs, resulted in CD spectra with maxima at ~260 nm and minima at ~240 nm, characteristic of G4 structure formation (Permutations 1-13) (4,5).We then added additional mutations to each G-run starting from the central ones (Permutations 14-20) to establish the threshold of mutations that abrogate G4 structure formation.We found that mutations in each of the eight G-runs within the 48 bp context were needed to completely abolish G4 formation as judged by CD (fig.S1B, permutation 19).We designated permutation 19 as minimally mutated MYC (MUT MIN).Mutations to each of the five central G-runs within the Pu27 core (permutation 18) within the 48 bp context were designated MUT CORE and found to retain canonical G4 spectrum features indicating residual G4 forming potential (fig.S1B).We further explored G4 folding under 10 mM K + or Li + conditions.G4 oligonucleotides generally show a higher molar ellipticity in K + compared to Li + (6).We observed no difference in K + over Li + preference between MYC MUT and MUT MIN suggesting lack of G4 folding.However, MYC WT and MUT CORE showed a preference for K + conditions, indicative of G4 folding (fig.S1B).
To confirm that the flanking regions in the MUT CORE oligonucleotide were contributing towards G4 formation, we performed CD in a short 27 bp version of MUT CORE at 10 mM and 100 mM K + and observed a profile characteristic of a non-G4 structure (fig.S1C).Overall, this shows the involvement of the three flanking G-runs towards G4 folding in vitro.To measure structural transitions, we deployed UV thermal melting spectroscopy (7).As the 48 bp MYC G4 sequence construct had not been previously characterised, we performed experiments at nearphysiological conditions (100 mM K + ) and titrated the K + concentration (10, 20, 50, 100 mM) to determine that 10 mM K + was optimal to capture structural transitions for our constructs in thermal melting measurements (fig.S2A).We measured the UV spectra for MYC WT, MYC MUT, MUT MIN and MUT CORE oligonucleotides at 20°C and 90°C at 10 mM and 100 mM K + .The thermal difference giving the greatest fold-change between folded and unfolded states was calculated to be ~300 nm (fig.S2B, fig.S2C).Thermal melting measurements were thus taken at this wavelength.MYC MUT and MUT MIN did not display a melting transition (fig.S2D).MUT CORE showed a clear structural transition at ~45°C in 100 mM K + , whereas MYC WT displayed a transition consistent with greater structural stability (100 mM K + at ~60°C) (fig.S2D).

Fig. S1. Circular dichroism MYC G4 guanine-contribution mutational study
(A) Circular dichroism (CD) for oligonucleotides with different mutations to identify G contributions to G4 folding within the MYC G4 27 bp and 48 bp sequence context.A minimum of nine mutations are required for G4 spectral signature loss (Permutation 19).Six G mutations at the core Pu27 sequence retain some G4 forming potential for the 48 bp sequence (Permutation 18).(B) Cation-dependency study to interrogate G4 forming in vitro.MYC G4 shows a clear increase in molar ellipticity in the presence of K + compared to Li + .MYC MUT and MUT MIN show no differences between K + and Li + conditions.MUT CORE shows a degree of K + preference.(C) CD spectra of MYC G4 and MUT CORE short oligonucleotides (27 bp) in the presence of 10 mM and 100 mM K + .The G4 spectral signature is lost for the MUT CORE at both K + concentrations, and was not lost in the 48 bp context.This result suggests the flanking regions contribute to G4 formation.All measurements were taken in 20 mM lithium cacodylate buffer as previously described (7).

Fig. S2. UV biophysical characterization of the G4 structural perturbations
(A) UV thermal melting measurements for the short (27 bp) and long (48 bp) MYC G4 sequence context (27 bp, 48 bp) at different K + concentrations.Measurements were taken at 300 and 305 nm.(B) UV spectrum at 20°C and 90°C for MYC G4 (WT), MYC MUT, MUT MIN, MUT CORE at 10 mM and 100 mM K + .(C) Calculated thermal difference spectra for each oligonucleotide at 10 mM and 100 mM K + .MYC G4 shows a minimum at ~300 nm.(D) UV thermal melting measurements for the short and long oligonucleotides at 10 mM and 100 mM K + .48 bp MUT CORE shows melting at ~45 °C and 48 bp MYC G4 at ~69°C.No melting is observed with MYC MUT or MUT MIN.All measurements were taken in 20 mM lithium cacodylate buffer as previously described (7).

Fig. S4. Genotyping of the generated cell lines for the study
Sanger sequencing chromatograms for an amplicon spanning the edited region (black boxes) and flanking sequences in wild type and edited cell lines.The chromatogram confirms homozygosity of the targeted region in the MYC locus for both wild type and edited HEK293T cell lines.Guanine runs (GGG/GGGG) for the wild type MYC G4 are underlined in blue (top chromatogram).Point mutations in MYC MUT cells underlined in red.Guanine runs (GGG/GGGGG) for the KRAS SWP are underlined in purple.Sequence for the MYC Flip cell line is highlighted in light blue (bottom track).Guanine (G), Cytosine (C), Thymine (T) and Adenine (A) bases are shown in black, blue, red and green respectively.

Fig. S5 Western blot analysis of MYC protein levels
Western blot showing a reduction in MYC protein level in MYC MUT cells compared to MYC WT and KRAS SWP (Left).The drop in MYC protein intensity (~60% of WT, *: p ≤ 0.05, ns: not significant, n = 3, Right) was estimated as the area-under-the-curve relative to b-ACTIN and Pvalues were calculated using the Wilcoxon rank-sum exact test.

Fig. S3 .
Fig. S3.Assessment of the degree of similarity between DNA sequences Distance matrixes showing how dissimilar two sequences are when imposing numerical values of matches as penalties for gaps and mismatches based on Needleman-Wunsch algorithm (match = +10, mismatch = -5, gap = -7).This shows that MYC WT and KRAS SWP are dissimilar (NW score = -16) while MYC WT and MYC mutants are similar (NW score = 9).

Fig. S7 .
Fig. S7.Ontology analysis for the G4 edited generated cell lines Pathway enrichment analysis illustrating upregulated and downregulated pathways in the absence of the MYC promoter G4 (MYC MUT).Upregulated pathways include signaling pathways and pro-apoptotic programs.Downregulated pathways include are MYC targets, mRNA splicing and translation.The score is calculated using gene set enrichment analysis (GSEA) as normalized enrichment score (NES).

Fig. S13 .
Fig. S12.Motif analysis for G4 binders Motif analysis derived from the generated CUT&Tag data showing the binding sequences for SP1 and CNBP.The preferential binding motifs include G-rich sequences which can fold into G4s.

Fig. S14 .
Fig. S14.Nucleosome positioning by G4 structures (A) MNase-seq genome tracks showing nucleosome positioning at the G4 edited site across different MNase digestion time points (20 min, 40 min).(B) 2% Agarose gel with the MNasedigested genomes of MYC WT and MYC MUT cells after 5, 10, 20 and 40 min of digestion at 37 o C. The bottom bands correspond to the mono-and di-nucleosomal fragments.

Fig. S15 .
Fig. S15.Heatmaps of the binding of histone modifiers and histone methylation distribution in respect to G4 sites Heatmaps showing genome-wide binding of histone methyltransferase MLL1 and activating H3K4me1 and H3K4me3 marks to G4 sites.A repressive mark (H3K27me3) shows no G4 overlap.Tracks show normalized coverage values.Profiles are centered at G4s and cover +/-2Kb.
Fig. S16.MLL4 protein affinity enrichment by G4 folded oligonucleotides and controlsAffinity enrichment and western blot analysis for MLL4 protein for double strand (ds) and single strand (ss) MYC G4, ss/ds MYC MUT, ss/ds and KRAS G4.

Fig. S19 .Fig. S20 .
Fig. S19.Heatmaps of the RNAPIIS5P and BG4 signals upon a triptolide treatment Heat maps showing genome-wide binding of RNAPII and BG4 at transcription start sites (TSS).The tracks show normalized coverage values.RNAPIIS5P signal is lost over time upon treatment.Profiles are centered at G4s and cover +/-2Kb.

Fig. S23 .
Fig. S23.Peak overlap across biological samples for the MYC interactomeUpSet plots and pairwise diagrams illustrating the overlap across three biological samples for CUT&Tag of CNBP, SP1, MLL1, MLL4, H3K4me1 and H3K4me3.The most abundant subset is the one where the peak is present in three out of the three family replicates, indicating high technical reproducibility across samples.

Table S15 . DEseq2 analysis on RNA-seq Comparison
of gene expression (fold changes) between MYC MUT and KRAS SWP compared to control revealed by DEseq2 analysis on the RNA-seq dataset.