Functional and mechanistic studies of XPC DNA-repair complex as transcriptional coactivator in embryonic stem cells

Significance Because of their unique ability to self-renew and generate all cell lineages of an organism (pluripotency), embryonic stem cells represent a versatile model for developmental biology, and a promising avenue for regenerative medicine. Understanding the molecular mechanisms regulating self-renewal and pluripotency provides a productive path to effectively use embryonic stem cells, and to improve current methods for induction/differentiation of pluripotent stem cells and direct somatic cell reprogramming. This study provides novel insights into transcriptional regulation of the stem cell state by characterizing interactions between key transcription factors SOX2 and OCT4, and a recently identified, multifunctional, stem cell coactivator—the xeroderma pigmentosum, complementation group C DNA repair complex—to control pluripotency gene-expression networks. The embryonic stem cell (ESC) state is transcriptionally controlled by OCT4, SOX2, and NANOG with cofactors, chromatin regulators, noncoding RNAs, and other effectors of signaling pathways. Uncovering components of these regulatory circuits and their interplay provides the knowledge base to deploy ESCs and induced pluripotent stem cells. We recently identified the DNA-repair complex xeroderma pigmentosum C (XPC)-RAD23B-CETN2 as a stem cell coactivator (SCC) required for OCT4/SOX2 transcriptional activation. Here we investigate the role of SCC genome-wide in murine ESCs by mapping regions bound by RAD23B and analyzing transcriptional profiles of SCC-depleted ESCs. We establish OCT4 and SOX2 as the primary transcription factors recruiting SCC to regulatory regions of pluripotency genes and identify the XPC subunit as essential for interaction with the two proteins. The present study reveals new mechanistic and functional aspects of SCC transcriptional activity, and thus underscores the diversified functions of this regulatory complex.

The embryonic stem cell (ESC) state is transcriptionally controlled by OCT4, SOX2, and NANOG with cofactors, chromatin regulators, noncoding RNAs, and other effectors of signaling pathways. Uncovering components of these regulatory circuits and their interplay provides the knowledge base to deploy ESCs and induced pluripotent stem cells. We recently identified the DNA-repair complex xeroderma pigmentosum C (XPC)-RAD23B-CETN2 as a stem cell coactivator (SCC) required for OCT4/SOX2 transcriptional activation. Here we investigate the role of SCC genome-wide in murine ESCs by mapping regions bound by RAD23B and analyzing transcriptional profiles of SCC-depleted ESCs. We establish OCT4 and SOX2 as the primary transcription factors recruiting SCC to regulatory regions of pluripotency genes and identify the XPC subunit as essential for interaction with the two proteins. The present study reveals new mechanistic and functional aspects of SCC transcriptional activity, and thus underscores the diversified functions of this regulatory complex. transcription | pluripotency | protein-protein interactions | ChIP-seq | RNA-seq E mbryonic stem cells (ESCs) can be maintained in culture to retain their defining properties of self-renewal (propagation without loss of cellular identity) and pluripotency (ability to generate all embryonic lineages upon appropriate developmental stimuli). In the case of murine ESCs (mESCs), a minimal culture media supplemented with serum and LIF (leukemia inhibitory factor) can perpetuate the pluripotency state (1). Such culture conditions, among others, provide the external cues to counteract differentiation programs hardwired in ESCs [e.g., the autocrine FGF4 signaling (2)], by fueling an intricate network of transcription factors (TFs) at the core of which stands the autoregulated and self-sustained OCT4 (POU class 5 homeobox 1), SOX2 (sex determining region Y-box2), and NANOG (Nanog homeobox) circuit (3,4). These "core" pluripotency factors orchestrate ESC transcriptional programs in conjunction with "ancillary" TFs (e.g., ESRRB, KLF2, KLF4, SALL4, TBX3, TFCP2L1), various cofactors (5), noncoding RNAs (6), histone modifiers, and chromatin remodelers (7), ultimately conveying regulatory inputs to specialized basal transcriptional machineries (8)(9)(10). Identifying components of these complex regulatory circuitries and their interplay has an obvious impact on both developmental biology and regenerative medicine by instructing the efficacious and safe use of ESCs, as well as the current methods for induction and differentiation of pluripotent stem cells and direct somatic cell reprogramming.
Unbiased approaches, such as transcriptional, epigenetic, and proteomic profiling, siRNA screenings, large-scale proteomics, and genome-wide TF occupancy studies (ChIP-seq), have been used by many investigators to uncover new players in the maintenance of pluripotency. Although effective, such experimental strategies often fail to establish a direct role for identified elements in the transcriptional regulation of pluripotency. To circumvent this limitation, we recently established an unbiased in vitro transcription-biochemical complementation assay and used it to search for the minimal components required for OCT4-and SOX2-mediated transcriptional activation in murine and human ESCs. We uncovered three distinct activities, one of which has been identified as the xerodoma pigmentosum, complementation group C (XPC)-RAD23B-CETN2 trimeric complex, referred to as the stem cell coactivator (SCC) in our studies (11,12).
SCC complex was previously well established as a DNA damage sensor in the global genome nucleotide excision repair pathway (NER) (13,14). Within SCC, the XPC subunit scans the genome and recognizes helix-distorting lesions (e.g., UVinduced pyrimidine dimers) through its DNA-binding domain. CETN2 (centrin, EF-hand protein, 2) and RAD23B (RAD23 homolog B) stabilize the XPC protein and potentiate its interaction with damaged DNA (15)(16)(17). XPC also initiates the NER repair cascade through direct recruitment of XPA and general transcription factor IIH (TFIIH), which mediate damage verification and DNA unwinding, respectively (18,19). Subsequent DNA damage excision, DNA synthesis through the gap, and ligation complete the repair process (14).
Concomitantly to our study that identified SCC as an OCT4/ SOX2 ESC coactivator (11), SCC was shown to assemble onto promoters of activated genes in HeLa cells, along with the entire Significance Because of their unique ability to self-renew and generate all cell lineages of an organism (pluripotency), embryonic stem cells represent a versatile model for developmental biology, and a promising avenue for regenerative medicine. Understanding the molecular mechanisms regulating self-renewal and pluripotency provides a productive path to effectively use embryonic stem cells, and to improve current methods for induction/differentiation of pluripotent stem cells and direct somatic cell reprogramming. This study provides novel insights into transcriptional regulation of the stem cell state by characterizing interactions between key transcription factors SOX2 and OCT4, and a recently identified, multifunctional, stem cell coactivator-the xeroderma pigmentosum, complementation group C DNA repair complexto control pluripotency gene-expression networks.
NER machinery, to facilitate DNA demethylation and gene looping (20,21). In ESCs, however, the SCC coactivator activity we uncovered is uncoupled from DNA repair, and no other NER factors emerged from our biochemical assays, suggesting a certain degree of specificity for OCT4/SOX2 transcription. In fact, a preliminary analysis of RAD23B genomic occupancy highlighted a striking overlap with OCT4/SOX2 binding sites, and depletion of SCC by RNA interference showed that it contributes in vivo to maintaining the mESC state and to generating induced pluripotent stem cells from somatic cells (11). Nevertheless, several key issues remained unaddressed in our initial study: is SCC working as a coactivator in ESCs for TFs other than OCT4 and SOX2? What is the transcriptional network orchestrated by SCC in ESCs? Which transcriptional changes occur globally upon SCC depletion? Are these resulting from SCC transcriptional activity or deriving from a DNA damage response? How is SCC recruited to chromatin? What are the biochemical bases for SCC interaction with OCT4, SOX2, and possibly other TFs? To address some of these issues, we delved into a more thorough analysis of SCC chromatin occupancy in ESCs, combined with whole transcriptome profiling of SCC-depleted ESCs. These genome-wide scale studies revealed that SCC is indeed recruited to gene regulatory regions mostly through OCT4 and SOX2. Transcriptional perturbations following SCC depletion resemble those seen upon OCT4 and SOX2 knockdown, rather than reflect a p53-mediated DNA damage response. We also demonstrate that, within SCC, XPC is the major subunit directly interacting with OCT4 and SOX2, and truncation experiments indicate that nonoverlapping XPC domains are responsible for interaction with the core TFs.
At a more fundamental level, our studies also underscore the importance of multifunctional complexes serving as key regulatory molecular machines that cells have evolved to coordinate different biological processes. By investigating mechanistic and functional aspects of the interaction between OCT4, SOX2, and the XPC-RAD23B-CETN2 DNA repair complex, our findings provide new insights into how this multisubunit factor executes and integrates functions beyond DNA repair.

RAD23B Targets Gene Promoters and Distal Regulatory Regions in
mESCs. To more thoroughly investigate how SCC works as a transcriptional coactivator responsive to OCT4/SOX2 and possibly other sequence-specific TFs in mESCs, we performed chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) using murine D3 ESCs. Because none of the commercially available antibodies gave reliable ChIP signals, we raised and affinitypurified polyclonal antibodies against the RAD23B and XPC subunits of SCC. Although antibodies against both proteins were specific and efficient at immunoprecipitating SCC complex ( . We therefore used RAD23B-specific antibodies to immunoprecipitate the holo-SCC complex from D3 cells. Two RAD23B ChIP-seq biological replicates identified 59,768 and 42,364 RAD23B binding sites in the mESC genome, respectively. Of these, 40,671 of the RAD23B binding sites were common between the two independent experiments (R = 0.86), indicative of high experimental reproducibility (Fig. S1 D and E). The top 50%, high-confidence RAD23B binding sites with the strongest enrichment (P < 10 −5 , 29,884 peaks) were selected for further analysis (see Dataset S1 for a full list). To determine where RAD23B binds with respect to genes, for each ChIP-seq peak midpoint we calculated the distance to the transcription start site (TSS) of the closest RefSeq gene. About 18% of RAD23B binding occurs within 200 bp of a TSS, a significant enrichment with respect to the control dataset (preimmune IgGs) (Fig. 1A). This genomewide distribution is consistent with our previous biochemical data suggesting that SCC facilitates transcriptional initiation by OCT4 and SOX2 (11). However, promoters (±500 bp from TSS) only account for one-fourth (24%) of all RAD23B binding sites, whereas proximal (500 bp < TSS < 5 kb) and distal (5 kb < TSS < 50 kb) gene regions contain the majority of RAD23B target sites (Fig. 1B). To probe whether these other RAD23B binding regions are, in fact, distal regulatory elements, we compared them to prototypic enhancer features mapped by others in mESCs such as H3K4me1 and H3K27ac chromatin marks, as well as p300 coactivator binding sites (22). We observed a significant (twofold and higher) overlap between RAD23B proximal and distal binding sites at all of the tested enhancer features, irrespective of whether they mark active (H3K27ac) or poised (H3K4me1, p300) enhancers ( Fig. S2 A and  D). In line with this observation, we found both active and inactive genes targeted by RAD23B (i.e., genes with at least one RAD23B ChIP-seq peak within 5 kb of their TSS) (Fig. 1C).
A gene ontology (GO) analysis of these putative targets revealed an overrepresentation of genes involved in transcriptional regulation among both active and inactive genes ( Fig. 1D; full list in Dataset S1). There was also a specific enrichment of developmentally regulated, Polycomb-bound genes among the inactive target genes ( Fig. 1D and Fig. S2B). Interestingly, RAD23B  Lower) genes with a RAD23B binding site within 5 kb from their TSS compared with all ON/OFF genes in D3 mESCs. In parentheses is the number of target genes in each GO term. Bonferroni P value < 10 −5 for all categories (complete table in Dataset S1; Polycomb binding in Fig. S2B). (E) Overlap between RAD23B binding sites and ChIP-seq peaks of major transcription factors in mESCs (24,70). Plotted is the percentage of RAD23B peaks overlapping with a given TF relative to the overlap in control (IgG) ChIP-seq peaks (dashed line: ratio to IgG = 1, background levels). In blue is the overlap when considering all TF binding sites, in gray is the overlap when considering only those TF binding sites that do not overlap with OCT4/SOX2 peaks. Additional overlap analyses in Fig. S2. Data in A-E are from one of two highly overlapping RAD23B ChIP-seq experiments in D3 mESCs (Fig. S1).
binding sites are enriched for regions co-occupied by cohesin and Mediator, but not CTCF (Fig. S2 C and D), which have been reported to favor enhancer-promoter looping and transcriptional activation in mESCs (23). Taken together, our genome-wide analyses reveal that RAD23B binds to promoter and distal gene regulatory regions of both actively transcribed and silenced genes in mESCs.
RAD23B Extensively Colocalizes with OCT4/SOX2 Binding Sites in mESCs. As part of the DNA-damage recognition and repair machinery, the XPC subunit of SCC has been reported to bind DNA directly (13), although the interaction is neither sequencespecific nor required for SCC transcriptional activity (11). We therefore hypothesized that the observed RAD23B recruitment to regulatory regions was more likely to depend on protein-proteinmediated transactions involving other sequence-specific TFs. Compared with a series of mESC enriched TFs, RAD23B binding sites showed the strongest colocalization with OCT4/SOX2 (>sevenfold enrichment over control), followed by NANOG (>fivefold), and STAT3, ESRRB, TFCP2L1, and KLF4 (two-to threefold) (Fig. 1E, blue bars). Importantly, RAD23B overlap with TFs other than OCT4 and SOX2 dropped to background levels when we subtracted from the analysis peaks located nearby OCT4/SOX2 sites (Fig. 1E, gray bars), suggesting that the observed colocalization between these TFs and RAD23B mostly depended on concurrent OCT4/SOX2 binding. Given the strong correlation observed between OCT4 and SOX2 (O/S) binding and RAD23B enrichment at mESC regulatory regions, we further characterized their interaction with RAD23B. About 60% of the strongest (P < 10 −9 ) RAD23B binding sites do, in fact, overlap with O/S, and for the most part the colocalization occurs away from core promoters (>500 bp from TSS) ( Fig. 2A), in line with preferential binding of O/S at distal regulatory regions (24). This finding still holds true for weaker (10 −9 < P < 10 −5 ) RAD23B binding sites, although the O/S overlap drops to ∼25%, indicating a direct correlation between RAD23B enrichment and O/S colocalization. De novo motif discovery within DNA sequences surrounding RAD23B peaks (±125 bp from peak midpoint) identified two prominent motifs: the top-ranking one was virtually identical to the O/S composite recognition element (P < 10 −10 ) (Fig. 2B); a second motif showed moderate resemblance to both KLF4 (P < 10 −5 ) and SP1 (P < 10 −4 ) binding sites (Fig. S3A). The overlap between RAD23B and O/S binding is even more robust when superimposing their ChIP-seq tracks at specific loci such as Nanog, Pou5f1, Klf4, Sox2, Lefty2, and Stat3 (Fig. 2C, Lower, and Fig. S3), confirming the selective enrichment of RAD23B at pluripotency genes that we previously observed by ChIP-qPCR (Fig. 2C, Upper) (11).
RAD23B Recruitment Follows Activator Binding. If SCC works mostly as a canonical O/S transcription coactivator, one would expect it not to bind DNA directly but rather rely on OCT4 or SOX2 for recruitment onto chromatin. Indeed, our in vitro biochemical studies demonstrated that SCC DNA binding activity is dispensable for transcription (11). To confirm in vivo that SCC is not preloaded onto chromatin through its DNA-binding subunit XPC, but rather recruited to gene regulatory elements by activators, we performed ChIP-quantitative PCR (qPCR) of RAD23B in D3 mESCs at selected loci upon knockdown of activators by RNA interference. For OCT4, we selected two lentiviral shRNA constructs that gave us the strongest OCT4 mRNA and protein depletion with the lowest effect on SOX2 and RAD23B levels at 24, 48, and 72 h postinfection compared with uninfected cells (Fig. 3 A and B). Under these experimental conditions, OCT4 levels remained high enough to prevent overt differentiation of mESCs into trophectoderm, although we did observe a reduction in cell proliferation at the latest time point. As expected, OCT4 binding to DNA dropped almost to background levels 72 h postinfection with shRNA constructs, with different kinetics depending on the investigated genomic locus (Oct4/Pou5f1, Nanog, or Klf4) (Fig. 3C). To ensure that the drop was not a result of intrinsic differences in chromatin properties and processing across time points, we performed ChIP on the same samples with Pol II and TBP antibodies, and checked their enrichment at the promoter of the housekeeping β-actin gene (Actb) (Fig. 3C, Bottom Right). Both Pol II and TBP signals were actually somewhat higher in OCT4-depleted cells than in uninfected ones, at both 48 and 72 h postinfection, thus controlling for any experimental bias. We next proceeded to check for SOX2 and RAD23B binding at various pluripotency loci upon OCT4 knockdown. Consistent with cooperative OCT4/SOX2 binding to their target genes (25), SOX2 recruitment was significantly reduced 72 h after OCT4 knockdown, even though SOX2 protein levels remained unchanged ( Fig. 3 A and C). Interestingly, at an earlier time point (48 h postinfection), SOX2 ChIP signal on Klf4 and Nanog enhancers was equal to uninfected cells, if not higher, indicating that at these loci OCT4 depletion is initially compensated by an increase in SOX2 binding. This result is not unreasonable, given that both OCT4 and SOX2 were shown to independently bind to the O/S composite motif (26) and that singlemolecule imaging indicates that SOX2 engages the target DNA first, followed by OCT4 (27). Most importantly, when we checked SCC chromatin binding in OCT4-depleted cells using RAD23B antibody, we observed that it closely followed SOX2 kinetics at all tested loci, reaching background levels 72 h post OCT4 depletion (Fig. 3C). We also performed complementary experiments knocking down SOX2 and checking OCT4 and RAD23B levels on   chromatin, but failed to obtain conditions where SOX2 levels could be reduced without also affecting RAD23B and OCT4 protein levels. Nonetheless, from the OCT4 knockdown experiments we can conclude that RAD23B requires at least a prebound SOX2 to be recruited to gene regulatory regions.
Impaired Stemness of SCC-Depleted mESCs. We previously reported that depletion of SCC by RNA interference compromises transcription of some pluripotency genes, resulting both in impaired pluripotency of ES cells and defective somatic cell reprogramming (11). To confirm these earlier results obtained by RNA interference and to gain a more comprehensive view of the transcriptional program coregulated by SCC in mESCs, we generated an independent Rad23b knockout mESC line (Rad23b −/− JM8.N4). Next, we depleted Xpc by RNA interference to obtain two Rad23b ablated/Xpc shRNA-depleted cell lines (SCC KD1 and SCC KD2 JM8.N4), and compared genome-wide transcription profiles of these lines to WT mESCs by poly(A)-RNA-seq (see Material and Methods and Fig. S4 for details on the cell line generation). RNA-seq analysis revealed that ∼15% of protein-coding genes in mESCs are either up-or down-regulated (1.5-fold or more) in Rad23b −/− and SCC KD1/KD2 cells compared with WT cells ( Fig. S5A; full gene list in Dataset S2). As expected, Rad23b and both Rad23b and Xpc are among the most dramatically downregulated genes in Rad23b −/− and SCC KD1/KD2 mESCs, respectively (Fig. S5A). To assess the impact of SCC depletion on mESC properties, we selected the genes up-and down-regulated in both SCC KD1 and SCC KD2 mESCs, averaged their expression levels, and compared them to WT cells. We then manually curated a list of genes involved in ESC maintenance ("pluripotency signature") or differentiation ("differentiation signature") and compared their transcript levels in WT and SCC KD mESCs (see Materials and Methods for details). SCC KD cells showed a preferential down-regulation of "pluripotency signature" genes (e.g., Tfcp2l1, Klf4, Esrrb, Nanog, Lefty1, Lefty2) and a concomitant up-regulation of "differentiation signature" genes, including several trophectoderm markers (e.g., Hand1, Cdx2, Wnt7b, Gata3, Bmp4, Ascl2) ( Fig. 4A; RT-qPCR validation in Fig. S5B). This phenotype is consistent with the underlying hypothesis of SCC working as an O/S transcriptional coactivator, because knockdown of either TF results in loss of pluripotency and induction of trophectoderm differentiation (26,(28)(29)(30). Down-regulation of pluripotency genes also translated to a reduction in protein levels ( Fig. 4B and Fig. S5C) and impaired clonogenic ability of SCC KD cells compared with WT cells (Fig. 4C). Interestingly, three of the most down-regulated genes (Tfcp2l1, Klf4, Lefty2) were previously reported as downstream of the LIF/STAT3 pathway (31)(32)(33). A further analysis revealed that Stat3 itself was down-regulated in both Rad23b −/− and SCC KD1 samples, but did not pass the threshold in the SCC KD2 sample, and was thus initially designated as "not changed." Indeed, RT-qPCR confirmed Stat3 down-regulation in Rad23b −/− and both SCC KD samples (Fig. S5B), and protein analysis revealed reduced levels of STAT3 and a consequent defect in STAT3 activation, as measured by decreased phosphorylation of tyrosine 705. These data suggest that SCC KD cells are defective in LIF/STAT3 signaling, possibly because of an altered transcriptional response.
To further validate these results, we also performed an unbiased GO analysis on the list of deregulated transcripts in SCC KD mESCs ( Fig. 4D; full table in Dataset S2). In accordance with the accentuated "differentiation signature" of SCC KD cells, among the up-regulated genes we observed a significant overrepresentation of categories related to tissue development and morphogenesis (placenta, urogenital system, heart, blood vessels, and so forth). The same GO analysis performed on down-regulated genes was less informative, with overrepresentation of gene categories like RNA processing, chromatin organization, and M-phase regulation.
Interestingly, the GO analysis also highlighted an overrepresentation of genes involved in the positive regulation of cell death among the up-regulated transcripts (Fig. 4D). This finding agrees with our observation that SCC KD cells exhibit reduced cell growth (Fig. S5F). Knowing that SCC KD mESCs are defective for DNA repair of UV-induced damage (34), we became concerned that some of the observed phenotypes (reduced pluripotency, increased differentiation, and cell death) could result from a DNA damage response mediated by the tumor protein p53 rather than a direct transcriptional defect. Indeed, several reports suggest that p53 can suppress pluripotency and self-renewal in ESCs and activate differentiation programs (reviewed in ref. 35). To control for potential complicating p53 effects in our analysis, we checked p53 RNA and protein levels, as well as p53 activation and induction of p53-response genes (p21, Mdm2, Gadd45α) in SCC KD mESCs ( Fig.  4B and Fig. S5G). We did not detect any elevated p53 activation in SCC KD cells relative to WT cells, and under normal culture conditions, no p53-mediated DNA damage response was elicited, suggesting that the phenotypes we observed are likely p53-independent.
To identify genes that might be direct SCC transcriptional targets, we correlated transcriptional deregulation in SCC KD mESCs with RAD23B binding by juxtaposing ChIP-seq and RNAseq data. Globally, there is no preferential RAD23B binding within 5 kb of TSSs of genes, either unchanged, down-regulated, or up-regulated upon SCC knockdown (Fig. S5D). Up-regulated genes are actually somewhat underrepresented among RAD23B ChIP targets compared with unchanged or down-regulated genes, suggesting that the up-regulation of differentiation genes is likely an indirect effect of depleting SCC. However, when we repeated the same analysis to include only those genes with both RAD23B and O/S binding within 5 kb of their TSS, there was clear down-regulation   upon SCC knockdown (Fig. 4E), suggesting a direct, positive role for SCC and O/S in the transcriptional regulation of these target genes. This finding prompted us to investigate whether genes affected by SCC KD are also direct targets of O/S. For this purpose, we used a previously described list of OCT4 direct targets in ESCs (36) and checked how many of them are deregulated in our SCC KD cell lines. Again, we observed that genes positively regulated by OCT4, and hence down-regulated upon OCT4 KD, are more likely to be down-regulated than upregulated upon SCC KD. The reverse does not hold true: genes repressed by OCT4 (i.e., up-regulated in OCT4 KD ESCs) are not preferentially up-regulated in SCC KD cells (Fig. S5E), consistent with these genes being perhaps indirect, downstream targets of OCT4 and thus not necessarily SCC-dependent. Collectively, these RNA-seq data confirm that SCC indeed functions as an O/S coactivator in vivo on a genome-wide scale, directly fueling the expression of a subset of pluripotency genes. Loss of SCC partially impairs mESC self-renewal and pluripotency, leading to a derepression of differentiation genes, mostly related to the trophectoderm lineage, via a mechanism that seems to be independent of SCC DNA repair activity.
SCC Interaction with OCT4/SOX2 Requires XPC. Having established OCT4 and SOX2 as the key TFs driving SCC recruitment to specific genes in mESCs, we set out to further characterize the molecular mechanism underpinning the relationship between SCC and its partner TFs. As previously shown (11), mouse SCC coimmunoprecipitates (co-IP) with OCT4 and SOX2 when all four proteins (RAD23B, XPC, OCT4, and SOX2) are ectopically expressed in 293T cells by transient transfection (Fig. 5 A-D, set 1). To assess which protein subunits dictate this interaction, we repeated the co-IP assays excluding one protein at a time in all possible combinations (Fig. 5 A-D, sets 2-5). We did not evaluate the requirement of CETN2 for the interaction, because previous data suggested that this small subunit is dispensable for SCC transcriptional activity (11). Without the large XPC subunit of SCC, RAD23B failed to pull-down either SOX2 or OCT4 (Fig. 5B, compare sets 1-3 to 4). Similarly, in the absence of XPC, neither SOX2 (Fig. 5C, compare sets 1 to 3-4) nor OCT4 (Fig. 5D, compare sets 1-2 to 4) were able to interact with RAD23B. As expected, loss of XPC had no effect on the ability of OCT4 to bind SOX2 (Fig. 5 C and D, sets 1 and 4). On the other hand, excluding RAD23B from the transfection mixture did not significantly alter the interaction of XPC with either SOX2 or OCT4 (Fig. 5A, set 5). We conclude that within holo-SCC, the large XPC subunit is necessary and likely sufficient for interaction with OCT4 and SOX2 (Fig. 5E). This finding agrees well with our previous observation that XPC alone-but not RAD23B-can partly serve to coactivate OCT4 and SOX2 transcription (11).
To confirm in vivo that SCC interaction with OCT4 and SOX2 occurs via XPC, we again mapped RAD23B binding sites by ChIP-seq in XPC-depleted mESCs (Xpc −/− ) and compared them to the sites of RAD23B interaction found in WT cells (see Materials and Methods for details on the cell line generation). ChIP efficiency of RAD23B in Xpc −/− mESCs was overall much lower than in WT cells, revealing a total of only 828 binding sites, about 65% of which (534) corresponding to the RAD23B ChIPseq peaks mapped in WT cells. These 534 peaks were used for further analysis (full list in Dataset S1). About 30% of the RAD23B binding sites retained in Xpc −/− mESCs still colocalize with O/S, but OCT4 and SOX2 signal at these overlapping peaks is much weaker than what was normally found in WT cells (Fig.  5F). In fact, XPC depletion abolishes RAD23B binding at strong, prototypic O/S targets like pluripotency genes (Oct4/Pou5f1, Sox2), whereas it only partially affects RAD23B binding at those genomic regions with little O/S enrichment ( Fig. 5F and Fig.  S6B). Concordantly, de novo motif discovery within DNA sequences surrounding the RAD23B peaks retained in Xpc −/− mESCs identifies only a motif that matches either SP1 (P < 10 −7 ) or KLF4 (P < 0.01) binding sites but no O/S composite recognition elements (Fig. 5G). These observations confirm on a genome-wide M phase (33) chromatin organization (36) histone acetylation (10) proteasomal protein catabolic process (7) RNA processing (

WT SCC KD
All genes UP cell-cell signaling (26) regulation of transcription, DNA dep. (97) embryonic organ development (23) positive regulation of cell death (21) tube development (21) blood vessel morphogenesis (20) heart development (19) urogenital system development (15) placenta development (11)  scale the biochemical evidence that XPC is the main and direct determinant of SCC interactions with OCT4 and SOX2.
OCT4 and SOX2 Independently Interact with XPC. In addition to identifying XPC as the major driver of SCC interaction with O/S, the co-IP experiments above also revealed that SCC can independently bind to either SOX2 or OCT4, because the absence of one has no effect on the pull-down efficiency of the other through either XPC or RAD23B (Fig. 5 A and B, sets 1-3). Searching for domains within XPC that might mediate interactions with these two distinct TFs, we overexpressed either full-length XPC or progressive C-terminal truncations in 293T cells and individually tested their binding to cotransfected SOX2 or OCT4 by co-IP (Fig. 6A). Removal of the C-terminal CETN2/ TFIIH-interaction domain of XPC (17,19) did not impair its interaction with either SOX2 or OCT4 (Fig. 6 B and C, 1-808: "-Benz" panels). This finding was consistent with previous observations that this region is dispensable for SCC transcriptional activity in vitro (11). In contrast, deletion of a region encompassing the DNA binding domain and part of the RAD23B interacting residues (18,19) drastically reduced XPC interaction with SOX2 but not with OCT4 ( Fig. 6 B and C; 1-599, "-Benz" panels). OCT4 binding to XPC was only abolished upon removal of the whole RAD23B interaction domain (Fig. 6 B and C, 1-511, "-Benz" panels). XPC thus independently binds OCT4 and SOX2, whereas its N terminus is dispensable for interaction with both. We note that the first 200 amino acids of XPC, rather than its C terminus, were recently reported to recruit TFIIH onto damaged chromatin (37). To further verify that SCC coactivation of O/S transcription occurs independently of TFIIH recruitment, we performed in vitro transcription assays with recombinant human SCC containing either WT or N-terminal truncated XPC (Δ1-195, hXPCΔN) ( Fig. 6 A and D) (11). ΔN-SCC enhanced OCT4/SOX2dependent activation of the Nanog promoter to levels comparable with using WT-SCC, confirming that SCC transcriptional activity is unlikely related to the ability of XPC to bind TFIIH. For SOX2, we recapitulated the above truncation results using internal XPC fragments and verified that the XPC DNA binding domain is critical for its interaction with SOX2 ( Fig. S7 A and B). We next tested whether nucleic acids play any role in XPC binding to SOX2. Benzonase treatment of cell lysates before co-IP, as well as co-IP assays with the XPC DNA-binding mutant W683S (38), confirmed that XPC binding to nucleic acids is indeed important for its interaction with SOX2 but not OCT4 (Fig. 6 B and C, "+ Benz" panels, and Fig. S7 C and D). Interestingly, the W683S mutation enhanced XPC binding to OCT4 (Fig. S7D); we surmise that this alteration may compensate for the loss of SOX2 interaction and could explain why the XPC W683S DNA-binding mutant retains near normal transcription coactivator activity in vitro (11).
To further confirm that nucleic acids mediate SCC/SOX2 interaction, and to investigate which class might be involved, we carried out reconstitution experiments using an orthologous system in which the human SSC complex was recombinantly expressed in Sf9 insect cells, purified, and mixed with lysates of 293T cells overexpressing human SOX2 (Fig. 6E). Reactions were either left untreated or treated with ethidium bromide, benzonase, or RNase A to assess dependence of SCC/SOX2 interaction on double-stranded DNA, nucleic acids in general, or RNA, respectively. Although benzonase treatment was confirmed to abrogate SOX2 interaction with SCC, ethidium bromide had minimal, if any, effect on this interaction. More interestingly, RNase A treatment completely abolished SOX2 binding to SCC, suggesting that an RNA component might be involved in bridging the two proteins or influencing either the stability or the conformation of one for binding to the other. Finally, we verified that RNA can boost SCC interaction with SOX2 in the absence of any other protein component, performing co-IPs with purified proteins in the presence or absence of total RNA extracted from 293T cells (Fig. 6F). The addition of extra RNA increased SCC pull-down efficiency through SOX2 by a factor of 2to >10-fold, depending on the experiment. Importantly, addition of equal amounts of total RNA extracted from Escherichia coli also enhanced SCC/SOX2 interaction, suggesting that the required RNA species are not mammalian, or even eukaryotic-specific. To exclude that the boost of SCC/SOX2 interaction is simply a result of RNA acting as an ion exchanger, we performed a similar experiment in the presence of increasing amounts of heparin, and observed that this ionic polymer actually hampered, rather than facilitated, SCC binding to SOX2, probably competing away the required RNA. Taken together, these experiments point to a model in which different domains of XPC independently interact with OCT4 and SOX2, and highlight the potential involvement of an RNA scaffold in mediating SCC/SOX2 interactions.

Discussion
In this study we analyzed transcriptional function and mechanism of action of the XPC-RAD23B-CETN2 DNA repair complex, which we recently established as an O/S selective coactivator in embryonic stem cells (SCC) (11).
Our high-throughput genome-wide mapping of RAD23B binding sites in mESCs reveals enrichment at TSSs and enhancers of both active and developmentally poised genes. Such a pattern is more consistent with SCC acting as a TF  rather than as a repair complex randomly scanning the genome. One alternative explanation we considered was the possibility that RAD23B recruitment at regulatory regions might represent a byproduct of XPC interacting with TFIIH (39), the latter being an intrinsic component of transcription initiation platforms assembled at promoters and tissue-specific enhancers (40). Two lines of evidence suggest that TFIIH is not the driver: (i) truncated versions of XPC lacking putative TFIIH interaction domains (19,37) remain competent to interact with O/S ( Fig. 6 and Fig. S7) and stimulate their transcriptional activity (Fig. 6D) (11); (ii) after probing a variety of ESC TFs, RAD23B was found to predominantly colocalize with O/S (Fig. 1E), whereas O/S bipartite motifs are faithfully retrieved from RAD23B peaks (Fig. 2B) and RAD23B binding relies upon O/S (Fig. 3C). These results strongly suggest that SCC is primarily recruited by sequencespecific TFs rather than by components of the basal transcription machinery, such as TFIIH.
Using an inducible knockout strategy, Niwa and coworkers recently reported that XPC may be dispensable for mESC pluripotency and O/S transcription, and that its loss has a modest impact on global gene expression, as measured by microarray analyses (41). This finding is in agreement with previous studies showing that Xpc knockout mice are viable with no overt developmental defects, although they are UV sensitive (42,43). Similarly, our transcriptome analysis of SCC-depleted mESCs cells shows a relatively mild gene deregulation (1.5-to 4-fold).
We caution, however, that SCC is only one of three coactivators required for fully activating O/S transcription in vitro, and its knockdown alone might not be sufficient to severely hamper transcription in vivo (11,12). Nonetheless, we observed a clear trend upon SCC depletion in mESCs, with down-regulation of pluripotency genes and concomitant up-regulation of differentiation markers, accompanied by reduced clonogenic ability ( Fig. 4 and Fig.  S5). It is also worth pointing out that in Ito et al. (41), floxing the Xpc gene reduces its mRNA level by half even before Cre excision; in these Xpc fl/fl ESCs, Nanog transcripts are reduced compared with WT cells as much as we observe upon SCC depletion. These results taken in aggregate suggest that, although SCC might be dispensable for mESC self-renewal and pluripotency because of partial functional redundancy, SCC remains an important player in stabilizing stem cell transcriptional programs.
To confirm the results from our biochemical assays showing that SCC interacts with specific chromatin sites mainly via direct interactions of O/S and XPC, here we performed ChIP-seq of RAD23B in Xpc −/− mESCs. Indeed, in the absence of XPC we observed a reduced colocalization of RAD23B with O/S, and we retrieved no obvious O/S motif from sequences surrounding the remaining RAD23B binding sites. However, to our surprise we found regions that retained RAD23B binding even in the absence of XPC, pointing at an XPC-independent loading of RAD23B onto chromatin. We speculate that this might reflect RAD23B functions beyond NER (i.e., its role as a proteasome shuttling factor) (44). Indeed, several studies have documented a role for the 26S proteasome in transcription, through proteolytic as well as nonproteolytic activities (45). In mESCs, the proteasome safeguards pluripotency by degrading preinitiation complexes preassembled at tissue-specific enhancers to allow transcription at later stages of development (46). Proteasome recruitment was verified at a few permissive loci, and our SCC ChIP-seq reveals that the very same sites are also bound by RAD23B. A transcriptional role for RAD23B has also been suggested in yeast, again through its interaction with the proteasome regulatory subunit (47). Further studies are needed to test the contribution of RAD23B alone to transcriptional regulation, which might partially explain the developmental abnormalities observed in Rad23b −/− mice (48,49).
Among the various mechanistic aspects of SCC and O/S interactions revealed by our studies, the most intriguing one is probably that RNA species might help mediate direct SOX2-XPC interactions. Interestingly, it was reported by others that binding to a noncoding RNA allows SOX2 recruitment to promoter regions of neurogenic TFs (50). It is tempting to speculate that a similar role for RNA exists at pluripotency genes regulated by SOX2 in ESCs. Our preliminary data indicate that the RNA facilitating SOX2-XPC binding is neither ESC-nor eukaryotic-specific, because we observed comparable results when using total RNA extracted from either 293T or bacterial cells. However, heparin failed to boost SCC association with SOX2, indicating that some property of RNA other than its strong charge (e.g., secondary structure) is responsible for the effect. Although our data are very preliminary and more experiments will be needed to fully address this issue, we regard it with particular interest, in light of recent emerging evidence for a more pervasive role of both specific and promiscuous RNA binding in the regulation of Pol II-dependent transcription (51)(52)(53)(54).
In conclusion, consistent with observations that ESCs are exceptionally sensitive to reduced levels of transcriptional cofactors and chromatin regulators (23,(55)(56)(57), herein we provide evidence that the DNA repair trimeric complex XPC-RAD23B-CETN2 participates in the maintenance of mESC identity by working as a transcriptional coactivator targeted by the core TFs O/S. The mechanistic and functional insights presented herein add another level of insight into how multiprotein complexes can accommodate and provide seemingly unrelated and orthogonal functions that stand at the point of convergence between diverse biological processes (i.e., transcription and DNA repair) to generate the coordinated responses required to maintain cellular identity.
Primers. Primers used in this study are listed in Table S1.
ChIP and ChIP-seq Analysis. ChIP and ChIP-seq experiments were performed as previously described (11). ChIP-seq reads from our RAD23B data as well as other publicly available mESC TFs were analyzed with Partek Flow (v1.1.1219) and Genomics Suite (v6.12.0103) softwares (Partek), using mm9 as a reference genome. All RefSeq genes with a RAD23B/SCC binding site within 5 kb from their TSS were considered putative SCC targets. Poly-A RNA-seq performed in D3 mESCs by Liu et al. (8) was used to assess target gene expression levels; RefSeq genes were classified as active when having a reads per kilobase and million mapped reads (RPKM) ≥ 1 and inactive when having an RPKM < 1. For GO analysis we used DAVID 6.7 Functional Annotation Tool (59, 60) (accessed March 2014). Overlaps between RAD23B/SCC and other mESC TFs were calculated with Galaxy (61-63), requiring a minimum 1-bp overlap between the ChIP-seq peaks.
DNA Constructs, Lentiviral Vector Production, and Infection of mESCs. pLKO.1 shRNA lentiviral constructs (Sigma-Aldrich) were the following: TRCN0000009611 (KD1), TRCN0000009612 (KD2), TRCN0000009614 (KD3), and TRCN0000009615 (KD4) targeting Oct4; TRCN0000240683 (KD1) and TRCN0000240685 (KD2) targeting Xpc; TRCN0000127120 targeting Rad23b. As a control, we used a nontargeting pLKO.1 construct (hairpin sequence: 5′-CCGG-CAACAAGATGAAGAGCACCAA-CTCGAG-TTGGTGCTCTTCATCTTG-TTGTTTTT-3′). Lentiviral particles for stable knockdown experiments in D3 mESCs were prepared by transient transfection of 293T cells, collected and concentrated as described (64), and titrated on 293T cells by real-time qPCR following the EPFL transgenic core facility protocol (tcf.epfl.ch/). Next The PG00023_C_2_C03 Rad23b targeting vector was obtained from KOMP. The neomycin resistance was replaced with a hygromycin cassette by assembly PCR. We amplified from the targeting vector a fragment containing part of the LacZ reporter cassette (upstream of the BssHII restriction site) up to the self-cleaving T2A peptide. The fragment was then assembled with a PCR amplicon containing the hygromycin gene from pCoHygro (Life Technologies) preceded by the last 20 nucleotides of the T2A peptide and followed by two stop codons and a BssHI restriction site. The assembled PCR amplicon was then cloned into the original PG00023_C_2_C03 vector digested with BssHI. Before electroporation into mESCs, the hygromycin targeting vector was linearized with SalI and XhoI, and gel-purified from low-melting agarose through phenol-chloroform extraction.
A lentiviral vector coexpressing an improved Cre recombinase (iCre) and an EGFP was generated from the previously described pCCL.GFP/ΔLNGFR bidirectional construct (65), by replacing GFP and ΔLNGFR reporters with iCre and EGFP, respectively. For transient iCre expression in JM8.N4 mESCs, we used a third-generation, integrase-defective lentiviral packaging system (66), and titrated the viral supernatant on 293T cells by flow cytometry. Next, 3 × 10 6 JM8.N4 mESCs were seeded to 10-cm plates 24 h before infection and transduced overnight at multiplicity of infection of 10 in the presence of 4 μg/mL polybrene. Forty-eight hours posttransduction GFP-expressing cells were sorted by FACS and expanded for further analyses.
For 293T pull-down experiments, cDNAs for mouse Pou5f1/Oct4 (NM_013633.3), Sox2 (NM_011443.3), and Xpc (NM_009531.2) genes were cloned into the pFLAG-CMV-5a mammalian expression vector, and Rad23b (NM_009011.4) cDNA was cloned into the p3XFLAG-CMV-10 plasmid, both from Sigma-Aldrich. To overexpress mouse HA-SOX2 protein, the tag was introduced by PCR at the C terminus followed by two stop codons, and the PCR product again cloned into a pFLAG-CMV-5a construct. XPC truncations and internal fragments were generated by PCR from full-length Xpc cDNA and cloned into pFLAG-CMV-5a. The W683S XPC mutant was obtained by a c.2048G > C substitution from full-length, WT Xpc cDNA and cloned into pFLAG-CMV-5a. For reconstitution experiments, human HA-tagged SOX2 cDNA was cloned into pFLAG-CMV-5a and HA-tagged RFP into pLKO.1 expression plasmid (Life Technologies). Constructs for expression of recombinant SCC were previously described (11).
Generation of SCC Knockdown and Xpc −/− mESCs. Xpc −/− mESCs, in which Xpc exon 11 and a portion of each of the flanking introns are replaced with a selectable marker (43), were a generous gift from J. H. Hoeijmakers (Department of Genetics, Erasmus University Medical Center, Rotterdam, The Netherlands). Detailed molecular analysis of the cell line showed that the targeted Xpc locus still produces two species of functional mRNAs, translated into truncated proteins (Fig. S6A). To avoid interference of these truncated products with our experiments, we further depleted Xpc mRNA with shRNA lentiviral constructs (see below). Infected cells were selected with puromycin (1.5 μg/μL) and used to perform ChIP-seq libraries with the RAD23B antibody.
To generate SCC-KD mESCs, we started from a KOMP-generated JM8.N4 mESC line (allele Rad23b tm1a(KOMP)Wtsi , project CSD40373). In these cells, one of the two Rad23b alleles is targeted by a reporter-tagged insertion with conditional knockout potential, floxing exon 2 (Fig. S4A). To obtain a complete Rad23b knockout, we electroporated these cells with a modified targeting vector containing hygromycin instead of the original neomycin resistance to target the second Rad23b allele, and selected for hygromycin-resistant colonies. We obtained six clones, and analyzed their Rad23b locus by Southern blot. In five of six clones the modified, hygromycin-targeting cassette simply replaced the neomycin one, leaving an intact WT allele. In one clone (#6) we could target the WT Rad23b allele, resulting in a double-targeted, Rad23b conditional knockout mESC line (Fig. S4B). Notably, clone #6 had already reduced, but still detectable, Rad23b mRNA levels compared with WT cells (Fig.  S4C), because the targeting cassette contains a splicing acceptor site and a polyadenylation signal that can splice with Rad23b exon 1 and terminate transcription early in the first intron. However, a fully functional Rad23b mRNA can still be formed whenever splicing jumps from exon 1-2. To obtain a full Rad23b knockout, we removed the floxed exon 2 by infecting the cells with an integrase-defective lentiviral vector that would transiently express the Cre recombinase and a GFP reporter gene. GFP-expressing cells were sorted by FACS and checked for complete exon 2 removal by genomic PCR (Fig. S4C). RT-qPCR and Western blot analyses confirm that very little Rad23b mRNA and no protein are present in these Rad23b −/− cells (Figs. S4C and S5C). XPC protein levels are already reduced in Rad23b −/− mESCs compared with WT cells (Fig.  S5C), in accordance with previous data that RAD23B protects its partner XPC from proteasomal degradation (15,67). Therefore, we could easily obtain an almost complete depletion of SCC by infecting Rad23b −/− cells with two independent shRNA lentiviral constructs targeting Xpc, and get two SCC depleted cell lines (SCC KD1 and 2) (Figs. S4 A and D and S5C). WT and Rad23b −/− JM8.N4 mESCs were also infected with a nontargeting shRNA construct and used as a control for RNA-seq and all further analyses.
RT-qPCR and RNA-Seq Analysis. Total RNA was purified from cell pellets using RNeasy Plus Mini kit (Qiagen) and quantified by spectrophotometer. For RT-qPCR, 1 μg of total RNA was retrotranscribed to cDNA with oligo(dT) primers (Ambion, Life Technologies) and SuperScript III (Invitrogen). Two microliters of 1:20 cDNA dilutions were used for qPCR with SYBR Green PCR Master Mix (Applied Biosystems) on an ABI 7300 Real Time PCR system. For RNA-seq, 6 μg of total RNA were used to prepare poly-A libraries following the TruSeq RNA Sample Preparation Guide (Illumina). Libraries were indexed with the following TruSeq adapters: Libraries were checked for quality and concentration by Bioanalyzer (2100 DNA, Agilent Technologies), Qubit (Life Technologies), and qPCR and sequenced in one lane of the HiSeq2000 platform (single-end reads, 50 bp; Illumina). Reads were mapped against Ensembl genes using mm9 as a reference genome by TopHat (v2.0.6), and differentially expressed genes between WT and Rad23b −/− or SCC KD cells were determined using Cufflinks (v2.0.2) (68). Genes with RPKM < 1 in both WT and KO/KD samples were not considered for further analysis. For all of the other genes, the cut-off for differential expression was set at 1.5-fold change. The list of genes involved in pluripotency and differentiation of Fig. 4A was manually curated starting from the TaqMan Array Mouse Stem Cell Pluripotency Panel (Applied Biosystems) and the Qiagen Mouse Embryonic Stem Cells PCR Array.
Clonal Assays. WT and SCC KD JM8.N4 mESCs were trypsinized to single-cell suspension, counted, and diluted to 30,000 cells/mL. Ten microliters of this cell suspension (containing 300 cells) were then plated to six-well plates in standard mESC medium with LIF (1,000 U/mL) and grown for 6-10 d. Emerging colonies were stained for alkaline phosphatase activity (Millipore) and counted.
Pull-Down Assays. Detailed experimental procedures are available in SI Materials and Methods.
Datasets and Accession Numbers. The ChIP-seq and RNA-seq data discussed in this publication have been deposited in National Center for Biotechnology Information's Gene Expression Omnibus (69) and are accessible through GEO Series accession number GSE64040. OCT4 and SOX2 ChIP-seq data were from