Previous Article |
Table of Contents
| Next Article
GENETICS
ProteinDNA interaction mapping using genomic tiling path microarrays in Drosophila





¶
*Department of Genetics and
Biostatistics Division, Department of
Epidemiology and Public Health, Yale University School of Medicine, New Haven,
CT 06520;
Netherlands Cancer Institute, 1066
CX Amsterdam, The Netherlands; and
Chromatin and
Cell Biology Laboratory, Institute of Human Genetics, Centre National de la
Recherche Scientifique, 34396 Montpellier, Cedex 5, France
Communicated by Walter J. Gehring, University of Basel, Basel, Switzerland, June 4, 2003 (received for review November 11, 2002)
| Abstract |
|---|
|
|
|---|
-globin locus and binding sites of E2F in promoters of
genes expressed during cell cycle entry
(1214).
The results from yeast demonstrate that intergenic arrays can be extremely
valuable for the study of transcriptional regulatory networks, and the results
from human show that, in principle, the technology can be applied to study
complex genetic loci. Here we demonstrate the use of genomic DNA tiling path microarrays to map proteinDNA interactions at high resolution along large segments of genomic DNA from D. melanogaster. We used DNA microarrays tiled across two genomic regions: 2.9 Mbp of Adhcactus region on chromosome 2 and 85 kb of 82F region on chromosome 3. These arrays allowed us to assay proteinDNA interactions in coding and noncoding genomic sequence that contains at least 220 genes (1518). The arrays were composed of overlapping fragments with sizes of 850920 bp each across the Adhcactus region and 430500 bp each across the 82F region. To map proteinDNA interactions, we used the DamID chromatin profiling technique (8, 19). This technique involves in vivo expression of a trace amount of a chromatin protein of interest fused to Escherichia coli DNA adenine methyltransferase (Dam). As a result, DNA in the target loci of the chromatin protein is preferentially methylated by the tethered Dam. Subsequently, methylated DNA fragments are purified, labeled with a fluorescent dye, and hybridized to a microarray. To correct for unspecific binding of Dam and local differences in DNA accessibility, methylated DNA fragments of control cells transfected with Dam alone are labeled with a different fluorescent dye and cohybridized. The obtained ratio of fluorescent dyes reflects the extent of protein binding to the probed DNA sequence (8).
We performed high-resolution binding site mapping of a sequence-specific
DNA-binding factor, GAF (20),
and the heterochromatin protein HP1
(21). Binding profiles of both
proteins have previously been determined in a study using cDNA arrays
containing
300 cDNA fragments
(8). Only binding sites in the
immediate vicinity of transcribed regions can be detected by using cDNA
arrays. However, localization of chromatin-associated proteins is often
distant from transcribed regions. Here we demonstrate that genomic tiling path
arrays can be used for comprehensive and high-resolution mapping of
chromatin-associated proteins in the Drosophila genome. We discovered
dozens of new GAF-binding sites in the 3 Mb of genomic DNA surveyed, and we
were able to initially map these sites to a few hundred base pairs in most
cases. The use of computational sequence analysis methods allowed many sites
of chromosomal association to be pinpointed to within several nucleotides.
Furthermore, ChIP analyses verified several randomly selected sites identified
through this analysis, providing validation by using an independent method for
direct mapping of GAFDNA interactions. In addition to the
high-resolution mapping of GAF protein, we found new patterns of HP1
association with transposable elements throughout this region of the
genome.
| Materials and Methods |
|---|
|
|
|---|
|
DamID. The DamID procedure was performed in Drosophila Kc167 cells as described (8, 19), except that methylated DNA fragments were not obtained by DpnI digestion and subsequent sucrose gradient centrifugation, but selectively amplified by PCR.
Genomic DNA isolated from Kc167 cells transfected with Dam or a Dam-fusion
protein was isolated as described
(8). In brief,
108 cells from one 10-cm plate were collected, pelleted, and
resuspended in 1 ml of ice-cold T10E10 (10 mM
Tris·HCl,pH7.5/10 mM EDTA). One milliliter of freshly prepared TENSK
buffer [100 mM NaCl/0.5% SDS/200 µl of Proteinase K (Roche Molecular
Biochemicals) in T10E10] was added and mixed by
inversion. After incubation for 2 h at 55°C, 2.0 ml of buffer-saturated
phenol/chloroform/isoamylalcohol was added, followed by mixing by inversion
and spinning for 10 min at 3.5 krpm. The water phase was transferred to 2.0 ml
of isopropanol and 0.2 ml of 3 M sodium acetate (pH 5.2), and mixed; the DNA
was recovered by spooling on a yellow tip, completely dissolved in 0.3 ml of
T10E10 with 2 µg of DNase-free RNase (Roche Molecular
Biochemicals), and incubated at 37°C for at least 1 h. Next, 0.3 ml of
TENSK was added, followed by incubation for 30 min at 55°C. A second
phenolchloroform extraction followed, after which the water phase was
transferred to 0.6 ml of isopropanol and 60 µl of 3 M sodium acetate (pH
5.2). The solution was mixed by inversion and the DNA precipitate was
recovered, rinsed in 70% ethanol, and dissolved in 50 µl of
T10E10 by incubation at 37°C for several hours.
For selective PCR amplication of methylated DNA fragments, 40 µg of the isolated genomic DNA was digested for 16 h at 37°C with 40 units of DpnI (New England Biolabs) in the presence of 12.5 ng of DNase-free RNase A (Roche Molecular Biochemicals) in a total volume of 50 µl of buffer 4 (New England Biolabs). After inactivation of DpnI at 80°C for 20 min, 4 µg of the DpnI-digested genomic DNA was ligated to 40 pmol of a double-stranded unphosphorylated adaptor (top strand: 5'-CTAATACGACTCACTATAGGGCAGCGTGGTCGCGGCCGAGGA-3', bottom strand: 5'-TCCTCGGCCG-3') for 2 h at 16°C with 5 units of T4-Ligase (Roche Molecular Biochemicals) in a total volume of 20 µl of ligation buffer. To prevent amplification of DNA fragments containing unmethylated GATCs, 1 µg of the adaptor-ligated DNA was cut with 2 units of DpnII (New England Biolabs) for1hat37°C in a total volume of 20 µl of DpnII buffer. Next, amplification was performed by using 0.5 µg of DpnII-cut DNA, 1 µl of Advantage cDNA PCR polymerase mix (CLONTECH), 10 nmol of each dATP, dCTP, dGTP, and dTTP, and 62.5 pmol of primer (5'-GGTCGCGGCCGAGGATC-3') in 50 µl total volume of Advantage PCR buffer, under the following cycling conditions: activation of the polymerase and nick translation for 10 min at 68°C, followed by one cycle of 1 min at 94°C, 5 min at 65°C and 15 min at 68°C; 3 cycles of 1 min at 94°C, 1 min at 65°C and 10 min at 68°C; and 14 cycles of 1 min at 94°C, 1 min at 65°C and 2 min at 68°C. The PCR products were purified by using the QIAquick PCR purification kit (Qiagen) and labeled with Cy3 or Cy5 as described (8).
Finally, labeled experimental (Damprotein fusion) and reference (Dam) DNA samples were mixed and hybridized to microarrays in 3x SSC (450 mM sodium chloride/45 mM sodium citrate, pH 7.0) supplemented with 0.22% SDS, 20 µg of poly(dAdT), 100 µg of yeast tRNA, and 25 µg of unlabeled DpnI-digested plasmid encoding the fusion protein used for transfection. After a 15-min incubation at 42°C, hybridization was performed at 63°C for 16 h, followed by a sequential washing at room temperature in 1.14x SSC plus 0.0285% SDS, 1.14x SSC, 0.228x SSC, and 0.057x SSC. Immediately after washing, arrays were spun dry at 1,000 x g for 5 min in a table-top centrifuge.
Motif Analysis. Consensus binding motifs were inferred from the complete set of binding log-ratios by using three different algorithms: the motif-based linear regression method REDUCE, which exploits the correlation between the occurrence of sequence motifs near exons of genes and the expression of those exons (23), the method proposed by Keles et al. (24), which is conceptually similar to REDUCE, but uses a different motif selection scheme, and the MDscan method, which uses a modified Gibbs sampling strategy to search for common patterns in the segments with high binding ratios (25).
ChIP of GAF Binding Fragments. ChIP was performed by using
formaldehyde cross-linking, and by using anti-GAF antibody with chromatin
extracts of both Kc cells and Drosophila embryos as described
(26). Primers were designed to
amplify five GAF-binding fragments identified with DamID and seven fragments
that did not show any GAF binding in the DamID experiments. However, all
fragments with GAGAG sites were selected, regardless of whether they were
positive for GAF binding in the DamID assay. PCR products were run on an
agarose gel (1.4%) and transferred to a nylon membrane for Southern blot
analysis. Blots were hybridized either with a probe made from a mock
immunoprecipitation (IP) sample or with a probe from GAF ChIP. Hybridized
membrane was then subjected to a 24-h exposure in a phosphorimager cassette,
and results were quantified as presented in
Table 2. Tested fragments were
scored as "ChIP positive" if the ratio of mock IP to GAF IP was
2.0 in ChIP with embryo chromatin extracts. Our positive controls (Fab7,
Mcp, and bxd from the Bithorax complex regulatory region) were enriched,
although the enrichment value in Kc167 cells is not as high as usually found
in embryos (26). We therefore
lowered the criteria for enrichment in GAF ChIP for Kc cells to 1.5-fold. ChIP
experiments were performed in duplicate.
|
| Results |
|---|
|
|
|---|
To begin, we consider the characteristic patterns of microarray data expected when these tiling path microarrays are used with the Dam ID technique, which compares genomic methylation patterns in the presence of a Dam-fusion protein to background methylation from expression of Dam alone (19). In the simplest case, the association of a Dam-fusion protein would occur at a single point along the chromosome. At that point, the signal ratio from a DamID experiment (Dam-fusion protein/Dam alone) would be high. One expects that targeted methylation levels of DNA in either direction from that point will progressively decrease proportional to distance, with a concomitant decease in the signal ratio. The quantitative result from the microarray experiment will accordingly be represented as a curve with its maximum over the point (Fig. 1c). This curve would be expected to be monotonic if GATC sequences targeted by Dam are randomly distributed around the focal point of the DNAprotein interaction. Multiple binding sites in a region may produce bimodal or other complex distributions (Fig. 1d). It is also important to consider that for proteinDNA binding assays using microarrays, repetitive DNA associated with the assayed protein in one part of the genome may cross-hybridize with DNA from another region printed on the microarray. At the place where sequence identity is lost between the DNA tiling path elements and the cross-hybridizing sequences from a remote genomic location(s), signal intensity would be expected to drop off sharply and no curve is expected outside the cross-hybridizing sequences, whereas inside the cross-hybridizing sequences, one expects either no curve at all (Fig. 1e) or a curve that reflects real binding to the remote sequences. We determined the actual distributions of signal intensities by using two Dam-fusion proteins, GAF-Dam and Dam-HP1. We refer to these data as GAF or HP1 binding profiles.
GAF Binding Profiles. We used a local linear weighted regression method that is more sensitive than a standard t test to identify 169 genomic DNA fragments with significantly elevated GAF-Dam/Dam methylation ratios (see Supporting Methods and ref. 27). These fragments congregated into 46 chromosomal areas (groups of adjacent fragments) (Table 4, which is published as supporting information on the PNAS web site). Because the affinity of GAF binding may be reflected in the microarray measurements, we imposed an additional criterion of a threshold cut-off to divide the 169 significant fragments into a set that shows a >2-fold differential ("high binding ratio") and a set that does not ("low binding ratio") (all ratios >2 also were significant by using a standard t test with P < 0.025; see Supporting Methods). We found 54 fragments in 23 areas in the 2.9-Mb Adhcactus region, and 10 fragments in three areas in the 82F region (26 areas total) that showed high GAF binding ratios (Table 4, Fig. 2). Most of the 26 areas display a GAF-binding profile consistent with direct associations between chromosomal DNA fragments and GAF. Among the 23 areas with high GAF binding ratios in the Adhcactus region, 15 display monotonic binding profiles (Fig. 2c). Of the remaining eight areas, four appear to contain multiple GAF binding sites because they display either bimodal (three areas) or multipeak profiles (one area) (Fig. 2d); the other four exhibit profiles that appear as half of a monotonic curve with signal precipitously dropping off. In two of these latter four, we observed two half monotonic binding profiles arranged as near mirror images of one another (Fig. 2e). We interpret these profiles as either direct binding of GAF to the ends of transposons elsewhere in the genome, or GAF binding nearby the ends of transposons elsewhere in the genome, because the sharp decrease of binding outside the ends of transposon is what one would expect in the case of cross-hybridization of the entire transposon. The average number of GATC sites in these cases does not differ on either side of the GAF binding site, so this mirror image "half-site" profile cannot be due to scarcity of methylation targets on only one side of binding sites. As in the Adhcactus region, the binding profiles in the 82F region also fall into three categories: half monotonic curve, monotonic curve, or multipeak patterns (Fig. 2b). The 20 areas of low GAF binding displayed a similar range of binding profiles as the 26 areas of high GAF binding (Table 4).
|
Most of the areas in the Adhcactus region associated with high GAF binding ratios are within the vicinity of sequences that contain annotated genes, with 15 that are <3 kb from the nearest start codon, and 18 that are within 10 kb of the nearest start codon (Table 4). Although high GAF binding ratios were commonly associated with putative regulatory sequences 5' or 3' of transcription units (nine instances), 5 of the 23 GAF binding sites are contained within 5' or 3' UTRs and 9 occurred within introns (Table 4). None occurred within coding regions. There was a single instance where no annotation features were identified in a 10-kb vicinity of GAF binding (the closest gene was >25 kb away). This may be caused by regulatory sequences acting from a distance, it may be caused by functionally irrelevant GAF binding, or it may be caused by the existence of genes not yet annotated. Considering the Adhcactus and 82F regions as representative samples from the genome, and extrapolating from these results, we expect that there are likely >1,000 sites with high GAF binding genome-wide, and >750 more sites with low but detectable GAF binding by using the DamID assay.
GAF Binding Motif Analyses. In vitro, GAF binds to the sequence GAGAG (28). By using three independent motif-finding methods that all use genomic sequence data and GAF binding data from the tiling path microarrays, we were able to successfully identify the correct consensus GAGAG binding motif for GAF. Table 1 shows the results of analyses based on Regulatory Element Detection Using Correlation of Expression (REDUCE) (23), the Keles et al. method (24), and the Motif Discovery scan (MDscan) (25). The first two methods are similar; they were developed to perform motif selection based on a least-squares fit of a linear predictive model for expression log-ratios, but can be used without modification to analyze binding log-ratios. The MDscan method compares the probability that one motif occurs in the top ranking sequences based on binding ratios and its occurrence in the background sequences. The success of all three of these algorithms in identifying the correct binding site indicates that DNA tiling path microarrays combined with DamID mapping of binding sites will provide a robust source of data for cis-regulatory motif-finding algorithms.
|
Scanning of genome sequence revealed that GAGAG/CTCTC motifs were contained in almost all DNA fragments showing peak levels of signal in the 46 areas we identified, but also in 2,115 DNA fragments without appreciable binding signal. All of the areas with high levels of binding contained at least one GAGAG/CTCTC site in the DNA fragments that showed peak signal on the microarrays, allowing the precise coordinates of GAF binding to be predicted. The average number of such sites in DNA fragments with peak signal was 4.3, whereas the median was 3 sites. In the 20 areas with low levels of binding, often more than one adjacent fragment showed indistinguishable levels of peak signal. The average number of GAGAG/CTCTC sites in these DNA fragments was 2.2, whereas the median was 1 site. Thus, we find an overall correlation between signal strength and the number of potential binding sites for GAF. For one case, no GAGAG/CTCTC sites were identified even though the binding patterns observed were monotonic and in nonrepetitive DNA sequence. This case could be caused by weak binding site(s) that do not match the exact consensus, or perhaps this is a false positive.
Independent Verification of GAF Binding Sites Using ChIP. We verified several candidate GAF binding sites by using ChIP from both Kc167 cell chromatin extracts and in embryonic chromatin extracts (26). We tested five fragments shown by DamID to bind GAF and seven fragments that were not positive in the GAFDamID assay. Among the five fragments that were positive for DamGAF binding, four were from the high-level GAF-binding fragment list and one was from the low-level GAF-binding fragment list. All of the fragments tested, both those postive and negative for GAF binding in the DamID assay, contained at least one copy of the GAF-binding motif (GAGAG/CTCTC) (28).
All five of the GAF DamID-positive DNA fragments also were positive for binding in the GAF ChIP assays from both Kc167 cells and embryos (Table 2). No difference between DamID- and ChIP-positive GAF binding sites was noted in Kc167 cells, and only one of the seven DamID-negative fragments was ChIP-positive in the embryonic chromatin extracts. These results indicate that the ChIP and DamID assays both accurately reflect bona fide GAF binding sites in vivo. Although the correspondence between DamID and GAF assays was striking, there was only a moderate correlation between the quantitative values of the DamID and GAF positive data for Kc167 cells (0.54), and thus the quantitative results from the two techniques are complementary. Finally, these results also indicate that GAF distribution in embryos and in embryonically derived Kc167 cells is largely overlapping, but qualitatively and perhaps quantitatively different.
HP1 Binding Profile. We identified 17 areas in the 2.9 Mb Adhcactus region, and one area in the 150 kb 82F region, that were associated with significant Dam-HP1:Dam ratios (Fig. 3 a and b and Table 5, which is published as supporting information on the PNAS web site). Fifteen of the seventeen areas yielded high HP1 binding ratios. All but one of these areas contain transposons or other repeat elements (Table 5), in agreement with previous studies showing that signal from HP1 fusion protein experiments is associated with transposable elements (8). To distinguish between cross-hybridization and direct association of HP1, we examined the local pattern of signal intensities for each area that contained repetitive DNA or transposable elements. Only 6 of the 17 areas in the Adhcactus region showed distributions that strongly indicated direct association of HP1, with four showing monotonic, and four showing bimodal or complex patterns of signal distribution (Fig. 3c). The single area of HP1 binding in the 82F region also displayed a multimodal signal distribution, indicating multiple binding sites. All other areas we identified showed profiles that indicated cross hybridization (Fig. 3d). Thus, the use of tiling path microarrays allowed us in at least some cases to distinguish between bona fide association of HP1 with chromosomes and cross-hybridization. Microarrays with more sparsely spaced DNA fragments do not allow this distinction to be made.
|
We found patterns indicating real HP1 binding within the coding sequence of only one gene, crinkled (ck), which encodes a non-muscle myosin involved in bristle formation (Fig. 3c) (15, 29). Whether ck has any role in Kc167 cells is unknown, but based on microarray analyses, ck is expressed at moderate levels in these cells (L.V.S. and K.P.W., unpublished results). Thus, HP1 binding within ck does not prevent its expression. Interestingly, HP1 appears to bind to the transcribed region of the ck locus rather than to the promoter region. No repetitive elements are present in the region of strong HP1 binding in ck (the nearest repeat element/transposon is >30 kb away), indicating that HP1 is recruited to the ck gene through a mechanism distinct from its strong association with repetitive DNA.
Finally, we compared HP1 and GAF binding sites and found that they rarely overlapped (Fig. 4a, which is published as supporting information on the PNAS web site). In only one case did we observe GAF and HP1 binding profiles very near one another (Fig. 4b). These results indicate that, on a local level, GAF and HP1 binding sites are largely independent of one another in the Adhcactus region.
Summary. Our results demonstrate the feasibility of proteinDNA interaction mapping with tiling path DNA microarrays that cover large tracts of a complex genome. We found that data from genomic tiling path arrays allowed the sites of chromosomal association to be readily discerned for both a site-specific transcription factor and a general heterochromatin-associated protein. Because all GAF-binding fragments identified with DamID were verified with ChIP, either approach is capable of yielding accurate and high-resolution binding site mapping for chromatin-associated proteins. ChIP can be complementary to DamID, and when suitable antibodies are available for a DNA-associated protein, ChIP can be used either with candidate targets or with microarrays to cross-validate binding sites. Additional studies will be required to determine the biological relevance of the dozens of GAF and HP1 binding sites we observed. Nevertheless, these results indicate that genomic tiling path microarrays will be valuable for mapping the binding sites of a wide range of regulatory proteins in Drosophila. These methods should be applied equally well for mapping DNAprotein interactions in cells isolated from animals, and will aid in the comprehensive delineation of genome-wide regulatory networks that control gene expression and development.
| Acknowledgements |
|---|
| Footnotes |
|---|
¶ To whom correspondence may be addressed. E-mail: b.v.steensel{at}nki.nl or kevin.white{at}yale.edu.
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
T. D. Southall and A. H. Brand Chromatin profiling in model organisms Brief Funct Genomic Proteomic, July 24, 2007; (2007) elm013v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Vogel, L. Guelen, E. de Wit, D. P. Hupkes, M. Loden, W. Talhout, M. Feenstra, B. Abbas, A.-K. Classen, and B. van Steensel Human heterochromatin proteins form large domains containing KRAB-ZNF genes Genome Res., December 1, 2006; 16(12): 1493 - 1504. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Huber, J. Toedling, and L. M. Steinmetz Transcript mapping with high-density oligonucleotide tiling arrays Bioinformatics, August 15, 2006; 22(16): 1963 - 1970. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Moorman, L. V. Sun, J. Wang, E. de Wit, W. Talhout, L. D. Ward, F. Greil, X.-J. Lu, K. P. White, H. J. Bussemaker, et al. Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster PNAS, August 8, 2006; 103(32): 12027 - 12032. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ashburner and C. M. Bergman Drosophila melanogaster: A case study of a model genomic sequence and its consequences Genome Res., December 1, 2005; 15(12): 1661 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. de Wit, F. Greil, and B. van Steensel Genome-wide HP1 binding in Drosophila: Developmental plasticity and genomic targeting signals Genome Res., September 1, 2005; 15(9): 1265 - 1273. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Jiao, P. Jia, X. Wang, N. Su, S. Yu, D. Zhang, L. Ma, Q. Feng, Z. Jin, L. Li, et al. A Tiling Microarray Expression Analysis of Rice Chromosome 4 Suggests a Chromosome-Level Regulation of Transcription PLANT CELL, June 1, 2005; 17(6): 1641 - 1657. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-H. Wang, R. Chmelik, D. Tang, and M. Nirenberg Identification and analysis of vnd/NK-2 homeodomain binding sites in genomic DNA PNAS, May 17, 2005; 102(20): 7097 - 7102. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Greil, I. van der Kraan, J. Delrow, J. F. Smothers, E. de Wit, H. J. Bussemaker, R. van Driel, S. Henikoff, and B. van Steensel Distinct HP1 and Su(var)3-9 complexes bind to sets of developmentally coexpressed genes depending on chromosomal location Genes & Dev., November 15, 2003; 17(22): 2825 - 2838. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||