Branch migration displacement assay with automated heuristic analysis for discrete DNA length measurement using DNA microarrays
- Nader Pourmand*,†,
- Stefano Caramuta*,
- Andrea Villablanca*,
- Silvia Mori*,
- Miloslav Karhanek*,
- Shan X. Wang‡, and
- Ronald W. Davis*,†
- *Stanford Genome Technology Center, 855 California Avenue, Palo Alto, CA 94304; and
- ‡Department of Materials Science and Engineering and Department of Electrical Engineering, Stanford University, Stanford, CA 94305-4045
-
Contributed by Ronald W. Davis, February 5, 2007 (received for review January 6, 2007)
Abstract
The analysis of short tandem repeats (STRs) plays an important role in forensic science, human identification, genetic mapping, and disease diagnostics. Traditional STR analysis utilizes gel- or column-based approaches to analyze DNA repeats. Individual STR alleles are separated and distinguished according to fragment length; thus the assay is generally hampered by its low multiplex capacity. However, use of DNA microarray would employ a simple hybridization and detection for field forensics and biology. Here we demonstrate a rapid, highly sensitive method for STR analysis that utilizes DNA microarray technology. We describe two adaptations to accomplish this: the use of competitive hybridization to remove unpaired ssDNA from an array and the use of neural network classification to automate the analysis. The competitive displacement technique mimics the branch migration process that occurs during DNA recombination. Our technique will facilitate the rapid deduction of identity, length, and number of repeats for the multiple STRs in an unknown DNA sample.
DNA short tandem repeats (STRs) occur in many locations throughout the human genome and most other genomes. Microsatellites are STRs with three to seven base pair repeats; tens of thousands of microsatellites exist in the human genome, and the exact repeat length at any given locus varies among individuals (1–3). Microsatellite genotyping has been used for linkage analyses in many Mendelian diseases (4) to screen for markers with a distinct phenotype. DNA fingerprinting (or DNA profiling) using STRs has become the method of choice in forensic science, allowing the unique identification of an individual by examining a unique pattern of STRs in that individual (1). STRs used in forensics typically contain 3–15 repeats of a short sequence. The Federal Bureau of Investigation uses 13 separate STR loci for forensic analysis (3); if two samples contain the same 13 STR loci, the courts accept this as evidence that the samples come from the same individual. In the future, additional loci are likely to be added to routine forensic analysis. Forensic DNA fingerprinting utilizes a miniature system of electrophoretic columns to analyze the length of each of the 13 STR loci (5).
DNA microarray technology has emerged as a powerful tool to analyze vast amounts of genetic information simultaneously (6, 7). DNA microarrays are routinely used to measure RNA levels in experimental samples, for DNA polymorphism analysis in diseases, and to identify targets of specific DNA-binding proteins (8–11). The ability to analyze thousands of loci at once on a microarray has paved the way for use of microarrays in disease diagnosis and investigation. However, microarray usage in forensic analysis has proved difficult because STRs that are similar in size have very similar hybridization efficiencies, making it difficult to distinguish different STR lengths on an array. In addition, in multiplex microarray-based STR analysis, different STRs with identical repeat sequences would cause cross-hybridization. A high-stringency method utilizing electrically active DNA microarrays to analyze STRs has been described (12). Kemp et al. (13) described a method using standard, readily available microarray technology to determine STR length. They described proof-of-principle experiments with several innovations. First, probe sequences of all possible repeat lengths were attached to an array (a “variable-length probe array”). Second, a clamp DNA sequence adjacent to the STR was used to ensure that the unknown target sequence hybridized to the known probes in the proper register. Third, enzymatic digestion was used to remove unhybridized, overhanging ssDNA. However, enzymatic digestion may be difficult to optimize in a commercial device. Here, we devised an assay, the branch migration displacement assay, while making use of the clamp sequence to hybridize biotin-labeled targets of unknown length to an array carrying probes of all possible lengths in the proper register. After the first hybridization, we used two additional hybridization steps to remove unhybridized, overhanging ssDNA, so that only targets identical in length to the probe on the array would remain hybridized. This competitive hybridization is somewhat similar to branch migration during DNA recombination (14), although it involves three DNA strands and not four. Three-stranded DNA branch migration has been studied previously (15–17). In solution, three-stranded branch migration at 37°C has been reported to occur at a rate of one nucleotide per 12 microseconds (18). We also describe the use of multilayer neural network (19) software to automate STR identification. We demonstrate the feasibility of this assay by analyzing two STRs in 20 human DNA samples with known STR patterns. Our method utilizes widely available microarray technology and could easily be adapted to allow rapid determination of individual identity or for genetic mapping.
Results
Strategy to Determine STR Length.
Our general approach (depicted in Fig. 1) is as follows. First, oligonucleotide probes with STRs of all possible lengths are attached via their 5′ ends to an array. Probes of different length are present in different spots on the array. Note that these oligonucleotides contain the clamp sequence at their 5′ end. An example array schematic with probes of one through five repeats is diagrammed in Fig. 1 A. Second, an unknown target DNA is 5′-end-labeled with biotin and then hybridized to the array. Note that the clamp sequence ensures hybridization in the proper register. A hypothetical target with four repeats is diagrammed in Fig. 1 B. Next, a second round of hybridization using an oligonucleotide complementary to the longest hypothetical target is used to remove target DNA that is longer than probe DNA on that spot (Fig. 1 C and D). The long targets will preferentially hybridize to this oligonucleotide, releasing the long targets from the array by branch migration and displacement. This removes the biotin label from the array at any spot where the target is longer than the probe (Fig. 1 D). In the next step, a third round of hybridization using an oligonucleotide complementary to the longest hypothetical probe is used to remove targets that are shorter than individual probes. The long oligonucleotide will preferentially bind to those probe sequences, releasing the biotin labeled target (Fig. 1 E and F). This removes the targets and attached biotin label from any targets longer than the probe from the array. Finally, streptavidin–fluorophore particles are added to the array, and the array is then scanned by using a standard microarray scanner. The biotin-streptavidin–fluorophore signal will remain only on the spots where the number of repeats in the unknown target is equal to the number of repeats on the arrayed probe (Fig. 1 F).
Principle of branch migration assay for STR typing. (A) Probes of all possible repeat length (two to six in this example) are coupled to a microarray surface at their 5′ end. (B) An unknown biotin-labeled target (four repeats in this example) is then hybridized to the array. (C and D) First competitive hybridization to remove labeled targets that are longer than a particular probe spot. (E and F) Second competitive hybridization to remove labeled targets that are shorter than a particular probe spot. The biotin signal remains only on the probe spots where the target length is equivalent to the probe.
At each feature on the microarray, there are three possibilities. The unknown target could be shorter than, the same length as, or longer than the probe attached to the microarray. If the target is longer than a particular probe, it will anneal during the first hybridization step (Fig. 1 B) but will subsequently be removed in the second hybridization (Fig. 1 D). If the target is shorter than a particular probe, it will anneal to the probe in the first hybridization (Fig. 1 B) but will be removed in the third hybridization (Fig. 1 F). Only when the target and probe are the same length will the biotin-tagged DNA remain on the array after both hybridization steps. The biotin tag is assayed with a fluorescently labeled streptavidin particle. We infer that the number of repeats in a particular STR target DNA is equivalent to the number of repeats in the probe with maximum fluorescence.
Demonstration of the Feasibility of the Branch Migration Displacement Assay.
To demonstrate the feasibility of this approach, we tested our assay by generating a microarray with probes of all possible lengths for two human STR loci, D7 and D16. We then applied 20 commercially available human DNA samples to the array, carried out the procedure, and determined the STR profile to evaluate whether the predicted profile matched the known STR profile for these DNAs.
We designed probe oligonucleotides (Table 1) that contained (from 5′ to 3′) the chemistry necessary for coupling to the microarray, a clamp sequence that flanks the human STRs of interest, and 1–22 repeats of a 4-mer corresponding to human STR loci D7 and D16. All probes were spotted as individual features on an array as described in Methods.
Oligonucleotides used for microarray preparation and branch migration displacement assay
Using PCR, we generated target DNA with a known STR profile from commercially available DNAs (Table 2). These targets were amplified by using oligonucleotides containing a clamp DNA sequence and a 5′ biotin label on one oligonucleotide (Table 3). These “test” targets were hybridized to the microarray by conventional means. The array was subsequently treated with two additional rounds of hybridization by using synthetic oligonucleotides with 22 STRs (Table 1). This removed the biotin-tagged targets from probe spots that were unequal in STR length. Finally, the array was treated with a streptavidin-coupled fluorophore, which binds to the biotin label, and fluorescence on the array was quantitated.
Commercially available DNA samples with known STR patterns that were used in the preparation of biotinylated targets for the branch migration displacement assay
Oligonucleotides used to amplify the two STRs of interest and control oligonucleotides used to test the array
In control experiments in which an internal control oligonucleotide was hybridized to the array, the fluorescence intensity was similar on all probe spots (Fig. 2 A). When test target DNA was hybridized to the array without the subsequent two additional rounds of hybridization, the fluorescence on all 22 probe spots was similar in intensity (Fig. 2 B). However, after the two additional subsequent hybridization steps were carried out, the fluorescent signal from the features where the probe and test target differed in length were significantly weaker than the signal from the features where the two lengths were similar (compare Fig. 2 C and Table 2). Thus, the number of repeats in the target could be inferred from the known identities of the probes attached to the features with the highest fluorescent signal. In cases where individuals were heterozygous for the STR loci, fluorescence peaked on two probes that were identical in length to the known STR profile (compare Fig. 2 C and Table 2). In cases where individuals were homozygous, fluorescence peaked on one probe (see “Pattern 1” in Fig. 3). The intensity of adjacent probes (±1 repeat) were consistently higher than probes of greater differences in repeat length of the target but considerably below the perfect-length probe. In addition, the backgrounds of probes higher than the perfect match were higher possibly because of the quality of the probes as well as the branching probe.
Results of branch migration assay for STR typing. Upper depicts the fluorescent scan of the STR array, and Lower depicts the quantitated fluorescence intensity at each probe spot. (A) Before hybridization: internal control oligonucleotide hybridized to the array. (B) After hybridization: test probe hybridized to the array. (C) After first and second branches: the two subsequent rounds of competitive hybridization.
Flow diagram of sample analysis by ANN. Pattern 1 (homozygous) and pattern 2 (heterozygous) samples represent cases with fluorescence intensities that are somewhat similar, and therefore ANN is used for identifying the type of the sample. Pattern 3 (heterozygote with clearly identifiable fluorescent peaks) can be identified by a simple mathematical algorithm but can also be identified by ANN.
Design of Multilayer Neural Network and Development of Software to Automate STR Array Analysis.
To automate the microarray analysis and extraction of repeat length from raw data, we designed a multilayer artificial neural network (ANN) and developed software to analyze the raw array fluorescence data and extract the specific number of repeats in the test target sequence. The main task of the ANN software is to differentiate between homozygous samples with one fluorescence peak and heterozygous samples with two peaks. This problem is particularly challenging for heterozygotes that differ very little in repeat length. These two overlapping patterns are difficult to classify by a simple set of algebraic or logistic rules, but the use of neural network, heuristics helped us to overcome this difficulty. For this purpose we could also use other heuristic methods based, for example, on statistical learning theory, such as support vector machines. However, the training and optimization tools provided with our neural network package were convincing factors to use it in final processing. We developed ANN software using BrainMaker Professional 3.52 software (California Scientific Software, Berkeley, CA); our neural network contains two main components: (i) an ANN module with customized feed-forward run-time code including a STR trained network and (ii) a mathematical rule-based algorithm to find peak location corresponding to repeat length (Fig. 3).
The input to our ANN software was the fluorescence intensities associated with each feature on the array (Fig. 3). These intensities were processed by a trained network of connections with learned weights and converted by a transfer function to specified output values. Output values, in this case, were statistical weight sums identifying homozygous or heterozygote samples. The larger the output sum for either type, the higher the accuracy in identification of the sample type. Output values were scaled by ANN design as decimal values from 0 to 1. These values were considered as likelihood values of ANN recognition for identifying homozygous or heterozygote samples (where the sum of their output values is close to 1) (19).
We performed extensive training and testing of our STR neural network with BrainMaker software. To train the network, we used the data from 65 STR microarrays containing targets of known repeat type and length (see Methods). Each run consisted of 22 measured mean fluorescence intensities from spots corresponding to the number of probe repeats and can be thus represented by a histogram of 22 values. Seven STR microarrays were then used as “blind” runs for testing the ANN. To maximize the difficulty for ANN pattern recognition, our test cases included some of the most challenging potential scenarios. These included several homozygous samples with similar numbers of repeats, as well as heterozygous samples containing two repeats similar in length. Even in these difficult cases, the calls from our ANN software were identical to the known STR lengths (Table 2) in all cases, including homozygotes and heterozygotes.
Discussion
In this work, we have demonstrated the feasibility of a microarray-based method for STR analysis that is based on a series of competitive hybridizations resulting in branch migration and displacement of an imperfectly registered target. We have also developed ANN software to automate the correct identification of STR length in an unknown sample.
Our microarray assay utilizes existing technology with two key innovations. The first innovation was described previously by Kemp et al. (13) and makes use of a clamp DNA sequence on the probe and target to ensure that hybridization occurs in the proper register. Second, we have developed a branch migration displacement assay that utilizes competing oligonucleotides to remove targets that are partially registered to the probes. This is achieved by using two extra rounds of competitive hybridization in the array procedure.
We chose ANN recognition as the most convenient way to automate the analysis for several reasons. Nevertheless, other heuristic methods (19) could be used to automate classification and analysis with the same or similar results (data not shown). First, in heterozygotes with STRs of similar length, it was often difficult to determine visually which probe spots had peak fluorescence intensity. However, ANN is a good pattern recognition tool if an underlying mechanism is not known or is too complex for visual identification (20). Our trained neural network was able to correctly score homozygous and heterozygous STRs, even when the repeats had similar lengths. Second, no specific dependent variables are used in pattern recognition by ANN, and this prevents bias in STR scoring (21). Training and testing with larger sample sizes are needed to thoroughly evaluate this ANN technique with different optimization parameters.
Our assay could be readily adapted to forensic STR fingerprinting. The sequences that flank STRs in the human genome are logical choices for the clamp sequence. Because of the high density of DNA on microarrays, an array could easily be developed for all possible STR lengths for dozens (or even hundreds) of STRs in the genome. For every STR analyzed, probes of all possible lengths with attached clamp sequence must be present on the array. Individual probe spots could be present on the array in multiple copies to ensure accuracy.
Our assay will also be useful in a variety of other applications that use STR analysis, including individual identification, paternity testing, and cancer diagnosis. We are currently developing conditions to adapt this technology for microsatellite analysis using STRs. Until recently, one of the greatest limiting factors in the analysis of genetic markers was parallel amplification of many loci. Development of multiplex amplification assays such as MIP, Golden Gate, and TnT has circumvented this obstacle. Thus, implementation of any of these techniques for multiplex amplification of STRs in combination with branch migration displacement assay for length identification would facilitate analysis of thousands of STRs in parallel. This rapid analysis could also be used in the field for forensic or military identification. By moving from gel- or column-based methods to microarrays, one could vastly increase the throughput and accuracy (by using more STRs) of traditional forensic and genotyping analysis.
Methods
Microarray Preparation.
Oligonucleotides were obtained from Integrated DNA Technologies (Coralville, IA) or the oligo synthesis facility at the Stanford Genome Technology Center. Oligonucleotides used as probes on the array (Table 1) consisted of (from 5′ to 3′) a 5′ amine group (for attachment to the array), a 5-bp poly(T) sequence, a 35-bp allele-specific clamp sequence homologous to the STR-flanking DNA, and 1–22 tandem repeats of the 4-bp STR sequence. Probes were attached to the microarray essentially as described previously (22). Each probe was printed in quadruplicate or quintuplicate, and two complete arrays were present on each chip. The postprinting processing of the microarrays was performed as recommended by the slide manufacturer. Control oligonucleotides used to verify array quality included a poly(T) (20 bp) with 5′-amin and internal-biotin modification as a labeling control (amino-P), a 5′-amine-modified oligonucleotide with internal-Cy3 (amino-B) as an internal control for each spot's quality, and a 5′-amine-modified poly(T) (20 bp) as a DNA spacer (Table 3).
Target Preparation.
Two STR regions were amplified by PCR from 20 human subjects by using commercially available genomic DNA (Serological Research Institute, Richmond, CA). These DNA samples have known STR profiles (Table 2). PCR primers F-D7 and R-D7 were used to amplify locus D7S820; primers F-D16 and R-D16 were used to amplify locus D16S539 (Table 3). These primers were designed to amplify the entire STR plus flanking DNA that is the reverse complement of the clamp sequence on the probe oligonucleotides.
PCR to amplify the STR loci was carried out in three steps. First, we used the AmpFlSTR Profiler Plus PCR Amplification Kit, which amplifies 13 different STR loci, using the suggested protocol using ≈1 ng of total genomic DNA in a 25-μl reaction volume. PCR was performed as follows: 95°C for 15 min, 28 cycles of 95°C, 59°C, and 72°C for 1 min each, and finally a 60-min final extension at 60°C. Second, 0.5 μl of this PCR product was used as template for a second round of PCR to amplify the two STR loci of interest (D7S820 or D16S539) using a concentration of 0.2 μM for each primer (F-D7/R-D7 or F-D16/R-D16). The size of the PCR products was verified by agarose gel chromatography. Third, a biotinylated, single-stranded target was generated by reamplifying the targets with biotinylated F-D7 or F-D16 primers. A total of 1 μl of the previous PCR product was used to reamplify the target with 0.4 pmol of biotinylated F-D7 or F-D16 primer using Titanium TaqDNA Polymerase (BD Biosciences Clontech). PCR was performed as follows: 95°C for 10 min, 30 cycles of 95°C, 55°C, and 72°C for 30 sec each, and finally a 5-min final extension at 72°C.
Array Hybridizations.
The Branch Migration Assay involves three sequential hybridization steps. In the first hybridization, the biotinylated single-stranded target was applied to the microarray. For the initial target hybridization step, we used 50 μl of target DNA in 1× hybridization buffer (100 mM Mes, 1 M [Na+], 20 mM EDTA, and 0.01% Tween 20) and 1.25× Denhardt's solution. The hybridization was performed at 42°C for 12–16 h. After hybridization, the microarray was washed with wash buffer (6× SSPE and 0.1% Tween 20) twice for 2 min at 50°C and once for 2 min at room temperature.
The second hybridization was performed by using a competing mixture (5 μl of 1 M MgCl2 and 60 μl of 4× SSC) containing 10 μl of the competing probe 1 (D7-Comp1 or D16-Comp1, 100 μM final concentration) (Table 1). This hybridization was carried out for 2 h at 50°C. After this hybridization, the slide was washed three times in wash buffer. After the third wash, the third hybridization was performed by using 10 μl of the competing probe 2 (D7-Comp2 or D16-Comp2, 100 μM final concentration) (Table 1) in the mixture. After this final hybridization step, the microarray was washed three times in wash buffer and then labeled for 10 min at 50°C with a solution containing streptavidin-allophycocyanin (1 mg/ml final concentration), 6× SSPE, 1× Denhardt's solution, and 0.01% Tween 20.
The microarray was scanned for fluorescent intensity at 535 and 635 nm by using a GenePix 4000 fluorescent scanner (Axon Instruments, Foster City, CA) set to scan at 450 PMT. GenePix Pro software was used to determine the total fluorescent signal from each spot on the array.
Development of ANN Software to Analyze Primary Array Data.
We used ANN software (20) to recognize patterns of fluorescence peak amplitudes characteristic for a specific number of repeats in our STR microarray analysis. The neural network training process was performed with a back-propagation neural network BrainMaker Professional 3.52 (California Scientific Software), which includes elaborated training tools for optimization of number of neurons, hidden layers, and training/testing parameters. Our STR optimized network used 18 neurons in the first hidden layer and 13 neurons in the second hidden layer. The number of input neurons, 22, is equal to the number of repeats in the longest probe on the array. Two neurons were used in the output layer, which correspond to the heterozygous and homozygous classification. The neural network training set consisted of 65 STR microarray runs (one run represents one slide with 22 probe spots) with at least two replicates for each sample. We then tested the neural network on seven test samples of known STR length (≈10% of total samples) with peaks in the repeat range of trained samples. The results showed that this trained neural network evaluated test samples with a 100% success rate and with at least a 90% likelihood score, indicating that our ANN is sufficient for robust pattern recognition of STR samples [supporting information (SI)].
Acknowledgments
We thank Keith Anderson, Mike Jenson, Dan Bruno, and Julianna Erickson for technical support. This study was supported by Defense Advanced Research Planning Agency/Navy Grant N00014-02-1-0807 and National Institutes of Health Grant P01-HG000205.
Footnotes
- †To whom correspondence may be addressed. E-mail: pourmand{at}stanford.edu or dbowe{at}stanford.edu
-
Author contributions: N.P. and R.W.D. designed research; S.C., A.V., and S.M. performed research; M.K. contributed new reagents/analytic tools; N.P., M.K., S.X.W., and R.W.D. analyzed data; and N.P. wrote the paper.
-
The authors declare no conflict of interest.
-
This article contains supporting information online at www.pnas.org/cgi/content/full/0700921104/DC1.
- Abbreviations:
- STR,
- short tandem repeat;
- ANN,
- artificial neural network.
-
Freely available online through the PNAS open access option.
- © 2007 by The National Academy of Sciences of the USA








