Hox gene–specific cellular targeting using split intein Trojan exons

Significance The brain has hundreds of cell types, many of unknown function. Determining the functions of all brain cell types requires methods for selectively monitoring and perturbing their activity. This is best accomplished by genetic methods, but these methods frequently disrupt the function of the native genes whose regulatory information is co-opted to achieve cell type–specific targeting. Here, we present a method for targeting specific cell types that limits disruption of the genes used to gain genetic access to them. We demonstrate the method’s utility by using it to identify and manipulate neurons that express different Hox transcription factor genes and are therefore located in anatomically distinct regions along the anterior–posterior axis of the fly nervous system.


Destabilized EGFP (dsGFP) and DsRed (dsRFP) constructs:
The plasmids "d2EGFP" (a kind gift from Dr. Gyeong-Hun Baeg of Macau University of Science and Technology) and pDsRed-Express-DR (Takara Bio USA, Inc.) were used to amplify dsGFP and dsRFP using forward and reverse primers that included 15 bp of sequence homologous to the pUASTattB vector (Addgene).The PCR products were cloned into pUASTattB (digested with EcoR I/Not I) using the In-Fusion Cloning Kit (Takara Bio UAS, Inc.) to get pUASTattB-dsGFP and pUASTattB-dsRFP.These constructs were used to introduce UAS-dsGFP and UAS-dsRFP into the attP40 landing site on the Drosophila 2 nd chromosome by ΦC31-mediated recombination.
Constructs pPac-PL-LexA-Myc-EY-Cfa N -T2A-3XHA and pPac-PL-HA-T2A-Cfa C -CFN-nlsp65AD for in vitro testing in S2 Cells: To generate coding sequences of the Cfa N and Cfa C split inteins, the corresponding protein sequences from Stevens et al. (2) were reverse translated and the resulting DNA sequences codon-optimized for Drosophila.To ensure efficient splicing, nucleotides encoding amino acids residues Glu-Tyr and Cys-Phe-Asn were added to the 5' and 3' ends of the Cfa N and Cfa C sequences, respectively.

Hox-specific siTrojan exon drivers:
The seven Hox gene Gal4 drivers and Scr siTE -p65AD hemidriver were made using Crispr/Cas9-mediated homologous recombination.Guide RNAs (gRNAs) conforming to the sequences shown in the table below were selected to make pBS-U6A1-sgRNA constructs for each of the Hox genes.Preference was given to sequences in introns that did not interrupt protein domains or contain highly conserved DNA sequence.For Dfd, gRNAs were selected for two introns (i.e.introns 2 and 3) with reading frame phases of 1 and 0, respectively.For several other genes (pb, abd-A and Abd-B), multiple gRNAs were selected when use of the initially selected gRNA(s) failed to generate transformants.Synthesized, complementary oligos were annealed to form each gRNA and ligated into the Bbs I-digested pBS-U6A1-sgRNA-short vector of Rem, et al. (4).Ligated plasmid DNA products were transformed into DH5α competent cells from Thermo Fisher Scientific (Waltham, MA).For the siTrojan constructs, DNA fragments representing 100-500 bp of the sequence flanking the gRNA sequences were synthesized to serve as left and right homologous arms.These arms were then cloned into siTrojan exon vectors digested with Not I (left arm) or Asc I (right arm).

Immunoblotting and Immunohistochemistry:
Immunoblotting PIPES, 1mM MgCl2, 1mM EDTA, pH 6.9)/heptane after vigorous shaking for 15s.After removal of fixation buffer in the lower phase, methanol was added and vials were again shaken vigorously for 15s.After removing the whole solution (including floating, damaged embryos), the embryos on the bottom were given a final methanol wash and mounted on glass coverslips.Z-series (1 µm step size) were collected through each embryo using both fluorescence and transmitted light.A maximum projection image of the mCherry fluorescence for each embryo was merged using Adobe Photoshop (Adobe, Inc., San Jose, CA) with a cropped (two Z-steps) projection of the transmitted light image of the embryo structure.
CNS preparations: Adult nervous systems were dissected in phosphate buffered saline (PBS), fixed in 4% paraformaldehyde/PBS for 30 min and then incubated in 4% paraformaldehyde/PBS plus 0.5% Triton X-100 for 15 min.Samples were immunostained as described previously (6), mounted in VectaShield (Vector Laboratories) and imaged by confocal microscopy with Z-series collected in 1 μm increments on a Nikon A1R microscope.Unless otherwise noted, the images shown are maximum projections of the entire Z-stack.Larval nervous systems were dissected in Baines' solution and excised brains were adhered to poly-lysine-coated coverslips and fixed for 7 min in 4% paraformaldehyde solution as previously described (7).Brains were washed 3X with PBS containing 0.1% Triton X-100 (PBT), blocked 1 hr in PBT containing 2% normal donkey serum (NDS), and stained with primary antibody, in PBT at 4°C overnight.After washing, samples were stained with secondary antibody at room temperature for 1 hr, washed again, and applied with increasing percentages of ethanol in water (30, 50, 70, 95, 100) series to replace PBT.Samples were then immersed into xylene for clearance, and then mounted in DPX (Sigma, MI).Images were acquired using a Zeiss LSM 800 confocal microscope and were processed and analyzed using ImageJ.
Analysis of Eve + U motor neuron immunostaining: CNS preparations from animals expressing either UAS-dsRFP or UAS-dsGFP under the control of Antp siTE -, Ubx siTE -, or Abd-B siTE -Gal4 were immunostained with anti-Eve, anti-Hox, and anti-mCherry (for dsRFP) or anti-GFP (for dsGFP) antibodies and analyzed as follows.For each of three preparations for each Hox siTE -Gal4 driver, we identified 3 to 5 Eve + cells (U motor neurons) in 5-6 VNC hemisegments in which the Hox gene was expressed and scored them for co-expression of anti-GFP/mCherry or anti-Hox.If the signal was not above background for either channel, they were scored as non-expressing.The number of Eve + cells that were both Hox + and GFP/RFP + were counted and "Driver Fidelity" was determined by dividing the sum of the cells counted in all three preparations by the total number of Eve + cells examined and multiplying by 100.Similarly, we counted cells with mismatched labeling of the following two types: (1) Hox + but not GFP/RFP + ("false negatives") or (2) GFP/RFP + but not Hox + ("false positives") and calculated the percentages of each type from the sums across the three animals divided by the total number of Eve + cells examined.

Analysis of GNG volume:
Four CNS preparations of 5-7 d old adults were immunostained with anti-nc82 antibody (DSHB, Iowa University) for each of the following genotypes: w 1118 ;; (control), ;;Dfd-Gal4 1 /TM3,Sb, and ;;Dfd-Gal4 2 /TM3,Sb (experimental).Each CNS was then imaged by confocal microscopy in 1 µm sections using a Nikon A1-R confocal microscope.Gnathal ganglia (GNG) neuropil was identified according to the criteria of Ito et al. (8) and outlined in each Z-section of each imaged prep using the Freehand tool of Fiji.The outlined areas were then multiplied by the section depth (1 µm) and summed over the entire preparation to obtain the GNG volume for each preparation.

Behavioral Assays
Larval behavior analysis: Larval behavior was analyzed as described previously (9).
Briefly, L1 larvae screened for expression of UAS-Cs.Chrimson.mVenuswere moved to apple juice caps with either yeast paste (-ATR) or yeast paste with 0.5mM of all-trans retinol (+ATR) and were raised to third instar.At L3, larvae were transferred to arenas made with 2 mm of 2% agarose in water.Agarose arenas were placed on a circular plexiglass sheet illuminated with 850 nm LEDs (HK-F3528IR60-X from ledlightsworld.com)and behavior was recorded using a Point Grey GS3-U3-41C6NIR camera with a 16 mm Standard Schneider Compact VIS-NIR Lens and a 825 nM highperformance long-pass filter (Edmund Optics, Barrington, NJ).The camera was attached to a Zeiss Stemi 508 dissecting microscope and images were acquired at 10 fps.
Optogenetic stimulation was achieved with 1200 lumens of 605 nm light (measured at 50% PWM) using a 1 ½ ft strip of 28 5050SMD LEDs (superbrightleds.com,NFLS-Y30X3-Bk-LC2) mounted on a plexiglass circle (6 in diameter).Larval locomotion was routinely recorded 10 s prior to LED stimulation, 10 s during LED stimulation, and 10 s after stimulation.Stimulating light and camera shutter were controlled with an Arduino Uno microcontroller.
Larval images were processed in Fiji (10).To measure speed of locomotion, the change in position of the larva's posterior tip was calculated as distance (in µm) over time.To measure segment lengths, the multi-point selection tool on ImageJ was used to distinguish segment boundaries and establish regions as shown in Fig. 4A'.The values were exported into MATLAB and segment lengths were calculated for each stimulus condition (for code see doi: 10.6084/m9.figshare.c.7018752).(stimulation condition).Behavior was recorded using a Sony NEX-VG10 video camera mounted on an Olympus SZX16 microscope.

Adult behavioral analysis:
Abdominal curvature was measured using the Kappa-Curvature Analysis tool in Fiji (10) from the second or third abdominal segment to the posterior tip.For each animal, abdominal curvature was measured in six video frames-spaced 30 s apart-for the time intervals immediately preceding and immediately following onset of stimulation (i.e.temperature elevation).The mean curvature for each condition was calculated by averaging the six measurements and the ratio of the means (before stimulation vs during stimulation) was used to calculate the normalized curvature observed for each animal.
The means and standard deviations of normalized curvatures were calculated across animals of each genotype and the significance of the difference of means from 1 (i.e. the normalized curvature without stimulation) was estimated by t-test.

Supporting Figures
Fig. S1.Design of the siTrojan exon construct using the Cfa split intein.
(A) Schematic of the siTrojan Gal4 constructs, indicating the nucleic acid insertions adjacent to the splice acceptor (SA) and splice donor (SD) sites required to maintain phase for each of the three possible reading frames during protein translation.Choice of reading frame and right and left homology arms (HAL and HAR) are determined by the specific intron into which the siTrojan construct is to be inserted by CRISPR/Cas9 genome editing.All other details of the three constructs are the same.Each is flanked by attP (P) sites to permit ΦC31 integrase-mediated cassette exchange.Flippase recognition target (FRT) sites flank a removable 3XP3-RFP selection marker cassette.(B) Depending on the reading frame of the intron into which the siTrojan construct is inserted and the identity of the codon interrupted, the insertion will lead to the incorporation of a slightly different hexa-or septa-peptide sequence into the native protein at the site of interruption by the intron.All peptides will share the EYCFN sequence of the split intein moieties.All other possible amino acid additions on either side of this penta-peptide are indicated.These depend on the identity of the nucleotide(s) preceding the splice junction in the native gene and the codon formed after splicing in the siTrojan exon.Phase 2 intronic insertions must be avoided if the nucleotides preceding the splice junction are UA, which will occur if a Tyr residue encoded by a UAU or UAC codon has been interrupted.This will lead to insertion of the UAG STOP codon into the protein encoding sequence.1F).SEZ expression in this line is similar to that of the Dfd siTE -Gal4 lines made with siTrojan-Gal4 insertions, but consistent labeling of VNC neurons is also observed as can be seen from the lateral axon projections (arrowheads).Similar ectopic labeling was also seen in some Dfd siTE -Gal4 1 preparations, which have insertions into the same intron.Green, UAS-EYFP (B) Volume-rendered confocal images of brain neuropil labeling with nc82 antibody for control (w 1118 ) flies and flies heterozygous for siTrojan-Gal4 insertions into the Dfd gene at sites 1 and 2 (see Fig. 1F).The SEZ is outlined in yellow and was similar for all genotypes.
(C) Box plots comparing the volume of the gnathal ganglia (GNG, the part of the SEZ in which the Dfd gene is expressed) for the each of the three genotypes indicated in (B).GNG volume was calculated from confocal z-stacks using Fiji as described in the Extended Methods and was not significantly different across genotypes, as determined by t-test (n=4 for each genotype).No viable siTrojan-Gal4 insertion was obtained at the indicated site, but a viable siTrojan-p65AD insertion was obtained.For reasons not fully understood, Trojan insertions of p65AD tend to be healthier and more viable than insertions of Gal4 and Gal4DBD into the same site.More generally, however, the success or failure of transformation for a given siTrojan insertion is likely to depend on two main things: 1) how efficient the split intein is in the protein context defined by the specific site at which it is inserted, and 2) how well tolerated the insertion peptide incorporated into the full-length protein is at that site after ligation.Supporting Table S1 Figures S1 to S6 Tables S1 to S2 Legends for Movies S1 to S2 SI References Other supporting materials for this manuscript include the following: Movies S1 to S2 noted, cloning reactions and bacterial transformations used In-Fusion 2X mix and Stellar Gold Competent Cells from Takara Bio USA, Inc. (San Jose, CA).
-CFN-nlsp65AD constructs, pPacPL-PL-mCD8-D2A-Gal4 plasmid DNA from Diao and White (3) was first digested with BamH I, Not I and Kpn I and then mixed with the gBloc DNA fragment encoding either LexA-Myc-EY-Cfa N -T2A-3XHA or HA-T2A-Cfa C -CFN-nlsp65AD in In-Fusion 2X Mix.The mixture was incubated for 15 min at 50 o C and then transformed into competent cells.siTrojan Gal4, p65AD, and Gal4DBD Constructs: The siTrojan constructs were modified from the original Trojan exon constructs described by Diao et al. (1) using the pTGEM(0) vector as a backbone.DNA sequences for the Cfa N and Cfa C split inteins were similar to those described above and included coding sequences for the flanking EY and CFN amino acids as described above.In addition, to accommodate insertion into all three possible reading frames extra nucleotides were added before the sequences encoding the EY and/or after CFN, as indicated in Fig. S1A.Briefly, DNA constructs in the form attP-SA-EYCfa N -T2A-(Gal4/p65AD)-T2A-Cfa C CFN-SD-FRT-3XP3-RFP-FRT-attP were synthesized by Integrated DNA Technologies, Inc. (Coralville, Iowa) for all three reading frames.(Constructs used to convert existing insertions substituted attB for attP and Gal4DBD for Gal4 or p65AD, when needed.)For some constructs, DNA insertion fragments could not be synthesized directly.In these cases, 2-3 short DNA fragments with overlapping ends were synthesized and combined by PCR.DNA constructs were cloned into the pTGEM(0) plasmid DNA digested with Not I and Asc I.
Gene gRNA sequence (Intron; driver) lab GTTGAGAGTAATGCACTAAG GGG (1; lab siTE -Gal4) pb TTCCAAATTTCCTATAAATT AGG (3) GAGTCAGTCATCTATAGTGA TGG (7; pb siTE -Gal4) Dfd TCCGTTGAGATTTGATAGG AGG (2; Dfd siTE -Gal4 1 ) CTCGATTACGCATTCTCTCT TGG (3; Dfd siTE -Gal4 2 ) Scr GAACTTCCCGTCTAGTCTAT AGG (2; Scr siTE -p65AD) Antp TGGTTGGCGAAATATGCGGG AGG (3; Antp siTE -Gal4) Ubx GAAAAGACAAACACCTTAT TGG (1; Ubx siTE -Gal4) For lines made using Crispr/Cas9-mediated homologous recombination, plasmid DNA for each siTrojan exon construct was microinjected into embryos of {nos-Cas9} attP40 flies (4) together with sgRNA.Transformants were identified by crossing injected animals with w;+;Dr/TM3,Sb flies and screening for the presence of the 3XP3-RFP selection marker (i.e.red fluorescent eyes) in the progeny.Hox gene hemidriver lines and the ChaT siTE and crc siTE Gal4 driver lines were made by ΦC31-mediated conversion as follows.Midiprep plasmid DNAs containing the desired siTrojan exons in the correct reading frame were injected together with ΦC31 plasmid DNA into the embryos of flies with Hox gene siTrojan drivers made by Crispr/Cas9 or flies with MiMIC insertions (5).Transformants were isolated by crossing injected animals with w;;Dr/TM3,Sb flies (for siTrojan Gal4 transformants) or yw;;Dr/TM3,Sb flies (for MiMIC convertants) and screening for loss of the 3XP3-RFP selection marker or yellow selection marker, respectively.Transformants were crossed with w;;Dr/Tm3,Sb flies to make stable lines and their expression patterns were confirmed by crossing to w;UAS-2X EGFP flies (for Gal4 lines), yw;Tub-Gal4DBD;UAS-2X EGFP flies (for p65AD hemidrivers) or yw;Tub-VP16AD,UAS-2X EGFP flies (for Gal4DBD hemidrivers).The Vmat-Gal4DBD line was likewise made by ΦC31-mediated conversion of a Trojan exon Gal4 driver, Vmat MI07680-TG4.0available from the BDSC.S2 Cell Culture: S2 cells were grown in serum-free HyQ-CCM3 medium from VWR International (Radnor, PA) to a density of 10 6 cells ml −1 in 6-well plates.Cells in each well were transfected with 1.0 μg of each purified DNA construct using the FuGENE6 Transfection Reagent from Roche Diagnostics USA (Indianapolis, IN).pPac-PL-nlsLexA-Myc-EYCfaN-T2A-3XHA, pPac-PL-3XHA-T2A-CfaC-CFN-nlsp65AD and 13X lexAop2-6XmCherry plasmid DNAs were co-transfected in experiments in which LexA-p65AD activity was monitored by 13X lexAop2-6X mCherry expression.In two negative controls, pPac-PL-nlsLexA-Myc-EY-CfaN-T2A-3XHA or pPac-PL-3XHA-T2A-CfaC-CFN-nlsp65AD and 13X LexAop2-6XmCherry plasmid DNAs were co-transfected.Cells were analyzed for fluorescence using a Nikon C2 personal confocal microscope after 3d incubation at 25 °C.To quantify the density of mCherry-labeled cells in each of the three conditions (i.e. the two control conditions, with only one of the two split intein constructs, and the experimental condition with both split intein constructs), two wells containing cells transfected under each condition were photographed and a mask with the same six randomly chosen circular fields was overlaid on each photograph.The number of labeled cells was then counted in the 12 fields for each condition and cell counts were averaged.
All flies were raised at 18 o C to prevent activation of the dTrpA1 channel, which was used for thermogenetic neuronal activation.Experiments were conducted at 18 o C or 31 o C as indicated in the text.For recording behavioral responses, a 10 µl pipette tip was glued to the dorsal thorax of each fly using Devcon 5 Minute Epoxy and the fly was positioned above a Peltier plate.Plate temperatures were adjusted to yield final temperatures at the fly of 22 o C (inactive condition) or 31 o C

Fig. S2 .
Fig. S2.In vitro demonstration of Cfa split intein efficacy in Drosophila cells with a split LexA-p65AD.(A) Schematic illustrating constructs encoding: (1) the Myc-tagged DNA-binding LexA protein fused to the CfaN split intein moiety followed by T2A and 3 copies of the HA antigen; and (2) the p65 transcription activation domain fused to the CfaC split intein moiety preceded by an HA tag and T2A (top).Translation of these constructs, with attendant ribosomal skipping after translation of the T2A sequences, is predicted to yield the four products shown (middle).The two products at the extreme right and left will be

Fig. S3 .
Fig. S3.siTrojan-Gal4 insertions into Dfd do not alter SEZ morphology.(A) The larval CNS expression pattern of the Dfd TE -Gal4 line, made by insertion of a conventional Trojan exon (i.e.TGEM) insertion into site 1 (see Fig.1F).SEZ expression in this line is similar to that of the Dfd siTE -Gal4 lines made with siTrojan-Gal4 insertions, but consistent labeling of VNC neurons is also observed as can be seen from the lateral axon projections (arrowheads).Similar ectopic labeling was also seen in some Dfd siTE -Gal4 1 preparations, which have insertions into the same intron.Green, UAS-EYFP (B) Volume-rendered confocal images of brain neuropil labeling with nc82 antibody for control (w 1118 ) flies and flies heterozygous for siTrojan-Gal4 insertions into the Dfd gene at sites 1 and 2 (see Fig.1F).The SEZ is outlined in yellow and was similar for all genotypes.(C) Box plots comparing the volume of the gnathal ganglia (GNG, the part of the SEZ in which the Dfd gene is expressed) for the each of the three genotypes indicated in(B).GNG volume was calculated from confocal z-stacks using Fiji as described in the

Fig. S4 .
Fig. S4.Intronic sites of insertion of siTrojan exons into Drosophila Hox genessiTrojan-Gal4 insertions were attempted for all eight Hox genes of the fly.Green arrows indicate successful insertions for which stable, viable lines were established.Red arrows indicate attempted insertions that failed to yield viable transformants.Note that for the Scr gene, both a green and dotted red arrow are indicated at the same site.No viable siTrojan-Gal4 insertion was obtained at the indicated site, but a viable siTrojan-p65AD insertion was obtained.For reasons not fully understood, Trojan insertions of p65AD tend to be healthier and more viable than insertions of Gal4 and Gal4DBD into the same site.More generally, however, the success or failure of transformation for a given siTrojan insertion is likely to depend on two main things: 1) how efficient the split intein is in the protein context defined by the specific site at which it is inserted, and 2) how well tolerated the insertion peptide incorporated into the full-length protein is at that site after ligation.

Fig. S6 .
Fig. S6.Gal4-and UAS-nucLacZ-independent labeling of nerves by anti-βgalactosidase antibodyVariable labeling of nerves by anti-β-galactosidase antibody (anti-betagal) was frequently observed in preparations lacking either (A) Gal4 or even (B) UAS-nucLacZ.This labeling was uniform along the length of the nerves rather than isolated to nuclei as is expected for the β-galactosidase reporter encoded by nucLacZ, which has a nuclear localization signal.This uniform nerve labeling, which contrasts with the punctate, nuclear labeling seen in the examples indicated by yellow arrows in Fig.2B, thus appears to be an artifact of the anti-betagal staining although the source of its variability in terms of the number of nerves labeled-both across and within genotypes-remains unclear.(A) Anti-betagal (green) labeled larval CNS (top) from an animal with the genotype: w;+/(Sp or Cyo);UAS-nucLacZ/TM3,Sb. Inset (bottom) shows an enlarged view of the