A dominant negative variant of RAB5B disrupts maturation of surfactant protein B and surfactant protein C

Significance The Rab5 GTPase functions in early endosome (EE) fusion in the endocytic pathway. Here, we propose that RAB5B also has a noncanonical vesicular fusion function in the regulated secretion pathway that produces mature surfactant proteins SP-B and SP-C in the lung. This function was revealed from investigation of a proband with interstitial lung disease suggestive of a surfactant dysfunction disorder who carried a de novo Asp136His variant in the RAB5B gene. Our modeling in C. elegans provided information on the genetic and cell biological mechanism, and analyses of proband and normal lung biopsies suggested a function for RAB5B and EEs in surfactant protein processing/trafficking. This work indicates that RAB5B p.Asp136His causes a surfactant dysfunction disorder.


This file includes:
SI Materials and Methods Tables S1 to S5 Figures S1 to S16 SI References Exome sequencing and Variant analysis. DNA, from the proband and parents, was extracted from peripheral blood mononuclear cells for trio exome sequencing (ES) in CLIA certified laboratories and variants were confirmed by Sanger sequencing. Trio ES was performed at Baylor Genetics using methods described (1). Produced sequence reads were aligned to the GRCh37 (hg19) human genome reference assembly then variants were determined and called using Edico Dragen BioIT Platform to generate a variant call file. The annotation platform leverages the GenomOncology Knowledge Management System API and provides annotations using open source data sets such as gnomAD, EVS, and ClinVar, and professional resources such as HGMD Pro. Baylor Genetics provided a clinical analysis of the proband's trio ES.
For the research analysis of the trio exome data, Codified Genomics (www.codifiedgenomics.com) was used to prioritize novel, de novo variants that were not observed in gnomAD and biallelic variants in the coding region or near intron-exon boundaries with an allele frequency of less than 1% (2) ( Table S1). In addition, copy number variant (CNV) analysis was performed using the XHMM tool (3) that compares depth-of-coverage in exome sequencing data. XHMM was run on >200 exomes sequenced at Baylor Genetics using the same library and sequencing parameters to generate baseline coverage statistics and identify outliers for CNV discovery. The list of genotyped CNVs was then annotated for genes, gnomAD frequency, ClinVar, and other relevant annotations using AnnotSV (4). Custom scripts were then used to identify rare and de novo CNVs for additional investigation.
C. elegans CRISPR-Cas9 gene editing. Single nucleotide changes were introduced by injecting VC2010 animals with Cas9 protein, tracrRNA, crRNA, and single-stranded DNA oligonucleotide (ssODN) repair template (5,6). The repair templates were designed to have greater than 33-bp homology on each arm and include synonymous changes that destroy gRNA re-binding and create or destroy a restriction site for genotyping. Lines with the synonymous changes, but not the proband variant, were generated as controls to verify that the silent mutations did not contribute to the observed phenotype (referred to as control edits). The dpy-10 co-conversion strategy (7) was employed to enrich for edited events. Edited alleles were considered independent when obtained from different injected animals. The rab-5 gene in the edited strains was Sanger sequenced to verify that no extraneous changes in the gene were introduced. Strains were backcrossed twice to remove non-linked background mutations. We retained and analyzed two independent control edits and three independent variant edits to assess possible phenotypes from unrelated variations that might have arisen from off-target editing or segregation in VC2010. Preliminary experiments indicated that control edit strains D135D #1 and #2 were phenotypically similar and not different from VC2010; we, therefore, used #1 in all subsequent experiments. Similarly, preliminary experiments indicated that variant edited strains D135H #1, #2, and #3 were phenotypically similar; we therefore used #1 and #2 in subsequent experiments. See Table S2 for the complete strain genotype.
A single copy rab-5 transgene was integrated at the safe harbor Mos1 transposon insertion site ttTi5605 on Chromosome II (II: 0.77cM) (8) through CRISPR-Cas9. The single copy transgene allele was generated via the Self Excising Cassette (SEC) method (9). The SEC contains the sqt-1 roller marker and hygromycin antibiotic selection marker to facilitate screening (9). The small guide RNA (sgRNA seq: atatcagtctgtttcgtaa) plasmid for ttTi5605 MOS-SCI site was cloned into plasmid DR274 U6 through BsaI site. The rab-5 transgene contains the genomic DNA sequence from 1.3 kb upstream of the rab-5 start codon to 0.6 kb downstream of the stop codon (chromosome I: 9,307,157 to 9,310,370). Two synonymous changes were introduced at amino acid position 84 (TTG to TTA) and 133 (AAG to AAA) to facilitate the comparison of mRNA level between the rab-5 wild type and transgene locus in RNA-seq analysis.
C. elegans locomotion and length analysis. Worm length and crawling speed were measured using the WormLab system (MBF Bioscience), a video recording instrument fitted with software for tracking and analyzing individual worms (10). Age/stage matched young adults were used. Briefly, 35 late L4 stage larvae per strain were picked onto a thin-lawn assay plate 24 hours before the assay. On the day of assay, 10 animals were transferred to each assay plate, allowed to recover for 20 minutes, then video recorded for 2 minutes. Three plates per strain were recorded per trial and three independent trials were performed for each experiment, totaling up to 90 animals assessed per strain. Within each trial, plates were scrambled to avoid systematic effects. Videos were analyzed using WormLab software. When multiple tracks were generated from one worm, tracks were manually joined whenever possible. If a worm moved out of and re-entered the field, tracks were not manually joined and the longer tracking data was used. Any worms that tracked less than 15 seconds were censored from analysis. A custom R script was written to compile the data files and aggregate the worm length (mean worm length) and speed (center point speed) data (see GitHub repository: https://github.com/samorrison19/WormLabAnalysis). Two-tailed Student's t-test was used for statistical analysis and we consider p values less than 0.01 to be significant.

Obtaining animals of the appropriate genotype by crosses for phenotyping.
For heterozygous animals tested in WormLab, where there is no visible marker to differentiate self and cross progeny, animals were raised on fem-1 RNAi plates and females were picked to cross with corresponding males. Strains for endocytosis analysis, specifically arIs37[myo-3p::ssGFP], pwIs23[vit-2::GFP], and cdIs85[pcc1::2xFYVE::GFP], were crossed with VC2010 males to obtain heterozygous marker males, which were then crossed into control or rab-5[D135H]/tmC18 females. The resulting cross progeny that lost the balancer fluorophore but obtained the marker signal were picked at L4 stage and imaged 24 hours later.

Analysis of rab-5 endogenous and transgene transcript levels by RNA-seq.
RNA-seq analysis was conducted in triplicate to compare the mRNA level/read counts between the endogenous rab-5 wild type locus (Chromosome I) and single copy transgene locus on Chromosome II, using the strain UDN100087. In brief, 1-day old young adults were collected to isolate total RNA with Trizol reagent following the manufacturer's instruction. Total RNA was further purified with DNase digestion followed by column extraction. Libraries for Next Generation Sequencing were prepared using the manufacturer's recommended protocol by the Genome Technology Access Center. Library fragments were sequenced on an Illumina NovaSeq-6000 using paired-end reads extending 150 bases. The resulting reads were then quasi-aligned and quantitated against the Ensembl WBcel235.90 transcriptome with copies of the wild type and transgenic rab-5 using the expectation-maximization algorithm in Salmon (11) to generate allele-specific expression in transcripts per million (TPM). The results indicate that the mRNA level of the single copy rab-5 transgene is not significantly different from the endogenous locus (Fig. S3), and thus can be considered as supplying an equivalent gene dose as the wild type rab-5 gene.
C. elegans western blot. For analysis of C. elegans proteins, 100 1-day old young adults were picked into RIPA lysis buffer (G-Biosciences) supplemented with Pierce TM Halt TM protease inhibitor cocktail (Thermo Fisher Scientific) and went through two cycles of freeze-thaw with liquid N2. Samples were then placed into 2x Laemmli Sample Buffer (Bio-Rad) supplemented with 2-mercaptoethanol (MP Biomedicals) and boiled for 10 min. Samples were centrifuged at 500 x g for 2 min and the supernatant separated by 12% Mini-PROTEAN® TGX Stain-Free TM precast gels (Bio-Rad) under reducing conditions, electrotransferred to a polyvinylidene difluoride (PVDF) membrane via Trans-Blot® Turbo transfer system (Bio-RAD) and blocked in Intercept® blocking buffer (LI-COR) for 1 hour at room temperature. Proteins were detected using an anti-RAB-5 rabbit polyclonal antibody (1:100 dilution, gift from Anne Spang) (12) and anti-UNC-15 monoclonal antibody (1:500 dilution, DSHB) overnight at 4°C. The primary antibody binding was detected with anti-rabbit IRDye 800CW (LI-COR) and anti-mouse StarBright B700 (Bio-RAD) secondary antibodies. Signal was detected using Image LabTM (Bio-RAD). Western blot quantification was performed in ImageJ with the built-in Analyze Gels function. There was considerable variability between western blots, leading to normalization with the relevant control strain depending on the comparison to be made ( Figure S3D, F, H). rab-5 RNA interference. rab-5 feeding RNAi bacteria from the Ahringer library (13) was employed to test the specificity of the anti-RAB-5 antibody for both western blotting and cytological staining. Ten gravid adults were transferred to rab-5 RNAi plates and progeny grown to adults after 3 or 4 days were picked for analysis.

Cytological staining of C. elegans intestinal and gonadal preparations.
Immunostaining was performed essentially as previously described (14). Anti-RAB-5 antibody staining in the wild type intestine, at the level of the lumen, revealed puncta of various sizes as well as diffuse cytoplasmic staining. Following feeding wild type animals with rab-5 RNAi bacteria, intestines showed equivalent cytological staining compared to untreated animals. In contrast, western blots showed a significant reduction in the accumulation of the ~23kD band in similarly RNAi-treated animals (Fig S3). These results indicate that the cytological staining pattern observed was not RAB-5-specific, likely arising from cross reaction with proteins larger or smaller than RAB-5 observed in the western blot.
Infant normal lung scRNA-seq. Single-cell RNA-seq data was downloaded from the LungMAP consortium portal on April 7, 2020 (15) for analysis by Seurat version 3.1.5. We visualized the cells in 2-dimensions using the UMAP algorithm and identified clusters in an unsupervised manner using the Louvain algorithm (16). From the day-1 and 21-month normal lung datasets, the type II pneumocyte UMAP cluster was identified from reads of marker genes SFTPB and SFTPC (17). We assessed gene expression differences between the RAB5A, RAB5B, and RAB5C genes using the Wilcoxon Rank Sum test in R.
Immunostaining human lung sections. Formalin fixed tissue sections on glass slides were rehydrated for staining with hematoxylin and eosin or antibodies as indicated (Table S4). For immunochemistry, tissue sections were heated in antigen unmasking solution pH 6.0 (Vector Laboratories) in a pressure cooker (Biocare Medical) for 5 min then cooled for 15 min. Tissues were then incubated in tissue blocking buffer (fish gel, 2%, Sigma-Aldrich; donkey serum 5%, Sigma Aldrich; Triton X-100, 0.2% in PBS) for 40 min at room temperature (RT). Primary antibodies diluted in blocking buffer were applied to samples overnight at 4°C. After washing with Tween-20 (0.1%) in PBS, samples were incubated with secondary antibodies for 30 min at RT. Tissues were washed and counterstained with 4', 6 diamidino-2-phenylindole (DAPI) in mounting media (Fluoroshield with DAPI, Sigma) prior to adding the coverslip. Lung sections from individuals with disorders of surfactant dysfunction: an infant with homozygous SFTPB null variants: p.Pro133Glufs*95 (variant previously known as '121ins2') ('SFTPB null') (18), an infant compound heterozygous for ABCA3 null variants: c.817_821del/c.1729_1730del ('ABCA3 null'), and an adolescent heterozygous for SFTPC p.Ile73Thr ('SFTPC missense variant') (19) were used for comparisons to the immunostaining from the proband (de novo RAB5B p.Asp136His variant).
Image acquisition and analysis. For the ssGFP and 2xFYVE::GFP imaging, animals were placed in 5 μl of 100 mM NaN3 (Millipore-Sigma) in PBS, in a 35 mm cover glass bottom dish (MatTek). Approximately 15 to 20 adult stage animals were transferred to the NaN3 solution and covered with a 12-mm circular coverslip and then a 25-mm square coverslip. Confocal images were taken with a Lecia SP8X tandem scanning confocal microscope with a white light laser using either a 40x 1.3 NA oil PlanApo objective over ≥20 z-planes and a pinhole size of 1.00 (Leica Microsystems). Images were displayed as maximum intensity projections. Images were rendered and analyzed using LASX (Leica Microsystems) and Volocity (v6.3; Quorum Technologies, CAN) software.
For the VIT-2::GFP imaging, animals were anesthetized with levamisole, transferred to an agar pad formed on a slide, imaged with a Zeiss compound microscope, and analyzed with Axiovision.
For lung sections, images were acquired using an epifluorescence microscope interfaced with imaging software (LAS X, Leica) or laser scanning confocal microscope (LSM 710; Carl Zeiss, Oberkochen). Images were globally adjusted for brightness and contrast in Photoshop (Adobe). Fluorescence intensity per cell in lung tissues was quantified using the Analyze function for Set Measurement of Integrated density in FIJI (21). Variants identified by research analysis of the trio exome sequencing. de novo variants that are not observed in gnomAD and biallelic variants in the coding region or near intron-exon boundaries with an allele frequency of less than 1% are listed above. 1 Variant observed in one read in the father (1/141) but not in mother (0/125). 2 Normal homocysteine indicated that a diagnosis of cobalamin C deficiency was unlikely. 3 Synonymous change.   Fig. S16) TCGGGCAAAGACATGGGTAAA shRNA3 CAAAGGACAGTTCCATGAATA shRNA4 GGAAGTCTAGCCTGGTGTTAC shRNA5 (renamed as shRNA2 in Fig. 6) TATGAAGAGGCTCAGGCATAT          RAB5B protein is significantly knocked down in shRNAs 1 and 5 (renamed as shRNA2 in Figure 6), while total RAB5 protein, including RAB5A and RAB5C, is largely unaffected.