Solenodon genome reveals convergent evolution of venom in eulipotyphlan mammals

Significance Multiple representatives of eulipotyphlan mammals (shrews, hedgehogs, moles, and solenodons) are venomous, but little is known about the evolutionary history and composition of their oral venom systems. Herein we characterized venom from the endangered Hispaniolan solenodon (Solenodon paradoxus) and find that it consists of hypotensive proteins likely used to facilitate vertebrate prey capture. We demonstrate that venom has evolved independently on at least 4 occasions in eulipotyphlans, and that molecular components of these venoms have also evolved convergently, with kallikrein-1 proteins coopted as toxins in both solenodons and shrews following their divergence over 70 million years ago. Our findings present an elegant example of convergent molecular evolution and highlight that mammalian venom systems may be subjected to evolutionary constraints.

Proteomics SDS-PAGE gel electrophoresis. We first visualised the protein composition of the two venom samples and the saliva sample by one-dimensional SDS-PAGE gel electrophoresis. Ten micrograms of each sample (1 mg/ml) were added to reducing protein loading buffer at a ratio of 1:1 and incubated at 100°C for ten minutes. The samples were then loaded onto ten-well Mini-PROTEAN TGX precast AnykD gels (Bio-Rad), alongside 5 μl of protein marker (Broad Range Molecular Marker, Promega), and run at 100 V for 60 mins using a Mini-PROTEAN Tetra system (Bio-Rad). The gel was then stained with Bio-Safe Coomassie Stain (Bio-Rad) and destained with water for visualisation.
Bottom-up proteomics. Bottom-up analyses were carried out on either crude or decomplexed venom and saliva from the Hispaniolan solenodon (S. paradoxus). For our shotgun experiments, we digested 5 µg venom and saliva with trypsin (sequence grade, Sigma Aldrich) and analysed by LC-MS/MS. Briefly, samples were dried and re-dissolved in 4 M urea, 10% acetonitrile (ACN), 100 mM ammonium bicarbonate, pH 8. Cysteines were then reduced by incubating with 5 mM dithiothreitol (DTT) at 70 °C for 5 min and alkylated with 10 mM iodoacetamide at 37 °C for 90 min. The reduced and alkylated samples were then digested by incubating with 30 ng/µl trypsin overnight at 37 °C in 2 M urea, 10% ACN, 100 mM ammonium bicarbonate, pH 8, at a final substrate to enzyme ratio of approximately 100:1. The digested samples were desalted using a C18 ZipTip (ThermoFisher, Waltham, MA, USA), dried using vacuum centrifugation, dissolved in 0.5% formic acid (HFo) and 2 µg analysed on an AB Sciex 5600 TripleTOF (AB SCIEX, Framingham, MA, USA) equipped with a Turbo-V source heated to 550 °C. Tryptic peptides were fractionated on a Shimadzu (Kyoto, Japan) Nexera UHPLC with an Agilent Zorbax stable-bond C18 column (Agilent, Santa Clara, CA, USA) (2.1 x 100 mm, 1.8 µm particle size, 300 Å pore size), using a flow rate of 180 µl/min and a gradient of 1-40% solvent B (90% ACN, 0.1% HFo) in 0.1% HFo over 60 min. MS1 spectra were acquired at 300-1800 m/z with an accumulation time of 250 ms and selecting the 20 most intense ions for MS2 scans acquired at 80-1400 m/z with an accumulation time of 100 ms and optimized for high resolution. Precursor ions with a charge of +2 to +5 and an intensity of at least 120 counts/s were selected, with a unit mass precursor ion inclusion window of ±0.7 Da and excluding isotopes within ±2 Da for MS/MS. To identify venom and salivary proteins, we used Protein Pilot v5.0 (AB SCIEX, Framingham, MA, USA) to search the MS/MS spectra against all translated predicted protein encoding genes in the S. paradoxus genome, allowing for both biological modifications and amino acid substitutions. False positives were identified using decoy-based false discovery rates (FDR) as estimated by Protein Pilot, and only protein identifications with a corresponding local FDR of <0.5% were considered significant. Genome matches were then annotated in Blast2GO (20) and gene ontology (GO) terms ascribed. Analysis of decomplexed venom was done according to (21). In short, 1 mg of crude S. paradoxus venom was dissolved in aqueous solution, including 1% HFo and 5% ACN, to a final concentration of 10 mg/ml. Insoluble material was removed by centrifugation at 20,000 x g for 5 min. Dissolved venom was injected to an Agilent 1260 semi-preparative reverse-phase (RP) HPLC system (Agilent, Waldbronn, Germany) coupled to a Supelco Discovery Biowide C18 column (300 Å pore diameter, 4.6 x 150 mm column size, 3 mm particle size). The venom components were eluted with a linear gradient of 0.1% HFo in water (solution A) and 0.1% HFo in ACN (solution B) with a flow rate set to 1 mL/min. The gradient started isocratically (5% B) for 5 min, followed by linear gradients of 5-40% B for 95 min, 40-70% for 20 min, 70% B for 10 min, and finally end with a re-equilibration at 5% B for 10 min. Peak detection was performed by means of UV detection at l = 214 nm using a diode array detector (DAD). The peak fractions were collected manually and dried overnight in a vacuum centrifuge. The fractions containing peptides, previously determined by intact mass profiling, were re-dissolved in 20 µL of 5% ACN containing 0.1% HFo and directly submitted to LC-MS/MS analysis. The protein-containing fractions were then chemically reduced with DTT, subsequently heated at 90 °C for 10 min and separated via SDS-PAGE (15% polyacrylamide gels). Thereafter coomassie-stained bands were excised from the gel and subjected to in-gel digestion. In the first step bands were treated with a reduction solution (10 mM DTT in 25 mM (NH4)HCO3, pH 8.3, for 30 min at 65 °C), then a alkylation step (50 mM iodoacetamide in 50 mM (NH4)HCO3, pH 8.3, for 30 min at 25 °C in the dark) was applied, before in-gel trypsin digestion (12 h at 37 °C with 66 ng sequencing-grade trypsin/mL in 25 mM (NH4)HCO3, 10% ACN; 0.25 mg/sample). Samples were dried in a vacuum centrifuge, tryptic peptides were re-dissolved in 20 µl of 5% ACN containing 0.1% HFo and subsequent submitted to LC-MS/MS analysis using an Orbitrap XL hybrid mass spectrometer (Thermo, Bremen, Germany) coupled with an Agilent 1260 HPLC system (Agilent, Waldbronn, Germany), using a flow rate of 0.3 ml/min. The HPLC system was connected to a Grace Vydac 218MSC18 column (2.1 x 15 mm, 5 mm). A gradient was applied using 0.1% HFo in water (solution A) and ACN (solution B) and started isocratically with 5% B for 2 min, followed by an increase over 10 min from 5 to 40% B, then 40-99% B over 15 min, 99% B was held for 5 min with a final re-equilibration phase at 5% B for 5 min. MS experiments were performed on an Orbitrap analyzer with R = 15,000 at m/z 400 and maximum filling time of 200 ms for both survey and first product ion scans. MS/MS fragmentation of the most intense ion was performed in the LTQ using CID (30 ms activation time); the collision energy was set to 35%. Precursor-ion isolation was performed within a mass window of m/z 2. Dynamic exclusion was set up for a mass window of m/z 3 for up to 50 precursor ions with a repeat of 2 within 30 s. (22) MS2 spectra were searched against all translated predicted protein encoding genes in the S. paradoxus genome, and a set of protein typical contaminant (common Repository of Adventitous Proteins; CRAP), in total 18,272 sequences. Mass accuracy of XTandem! was set to 10 ppm for precursor mass and 0.2 m/z for MS2 level. Alkylation of Cys was set as fixed modification and acetylation of N-term, Lys as well as oxidation of Met were allowed as variable modifications. FDR was estimated through target-decoy approach and a cut-off of 0.01 was applied. All PSMs were validated manually and only protein IDs with at least two PSMs were considered.
Top-down proteomics. For top-down proteomics, we dissolved 0.2 mg crude venom and saliva in 20 ul aqueous 1% (v/v) HFo, to a final concentration of 10 mg/ml, and centrifuged at 20,000 x g for 5 min. Dissolved samples were then mixed with 60 µl of citrate buffer (0.1 M, pH 4.2) and divided into 40 µl each. One sample was mixed with 10 µl ultra-pure water, while the other sample was mixed with 10 µl of tris(2carboxyethyl)-phosphine (TCEP, 0.5 M) to chemically reduce existing disulfide bonds. The reaction mixtures were incubated for 30 min at 65 °C. Subsequently, the samples were centrifuged at 20,000 x g for 5 min, before 20 µl of both reduced and non-reduced samples were submitted to HPLC-high-resolution (HR) MS/MS measurements. Top-down LC-ESI-HR-MS experiments were performed on an LTQ Orbitrap XL mass spectrometer (Thermo, Bremen, Germany) coupled to an Agilent 1260 HPLC system (Agilent, Waldbronn, Germany), using a Supelco Discovery 300 Å C18 (2 x 150 mm, 3 mm particle size) column. The flow rate was set to 0.3 ml/min and a gradient of 0.1% HFo in water (solution A) and 0.1% HFo in ACN (solution B) was used. The gradient started isocratically (5% B) for 5 min, followed by an increase from 5 to 40% B over 85 min, 40-70% over 20 min, thereafter a washout at 70% B for 10 min and ended in a reequilibration phase at 5% B for 10 min. ESI settings were 7 L/min sheath gas; 25 L/min auxiliary gas; spray voltage, 4.8 kV; capillary voltage, 30 V; tube lens voltage, 170 V and capillary temperature, 330 ºC. The survey scan was performed with mass resolution (R) of 100,000 (at m/z 400). The MS2 spectra were obtained in a data-dependent acquisition (DDA) mode with R = 100,000 (at m/z 400) and two scan events, where the most abundant ion of the survey scan with calculable charge was selected. The FTMS measurements were performed with 2 micro scans and 500 ms maximal fill time. AGC targets were set to 106 for full MS scans and 3x105 for MS/MS scans. The normalized collision energy (CID) was adjusted to 30% and the high energy C-trap dissociation (HCD) to 35%. The activation time was set to 30 msec and the default charge state to z = 10 for the CID scan event or z = 7 for the HCD scan event. The precursor selection window was set to 2 m/z. Dynamic exclusion was performed with a 3 m/z exclusion window for precursor ions with 2 repeats within 10 s. The exclusion list contained maximal 50 ions for a duration of 20 s. For data analysis, raw data were converted to .mzXML files using MSconvert of the ProteoWizard package (version 3.065.85) and multiple charged spectra were deconvoluted using MS-Deconv (version 0.8.0.7370). The maximum charge was set to 30, maximum mass was set to 50,000, signal-to-noise threshold was set to 2, and m/z tolerance was set to 0.02 amu. Protein spectra matching was performed using TopPIC version 1.0.0 (http://proteomics.informatics.iupui.edu/software/toppic/) against all translated predicted protein encoding genes in the S. paradoxus genome, and a set of protein sequences found as typical contaminants from the common Repository of Adventitious Proteins (CRAP), in total 18,272 sequences. TopPIC mass error tolerance was set to 20 ppm. A false discovery rate (FDR) cut-off was set to 0.01. Maximal allowed unexpected PTMs was set to one. The deconvolution of isotopically resolved spectra was carried out by using the XTRACT algorithm of Xcalibur Qual Browser (Thermo, Bremen, Germany). The intact mass extracted ion chromatograms (XICs) were generated by deconvolution of the MS raw data using XTRACT of the Xcalibur Qual Browser version 2.2 (Thermo, Bremen, Germany). MZmine2 (version 2.25) was used for intact mass feature findings of the mono-isotopic deconvoluted MS spectra for the native venom and saliva samples. The mass alignment for the creation of XICs was performed with a minimum peak width of 30 s and 3.0e4 peak height. A 1.0e4 signal intensity threshold for the peak selection was used and mass error tolerance was set to 10 ppm. The baseline cutoff algorithm for chromatographic deconvolution was set to 1.0e4 signal threshold and the maximum peak width was set to 10 min. Feature alignment was performed with 10 ppm mass accuracy and 0.5 min retention time tolerance.
Details of all peptide/protein spectrum matching for the various proteomic experiments are displayed in SI Appendix File S1. Data repository. Mass spectrometry based proteomics data (.mgf, .raw, .mzXML and PSM/ PrSM output files as well as deconvoluted spectra) have been deposited to ProteomeXchange (http://proteomecentral.proteomexchange.org) with the ID PXD009593 via the Mass Spectrometry Interactive Virtual Environment (MassIVE, https://massive.ucsd.edu/) with the accession number MSV000082307.

Evolutionary analyses Phylogenetic analyses.
To infer the molecular evolution of tetrapod kallikreins, we used the 26 nucleotide sequences identified in the Solenodon paradoxus genome as queries, and retrieved homologous sequences for representative vertebrates, with a focus on mammals, from NCBI's 'non-redundant' database (http://www.ncbi.nlm.nih.gov/) and vertebrate genomes in Ensembl (23). This dataset was supplemented with sequences sourced from a prior phylogenetic analysis of the kallikrein gene family (24). The resulting nucleotide sequences were aligned using MUSCLE (25), and are displayed in SI Appendix File S2. Next, Bayesian inference was implemented in MrBayes v3.2.6 (26) for phylogenetic reconstructions. The analysis was executed for 2x108 generations with four parallel runs each with six simultaneous Markov chain Monte Carlo simulations, sampling every 100th tree and parameter set, and using a mixed model of evolution. The log-likelihood score of every sample was plotted against the number of generations to attain the point where the log-likelihood scores asymptote. The first 25% of the sampled trees and model parameters were discarded as burn-in, while the remaining were used for generating the consensus tree. The posterior probabilities for the nodes were estimated by creating a majority-rule consensus tree from all trees selected after burn-in. Phylogenetic reconstructions were also performed using a maximum likelihood approach implemented in PhyML 3.0 (27). The optimal model of nucleotide substitution was identified as GTR+G+I, and the tree topology was determined by implementing this model and using the Subtree Pruning and Regrafting (SPR) method, with 100 bootstrapping replicates. For amino acid analyses, we used a pruned dataset that retained representative sequences from Homo, Mus and Solenodon for all kallikrein paralogs (and other more distantly related serine proteases), in addition to a broad range of KLK1 sequences sourced from a diverse array of mammalian taxa. The resulting dataset was aligned using MUSCLE (25), and the alignment can be found in SI Appendix File S3. Bayesian inference analysis was undertaken as described above (2x108 generations in six Markov chains, sampling every 100th tree and model parameters, 25% burnin), except that we implemented a WAG+G model of sequence evolution selected by ModelGenerator (28).

Selection analyses.
To assess the nature of natural selection underpinning the evolution of kallikreins, various site-, branch-, and branch-site maximum likelihood models implemented in CodeML of the PAML (Phylogenetic Analysis by Maximum Likelihood) package (29). To this end, the sequence alignment and consensus tree generated above were used for these analyses. To determine positive selection, the ratio of non-synonymous to synonymous substitutions (ω) was estimated, and nested models M7 (null model) and M8 (alternate model) were compared using a likelihood ratio test (LRT) for determining statistical significance. Bayes Empirical Bayes (BEB) method implemented in the site model 8 (M8) (30) was used to identify amino acid sites under the influence of positive Darwinian selection. Mixed Effect Model of Evolution (MEME) (31) and the Fast Unconstrained Bayesian AppRoximation (FUBAR) analyses implemented in the Datamonkey webserver (32) were performed to determine the effect of episodic and pervasive influence of diversifying selection. For assessing the influence of selection on the eulipotyphlan KLK1 clade (which incorporates KLK1 and those arbitrarily annotated as KLK2 and KLK3 in Homo sapiens), the branch specific two-ratio model (33,34) was employed by selecting the KLK1 clade as the foreground branch, and: i) constraining (assuming no positive selection); and ii) relaxing (indicating positive selection) the ω parameter. A LRT test was conducted comparing the null model (neutral evolution) against the alternate model (positive selection). Subsequently, the Adaptive Branch-Site Random Effects Likelihood (aBSREL) approach (35), an improved version of the common branch-site models, was used to identify branches experiencing diversifying selection.
Structural Analyses. Three-dimensional homology models were generated for various tetrapod kallikrein homologs using the Phyre2 server (36), and ConSurf (37,38) was used for highlighting the evolutionary variability in amino acid sites. PyMOL 2.2 (The PyMOL Molecular Graphics System, Version 2.0, Schrödinger, LLC.) was used for visualization and for the generation of images of the homology models.

Sequence analyses.
A sequence alignment of representative eulipoytyphlans was constructed by pruning the dataset described above to only include KLK1 representatives from Blarina brevicauda, Sorex araneus, Condylura cristata, Erinaceus europaeus, Solenodon paradoxus, and Homo sapiens as an outgroup comparator. The sequences were translated in MEGA v.7 (39), realigned with MUSCLE (25) and then manually inspected. Jalview v.2.10.1 (40)was then used to annotate the alignment with sequence conservation grading. Next, we identified the catalytic triad residues and the five regulatory loops described by Aminetzach et al. (41), and subjected the amino acid residues found within loops 1, 2, 3 and 5 of the various venom-derived and non-venom KLK1s to analyses of hydropathy and charge. Regulatory loop 4 was discarded from this analysis due to the short length of the loop (three amino acids long). The hydropathicity and net charge of each loop were calculated for each sequence using the ProtParam tool of the ExPASy Bioinformatics Resource Portal (https://web.expasy.org/protparam/). Hydropathicity was calculated as the grand average of hydropathicity, and charge was determined based on the presence of the negatively charged amino acids aspartic acid and glutamic acid, and the positively charged amino acids arginine and lysine. Statistical comparisons of the hydropathicity and net charge of venom and non-venom KLKs were performed using unpaired two-tailed t-tests in Graphpad Prism (La Jolla, USA).
Genomic organisation of kallikreins. For the de novo annotation of KLK exons and synteny comparisons of mammalian genomic data we used the method described in Koludarov and Aird (42). We extracted exons that corresponded to KLK genes according to published annotations, and then used BLAST (BLASTn, evalue of 0.05, default restrictions on word count and gaps) to determine homology of those exons. This step was necessary, since many ab initio annotated KLK genes have a varying number of exons. As anticipated, this was an annotation-related artefact and in the final analysis no gene had more than five exons. By removing all unique exons, we created the initial exon database that was used to BLAST-search genomic sequences a wide taxonomic variety of mammalian species used in this study (see Fig. 4). This approach uncovered exons that were absent from published annotations, and by including those newly found sequences in our database, we were able to refine our process and subsequently repeated the BLAST search using the tblastn function of NCBI-BLAST, with e-value cutoffs of 0.01. This process was repeated until no new exons were discovered. We then manually assessed each result and established exon boundaries using Geneious v11 (https://www.geneious.com), relying on previously existing transcriptomeverified exon annotations wherever possible. We paid close attention to variations in exon boundaries between the different KLK genes and between different lineages.
Reconstructing the evolutionary origin of venom. Ancestral trait reconstruction of venom in Eulipotyphla was performed with Ape (version 5.2) (43) and Phytools (version 0.4.98) (44) packages in R (45). We treated venom as a trait with a binary distribution (i.e., venomous and non-venomous), and ancestral states were estimated using 'equal rate' (ER) and 'all rates different' (ARD) models, where both forward and backward rates are, respectively, fixed or flexible. The marginal ancestral states (empirical Bayesian posterior probabilities) were estimated for each node in a eulipotyphlan species tree derived from prior studies (46,47). Finally, a stochastic character mapping analysis (48) was performed for 1,000 simulations, and a trait density map was generated to depict the posterior probabilities of states across all edges and nodes of the species tree.
In vitro assessments of venom function Serine protease chromogenic assay. We applied a chromogenic assay, using the serine protease-specific chromogenic substrate S-2288 (Cambridge Biosciences), to measure the serine protease activity in the solenodon and snake venoms and that in the solenodon saliva. The reactions were plated in triplicate onto 384-well plates and changes in absorbance were measured at 405 nm for ~30 minutes (kinetic cycle of 21 s) using a FLUOstar Omega microplate reader (BMG Labtech GmbH, Ortenberg, Germany). We initially added 15 μl of diluted venom or saliva (1 μl of venom (1 μg) +14 μl PBS) to the plate, followed by a 3 min incubation at 37 ºC. We then added 15 μl Tris buffer (100 mM Tris, 100 mM NaCl, pH 8.5) and incubated for another 3 min at 37 ºC. Lastly, we added 15 μl of the 6 mM S-2288 chromogenic substrate to each well and set the reaction to run at 37 ºC in the plate reader. A negative control consisting of no venom (15 μl PBS + 15 μl Tris buffer + 15 μl substrate) was used in every experiment, and a positive control, containing 1 μg Bitis arietans snake venom, was used to validate the assay. The mean absorbance was plotted against time to compare venom activity with the baseline (negative controls) and positive control readings (SI Appendix Fig.  S1). We then subtracted the mean of the negative control readings from each of the venom/saliva and positive control readings and calculated the rate of substrate consumption for each sample by measuring the slope after a specific time interval (t1~5 min), as follows: where the absorbances at 405 nm represent the adjusted absorbances from which the negative control has been subtracted. We then plotted the means and the standard error of the mean (SEM) for each sample (n = 3 independent repeats), and performed statistical comparisons of solenodon venom and saliva using an unpaired two-tailed t-test in Graphpad Prism (La Jolla, USA).
Kininogen and fibrinogen degradation gel electrophoresis. We used a degradation SDS-PAGE gel electrophoresis approach (50) to determine whether fibrinogen (Sigma-Aldrich) or high molecular weight kininogen (HMWK) (HK1300, Enzyme Research Laboratories) was cleaved by solenodon venom or saliva. We included the following experimental samples containing: 5 μg of fibrinogen; 5 μg of venom; 5 μg of fibrinogen and 5 μg of venom; 5 μg of saliva; and 5 μg of fibrinogen and 5 μg of saliva. Similarly, we used 5 μg of HMWK instead of fibrinogen and the same setup as above to test whether the degradation of HMWK occurred. Samples were either incubated for 60 minutes at 37 °C before the addition of a reducing protein loading buffer at a ratio of 1:1, or directly mixed with the loading buffer and loaded onto gels (no preincubation). The samples were loaded onto 12-well 4-20% Novex Tris-Glycine gels (ThermoFisher) alongside a protein marker (Broad Range Molecular Marker, Promega) and run at 120 V for 1 h. The resulting gels were stained with Coomassie brilliant blue for 1 h, and then destained (4.5:1:4.5 methanol:acetic acid:H2O) for visualisation.
Plasminogen activation assay. We used a kinetic assay, modified from that recently described (51), to indirectly monitor the plasminogen cleavage activity of the solenodon or snake venoms and that in the solenodon saliva. The resulting plasmin activity was then detected via the cleavage of the H-D-Val-Leu-Lys-AMC fluorescent substrate (I-1390, Bachem). The samples were prepared in a final volume of 10 μl in assay buffer (100 mM Tris-HCl pH 7.5, 0.1% BSA) and contained 1 μg of venom or saliva. As a negative control, 10 μl samples containing assay buffer were prepared. In addition, as a positive control, samples containing 600 ng of recombinant Kallikrein (ab117200, Abcam) were included. The samples were pipetted in triplicate onto 384-well plates, followed by the addition of 50 μl/well of a mix containing the plasminogen (SRP6518, Sigma-Aldrich) and fluorescent substrate in assay buffer (final mix concentrations: 200 ng/ml plasminogen and 5 µM substrate). The increase in fluorescence was monitored on a FLUOstar Omega microplate reader for 45 min (kinetic cycle of 9 seconds, 300 cycles) using an excitation wavelength of 355 nm and an emission wavelength of 460 nm. The areas under the curve (AUCs) were calculated for the 0-30-minute interval, which was chosen as the point where the fluorescence in the solenodon venom-and saliva-containing samples had both reached a plateau. Using the AUCs, mean values and SEM were calculated for each sample (n = 3 independent replicates), and then statistical comparisons of solenodon venom and saliva were undertaken using an unpaired two-tailed t-test in Graphpad Prism (La Jolla, USA).
Identification of plasminogen activating toxins via nanofractionation. Solenodon venom was fractionated by liquid chromatography (LC) in parallel with at-line nanofractionation, with subsequent mass spectrometric analyses for the identification of bioactive components. For LC separation, we used a Shimadzu UPLC system ('s Hertogenbosch, The Netherlands) and a Shimadzu SIL-30AC autosampler for the injection of 50 µl of 5 mg/mL venom. A total flow rate of 500 µl/min was provided by two Shimadzu LC-30AD pumps. For the separation of venom we used a 250x4.6 mm Waters Xbridge Peptide BEH300 C18 analytical column with a 3.5-μm particle size and a 300-Å pore size, and the Shimadzu CTD-30A column oven was kept at 30°C. The mobile phase A comprised of 98% H2O, 2% acetonitrile (ACN) and 0.1% formic acid (HFo), and mobile phase B comprised of 98% ACN, 2% H2O and 0.1% HFo. The following gradient was used: an increase of mobile phase B from 0% to 50% in 20 min, followed by an increase from 50% to 90% B in 4 min and a 5 min isocratic separation at 90% B. Subsequently, the starting conditions were reached at 1 min, followed by column equilibration for 10 min at 0% B. There was a post-column flow split in a 1:9 ratio. 90% of the fraction was sent to a nanofraction collector, a modified Gilson 235P autosampler. The other 10% was sent to a Shimadzu SPD-M30A photodiode array detector and a maXis impact quadrupole-time-of-flight (qTOF) mass spectrometer (Bruker, Bremen, Germany). The mass spectrometer was equipped with an electrospray ionization source (ESI) and was operated in positive-ion mode. The following ESI source parameters were used: source temperature 180°C, capillary voltage 4.5 kV, nebulizer at 0.4 Bar and dry gas flow 4 l/min. Full MS spectra were recorded in a m/z 50-3000 range at 1 spectrum/s rate. Bruker Compass software was used for controlling the instrument and data analysis. The nanofractions were collected in a 384-well plate in a column serpentine-like fashion using Ariadne, in-house customs software, which allowed for 6-s nanofractions to be collected into black 384-well plates (Greiner Bio One, Alphen aan den Rijn, The Netherlands) with a maximum of six plates in one sequence. 350 wells of the 384-well plate were collected for each chromatographic run. The plates were evaporated overnight after nanofractionation, for approximately 16 h using a Christ Rotational Vacuum Concentrator (Salm en Kipp, Breukelen, The Netherlands) RVC 2-33 CD plus. After freeze-drying, the plates were stored at -20°C until further use for either tryptic digestion or for bioassaying. The previously developed plasmin assay referred to above (Zietek et al. 2018) was adapted in order to screen for plasminogen activating activity on the resulting nanofractionated plates. The assay was performed by making a bioassay mix containing 200 ng/ml plasminogen, 5 µM of the fluorogenic substrate H-D-Val-Leu-Lys-AMC (I-1390, Bachem) dissolved in 100 mM TRIS-HCl buffer (pH 7.5), containing 0.1% BSA (w/v). The mixture was prepared by adding the same volumes of enzyme and substrate solutions in to the buffer. Immediately after preparation, the mixture was dispensed on to the nanofractionated plates using a Multidrop 384 reagent dispenser (Thermo Scientific, Ermelo, The Netherlands). A VarioSkan LUX microplate multimode reader (Thermo Scientific, Ermelo, The Netherlands) was then used to measure the fluorescence of each well kinetically at 380 and 460 nm excitation and emission wavelengths, while the bandwidth was 12 nm. The temperature inside the platereader was kept at 37 °C throughout. The measurements consisted of 30 cycles, resulting in the construction of a kinetic curve and its corresponding slope, which was plotted against the time of the collected fractions, resulting in a bioactivity chromatogram. We identified proteins in the wells responsible for conferring plasminogen activating activity via nanoLC-MS/MS analysis of tryptic digests. We used an UltiMate 3000 RSLCnano system (Thermo Fisher Scientific, Ermelo, The Netherlands) for NanoLC separation of the tryptic digests, with full-loop injection mode run by the autosampler. A 1 µl injection volume was used by the autosampler, and after injection the samples were separated on an analytical Aqua C18 capillary column (150 mm x 75 µm) packed in-house (3 µm particle size and 200 Å pore diameter; Phenomenex, Utrecht, The Netherlands). The mobile phases consisted of eluent A (98% H2O, 2% ACN, 0.1% HFo) and eluent B (98% ACN, 2% H2O, 0.1% HFo). The following gradient was used for separation: 2 min isocratic at 5% solvent B, linear increase to 80% solvent B in 15 min, 3 min isocratic at 80% solvent B, down to 5% solvent B in 0.5 min and finally column equilibration for 9 min. The column oven was kept at 30°C throughout. A Variable Wavelength Detector set at 254 nm followed by a Bruker Maxis q−TOF mass spectrometer (Bruker, Bremen, Germany) were used for detection. The mass spectrometer had an electrospray ionization (ESI) source and was operated in positive-ion mode. The parameters for the ESI source of the MS instrument comprised of: capillary voltage 4.5 kV, gas flow 10 l/min and source temperature 200 °C. Mass spectra were obtained in the range of 50 to 3000 m/z and at 1 spectrum/s. Data-dependent mode was used to obtain MS/MS spectra by using 35-eV collision energy in the CID collision cell. Bruker Compass software was used for the instrument control and data analysis. Finally, we used MASCOT (Matrix Science, London, United Kingdom) for protein identification of the analysed tryptic digests, via a search against translations of the genes annotated in the Solenodon paradoxus genome. The following search parameters were used; instrument type: ESI-QUAD-TOF, digestion enzyme: semiTrypsin allowing for one missed cleavage, fixed modification: carbamidomethyl on cysteine, variable modifications: amidation (Protein C-term) and oxidation on methionine, mass tolerance: ± 0.05 Da fragment and peptide mass tolerance: ± 0.2 Da. Details of all peptide/protein spectrum matching for the identification of bioactive proteins are displayed in SI Appendix File S4. The raw data have also been deposited to ProteomeXchange (http://proteomecentral.proteomexchange.org) with the ID PXD009593 via the Mass Spectrometry Interactive Virtual Environment (MassIVE, https://massive.ucsd.edu/) with the accession number MSV000082307.
Electrophysiology. We performed patch-clamp electrophysiology experiments using TE671 human rhabdomyosarcoma cells endogenously expressing embryonic muscle-type nicotinic acetylcholine receptors (nAChR) (52) and Nav1.7 voltage gated sodium channels (VGSC) (53,54). Cells were cultured in 4.5 g/l glucose Dulbecco's modified Eagle's Medium (DMEM, Sigma) supplemented with 10% foetal bovine serum, 2 mM glutamine, 10 IU/ml penicillin and 20 µg/ml streptomycin (Sigma). Cells were plated in 35 mm Petri dishes over heat-sterilised sections of glass coverslips in 2 ml of DMEM and kept in a 36.5°C incubator with 5% CO2. Locust (Schistocerca gregaria) primary neurons (natively expressing insect neuronal nAChR) were dissected from the mushroom bodies of 6th instar locusts and dissociated by incubating in Rinaldini's saline (135 mM NaCl, 25 mM KCl, 0.4 mM NaHCO3, 5 mM D-glucose, 5 mM HEPES, pH 7.2 with NaOH, with 2 mg/ml collagenase and 0.5 mg/ml dispase) for 15 minutes at 36.5°C. Cells were gently triturated with a 200 µl Gilson pipette tip and plated over poly-L-lysine coated glass coverslips in 5:4 DMEM:Schneider's medium and incubated for 12 hours at 36.5°C with 5% CO2. Patch-clamp experiments were performed within 36 hrs of dissection. Patch pipettes with a resistance 5-7 MΩ were pulled from thick-walled borosilicate glass (World Precision Instruments, USA) using a P-97 Flaming/Brown micropipette puller (Sutter Instrument Co., USA) and filled with a caesium pipette solution (140 mM CsCl, 10 mM NaCl 1 mM MgCl2, 11 mM EGTA and 5 mM HEPES, pH 7.2 with CsOH). The bath solution for TE671 cells was 135 mM NaCl, 5.4 mM KCl, 1 mM CaCl2, 1 mM MgCl2, 5 mM HEPES and 10 mM D-glucose (pH 7.4 with NaOH), and for locust neurons was 180 mM NaCl, 10 mM KCl, 2 mM CaCl2, 10 mM HEPES, pH 7.2. Whole-cell currents were monitored using an Axopatch 200A (Axon instruments, USA) patch-clamp amplifier and recorded using WinWCP V4.5.7 software (Dr John Dempster, University of Strathclyde). Venom and agonist were applied to cells using a DAD-12 Superfusion system (Adams and List Associates, USA) fitted with a 100 µm polyamide coated quartz output tube with a solution exchange time of 30-100 ms. Control responses and those in the presence of venom were obtained on the same cell and repeated for n≥4 separate cells. Series resistance was compensated by 75% to minimize any voltage errors, and data were filtered at 10 kHz. Data were analysed using GraphPad Prism 7.

In vivo assessments of venom function
Locust toxicity assay. Fifth instar desert locusts (Schistocerca gregaria) were injected at 0.05 ml/g with venom at various doses (0.1, 1, 10 and 50 µg/g solenodon venom, n=4 per dose) in sterile filtered (0.22 m Starstedt) insect Ringer's saline (4.35 mM CaCl2(+2H2O), 3.81 mM NaHCO3, 4.3 mM KCl and 0.17 mM NaCl, adjusted to pH 7.2 with NaOH). The injection site was between the 1st and 2nd ventral tergite, towards the thorax, and was performed using a 300 µl Micro-Fine insulin syringe. Ringer alone was used for controls (n = 4). Injected locusts were placed individually in ventilated livefood tubs and kept at 26 °C. Their status was recorded at 24, 48 and 72 hours post injection as either alive, incapacitated (unable to self-right within 10 s when placed on back) or dead.

Centipede toxicity assay.
To test the toxicity of solenodon venom against centipedes we injected juvenile giant centipedes (Ethmostigmus rubripes; 4-5 cm total length) bred in the lab from specimens collected in Brisbane, Queensland, Australia, with 2 µl of venom at various doses (20 µg/g and 100 µg/g solenodon venom in insect Ringer's saline, n = 5). Negative controls were injected with Ringer's saline only (n = 5). The injection was done using a 1 ml 29G insulin syringe into the 5th trunk segment. Injected centipedes were placed individually in ventilated food tubs with a piece of wet paper for moisture, but no substrate, and kept at 25 °C. Their status was recorded at 30 mins, 60 mins and 24 hours post injection as either alive, incapacitated (unable to respond to contact from forceps) or dead.

Pulse-oximeter measurements of mice.
To measure the physiological responses of mice dosed with solenodon venom in a non-invasive manner, we used a MouseOx pulse-oximeter monitoring system (MouseOx, Harvard Apparatus), coupled to MouseOxPlus software (Starr Life Sciences) for data collection and analysis. The experiment consisted of groups of male 20 g CD1 mice (Charles River) receiving either 500 µg solenodon venom (n=3) or PBS, pH 7.2 (n=3) via intravenous injection (100 µl dose/animal via the tail vein). Throughout the experiment, we collected the following physiological parameters: oxygen saturation (% oxygen), pulse rate (beats per minute), respiration rate (breaths per minute) and pulse distension (mmHg). The data was collected via a monitoring collar attached to the recording device, which was applied to the neck of each mouse and held in place until eight values were measured on the resulting trace (typically over a period of ~1 min). To ensure data robustness, readings were only retained where 'error' and 'activity' were zero. Each experimental animal was subjected to five independent measurements (each consisting of the eight values mentioned above) at different timepoints: at baseline (prior to the administration of venom or saline) then at 1 min, 15 min, 30 min and 45 min post-administration. For each time point, mean readings for each experimental animal were calculated, prior to the calculation and plotting of group means and standard deviations expressed as the percentage of baseline readings. This in vivo animal experiment was conducted using protocols approved by the Animal Welfare and Ethical Review Boards of the Liverpool School of Tropical Medicine and the University of Liverpool, and performed in specific pathogen-free conditions under licenced approval of the UK Home Office, in accordance with the Animal [Scientific Procedures] Act 1986 (UK) and institutional guidance on animal care.
Mean arterial blood pressure in rats. The effect of S. paradoxus venom on blood pressure was examined in anaesthetized rats, as described previously (55). We first anaesthetized male rats (Sprague-Dawley; 250-320 g) with 100 mg/kg ketamine and 10 mg/kg xylazine (i.p.). A midline incision was made and cannulae inserted into the trachea, jugular vein and carotid artery to enable artificial respiration if required, administration of venom, and recording of arterial blood pressure, respectively. The carotid artery cannula was connected to a PowerLab/400 system via a Gould Statham P23 pressure transducer. Before the injection of venom, blood pressure was allowed to stabilize for at least 10 min. Body temperature was maintained at approximately 37°C using an overhead lamp and heated table. Venom (1 mg/kg; n=5) was administered through the jugular vein and flushed with saline (0.2 ml). Control traces were obtained by flushing only with saline. Animals were assigned randomly to experimental groups, and the experimenters were not blinded to the condition. This procedure was approved by the Monash Animal Research platform (MARP) Animal Ethics Committee, Monash University, Australia (#MARP/2017/147).

Dietary analyses of wild solenodons
To assess the contribution of vertebrate prey to the diet of S. paradoxus, fieldwork was carried out in the Dominican Republic during two seasons: in January to February 2015 (dry season) and July to August 2015 (wet season). Fieldwork was undertaken with permission from the Secretaría de Estado de Medio Ambiente y Recursos Naturales, Dominican Republic. Samples were collected in the Bahoruco-Jaragua-Enriquillo UNESCO Biosphere Reserve region in Pedernales Province along the southwestern edge of Parque Nacional Sierra de Bahoruco at 300-400 masl (~18°08' N, 71°39' W). The natural vegetation of the sampled area is mid-elevation broadleaf forest, although increasing anthropogenic impacts on this landscape have created a mosaic agriculture-forest habitat consisting of unmanaged pastures, small-scale mixed cropland, and primary and secondary forest fragments. Abundant plant families include Malvaceae, Euphorbeaceae, with the leguminous species Acadia macracantha and Prosopis juliflora common in disturbed areas. Geologically, Miocene limestone karst dominates the landscape, often serving as burrows for solenodon family groups and supplying potential food items such as land snails and arachnids. We identified active solenodon foraging sites using "nose-pokes", which are diagnostic conical holes in the soil and leaf litter made by the solenodon's probing long nose. These tracks are often accompanied by diagnostic tail impressions and diggings made by the animal's robust forelimbs. Solenodon faeces can be readily identified by the presence of abundant chitinous millipede exoskeletons in the faecal matrix, and the invertebrate prey diversity of S. paradoxus has previously been described (56). To detect vertebrate prey, we collected fresh faeces (< 2 days) opportunistically when encountered near foraging sites and active burrows. Individual collected samples were flash preserved in 95% ethanol for <24 hours initially, then decanted with the ethanol completely removed by drying with silica beads. The faeces were then stored dry until later analysis. We collected a total of 64 solenodon faecal samples, with 40 collected in the dry season and 24 in the wet season. DNA was extracted using a PowerFecal® DNA Isolation Kit (MO BIO Laboratories Inc, Carlsbad, CA; Catalog No. 12830-50) to facilitate removal of PCR inhibitors specific to faeces. We then used previously designed vertebrate primer sets for 12S (57) and 16S ("16smam") (58) ribosomal genes to probe for the presence of vertebrate prey DNA in solenodon faeces. Resulting DNA was sequenced on an Ilumina MiSeq machine at the National High-throughput DNA Sequencing Centre of Denmark. DNA operational taxonomic unit (OTU) sequences were then compared to Genbank's non-redundant nucleotide and the Barcode of Life Data Systems (BOLD) databases to identify prey items above an 85% identity threshold. To calculate the frequency of occurrence of vertebrate prey, we summed the presence of a food item across all 64 samples (e.g. following the approach of (59)) and calculated the percent frequency occurrence using the following formula: where N is the total number of faecal samples considered and Ni is the number of samples containing the food item i (60).
As the identification of vertebrate prey items was reliant upon available reference sequences in Genbank and BOLD, we were unable to identify OTUs to the genus and species level, and our resulting frequency of occurrence value of 12.3% is therefore a conservative estimate. In addition, the sampled region represents just one of many biomes inhabited by S. paradoxus across Hispaniola, and due to the aforementioned impact of anthropogenic disturbance on this region, it is possible that this generalist species may alter its diet and therefore the proportion of vertebrate preyto reflect local conditions. As Hispaniola experienced massive vertebrate extinctions in the Mid-Holocene and historic period, it is also possible that their preferred prey items have been lost, forcing solenodons to decrease vertebrate protein in their diet. However, as introduced mammals have largely replaced native mammals in body size (61), it is unclear whether solenodons would have experienced an overall decline in the availability of vertebrate prey. Figures   Fig. S1. The in vitro activity of S. paradoxus venom. Raw plotted data used to calculate areas under the curve displayed in Fig. 2 for the chromogenic serine protease assay (A) and fluorescent plasminogen activating assay (B). C) Solenodon venom shows no evidence of enzymatic phospholipase activity as measured by fluorescent enzyme assay. D) Solenodon venom and saliva both degrade the alpha and beta chains of fibrinogen (indicated by arrows), but not in a potent manner, as incubation is required to detect extensive degradation. The data displayed in A-C represent every third reading collected (to aid the display of data) and each data point represents the mean of triplicate measurements, with error bars representing SEM.    (41,62,63) are shaded yellow. Orange shading highlights novel insertions within regulatory loops observed in Solenodon venom KLK1s. Black asterisks highlight the previously proposed catalytic triad residues (41) and red asterisks highlight positively selected sites detected by the Bayes Empirical Bayes approach (see SI Appendix Table S4). Homo sapiens KLK1 is included as an outgroup comparator. B) Hydropathicity and net charge analyses of the regulatory loops of representative eulipotyphlan KLK1 amino acid sequences. Bar charts show comparisons of hydropathicity (grand average of hydropathicity), and diamonds show comparisons of net charge. Data from kallikreins identified from venom, either in this study or prior studies (62,63), are coloured red, and those from non-venomous taxa in blue. Data is not displayed for regulatory loop 4 due to the short length of this loop (three amino acids long). Note the change in scale for net charge in loop 2.    Inset: voltage protocol for the step depolarisation to -10 mV from a holding potential of -80 mV. B) Conductance-voltage relationship in the absence and presence of 100 μg/ml solenodon venom. Current data were transformed into and normalised to maximum peak conductance (±SEM) and the curves displayed are fits of a Boltzmann's equation that enabled estimation of half activation voltages (V50.act) (see also SI Appendix Table S8). Inset:

SI Appendix
Step depolarisations from -80 to +40 mV in 5 mV increments for 25 ms from -80 mV holding voltage. C) Voltage dependence of steady-state inactivation of VGSC expressed by TE671 cells in the absence (control) and presence of 100 μg/ml solenodon venom. Normalised peak currents plotted as a function of the pre-pulse potentials and fitted with a Boltzmann sigmoidal relationship for the computation of the V50.inact value (n=4) (see also SI Appendix Table S8). Inset: The voltage protocol showing pre-pulse potentials from -120 mV to -20 mV for 200 ms before a -10 mV depolarisation step for 25 ms.        Table S7. Adaptive Branch-Site Random Effects Likelihood analysis of kallikreins. Out of 461 branches, a total of 13 branches were found to have undergone episodic diversifying selection. All branches detected are found within the KLK1 clade (which includes Homo sapiens KLK1, KLK2 and KLK3). Four of the 13 branches detected are KLK1 genes from Solenodon paradoxus, and three of these encode proteins detected in solenodon venom (highlighted red). A total of 49 branches were formally tested for diversifying selection. Statistical significance was computed using the Likelihood Ratio Test at a threshold of p ≤ 0.05.

Branch
LRT p-value ω distribution over sites

SI Appendix Files
File S1. Detailed information on the peptide/protein spectrum matching for the shotgun, decomplexed, topdown and plasminogen-activating experiments.
File S2. A DNA sequence alignment of tetrapod kallikreins used to construct the Bayesian phylogeny displayed in SI Appendix Fig. S4.
File S3. An amino acid sequence alignment of tetrapod kallikreins used to construct the Bayesian phylogeny displayed in Fig. 4A.
File S4. Detailed information on the peptide/protein spectrum matching for the identification of plasminogen activating venom proteins.