Volatile biomarkers of symptomatic and asymptomatic malaria infection in humans

Significance Malaria elimination efforts are hindered by the prevalence of asymptomatic infections, which frequently go undetected and untreated. Consequently, there is a pressing need for improved diagnostic screening methods. Based on extensive collections of skin odors from human populations in Kenya, we report broad and consistent effects of malaria infection on human volatile emissions. Furthermore, we found that predictive models based on machine learning algorithms reliably determined infection status based on volatile biomarkers and, critically, identified asymptomatic infections with 100% sensitivity, even in the case of low-level infections not detectable by microscopy. These findings suggest that volatile biomarkers have significant potential for the development of robust, noninvasive screening methods for detecting symptomatic and asymptomatic malaria infections under field conditions.


Supplemental methods:
Determination of malaria status Light Microscopy: Thin and thick blood smears were prepared for microscopy and stained in 10% Giemsa prepared using acid buffered water (pH 7.2). The films were then read in duplicate under high power magnification in oil immersion on a light microscope (Optika B-192) by two independent, experienced microscopists. Parasite counts were enumerated per every 200 white blood cells (WBCs) on the thick film. Parasite species were identified using the thin film.
Rapid Diagnostic Tests (RDT): Tests employed the SD Bioline™ kit (Standard Diagnostic, INC.) in accordance to manufacturer's instructions. The kit provides identification of Plasmodium falciparum malaria based on a specific antibody reaction (histidine-rich protein II), and also a non-specific enzyme-based test (Plasmodium lactate dehydrogenase) for the presence of any other malaria species present in the area (P. ovale and P. malariae).
DNA extractions: Three blood spots were collected on qualitative filter paper for confirmation of infection status and infecting species by nested PCR coupled to high resolution melting (HRM) targeting the 18S rRNA gene in a method adapted from Kipanga et al, (1) but with modifications. Briefly, an individual blood spot was punched using a 3mm punch decontaminated first using 10% bleach then followed by 70% ethanol and only with ethanol between all subsequent punches. The spots were placed in individual wells in a 96 well plate. For each plate, four wells set aside for two negative controls (blank filter paper) and two positive controls (DBS prepared from the WHO Plasmodium falciparum standard). Each well was then filled with 100 µL of Tris-EDTA (TE) buffer (pH 8.0) and the plate covered with an Eppendorf heat sealing foil (Eppendorf AG, Hamburg Germany) and placed on a shaker for 30 min at a speed of 1000rpm at room temperature. After shaking, the plate was centrifuged at 13000rcf for 5 minutes at 4°C (Eppendorf 5417R). The DNA pellet obtained was washed and re-suspending three times in TE buffer. After the final wash, 10 µL of proteinase K buffer (1.5 mM MgCl2, 50mM KCL, 0.5% Tween 20, 100µg/mL proteinase K and 10mM tris-HCL; pH 8.3) was added to each well and the plate incubated at 55°C for 1 hr followed by a second incubation step at 95°C for 10 minutes to inactivate the proteinase K. The DNA extracts were then stored at -20°C for downstream processing.
nPCR-HRM: Extracted DNA was amplified using two sets of primers; PL-1459-F and PL-1706-R for the primary reaction and PL-1473-F and PL-1679-R for the nested amplification reaction. The primary reaction was carried out in a Veriti Thermocycler (Applied Biosystems) with thermal conditions consisting of an initial denaturation step of 95°C for 5 minutes then 35 cycles of denaturation at 94°C for 20 seconds, decreasing annealing temperatures from 65°C to 50°C for 25 seconds (cycles 1-5), 50°C for 40 seconds (cycles 6-10), 50°C for 50 seconds (cycles , extension at 72°C for 30 seconds, and a final extension at 72°C for 3 minutes. The nested reaction was carried out in a real-time PCR-HRM instrument, Rotor gene (QIAGEN, Germany), with HRM conditions consisting of an incremental temperature increase of 0.2°C from 75°C to 90°C and fluorescence acquisition at each 2 second temperature increment. Representative samples from each cluster of melt curves different from those of the WHO Plasmodium falciparum control were obtained cleaned up using the ExoSAP-IT™ protocol and sent for sequencing at Macrogenlab (Seoul, Korea). The sequences obtained were aligned to reference sequences of Plasmodium ovale and malariae (GenBank: AB182490 and GenBank: LT594624) using Geneious 6.1.6 software.
GC/MS Methods for K1: Compounds from a 2μl injection were separated on a SLB-5ms (30m x 0.25mm ID x 0.1μm film thickness; Supelco, USA), using the following temperature program: 35°C for 0.5 min then raised by 7°C/min to 270°C and a constant flow rate of 1.2mL/min of helium. Compounds were detected with an electron impact single quadrapole mass spectrometer (70 eV: ion source 230°C: quadrapole 150°C, mass scan range: 30-350 amu). The analyses were otherwise similar to those described for K2.

Data analyses
Discriminant Analysis of Principal Components (DAPC) was implemented in the adegenet R package v2.0.1 (2-3). The function find.clusters was used to determine the optimal number of PCs (to avoid unstable assignments of individuals to clusters, we used a maximum number of PCs corresponding to the sample size divided by three).
For our predictive models, the data was partitioned into training and testing datasets, using 70% and 30% of the data respectively. The training dataset was used to build the model using the methods "Adaboost.M1" (4), "rf" (5) and "rrf" (6) in the R caret package (7). Parameters for both models, ntree and mtry for random forest (rf) and mfinal for adaboost were tuned away before running each model, and the best combination of parameters was chosen using the accuracy as the performance metric. This final set of parameters was used to train the final version of the models and their performance were tested on the independent test dataset.
Random forest is a bagging technique based on individual and independent classification trees that are run in parallel with different subsets of the data. In this this ensemble, each tree is built on different bagging subsamples of samples and each split of the tree is constructed with a randomly selected subset of compounds. During the training process, random forest uses the out of bag (OOB) error rate as an estimation of the classification accuracy of the model (5). In Adaboost, on the other hand, the models are trained sequentially and each new model "learns" from the previous one focusing on samples that are difficult to classify (4).
To further explore the date effect in selected compounds, we performed a two-way ANOVA in the R package ARTool (8), with infection status (AS + S vs U) or (S vs U) and collection date as main effects (Tables S4-5). In this analysis, all collections on a given date took place at a single location.

Supporting figures and tables:
Figure S1. Group separation using DAPC showing differences between Asymptomatic (blue), Symptomatic (red), and Uninfected (cyan) groups for foot and arm volatiles in K1. Vertical line: axis 1 (PC1); horizontal line: axis 2 (PC2). Points represent individual samples, with colours denoting malaria condition and inclusion of 95% inertia ellipses.       (15) and breath (16); Plasmodium in vitro (17) Produced by Clostridium and other bacteria genera found in the human gut microbiome (18) C-9 hexanal Human skin (15), and Plasmodium in vitro (19) Produced by Lactobacillus and Enterococcus bacteria (20), which are present in the human gut microbiome (21). Also formed by the peroxidation of fatty acids in cell membrane phospholipids ( Produced by Streptococcus mutans, which has been linked to human oral decay (31) C-61 nonanal Human skin (15) and breath (23) Formed by the peroxidation of fatty acids in cell membrane phospholipids (22)