Histone H3 lysine 4 methylation signature associated with human undernutrition

Significance The early life environment can exert a profound effect on long-term health. However, differences in developmental epigenetic patterns in response to environmental challenges are not well understood in humans, where nutrient insufficiency and pathogen exposure in early infancy can impact immune system function and metabolic health into adulthood. Here we report a comprehensive global picture of the patterns of the epigenetic modification histone H3 lysine 4 trimethylation (H3K4me3) in undernourished infants and their mothers. Comparisons of the emergent patterns of H3K4me3 within the first year of life reveal large-scale changes consistent with the impact of a poor environment, and uncovered a candidate gene with a role in the response, which was validated in a mouse model.


Supplementary Information Text Section 1: Experimental Methods and Materials
Isolation of peripheral blood mononuclear cells (PBMCs) from human whole blood PBMCs were isolated at icddr,b (International Centre for Diarrhoeal Disease Research, Bangladesh) in Dhaka, Bangladesh. Peripheral blood samples from 18-week-old and 52-week-old children and from their corresponding mothers enrolled in the PROVIDE study (1) were collected in tubes supplemented with anticoagulants (e.g. heparin, EDTA, citrate, ACD-A or citrate phosphate dextrose). Blood samples were processed within 8 hours of collection. To isolate PBMCs, per 2 mL blood: blood was diluted with 2-4X the blood volume of buffer (PBS supplemented with 2% heatinactivated FBS ((cat #D-S11550, Atlanta Biologicals, Flowery Branch, GA, USA); FBS was heatinactivated by thawing to room temperature followed by incubation at 56ºC for 30 min and then cooling). Approximately 6 mL of diluted cell suspension was carefully layered over 3 mL Ficoll-Paque (GE Healthcare Life Sciences, Pittsburgh, PA, USA) in a 15 mL conical tube and centrifuged at 1000 x g for 15 min at 22ºC in a swinging-bucket rotor without brake. The upper plasma layer was aspirated, and the mononuclear cell layer was carefully transferred to a new 15 mL conical tube. The sample volume was adjusted to approximately 10 ml with PBS, mixed, and centrifuged at 400 x g for 10 min at 22ºC. The cells were counted using a hemocytometer, and the washing step was repeated in order to remove any residual Ficoll. The supernatant was carefully removed and PBMCs were resuspended with gentle swirling in 1 mL freezing medium (90% heat-inactivated FBS with 10% DMSO (cat #D5879, Sigma-Aldrich, St. Louis, MO, USA)). Cells were placed on ice and transferred into a cryopreservation vial by slowly pipetting. The cryovial was capped, placed into a pre-cooled freezing container filled with isopropanol (such as a "Mr. Frosty" (cat #5100-0001), Thermo Fisher Scientific, Waltham, MA, USA), placed at -80ºC for 24 h, then moved to storage in liquid N 2 . Frozen vials containing sample PBMCs were shipped to the University of Virginia using a dry shipper charged with liquid N 2 , and vials were subsequently stored at -80ºC prior to crosslinking.

PBMC Crosslinking and Lysis
Frozen PBMC samples stored at -80°C were crosslinked and lysed using protocols adapted from the NIH Roadmap Epigenomics Mapping Consortium [http://www.roadmapepigenomics.org/protocols] and Gilfillan et al (2012) (2). Work with unfixed cells was performed in a biological safety cabinet. Each cryovial of frozen PBMCs contained 4x10 6 to 2x10 7 cells. Typically four specimens were processed at a time. Cells were thawed by warming the vials in gloved hands, contents (~1 mL) were immediately transferred to 15 mL conical tubes and tubes placed on ice. Vials were rinsed once with 1 mL ice cold PBS, and the rinse added to each corresponding 15 mL tube. Conical tubes were centrifuged in a tabletop centrifuge at 1500 x g for 5 min at 4°C. Supernatants were aspirated, and cell pellets rinsed with 1 mL ice cold PBS. Conical tubes were centrifuged again at 1500 x g for 5 min at 4°C and the supernatant aspirated. To crosslink samples, 5 mL room temperature FIX solution (1% formaldehyde in DMEM (cat #11995-065, Gibco/Thermo Fisher Scientific, Waltham, MA, USA)) was added to each tube, and tubes were incubated for 10 min at room temperature with gentle inversion. Fixation was stopped by adding 5 mL room temp STOP-FIX solution (250 mM glycine in PBS) to each tube and incubating tubes with gentle inversion for 5 min at room temp. PBMCs were pelleted by centrifugation at 4000 rpm for 5 min at 4°C and supernatant was aspirated. Cell pellets were rinsed with 5 mL ice cold PBS, tubes were centrifuged at 4000 rpm for 10 min at 4°C, and supernatant aspirated, with any remaining PBS removed as carefully as possible. One mL of 4°C SDS Lysis Buffer (1% SDS, 10 mM EDTA pH 8.0, 50 mM Tris HCl pH 8.1, 1 cOmplete protease inhibitor tablet, EDTA-free (cat #11873580001, Roche, Indianapolis, IN, USA) per 20 mL buffer) was then added to each cell pellet, and pellets were suspended by pipetting. If SDS precipitates were present in the buffer, the buffer tube was warmed until the precipitates were solubilized. The resulting cell lysates were transferred to 1.5 mL microfuge tubes and incubated on ice a minimum of 10 min before sonicating.

PBMC Sample Sonication
Lysed PBMCs were sonicated using conditions adapted from Adli & Bernstein 2011 (3) using a Branson Digital Sonifier Model 250 (Branson Ultrasonics Corporation, Danbury, CT, USA) to generate fragment sizes in the range of 150-200 bp to 500-600 bp as visualized by 2% agarose gel electrophoresis. Samples were kept on ice at all times. Each 1.5 mL microfuge tube was placed in a wet ice bath in a plastic beaker during sonication, with the probe tip submerged as deeply as possible in the sample tube without touching tube bottom or sides. Settings were: 10 min of total sonication, 0.7 sec "on", 1.3 sec "off", 40% power. Sonication was paused after every minute of sonication to add ice and/or remove water from the wet ice bath. The probe was rinsed thoroughly between samples with 70% ethanol followed by sterile distilled water. After all samples had been sonicated, sample tubes were microfuged at 4°C for 10 min at maximum speed to pellet insoluble debris. If SDS precipitates were visible in sample tubes, tubes were warmed in gloved hands to resolubilize SDS before centrifuging. Supernatants were removed to new 1.5 mL microfuge tubes, placed on ice, and protein concentration of the resulting chromatin solution was measured with the Bradford Protein Assay (cat #5000006, Bio-Rad, Hercules, CA, USA) using BSA as the standard. Lysates were frozen on dry ice and stored at -80°C.

Chromatin Immunoprecipitation (ChIP)
Samples were kept at 4°C at all times. Sonicated chromatin lysate aliquots of 100 µg total protein were used for each ChIP sample. For libraries containing spike-in chromatin, 0.2 µg of sonicated Drosophila chromatin (Active Motif #53083) was also added. Input controls consisted of 40 µL aliquots of sonicated chromatin lysate, which were used to check DNA fragment size, and after processing, in qPCR to estimate ChIP sample enrichment. ChIP and Input samples were diluted 1:5 with ice cold ChIP Dilution Buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA (pH 8.0), 16.7 mM Tris-HCl pH 8.1, 167 mM NaCl, 1 cOmplete protease inhibitor tablet, EDTA-free (cat #11873580001, Roche, Indianapolis, IN, USA) per 20 mL buffer). Diluted input samples were stored at 4°C prior to crosslinking reversal. Anti-H3K4me3 Ab (12.5 µl per 100 ug chromatin protein solution, cat #9751, Cell Signaling Technology, Danvers, MA, USA) was added to each sample and incubated overnight at 4°C with rotation. After overnight incubation, 50 µl of a 1:1 mixture of Dynabeads Protein A and G (cat #10001D and #10003D, Life Technologies/Thermo Fisher Scientific, Waltham, MA, USA) was used to capture the DNA-protein-antibody complexes. Beads were washed twice with 2X the original bead volume of cold ChIP Dilution Buffer. The beads were resuspended in 1X bead volume of cold ChIP Dilution Buffer. Washed beads were added to each sample tube that had incubated overnight with antibody at 4°C. Sample tubes were incubated with beads for 2 hours at 4°C with rotation and then washed a total of 6 times: (a) 2 washes with ice cold Low Salt Immune Complex Wash Buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8.1, 150 mM NaCl), (b) 2 washes with 1 mL ice cold LiCl Immune Complex Wash Buffer (0.25 M LiCl, 1% IGEPAL CA-630/Nonidet P-40, (cat #I1112 Spectrum Chemical Mfg. Corp., New Brunswick, NJ, USA, or cat #I8896, Sigma-Aldrich, St. Louis, MO, USA), 1% sodium deoxycholate (Sigma-Aldrich, St. Louis, MO, USA, cat #D6750), 1 mM EDTA, 10 mM Tris-HCl pH 8.1), and (c) 2 washes with 1 mL ice cold TE buffer pH 8. After washing, bead pellets were suspended in 125 µL room temperature Elution Buffer (10 mM Tris-HCl pH 8.1, 1 mM EDTA pH 8.0, 1% SDS, 150 mM NaCl) with 5 mM DTT. Samples were incubated at 65°C for 10 min, the supernatant recovered, and the elution repeated. Input samples were diluted with Elution Buffer as well, and crosslinks in all samples were reversed by incubation at 65°C overnight.
After the 65°C overnight incubation, 250 µL of Proteinase K in Elution Buffer without DTT (5 µL of 20 mg/mL Proteinase K (cat #25530015, Invitrogen/Thermo Fisher Scientific, Waltham, MA, USA) and 245 µL of Elution Buffer without DTT) were added to each sample and incubated at 37°C for 2 hr. Samples were then extracted twice with Tris-saturated phenol pH 7.9 (cat #BP1750I100, Fisher BioReagents/Thermo Fisher Scientific, Waltham, MA, USA) and once with chloroform:isoamyl alcohol (24:1). The aqueous samples were ethanol precipitated, washed with 70% ethanol and resuspended in 50 µL of 1X TE pH 8.0 containing 0.2 mg/ml RNase A (cat #EN0531, Thermo Fisher Scientific, Waltham, MA, USA), then incubated at 37°C for 30 min prior to storing at -80 o C. Recovered DNA samples were quantified using a Qubit fluorometer (Life Technologies/Thermo Fisher Scientific, Waltham, MA, USA) and the Qubit dsDNA High Sensitivity Assay kit (cat #Q32854 (reagents), #Q32856 (assay tubes), Life Technologies/Thermo Fisher Scientific, Waltham, MA, USA) as recommended by the manufacturer.

ChIP-seq Library Construction and Sequencing
H3K4me3 ChIP-seq libraries were constructed using 5-10 ng ChIP or input DNA with the Illumina TruSeq ChIP Library Preparation Kit (cat #IP-202-1012, Illumina, San Diego, CA, USA) as recommended by the manufacturer with the following exceptions. Gel slices included 200-650 bp DNA fragments, and two MinElute columns were used rather than one. ChIP-Seq and input library DNA was quantitated using the Qubit fluorometer as described above, and the size distributions of fragments in completed libraries were assessed using DNA 1000 microfluidics chips (cat #5067-1504, Agilent Technologies, Santa Clara, CA, USA) on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Enrichments for ChIP-Seq libraries were estimated by qPCR using primer sets for enriched and un-enriched control genomic regions, and each ChIP-seq library had fold enrichments consistent with published guidelines (4,5). Libraries were sequenced on an Illumina MiSeq instrument (Illumina, San Diego, CA, USA) in the University of Virginia DNA Sciences Core Facility. H3K27ac libraries were constructed as above except using an H3K27ac antibody from Diagenode (Cat# C15410-196, lot# A1723-0041D) and in addition, 0.02% (microgram/microgram) Drosophila spike-in chromatin was added prior to immunoprecipitation. Multiplexed H3K27ac samples were sequenced in the UVA Biomolecular Core Facility using an Illumina NextSeq500 system with high capacity cartridge.

RNA-seq library construction and sequencing
We obtained six PBMC samples from females at one year of age residing in Dhaka. Total RNA was extracted using the Zymo research Quick-RNA miniprep kit (Cat# 11-327). ERCC spike-in RNAs (Cat#4456740 Illumina) were added to 1mg of total RNA as recommended by the manufacturer, and ribosomal RNA depletion was performed using the RiboZero gold kit (Cat#MRZG126). cDNA libraries were prepared with 20ng of rRNA-depleted RNA using the NEBNext Ultradirectional RNA Lib Prep Kit (Cat #E74205) and NEBNext Multiplex Oligos (Cat#E73355). Library quality was assessed on the Agilent 2100 Bioanalyzer and 75bp paired-end reads were obtained in a single multiplexed run on an Illumina NextSeq500 system with high capacity cartridge in the UVA Biomolecular Core Facility.

Droplet digital PCR
Five hundred µL of blood collected from stunted (HAZ ≤ -2) and non-stunted (HAZ > -2) children at 52 weeks of age enrolled in the PROVIDE cohort (1) was suspended in 4 mL of TRIzol LS, mixed gently, and frozen for later use. Water (833 µL) was added to maintain the 3:1 volume of TRIzol LS to sample, and 2 mL of sample was purified for LRP1 expression analysis. RNA was isolated from whole blood suspended in TRIzol LS (Thermo Fisher Scientific), according to the manufacturer's instructions for RNA isolation from biological fluids with small sample quantities. The sample was then treated with DNase I (Qiagen), and RNA was purified and concentrated using RNeasy MinElute Cleanup Kit (Qiagen). LRP1 and ENO3 transcripts were amplified using PrimePCR ddPCR expression probe assays (assay ID dHsaCPE5026596 for LRP1 and dHsaCPE5045082 for ENO3, Bio-Rad) and the one-step advanced RT-ddPCR kit for probes (Bio-Rad). Droplets were generated, and samples were run on the QX200 Droplet Digital PCR System (Bio-Rad). Samples were analyzed using QuantaSoft software (Bio-Rad). LRP1 and ENO3 expression analysis was performed in triplicate for each sample, and each data point represents the average transcript number.

Western blotting
Formaldehyde crosslinked cells were resuspended in 1% SDS, 10mM EDTA, 50mM Tris-HCl (pH 8.1). Crosslinks were then reversed by incubating cell suspensions in 1x Laemmli buffer at 100°C for 30 mins. Extracts were then clarified by centrifugation at 21,000 x g for 10 mins. Twenty (20) micrograms of total protein, estimated by Pierce BCA Protein Assay (ThermoFisher Scientific #23225), were separated on 4-20% gradient gels and transferred to Immobilon-P SQ membrane (Millipore #ISEQ00010) according to Vettese-Dadey et al (1996) (6). Membranes were blocked in 3% BSA (w/v) in TBS-T and incubated with rabbit anti-H3K4me3 C42D8 (Cell Signaling #9751) at a dilution of 1:1,000 in TBS-T. Secondary antibody was either ECL Plex goat anti-rabbit IgG-Cy5 (GE Healthcare #PA45011) diluted 1:2,500 in TBS-T or ECL donkey anti-rabbit IgG-HRP (GE Healthcare #NA934) diluted 1:10,000 in TBS-T. Total H3 from the same extracts was measured using a rabbit anti-H3 C-terminal antibody (Active Motif #39163) diluted 1:10,000 in TBS-T as the primary antibody. Levels of H3Kme3 were quantified relative to total histone H3 by fluorescent western blotting detected using a Typhoon Trio laser scanner. Data are presented for every 52-week sample for which there was sufficient material remaining after ChIP-seq analysis to include on the blots.

Flow cytometry
Cryopreserved PBMC samples from a subset of 53-week-old Bangladeshi children were thawed into complete RPMI medium containing 10% FBS. Cells were rested for one hour in the presence of benzonase (Sigma) at 37°C for one hour then washed twice in FACS buffer (PBS with 0.1% bovine serum albumin, 2 mM EDTA, and 0.05% sodium azide). Cells were stained with the following anti-human antibodies in the presence of Fc receptor block and live/dead aqua zombie (all Biolegend) for 30 minutes: FITC-CD20, PerCP-Cy5.5-CD33, PE-Cy7-CD27, APC-CD38, Alexa700-CD14, APC-Cy7-CD3, Pacific Blue-CD16, Brilliant Violet 650-CD19. Samples were tested for either surface or intracellular LRP1 protein expression. Surface staining was done using anti-LRP1-PE antibody (eBioscience). For intracellular LRP1 staining, cells were fixed, permeabilized, and stained with anti-LRP1 (Abcam) followed by an anti-rabbit secondary PE antibody. After washing, data was collected using an LSRII instrument (BD) and data analysis was performed using Flowjo (Treestar).
Samples that showed few intact cells (<= 30% of total events) or few live cells (<= 60% of singlets) were excluded from the analysis. Subjects were defined as stunted if HAZ score at 53 weeks was below or equal to -2.

Mice and whole-body LRP1 deletion
All mouse procedures were approved by the Institutional Animal Care and Use Committee of the University of Virginia. LRP1 floxed, UBC-Cre-ERT2+ (LRP1 fl/fl Cre+) and LRP1 floxed, UBC-Cre-ERT2-(LRP1 fl/fl Cre-) mice on a C57BL/6 background were bred and maintained at the University of Virginia. The mice used to generate breeder and experimental mice, UBC-Cre-ERT2 and LRP1 floxed mice, were originally obtained from Jackson Laboratories and Joachim Herz of the University of Texas Southwestern Medical Center, respectively. To whole-body delete LRP1, LRP1 fl/fl Cre+ were administered tamoxifen (Sigma-Aldrich) 3 times at a dose of 75 mg/kg body weight via the intraperitoneal route every 10 days. Depletion of LRP1 protein in LRP1-/-mice was confirmed by LRP1 immunoblotting using extracts from brain, liver, and lung tissue (not shown). Body weight was measured using a digital scale. Mice were individually housed for measuring food intake, and chow consumed was measured by averaging the difference between the weight of chow before and after a 24-hour period over 3 days.

Intestinal permeability assay
Prior to the administration of FITC dextran (Sigma Aldrich), mice were fasted for at least 4 hours and then fed FITC dextran in PBS (Thermo Fisher Scientific) at a dose of 40 mg/100 g body weight by oral gavage. Four hours after FITC dextran administration, mice were bled, serum was obtained (BD Microtainer, BD), and serum was assayed for fluorescence using Synergy H4 microplate reader (BioTek).

Fat and lean mass determination
Fat and lean mass were assayed before and after LRP1 deletion using the EchoMRI-500 (EchoMRI) according to the manufacturer's instructions.

Metabolic caging
Volume of O 2 consumed, volume CO 2 expelled, respiratory exchange ratio, and activity in the X and Y direction were measured and calculated using the Oxymax-Comprehensive Animal Monitoring System (Columbus Instruments), according to the manufacturer's instructions.

Section 2: Computational Methods
Power and sample size estimates for H3K4me3 ChIP-seq analysis In order to estimate the sample size, we re-analyzed a DNA methylation data set generated by Khulan et al (10) in which Gambian women were given a supplement, UNIMMAP tablets, before pregnancy, followed by FeFol once pregnant and compared to control mothers who were given a placebo. Circulating blood was collected from supplement-treated and control mothers' infants at 9 months of age and their DNA was analyzed using Illumina HumanMethylation27 BeadChip microarrays. There were 5 female and 4 male infants from treated mothers and 8 female and 7 male infants from control mothers. We quantile normalized the fraction of methylated alleles, beta; calculated the mean and standard deviation of the difference between treated and control betas; calculated the pooled variance; and fit the pooled variance to an inverse gamma distribution (i.e., determined the shape and scale parameters of the inverse gamma distribution). We input these results into the R package ssize.fdr (11) (12), which calculates the sample size for fixed power, false discovery rate (FDR) and proportion of non-differentially methylated loci. We estimated that we had 90% power to detect 138 differentially methylated loci (i.e., proportion of non-differentially methylated loci equal to 0.995) at a 5% FDR using a two-sided t-test with 12 infant female control and treatment samples or 16 infant male control and treatment samples.

Processing of ChIP-seq datasets
H3K4me3 ChIP-seq datasets obtained from 21 children at 18 weeks of age and 16 children at 52 weeks of age, and included samples from boys and girls in roughly equal numbers and from control children and children who were or became stunted (height-for-age (HAZ) z-score score < -2) at one year of age (Table S1). In addition, datasets were obtained from 24 mothers of children in the same cohort. Samples were chosen based on these phenotypic attributes from among those not otherwise obligated for other investigations as part of the PROVIDE Study (1). We also obtained four input datasets from children at each age (samples from 2 males and 2 females, one stunted and one control child for each sex) plus two input datasets for the maternal samples, one from a mother of a control boy and one from a mother of a control girl. After these datasets were generated, to validate the normalization strategy, we also generated and analyzed H3K4me3 datasets from four 52-week samples with Drosophila spike-in chromatin as described below. We obtained ~30M single-end, 151 bp, raw reads from each library.
Raw reads in FASTQ format were mapped to the hg19 human genome using Bowtie-1.0.0 (13). Excluding hardware-specific arguments for improved speed and efficiency, default settings were used with the following exceptions: -m 3 --best --strata. The resulting SAM files were converted to BAM, unmapped reads were removed and then the files were sorted and indexed using SAMtools v. 0.1. 19-44428cd (14). Spike-in datasets were mapped to the hg19 and Drosophila dm6 genomes using Bowtie2-2.2.6 (15) with default mapping parameters, then converted to BAM, unmapped reads were removed and the mapped reads were sorted and indexed using SAMtools as for the other datasets. Sorted and indexed BAM files were converted to BigWig using BEDtools-2.18.2 bamtobed and genomecov (16) and the bedGraphToBigWig converter (17). The resulting BigWig files were used for browsing the data with IGV (18). The browser screenshot in the main text was obtained by scaling sorted and indexed BAM files based on the total mapped read counts from each of the individual datasets using BEDTools genomecov with an appropriate value for the -scale argument. Peaks of H3K4me3 enrichment were identified in each dataset using MACS-1.4.2 (19) with a sex-matched input dataset as control and with default parameters as detailed previously (20). Approximately 25,000 H3K4me3 peaks were identified for each dataset.
Datasets from children at 18 and 52 weeks, as well as maternal data, were first analyzed independently. Non-redundant lists of all peaks identified in one or more datasets were generated and read counts for each dataset distributed to each of the non-redundant genomic intervals using BEDTools-2.18.2 merge and multicov (16) with default parameters. The resulting tables of read counts were used as input to DESeq2 (21) for identification of differentially affected peaks. DESeq2 was implemented for differential analysis based on phenotypic differences (e.g. control versus stunted datasets) as well as biomarkers including HAZ score, delta HAZ, maternal height, and other biomarkers collected for individuals enrolled in the PROVIDE Study (1). Additional details regarding implementation of DESeq2 and differential peak analysis are described below.
H3K27ac ChIP-seq datasets were obtained using 14 samples from one-year-old children which were also used for H3K4me3 ChIP-seq. The average yield was ~56M single-end, 151 bp reads per sample. Raw reads were mapped to the hg19 genome using bowtie2-2.2.6 (15) with default settings. The data processing pipeline was identical to the processing of H3K4me3 data with the exception that peak calling was performed using MACS2-2.1.1.20160309 (19) with an input dataset for control and the arguments --broad --broad-cutoff 0.01. Approximately 34,000 H3K27ac peaks were identified for each dataset. DESeq2 was implemented as described above and using default normalization and with delta HAZ scores in DESeq2 design.

Estimation of H3K4me3 ChIP-seq Background and Mistargeted Signal
Proper normalization is a critical component of ChIP-seq data analysis and currently remains an area of active investigation (22)(23)(24)(25)(26)(27). We acquired the H3K4me3 ChIP-seq datasets whose analysis is presented here beginning in ~2013, as concerns were emerging (initially from gene expression analysis) that typical "median-based normalization methods would likely miss global effects on expression level (28). As shown in Fig. 2, a global reduction in H3K4me3 signal was observed at "canonical" peaks proximal to TSSs. For datasets with comparable total read numbers, then, we expected the read numbers in regions outside these peaks to be increased in stunted compared to control children. If non-peak reads exclusively represented background, then given that the amount of non-specific ChIP DNA extracted from each cell of each sample should be comparable regardless of sample type (i.e., stunted or control), the true background levels should be the same and hence could serve as a good normalization factor for each H3K4me3 sample. However, as described in the main text, the comparable total levels of H3K4me3 across samples (determined by Western blotting) and the redistribution of H3K4me3 from TSS regions to other regions of the genome in stunted children strongly suggested that the increased signal in non-peak regions represents a mixture of background and mainly non-targeted H3K4me3 signal. If this were the case, then low read count "background" should not necessarily be comparable. To assess whether the low read count data was pure background and hence could be used to normalize samples, or mixed background and low-lying mistargeted H3K4me3 signal, we first performed a rigorous analysis of low read count data.
After exploration of different background models including the Negative Binomial and Poisson distributions, we found that a simple exponential background model outperformed the Negative Binomial and Poisson models and fit the low read count or "background" reads extremely well (Fig.  2D,E). Notably, this was also the case for ENCODE H3K4me3 data obtained from human PBMCs (Fig. S2A). More specifically, we modeled the number of bins with background reads, ( ), as an exponential model = (0) !!" where (0) and are the number of bins with no reads and exponential decay parameter, respectively. As is common with ChIP-seq data (regardless of the background model used), we found that the observed number of bins with no reads was in some cases much larger (inflated) compared to a value of (0) consistent with an exponential model fit with the rest of the background data. Consequently, we fit the natural log of the number of bins with background reads versus for background bins containing at least one read (i.e., ≥ 1) with the following linear model, which fit the background data well (Fig. 2D,E). With estimates of the parameters from the fit to background data, (0) and , we summed the number of bins with background reads multiplied by the number of reads per bin ( ) over all (incurring a negligible error by summing to infinity) to arrive at an estimate of the total background read count, Because background read values tended to be in the tens of millions, we divided ! by ten million (10 7 ) to arrive at the total background-based normalization, ! ! = ! ! !" ! . We were then able to arrive at estimates of the scaled (or normalized) signal, , for each sample by first subtracting ! from the total read count for every sample, , and then dividing that difference by . For the H3K4me3 differential enrichment analysis using DESeq and correlation analysis of biomarkers and H3K4me3 peaks across samples, we normalized each sample by dividing read counts in peak regions by ! ! . Notably, we performed an experiment in order to determine whether using ! ! or DESeq2's default normalization compared more favorably to the normalization suggested by chromatin spike-in read counts. The analysis discussed below details how we found that DESeq2's default normalization matched that of the spike-in read counts much better than ! ! .

Comparison of ChIP-seq normalization methods
The results reported in the main text utilized DESeq2 default normalization. After these datasets had already been generated and concerns about global changes in modification became more apparent, much like the strategy used for global normalization of RNA-seq data (28), spike-in chromatin was developed to test for global changes in signal that would otherwise be masked (24). Thus, to test in an alternative way for a global change in H3K4me3 levels in stunted compared to control children, and to assess our alternative normalization approaches, we generated four datasets with spike-in reads post hoc. These consisted of two datasets from control and two from stunted children. Using these 4 datasets, comparative analyses were run with three types of normalization: (1) DESeq2 default normalization, (2) spike-in read count-defined normalization and (3) normalization based on the background model described in the previous section. Normalization factors based on method (2) were obtained by dividing the spike-in read counts by ten million. As shown in Table S2, we observed notable consistency in the normalization factors obtained using spike-in read counts and the DESeq2 default normalization method, in strong contrast to the normalization factors obtained by considering non-peak reads as background only and not representing background plus true, mistargeted, H3K4me3 signal.
Differential peak analysis using DESeq2 in combination with each of the three sets of normalization factors indicated that DESeq2 default normalization and spike-in normalization gave rise to very similar patterns of differentially affected peaks when comparing the two stunted datasets to the two control datasets (Fig. S2C and E). In contrast, normalization using the background model normalization factors showed a very different pattern in which the majority of peaks were strongly decreased in stunted compared to control children, as would be expected if the increased H3K4me3 signal in non-peak regions were considered as background and normalized away. Moreover, there was broad overlap in sets of significantly affected peaks identified using DESeq2 default normalization and spike-in normalization methods, with the differentially affected peaks from spike-in normalization nearly entirely contained within the set of differential peaks identified using DESeq2 default normalization (Fig. S2F). Consistent with the broad overlap in the differentially affected peak sets using these two normalization methods, we found that the genes associated with the two sets of differential peaks were enriched in similar sets of gene ontology terms, including terms related to immune system (dys)function and transcription as we report for the full set of 52 week data (Fig. S3). We conclude from this analysis that DESeq2 default normalization and spike-in normalization identify very similar sets of affected peaks and associated with very similar underlying biology.

Differential peak analysis and functional annotation
Our original set of H3K4me3 data consisted of 24 datasets from children at 18 weeks of age and 21 datasets from children at 52 weeks of age. Two datasets from each of these age groups were set aside as they had quantile read distributions indicative of library over-amplification. An additional 18week dataset was attributed to a male but had the epigenetic signature of a female, indicating a probable sample mix-up at some point between the blood draw and archiving of the resulting PBMC sample. This yielded 21 datasets from 18-week children that were used in the analysis. Similarly, two of the original datasets from children at 52 weeks of age were of poor quality (read distributions reminiscent of input samples) and were set aside. The available clinical data suggest that stunting has multiple contributing factors that can be associated with more than one pathway for disease emergence (29). Indeed, we saw by Principal Component Analysis that 3/21 datasets (one control and two stunted samples) did not group with phenotypically similar samples (not shown). It is likely that these three datasets represent the patterns in either children with other disease conditions or children having arrived at the same overt presentation via a different route(s). These three outlier datasets were set aside, and the analyses presented in the main text were obtained from 16 52-week samples (~76% of the original data) and as described in the main text. All 24 maternal datasets were used in the analysis of maternal H3K4me3.
Based on the background and spike-in analyses described above, differential peak analysis of 18-week, 52-week and maternal data was performed in R using DESeq2 and employing the DESeq2 default normalization method (21). We compared samples to obtain differentially affected peaks based on phenotypic differences (e.g. control versus stunted) in which each sample of a given type was treated essentially as a replicate. A more powerful approach was to use various biomarkers including HAZ score and delta HAZ score as quantitative measures of the degree of stunting. In this type of DESeq2 analysis, significantly affected peaks were identified as those with statistically significant changes correlating with the biomarker value across all samples. In these cases, the log2FC values represent the incremental change in peak size per unit of the particular biomarker. Significant differential peaks were defined as those with FDR-adjusted p-values < 0.05. PCA analysis was performed in R using prcomp with regularized log transformation of the DESeq2 results as input. The results presented here for the children were obtained using both male and female datasets together. As shown in Fig. 3A, males and females were readily distinguishable by PCA. Analyses were also conducted using only the autosomal peaks, and of females alone and males alone. Leaving out peaks on the sex chromosomes had very little impact on the results overall and provided no insight into possible sex differences in stunting-associated autosomal peaks. In the analysis of datasets by sex, fewer significantly affected peaks were identified, most likely due to the reduced statistical power associated with the relatively small number of datasets used in the analyses of only one sex at a time.
Significantly affected peaks were computationally associated with genes using GREAT (30), selecting the whole hg19 genome as the background region and with default gene association settings. Transcription start site histograms were generated using the distance of the peak to the nearest TSS as defined by GREAT. GREAT provided gene set enrichment results as well; although the gene set enrichment results from GREAT are not reported here, they were broadly consistent with those obtained using other enrichment tools. To define the possible functional significance of H3K4me3 peak changes, genes from GREAT were re-associated with the DESeq2 peak log2FC and/or adjusted p-values in python in order to obtain ranked gene lists, which were used for functional enrichment analysis in DAVID (31), GOrilla (32), MSigDB and GSEA (33), and Pathway Express (34). For each gene set enrichment tool, we submitted gene lists of the recommended maximum size and analyzed genes associated with peaks with positive log2FC values separately from genes associated with peaks with negative log2FC values. As described in the main text, when analyzed with respect to delta HAZ, the peaks with positive log2FC values were overwhelmingly in stereotypical locations in or around transcription start sites whereas the peaks with negative log2FC associations with delta HAZ were rarely if ever proximal to TSSs. For this reason, we focused mainly on understanding the functional significance of genes with positive log2FC peaks as they likely are transcribed at rates that scale with the health of the child. In general, functional analysis of gene sets explored using each tool above resulted in broadly similar sets of results.
Pathway enrichment and upstream regulator results shown in Figs. 4A, B and C were obtained using Ingenuity® Pathway Analysis (IPA®, QIAGEN Redwood City, www.qiagen.com/ingenuity). As explained above and in the main text, for analysis of data from one-year-old children, we focused on genes associated with peaks showing positive log2FC values versus delta HAZ score. However, very similar Canonical Pathway enrichment results were obtained by analyzing genes associated with differential peaks located within 1.5 kb of the TSS (includes those with both positive and the few with negative log2FC values) as well as genes associated with peaks that decreased in the categorical comparison of stunted versus control children. Analysis of maternal data focused on all peaks associated with maternal height. For analysis of the one-year old data, a list of the top 3000 genes with positive log2FC values ranked by DESeq2-derived adjusted p-value was used as input. Significantly affected pathways were defined as those with an IPA p-value of < 0.05. We have focused on gene ontology and pathway enrichment related to protein coding genes, but our analysis uncovered many links to miRNAs and other regulatory molecules which were not explored further.
Gene or peak average plots were generated using ngs.plot (35). In addition to settings to specify rendering for publication (font size, line width, etc), -MW 9 was used.

Enhancer analysis
To determine the relationship between enhancers and delta HAZ affected H3K4me3 peaks in one year old children, BEDTools was used to identify peaks that overlap by one base pair or more with enhancers that are active in various cell types as reported previously (36). "Positive peaks", those that increase with increasing delta HAZ score, and "negative peaks", those that decrease with increasing delta HAZ score, were analyzed separately. The peak-enhancer overlap results are summarized in Table S2. Genes targeted by the subset of enhancers with overlapping differential H3K4me3 peaks were identified from the list of enhancer-gene associations reported in the same study that identified the catalog of active enhancers (36).

Motif analysis
The DNA sequences of the enhancers identified above were retrieved from the UCSC repository (http://genome.ucsc.edu/) (37). Enriched DNA sequence motifs in these enhancers were then identified using MEME (http://meme-suite.org) (38). The total enhancer sequence length was far too great to search in its entirely, so subsets of randomly selected enhancer sequences were chosen using the MEME Suite tool fasta-subsample, resulting in 142 and 155 enhancers associated with H3K4me3 peaks that increased ("positive peaks") or decreased ("negative peaks"), respectively, with delta HAZ score. The subsample sequence numbers were determined empirically to result in close to the maximum input file sizes for analysis by MEME. MEME v4.11.2 was run using -maxsize 60000mod zoops -nmotifs 3 -minw6 -maxw 50 -revcomp and appropriate background files. Background files were generated using fasta-get-markov with -m 3. Significantly enriched motifs were associated with known transcription factor motifs using the MEME Suite tool TOMTOM with the HOCOMOCO Human v10 database.
For the analysis of enhancers associated with negative peaks, the two highest scoring motifs are shown in Fig. 3F. The ETS1-associated motif (155 sites in 155 enhancers) was discovered with a TOMTOM E-value of 2.12 x 10 -3 whereas the FOXO1-associated motif (47 sites in 155 enhancers) was discovered with an E-value of 3.40 x 10 -3 . The third motif (not shown) had no statistically significant association with known motifs in TOMTOM. The enhancers associated with positive peaks included an A-rich motif (44 sites in 142 enhancers; E-value: 2.0 x 10 -57 ) and a C-rich motif (87 sites in 142 enhancers; E-value: 1.3 x 10 -46 ). Much like the A-rich motif discovered for the negative peaks, the A-rich motif identified with the positive peaks was associated with FOX-family transcription factor motifs in TOMTOM (E-value for FOXJ3, the top scoring motif: 3.29 x 10 -2 ). The C-rich motif was found using TOMTOM to be similar to MAZ, EGR1, and Sp family transcription factor motifs, among others, with the E-value for the top scoring MAZ association 1.56 x 10 -6 . The third motif identified using MEME associated with the positive peaks had no significant association with known transcription factor motifs using TOMTOM.

H3K4me3 peak-adult height snp overlap analysis
To determine the relationship between all H3K4me3 peaks associated with delta HAZ score at one year of age and single nucleotide polymorphisms (snps) associated with adult height, BEDTools was used to count peaks that overlapped by one base pair or more with 602 snps linked to adult height and reported previously (39). The results are summarized in Table S4. As H3K4me3 peak size at the TSS is generally correlated with transcription, those overlapping snps near the TSS may be eQTLs.

Correlation analysis of 18-week, 52-week and maternal datasets
Global correlation analysis shown in Fig. 4E was performed using a collection of nine IDs for which we obtained 18-week, 52-week and maternal datasets for each. Raw counts for all autosomal peaks for each dataset were divided by DESeq2-defined normalization factors, and Pearson correlations were computed for all pairwise combinations of the datasets. The matrix of correlations was visualized using the heatmap.2 function in R with color specified by the viridis library.

Re-analysis of published H3K4me3 ChIP-seq datasets
H3K4me3 datasets acquired from Hct116 cells grown in the presence or absence of methionine (40) were retrieved from the Gene Expression Omnibus (accession number GSE72131) and analyzed essentially as described above. Raw sequence reads were mapped to the hg19 genome using Bowtie2, the resulting SAM files were converted to BAM format using SAMTools and peaks were called using MACS with input sequences as the control. Reads were distributed to the union set of all called peaks and differential peak analysis was performed using DESeq2, with significantly affected peaks (FDR adjusted p-values < 0.05) associated with genes using GREAT as described above.

H3K4me3 heatmap
The normalized counts for the significant differential peaks (FDR-adjusted p-value < 0.05) were used with the heatmap.2 function in R and with the scale = "row" argument and z-score normalization of peaks across samples. The default row (peak) clustering method was used and samples (columns) were ordered by delta HAZ score.

Roadmap Epigenomics Project Data
Publicly available Roadmap Epigenomics Project ChIP-seq datasets for H3K4me3 and H3K27ac in PBMCs from male and female individuals were obtained from the ENCODE site, reference epigenome series ENCSR841LMD.

RNA-seq data analysis
Raw datasets in FASTQ format were evaluated for quality and uniformity using FASTQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Files for each sample from each of the four NextSeq500 flow cells were then merged and paired-end reads were mapped to a human hg19-ERCC reference genome using HISAT2 (41). The resulting SAM files were converted to BAM format using samtools (14), and transcripts were assembled and quantified with StringTie (42) using combined GENCODE (gencode.v2lift37)-ERCC annotations. Gene-and and transcript-level count tables were generated using prepDE.py (http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual). Results presented here were obtained using the transcript-level count table. Differential analysis of RNA expression was performed using DESeq2 as described above with delta HAZ as a quantitative variable and following filtering to remove transcripts in which four or more samples had zero reads. Other analyses were performed as described above.

Power, sample size, randomization and exclusion criteria in mouse studies
To estimate the sample numbers required for comparison of control and LRP-mice, we first calculated the mean and standard deviations of the weight-for-age z-scores at one year of age for the control and stunted children whose samples were used for H3K4me3 analysis. These values were used in two-sample tests (43) to estimate sample sizes required to detect a difference in weight associated with loss of LRP1 in mice. This analysis showed that four mice in each group would be sufficient for detection of a difference with alpha = 0.05 and power = 0.99 and is a conservative underestimate of statistical power as the children showed modest differences in LRP1 expression whereas the gene was deleted in knock-out mice.
Runts, mice that weighed significantly less than their sex-matched littermates (prior to tamoxifen injection), were excluded from experiments. Mice were randomly assigned to control and tamoxifen treatment groups, and the mouse experiments were performed without blinding. Tests for data distribution were performed using Prism 6 (GraphPad) software and statistical tests were chosen in part based on the distribution test results. The variance was typically greater in the LRP-mice than the LRP+ mice.
R and other software Differential ChIP-seq peak analysis was performed in R (v3.3.1) and employed the Biobase_2.32.0, DESeq2_1.12.4, BiocGenerics_0.18.0, genefilter_1.54.2, GenomicRanges_1.24.2, IRanges_2.6.1, and SummarizedExperiment_1.2.3 packages. RNA-seq data analysis was performed using HISAT2 v2.0.4 and StringTie v1.3.4d. Plots were mainly generated using ggplot2_2.1.0, but some images were generated using gplots_3.0.1 and/or lattice_0.20-34 along with RColorBrewer_1.1-2. Venn diagrams were generated using Biovenn (44) and then edited for presentation using Adobe Illustrator CS6. Statistical results not derived from a particular tool or package described above were obtained in R using t.test() or fisher.test(). Fig. S1. A, Gene average H3K4me3 profile at transcription start sites (TSSs) in PBMCs as measured by the Epigenome Roadmap Project and available through ENCODE (4,45). B, Gene average H3K27ac profile at TSSs in PBMC data reported by ENCODE. We expect the genome-wide distributions of H3K4me3 and H3K27ac to be correlated (46), and indeed, the Pearson correlation coefficients between these two marks in datasets from children at 52 weeks of age ranged from 0.722 to 0.825, with a median value of 0.785 and comparable to the correlation in ENCODE data of 0.846. Moreover, there was no decay in correlation value with stunting. Collectively, these results reinforce the conclusion that the H3K4me3 data associated with stunted individuals is of high quality. C, H3K27ac average profiles at blood cell enhancer regions (36) in PBMCs from control and stunted children at 52 weeks or (D) as reported by ENCODE. The results in (C) and (D) show that H3K27ac in control and stunted children is localized at active enhancers as expected (47). E, Representative western blots of total histone H3 and total H3K4me3 in chromatin samples from one-year-old children. Each lane was loaded with 5 mg of a different sample; samples from controls are labeled in blue, samples from stunted children are labeled in red. Fig. S2. A, Exponential background model fit of ENCODE H3K4me3 data obtained from human PBMCs. Compared to the H3K4me3 datasets presented here (Figs. 2E,F), the ENCODE data show differences in the distribution of reads in peaks versus background as would be expected given differences in sample preparation, sequencing platform and sequence read depth. Nonetheless, the exponential background model fits the ENCODE data comparably well. B, Numbers of H3K4me3 peaks called in datasets obtained from stunted or control children at one year of age. The high degree of overlap indicates that there was high concordance in the number and location of H3K4me3 peaks despite the overall reduced level of signal in peaks in stunted datasets. C-E, Comparison of differential peak calls (stunted versus control children at one year of age) using three different normalization methods. This analysis was performed using the 4 datasets described in SI Methods. The plots show the mean normalized read counts versus log2 fold change obtained after read count normalization by the method indicated in the plot title. Red denotes FDR-corrected p-value < 0.05. Note the similarity in the results obtained with DESeq2 default normalization and spike-in normalization. F, Overlap (gray) in differential peaks identified using DESeq2 default normalization (pink) or spike-in normalization (cyan). The results were obtained using the 4 spike-in datasets described in SI Methods. 99.84% of the significant (FDR-corrected p-value < 0.05) differential peaks identified using spike-in normalization were identified using DESeq2 default normalization.

Fig. S3. (A) and (B) show overlaps in MSigDB Canonical
Pathway Terms associated with differentially affected H3K4me3 peaks identified using DESeq2 default normalization or spike-in normalization. Differentially affected peaks were computationally associated with genes, then the gene lists were tested for functional enrichment using MSigDB as described in Methods and Supplementary Information. A, Overlap in Canonical Pathway terms with binomial FDR Q values < 0.05. DESeq2 default normalization captured 97.8% of the enriched terms identified using spike-in normalization, plus other terms. B, Since the analysis only included 4 datasets and was therefore likely underpowered for defining functional enrichment, also shown is the overlap in all functional terms identified regardless of FDR Q value. The figure shows that the overlap in overall Canonical Pathway Terms was 76.2%. In comparison to the total number of Canonical Pathway Terms in the MSigDB database, overlaps in both A and B were found to be highly significant (p < 2.2 x 10 -16 using Fisher's Exact Test). C, Overlap in differential H3K4me3 peaks identified in a comparison of datasets from control and stunted children at one year of age versus those peaks with significant log2 fold-change per unit of the child's delta HAZ score. Differential peak analysis using delta HAZ score captured 93.4% of the peaks identified in the categorical comparison of control and stunted children, and in addition revealed 5520 additional peaks associated with growth within the first year of life. D, The heatmap shows the 16,477 peaks (rows) with significant H3K4me3 differences versus delta HAZ score at one year of age. Samples (columns) are ordered from lowest (left) to highest (right) delta HAZ score. Control samples are labeled Ctrl and stunted samples (HAZ score < -2) are labeled Mal. The first letter M or F in each sample name designates male or female, respectively. Counts in each peak were z-score normalized and peaks were clustered using default settings in R using heatmap.2. Row names were omitted for clarity.  S5. A, Changes in growth and proliferation pathways in one-year-old children based on differential H3K4me3 peaks associated with growth trajectory (delta HAZ score) using Ingenuity Pathway Analysis. Positive z-scores indicate predicted pathway activation with increasing overall health; negative z-scores indicate predicted pathway activation with stunting. Note that all but one of these pathways had positive activation z-scores, meaning that the pathways were predicted to be more activated in healthy compared to stunted children as would be expected. B, Differentially affected H3K4me3 peaks are associated with genes whose expression is driven by the indicated transcription factors (TFs). Positive z-scores indicate an increasing likelihood that the TF is activated with increasing health; the negative z-score for KDM5A indicates that KDM5A is increasingly inhibited with increasing health, thus, increased KDM5A (H3K4me3 demethylase) activity in stunted children. Inset: HNF4 had a -log(p-value) = 31.05. The list also includes general regulators of cellular growth (MYC, RB1, TP53), regulators of metabolism and stress in blood cells (TCF7L2, NFE2L2, HIF1A), and regulators of immune cell growth and development (FOS, ARNTL, EBF1, STAT4). NFKB subunit genes were also associated with differentially affected H3K4me3 peaks, providing support for the association of altered inflammatory responses with stunting. FOXO1 was also identified in this analysis as a key transcription factor, supporting the observation above that FOXO1-targeted enhancers are specifically affected in stunted children. Since AKT-mediated phosphorylation of FOXO1 regulates chromatin occupancy and transcriptional activity (48), reduced TOR and AKT signaling in stunted children (A) would further potentiate FOXO1 misregulation.  (significantly affected peaks that increase with health) and significantly affected genes identified by RNA-seq. Up-regulated genes were those whose expression increased with health; down-regulated genes were those whose expression decreased with health. Both overlaps are highly significant: p < 1.71e-9 for the up-regulated genes overlap and p < 3.08e-11 for the down-regulated genes overlap.