Improved RNA stability estimation through Bayesian modeling reveals most Salmonella transcripts have subminute half-lives

Significance Together with transcription and translation, RNA decay is one of the major processes governing protein production. Here, we have developed a statistical approach that corrects for confounding effects when estimating RNA decay rates from RNA-seq in bacteria. Our more accurate decay rate estimates indicate that Salmonella transcripts have half-lives about three times shorter than previously thought. This approach allowed us to measure the effects of RNA-binding proteins (RBPs) on decay rates, identifying large cohorts of transcripts with changes in stability following RBP deletion and conditions where posttranscriptional regulation affects survival. Our method should lead to a reevaluation of RNA stability estimates across diverse bacteria and insights into the role of RBPs in shaping the transcriptome.


Figures S1 to S15 Supporting References
Other supporting materials for this manuscript include the following: Dataset S1: Half-lives obtained by fitting the LNM to the RIF-seq data set.Dataset S2: Steady-state log-fold changes of the RIF-seq data set from edgeR.Dataset S3: Genetic features with ProQ-binding sites (CLIP-seq) in the 3'UTR or within 100 bases of the stop codon which are destabilized upon proQ deletion.Dataset S4: Genetic features with CspC/E-binding sites (CLIP-seq) in the CDS or 5'UTR which are destabilized upon cspC/E deletion.Dataset S5: Significant ProQ peaks obtained by re-analyzing the ProQ CLIP-seq data set (14).Dataset S6: Significant CspC CLIP-seq peaks.Dataset S7: Significant CspE CLIP-seq peaks.

Ratio between differential gene expression and stability in ΔproQ vs. ΔcspCE
The ratio between the differences in RNA half-life in the ΔproQ and ΔcspCE mutant strain (as compared to WT) was calculated by selecting only transcripts with significant stability changes in the same direction in the two RBP deletion strains (Figure S9C).Similarly, we selected only genes with significant log 2 -fold changes in the same direction in both RBP deletion strains (Figure 4F).

Hydrogen peroxide exposure
Bacterial cultures of all strains (Salmonella WT, ΔproQ, proQ++ and ΔoxyRS) were grown overnight.All strains contain the pJV300 plasmid.10 mL of culture were inoculated 1:1000 in LB and grown at 37°C for 5 h.The cultures were then diluted 1:100 in 10 mL of LB and incubated for 2 h at 37°C with 1. ) which were then incubated overnight at 37°C.

RNA secondary structure
The ViennaRNA web server with default settings was used to obtain the secondary structures of RNA sequences (1).Forna (2) was used for visualization.

Geneset enrichment analysis
In order to identify pathways with transcripts either stabilized or destabilized in the absence of ProQ or CspCE, the genes in the analysis were ranked according to the quantity .For pathway analysis of log fold-changes, we used the quantity − (∆ for ranking.We created a gene set database combining the terms for the − () 10 () strain SL1344 from the eggnog database (3), QuickGo (4) and KEGG (5).We used the r package GSEA 1.2 (6) to calculate the enrichment scores and the corresponding adjusted p values.gsea.type was set to 'preranked' and shuffling.typeto 'gene.labels'.Gene sets with sizes between 3 and 50 genes were analyzed.For the CLIP-seq data, we performed a hypergeometric test with the R stats function fisher.test.The FDR corrected p value was obtained using the Benjamini Hochberg procedure.For the hypergeometric test, the significance cutoff on the CLIP-seq data was chosen as ( ) for CsrA, Hfq, and ProQ.For CspC/E,   ≤ 0. 1 the value was reduced to to obtain a comparable number of interaction partners.  ≤ 0. 01

RNase E cleavage sites in random sequences
To estimate expected overlap between CspC/E CLIP-seq peaks and RNase E cleavage sites (7), we generated 100 random peaks of the same length within the same transcript as the actual CspC/E CLIP-seq peak, then tested how many of these random peaks overlapped with RNase E cleavage sites.Across the 100 simulations, this resulted in a mean of 331 of overlapping incubated for 1 hr at 50℃ and the RT enzyme later on inactivated at 70℃ for 15 min.For the cDNA amplification, 10 µl of each cDNA library was mixed with 25 µl of LongAmp Taq 2x Master mix, 1.2 µl of SR primer, 12.5 µl of nuclease free water and 1.2 µl of index primer (one different for each library).Amplification conditions applied were the following: 94℃ for 30 sec; 18 cycles of 94℃/15sec; 62℃/30sec; 70℃/15sec and a final step of 70℃ for 5 min.After amplification, samples were loaded of TBE gels and bands from amplification between 130 to 200 bp were selected by gel extraction.DNA was eluted from crushed gel pieces with 500 µl of DNA elution buffer after 2 hr incubation at RT.After collection of the supernatant using corning costar spin-X centrifuge tube filters, precipitation mix was added, and samples were placed at 80℃ for 1 hr.
After centrifugation and washing steps, dried pellets were resuspended in nuclease free water.Size, quantity, and absence of primers dimers were checked by bioanalyzer before sequencing.High-throughput sequencing was performed by Vertis.The libraries were pooled on an Illumina Nextseq500 platform and sequencing done for single end 1x150 bp.

Processing of Sequence Reads and Mapping CLIP-seq
The CspC/E and ProQ CLIP-seq data (9) was analyzed following the procedure described in (8) with a few alterations.First, putative PCR duplicates were removed using FastUniq v1.1 (10).The read pairs were trimmed together using Cutadapt v4.1 (11) and reads with fewer than 12 remaining bases were discarded.Additionally, we performed quality trimming with a minimum phred score of 20.Read pairs longer than 25 nt were eliminated for peak calling.The remaining reads were mapped to the Salmonella Typhimurium SL1344 chromosome (NCBI Acc.-No: NC_016810.1)and plasmid (NCBI Acc.-No: NC_017718.1,NC_017719.1,NC_017720.1)reference sequences using segemehl version 0.3.4(12) with an accuracy cutoff of 80%.Only uniquely mapping reads were considered for all subsequent analysis.For quantification of peak regions, no upper limit was imposed on read length.Reads were aligned to the Salmonella Typhimurium SL1344 chromosome and plasmids using STAR (13).

CLIP-seq Peak Calling
Segemehl read alignments were converted from BAM to BED format using BEDTools v2.17.0 and reformatted to satisfy blockbuster's input requirements.Subsequently, peaks were defined by applying blockbuster v0.0.1.1 (-minBlockHeight 10 -distance 1).This resulted in a large set of clusters with overlapping blocks of reads.In clusters with only one block the peak region was defined by the position of the block.In clusters with multiple blocks, peaks were chosen iteratively.First, the block with the highest count was selected and a peak region was defined by joining together all blocks which overlapped by at least 50% with this block.Then, all reads overlapping with this block were removed.This procedure was repeated until the largest block contained less than 1% of the reads in the corresponding cluster.A formalized description of this algorithm is given in (8).The peaks were exported to gff format and htseq-count v2.0.2 with default parameters was used to count the uniquely mapped reads in the STAR alignments.
Differential peak abundance analysis of CLIP-seq Data DEseq2 ( 14) was used to identify peaks with differential abundance in the cross-linked vs. the non-cross-linked libraries.Log-fold changes were shrunk using apeglm (15).We required a log-fold change of at least 1.For ProQ, we chose the same adjusted p-value cutoff as (8) chose for CsrA and Hfq (padj<0.1, Figure S7B).For the CSPs, the adjusted p-value cutoff was reduced to 0.01 to obtain a comparable number of peaks (Figure S7C).

Comparison between count-based and log-normal models
The modeling of RNA-seq count data conventionally employs a negative binomial (NB) distribution, as demonstrated by tools such as edgeR (16) and DESeq2 (14).Notably, the authors of the limma package (17) have previously shown that modeling log-transformed counts with normal distributions yields comparable (or even superior) results to count-based models in simulations where data was explicitly generated from a negative binomial distribution, provided that measurement precision is accounted for.Given the similarity of our RIF-seq library sizes (Figure S2M) and, consequently, measurement precision across replicates, we did not explicitly model measurement precision differences in our LNM model.
To validate this choice, we conducted a comparative analysis between the LNM, a count-based NB model, and a log-normal model with count-dependent variance (LNMcdv), similar to the limma-voom package (17).Our investigation revealed a very high correlation between wild-type decay rates, differences in decay rates, and p-values obtained from our original LNM and the two alternative models (NB, LNMcdv, see Figure S15A-D).Furthermore, we confirmed that the results are still highly similar when working with a reduced dataset (replicates 1,2 and 3 of the ProQ dataset, Figure S15E).However, the negative binomial model in particular came with significant computational costs, requiring between 3 and 4 fold more CPU time for sampling (Figure S15F).Considering the computational efficiency and the consistent results obtained with the three models, the application of the simple LNM for our RIF-seq dataset is well-founded.

Figure S4 .Figure S5 .Figure S6 .gAMN fl gBCDEFGHIJKL fl hBAE fl iAZY fl iDST fl iB, fl iE fl iFGHIJK fl iLMNOPQR fl gMN fl gKL fl iAZY fl iDST fl iC, fl jBA m
Figure S4.Posterior predictive p values und false discovery rate (FDR) (A) Posterior distribution of the difference in half-life for a transcript in the RBP deletion strain vs. the WT.Under the null hypothesis, MCMC samples should fall within an interval around zero (blue, dashed lines).The Bayesian p value is given by the fraction of samples that overlap with the null hypothesis (yellow).(B) p-value distribution for differential stability data for the ∆proQstrain compared to the distribution under the null hypothesis.(C) Correlation between simulated and fitted differences in half-life (Pearson ρ = 0.86).(D) ROC curves obtained from the simulated data for different simulated standard deviations of relative log-counts.An FDR of 0.1 is marked with a black triangle.The number of transcripts which pass the cutoff is indicated in parentheses.At an FDR of 0.1, differentially decaying transcripts with a simulated standard deviation below 0.05 are identified with a sensitivity of 0.82.(E) We determine the FDR at a given p value cutoff from the simulated data (black dots) and fit a LOESS curve to it to map p-values to FDR (blue line).More details on the calibration of p values can be found in the Methods.

Figure S9 .
Figure S9.Integrative analysis of RBP binding and transcript stability (A) Regulation of SPI-1/2 effectors by ProQ and CspC/E.The picture of the epithelial cell was taken from BioRender.com.(B) Decay curve of significantly destabilized transcript of oxyR.(C) ProQ CLIP-seq peak in the 3'UTR of oxyR identified by re-analyzing (14), FDR=0.047.(D) Exposure of various Salmonella strains to varying levels of hydrogen peroxide.(E) WT half-life distributions for the same groups of transcripts as in Figure 5A.(F) Abundance dependence of CLIP-seq results.(G) Distribution of log-counts for the groups of transcripts used in (F).

Figure S10 .
Figure S10.Integrative analysis of RNA secondary structures (A) Secondary structure of the ProQ/CspC/E-bound mRNA of the bacteriolytic lipoprotein EcnB including RNase E cleavage sites (47).(B) Secondary structure of the ProQ/CspC/E-bound sRNA CsrB including RNase E cleavage sites (47).(C) Normalized data for CsrB, including the fitted decay curves.(D) Overlap in interaction partners between various CLIP-seq data sets with major RBPs.

Figure S11 .
Figure S11.Top destabilized transcripts in the absence of ProQ and oxidative stress response

FFigure S15 .
Figure S14. Figure S14.Transcription factors, subunits of RNA polymerase complex 5 mM or 2 mM of H 2 O 2 .Controls were incubated without H 2 O 2 .Viability of

Flagellar genes with negative log-fold change in proQ and cspC/E deletion strains
Times are given for 1 chain with 1,000 warmup and 1,000 sampling iterations.