Light and flow regimes regulate the metabolism of rivers

Significance This paper provides a comprehensive analysis of the annual patterns of ecosystem productivity and respiration for more than 200 rivers, comparing the magnitude and phenology of river metabolic regimes with annual estimates from more than 150 terrestrial ecosystems. Although mean annual temperature and mean annual precipitation explain much of the variation in terrestrial productivity and are used to define biomes, for rivers the most important controls are annual light availability and flow stability. Attention to these gradients will substantially improve our ability to scale and model river ecosystem dynamics and may fundamentally change the way rivers are studied.


Materials and Methods
Lotic Metabolism data: Mean daily discharge (m 3 s -1 ) and daily estimates of metabolism for lotic sites were compiled from (1) and the StreamPULSE data portal (https://data.streampulse.org). In both datasets, estimates of gross primary production (GPP) and ecosystem respiration (ER) were produced using the streamMetabolizer R package (2), which uses inverse modeling based on sub-daily time series of dissolved oxygen (DO), water temperature, discharge, depth, and photosynthetically active radiation (PAR). We limited our analysis to subset these datasets to only include sites within the continental United States so they could be linked to other nationally-consistent datasets. Biologically impossible values, negative GPP or positive ER, were excluded from the dataset and then individual site-years were filtered to only include years where the model generated reliable metabolism estimates for more than 60% of all days of the year. Gaps in discharge or metabolism data were filled using a structural timeseries in the R statistical language (3). To facilitate comparison to terrestrial data, estimates of GPP and ER from lotic sites were converted from g O2 m -2 d -1 to g C m -2 d -1 using a photosynthetic quotient of 1.25, which assumes a molar C:O ratio of 1:1.

FLUXNET :
We used the FLUXNET2015 dataset (4) for terrestrial GPP and ER data. The FLUXNET2015 dataset contains data at multiple temporal resolutions and we used the annual sums of GPP and ER (g C m -2 y -1 ) which were included in this dataset. However, only site-years with >60% of days having estimated rates were used in our analysis to be consistent with our approach to filtering lotic sites.

Stream Light estimates:
Estimates of photosynthetically active radiation (PAR) at the stream surface were generated using the model of (5), which was implemented in the StreamLight R package (6). Specifically, we used the stream_light() function, which effectively assumes constant bankfull conditions for the purposes of determining wetted channel widths. Time series of incoming shortwave radiation (W m -2 ) (7) and leaf area index (m 2 m -2 ) (MCD15A2Hv006) (8) were used as inputs to the model in addition to several channel characteristics described below. The hourly estimates of PAR (µmol m -2 s -1 ) were aggregated to daily sums (mol m -2 d -1 ) so they would match the daily resolution of the stream metabolism estimates.

MODIS NPP data:
For each of the lotic site-years in our final dataset, annual MODIS net primary production (NPP) data (kgC m -2 y -1 ) (MOD17A3HGFv006)(9) was downloaded based on the site location through the Application for Extracting and Exploring Analysis Ready Samples (AρρEEARS) (10). NPP data were not available for all sites through AρρEEARS, possibly due to the proximity of pixels to MODIS water masks, so NPP data for the remaining sites were downloaded for a 1km buffer around the site location using the MODISTools package (11). For these sites, the mean value for all pixels in the buffer was calculated for each site-year of NPP. Finally, NPP was converted to g C m -2 y -1 to be consistent with the other data sources used.

StreamPULSE Channel Characterization:
Site locations were joined to their nearest flowline reach within the National Hydrography Dataset (12) using the nhdplusTools R package (13). From the paired flowlines, we assigned the Strahler stream order for each site. We also used the unique identification number for each flowline reach (i.e., the COMID) to gather relevant site information from additional linked datasets. Channel widths were estimated for each site location using either field-based measurements or geomorphic scaling relationships between width and cumulative upstream area. For the majority of sites, channel width was approximated as the median of width measurements available during the period of record for the metabolism estimates, acquired either from NWIS for sites co-located at USGS gages, or from the StreamPULSE data portal. For sites with no available width records, we estimated channel width based on empirical equations relating upstream area with width, regionalized at the HUC2 scale (14). Upstream watershed area was acquired from either the NHDPlus vaa tables based on the associated COMID. We calculated channel azimuths using the NHDPlusV2.1 flowlines, where the azimuth at each site was calculated as the circular mean of all segments within the associated reach. Tree heights were derived using 30m resolution global canopy height estimates from (15). Heights were extracted using the site coordinates and extending a 60m buffer into the riparian zone. The 90th percentile of canopy heights within the buffer for each site was returned. The 90th percentile of heights was chosen to capture tall canopy elements that could shade the channel.

StreamPULSE Hydrologic Regimes:
The skewness in daily average discharge was calculated following (16). Specifically, skewness refers to L-skew. L-moments have been found to be useful for analyzing streamflow since they are relatively robust to outliers when compared to product moment estimators (17).