Soil carbon debt of 12,000 years of human land use
See allHide authors and affiliations
Edited by William H. Schlesinger, Cary Institute of Ecosystem Studies, Millbrook, NY, and approved July 14, 2017 (received for review April 12, 2017)

Significance
Land use and land cover change has resulted in substantial losses of carbon from soils globally, but credible estimates of how much soil carbon has been lost have been difficult to generate. Using a data-driven statistical model and the History Database of the Global Environment v3.2 historic land-use dataset, we estimated that agricultural land uses have resulted in the loss of 133 Pg C from the soil. Importantly, our maps indicate hotspots of soil carbon loss, often associated with major cropping regions and degraded grazing lands, suggesting that there are identifiable regions that should be targets for soil carbon restoration efforts.
Abstract
Human appropriation of land for agriculture has greatly altered the terrestrial carbon balance, creating a large but uncertain carbon debt in soils. Estimating the size and spatial distribution of soil organic carbon (SOC) loss due to land use and land cover change has been difficult but is a critical step in understanding whether SOC sequestration can be an effective climate mitigation strategy. In this study, a machine learning-based model was fitted using a global compilation of SOC data and the History Database of the Global Environment (HYDE) land use data in combination with climatic, landform and lithology covariates. Model results compared favorably with a global compilation of paired plot studies. Projection of this model onto a world without agriculture indicated a global carbon debt due to agriculture of 133 Pg C for the top 2 m of soil, with the rate of loss increasing dramatically in the past 200 years. The HYDE classes “grazing” and “cropland” contributed nearly equally to the loss of SOC. There were higher percent SOC losses on cropland but since more than twice as much land is grazed, slightly higher total losses were found from grazing land. Important spatial patterns of SOC loss were found: Hotspots of SOC loss coincided with some major cropping regions as well as semiarid grazing regions, while other major agricultural zones showed small losses and even net gains in SOC. This analysis has demonstrated that there are identifiable regions which can be targeted for SOC restoration efforts.
The incredible rise of human civilizations and the continuing sustainability of current and future human societies are inextricably linked to soils and the wide array of services soils provide (1⇓–3). Human population and economic growth has led to an exponential rise in use of soil resources. Roughly 50 million km2 of soils are currently being managed to some degree by humans for food, fiber, and livestock production (4), leading to the declaration that we live on a “used planet” (5). The consequences of human domination of soil resources are far ranging (6, 7): accelerated erosion, desertification, salinization, acidification, compaction, biodiversity loss, nutrient depletion, and loss of soil organic matter (SOM).
Of these soil threats, loss of SOM has received the most attention, due to the critical role SOM plays in the contemporary carbon cycle (8, 9) and as a key component of sustaining food production (10, 11). Despite the intense research interest in SOM and soil organic carbon (SOC) as the dominant component of SOM, there remain many unknowns (12) that impede progress in implementing sound land management strategies to rebuild SOC stocks (13).
Conversion of native soil to agricultural uses typically leads to a decline in SOC levels (14⇓–16). The rate and extent of decline in SOC stocks should vary greatly across the globe, due to differences in soil properties, climate, type of land-use conversion, and, importantly, the specific management implementation of a given form of land use. Loss of SOC under agricultural land use is not universal; modest gains are seen when soil of naturally low fertility is improved and the previous constraint (e.g., moisture, fertility, and hardpan) on plant growth is alleviated (17⇓–19). However, for the vast majority of land, SOC loss is more common. In fact, in a metaanalysis of the available literature (SI Appendix), we found median SOC loss values of 26% for the upper 30 cm and 16% for the top 100 cm of soil, but ranges of −36 to 78% and −25 to 61%, respectively, have been reported for these two depth increments (SI Appendix, Fig. S2). Scaling these limited point measurements to calculate a cumulative SOC loss for the world’s agricultural land has been difficult, with estimates ranging from 40 Pg C to over 500 Pg C (20). Recent estimates from dynamic global vegetation models run with actual land use versus with potential natural vegetation have put this figure at 30 Pg C to 62 Pg C for the industrial post-1850 period (21, 22).
A credible estimate of the global total and spatial distribution of SOC loss is a critical step in understanding the potential for soil carbon sequestration to be an effective climate abatement strategy. To quantify the cumulative impact of human land use on changes in SOC at the global scale, we have developed a machine learning-based data-driven statistical model, which is based on a global compilation (
Results and Discussion
Model Performance and Predictors.
Models explaining SOC density distribution as a function of depth, climate, relief, lithology, land cover, and land use were evaluated using fivefold cross-validation with repeated fitting. For the 0- to 2-m depth interval, R-square was 54% with a root mean square error (RMSE) of 12 kg C
While climate and topographic attributes were the most important variables in both models (SI Appendix, Fig. S3), multiple land-use variables have been shown to be also important for explaining the distribution of current SOC. Individual correlation plots for the land-use variables generally decline with increasing intensity of land use within a pixel (SI Appendix, Fig. S3). The relatively high importance of the HYDE land use classes “grazing” and “cropland” in our SOC models validates our conceptual approach to estimating historic SOC stocks in the absence of human land use.
Current SOC Stocks.
Projecting our data-driven statistical model across the globe for the year 2010 suggested that global SOC stocks were 863 Pg C, 1,824 Pg C, and 3,012 Pg C in the upper 0.3 m, 1 m, and 2 m of soil, respectively (SI Appendix, Fig. S5A and Table S3). On a per hectare basis, land classified as cropland [International Geosphere-Biosphere Programme (IGBP) class 12 (24)] contained an average of 62 [31 to 96, 95% confidence intervals (CI)], 127 (61 to 200, 95% CI), and 198 (96 to 315, 95% CI) Mg C
Historic SOC Stocks.
Reprojection of our SOC model to a no land-use (NoLU) condition with all other variables held constant resulted in global SOC stocks of 899 Pg C, 1,899 Pg C, and 3144 Pg C in the upper 0.3 m, 1 m, and 2 m of soil, respectively (SI Appendix, Fig. S5B and Table S3), suggesting that human-driven land-use decisions have resulted in substantial reductions in global SOC levels. In the absence of accurate SOC data from past millenia, we have attempted to assess the accuracy of these historic projections by comparing the modeled NoLU SOC stocks to SOC measurements taken in remnant patches of native vegetation which were compiled from the literature (SI Appendix, Fig. S6). Given the limitations of this comparison between point measurements and 10-km model output, model results for NoLU SOC compared favorably to the measured literature values (R-square = 0.33 to 0.34, with RMSE values of 17 Mg C
SOC Loss Due to Land Use.
Subtracting current (2010) SOC stocks from historic (NoLU) SOC stocks, we found that 37 Pg C, 75 Pg C, and 133 Pg C have been lost due to land-use change in the upper 0.3 m, 1 m, and 2 m of soil (Fig. 1B and SI Appendix, Table S3). The mean absolute loss due to land use to 2 m across all pixels with some degree of land use was 17.7 Mg C
Global distribution of cropping and grazing in 2010 from (A) HYDE v3.2 and (B) modeled SOC change in the top 2 m. In A, color gradients indicate proportion of grid cell occupied by given land use. In B, legend is presented as histogram of SOC loss (Mg C
Comparison of model results with the native remnant database suggests that modeled SOC loss due to land use was likely a conservative estimate. First, comparing measured SOC in native patches to predicted SOC stocks with NoLU, we found a negative bias of −10 Mg C
This analysis clearly demonstrates that, while, on average, agricultural land use leads to SOC loss, there are important spatial patterns and contrasts (Fig. 1) suggesting that simple accounting methods for SOC change [e.g., Intergovernmental Panel on Climate Change default emission factors (28)] will misrepresent SOC change in many regions of the world. The majority of data used to generate published emission factors come from North America and Europe (29). In our own metaanalysis (SI Appendix), 82 of the 140 paired comparisons were from these two regions. Model results from the agricultural heartland of the United States and much of Europe showed large losses (Fig. 1B), primarily from cropping, that are consistent with the average documented changes in SOC stocks from field investigations (14, 15).
The largest per pixel losses were found to coincide with cropping regions (Fig. 1B); however, grazing, especially in arid and semiarid regions, with its larger spatial extents (Fig. 1A), was responsible for at least half of the total SOC loss (SI Appendix, Table S3). The grassland and savanna IGBP land classification categories collectively lost more SOC than the cropland and crop/natural vegetation mosaic categories (48 Pg C vs. 35 Pg C). In particular, the rangelands of Argentina, southern Africa, and Australia stand out as hotspots of SOC loss when viewed as a percent of historic SOC (SI Appendix, Fig. S7).
While land use is the underlying anthropogenic driver of SOC loss, the degree to which land use results in SOC loss is at least partially dependent upon the degree to which the soil resource has been exploited (2). Using the Global Land Degradation Information System biophysical status of land index, a quantitative expression of a given land area’s ability to provide four categories of ecosystem services (biomass, soil, water, and biodiversity) (30), we found a strong correlation between land degradation and SOC loss (SI Appendix, Fig. S9). Results from SI Appendix, Fig. S9, coupled with the finding that grazing was the single most important land-use variable in the model (SI Appendix, Fig. S3), suggest that grazing of relatively unmanaged rangelands may be a stronger driver of SOC loss than previously acknowledged.
Agricultural land uses do not always result in large losses of SOC. For example, the 2-million-km2 seasonally dry moist savanna region in Brazil known as “the Cerrado” was thought to have soils too poor to support intensive agriculture but, over the past few decades, has been transformed into one of the most productive agricultural regions of the world, with ∼750,000 km2 of crops and 800,000 km2 of pasture, through liming, fertilization, and weed control (31). Because of the naturally infertile soil state in the Cerrado, agricultural expansion has resulted in little loss in SOC (Fig. 1B), and there are large areas that have actually accumulated modest amounts of SOC after the advent of agriculture, which is consistent with results from field studies (17, 32, 33).
Historic SOC Loss Trend.
By using the HYDE v3.2 land-use dataset, we predicted the temporal evolution of SOC stocks due to changes in land use alone (Fig. 2). Globally, SOC loss follows the exponential rise in used land, but not in a linear fashion (Fig. 2, Inset). There were low annual rates of SOC loss (<0.05 Pg C
Historic reconstruction of loss in SOC relative to 10,000 BC (assumed NoLU). Temporal evolution of cropland and grazing land is given in stacked area plots. (Inset) Biplot of SOC loss (Pg C) v. total used land area (106 km2) for each predicted time interval.
Limitations of This Study.
While great care has been taken to ensure that the input data were of the highest quality possible (23, 27), there remain several limitations in the underlying datasets and therefore predicted SOC change. First, the training dataset used to build spatial predictions models was not ideal for testing the hypotheses. ISRIC’s soil profile dataset is a compilation of national inventories from a large number of nations with different reasons for undertaking soil surveys and different methods of laboratory analysis. Data were collected over a 50-y period, which is likely smoothing out some of the SOC loss in the model. In addition, the mismatch in scale between a soil pedon (0.5 m on a side) and the pixel size of the HYDE v3.2 land-use data (10 km) can create situations where the dominant soil properties of a pixel are not represented by the particular soil pedons that were sampled within that pixel (34). Relatedly, regions with low sampling density may be overly influenced by a few data points that may not be representative of that region as a whole. These last two issues are the major drivers of the spatial distribution of model error (SI Appendix, Fig. S4).
The HYDE dataset itself also presents a few limitations. First, these land-use data are a combination of national and subnational level statistics and remotely sensed land-use change (4), which, in some cases, can create artificial changes in SOC along political boundaries. This became particularly apparent when we used the reconstructed land-use histories in prior centuries to estimate SOC stocks for those time periods (Fig. 2). Second, only very coarse land-use categories are represented, so management-specific practices which can influence SOC levels (13), such as tillage practices, rotations, and cover crops, are not represented. Relatedly, HYDE does not contain direct information on forest or wetland loss, both known drivers of SOC loss. Third, in our model formation, there is no indication of the duration of a given land use. SOC stocks, while often declining most rapidly in the first decade after land-use change (35), often take many decades to over a century to reach a new steady state (36, 37). Finally, the HYDE dataset describes the extent of land use but not the intensity. This limitation may be particularly important for the grazing category, as SOC levels have been shown to decline with increasing grazing pressure (38), although this effect appears to be dependent upon grass species composition, with C3 grasses showing large declines and C4 grasses showing small gains in SOC with increased grazing pressure (39). It is very likely that, taking these limitations together, our estimate of soil carbon debt covers only a smaller fraction of the actual debt due to human influence.
Implications.
This analysis indicates that the majority of the used portions of planet Earth have lost SOC, resulting in a cumulative loss of ∼133 Pg C due to agricultural land use. These SOC losses are on par with estimates of carbon lost from living vegetation primarily due to deforestation (40) and are nearly 100 Pg C higher than earlier estimates of land use and land use change-driven losses of SOC (41). Importantly, as Fig. 1 demonstrates, there are hotspots of SOC loss, associated with extensive cropping regions but also with highly degraded grazing land (SI Appendix, Fig. S9), suggesting that there are identifiable regions which should be targets for SOC restoration efforts.
The potential to recover lost SOC may be more limited than is often assumed. The amount of SOC that has been lost historically can be thought of as the carbon sink potential of the soil (42). Our analysis has found that this sink potential is ∼133 Pg C (SI Appendix, Table S3). A widely repeated figure is that, with adoption of best management practices, two thirds of lost SOC can be recovered (42). If the two-thirds figure is accurate, then SOC sequestration has the potential to offset 88 Pg C (322 Pg CO2) of emissions. However, bottom-up estimates of the maximum biophysical potential for carbon sequestration on cropping and grazing land range from 0.4 Pg C
Conclusions
Our data-driven statistical analysis confirms that agricultural land use is a significant driver of SOC levels. Importantly, we have generated estimates for the global cumulative loss of SOC which potentially represent a maximum estimate of the SOC sink capacity, and have demonstrated that there are hotspots of SOC loss which are closely associated with land that has been identified as highly degraded. This analysis also demonstrated that not all land use is associated with large losses in SOC, particularly in regions with naturally infertile soils. These results provide a basis for national and international policies to target SOC restoration efforts but also suggest that more effort needs to be put into collecting, integrating, and using legacy soil profile data, especially historic data 50+ y old, so that even more reliable models of SOC dynamics can be produced.
Materials and Methods
SOC varies in complex but mostly predictable ways across the landscape. It can be best modeled as a function of climate, potential vegetation, topographic relief, soil parent material, and time (46, 47). Numerous spatially explicit data layers now exist that cover most of these state factors of soil formation. This is the foundation of the current state of the art in predictive soil mapping (27, 48),
Soil Profile Data.
ISRIC) (www.isric.org) curates the largest repository of spatially explicit soil profile observations and samples (
Spatial Prediction Model for Organic Carbon Density.
We overlay the training points and environmental covariates and fit spatial prediction models following the formula
In addition to spatial prediction of OCD values, we also modeled associated spatial prediction uncertainty using the following procedure: (i) Run fivefold cross-validation using the machine learning framework and derive prediction residuals; (ii) overlay prediction residuals (absolute values) and covariates and fit a new spatial prediction model using the same machine learning framework; and (iii) apply this error model over the whole area of interest to produce maps of absolute errors.
Environmental Covariates.
For modeling purposes, we use a large stack of spatially explicit covariate raster data layers. These covariate layers have been compiled from numerous sources representing the major state factors mentioned above, including the following: (i) The HYDE 3.2 Historic land-use dataset (ftp://ftp.pbl.nl/hyde/hyde3.2/) (4) contains the distribution of main agricultural systems from 10,000 BC (prehistoric NoLU condition) to present time. Each raster layer represents the area (square kilometers) of each pixel occupied by a given land-use category, with the 10 categories being as follows: total cropping, total grazing, pasture (improved grazing land), rangeland (unimproved grazing land), total rainfed cropping, and total irrigated cropping with further subdivisions for rice and nonrice cropping systems for both rainfed and irrigated cropping. (ii) The Climate Research Unit TS2.1 climatic surfaces for period 1960 to 1990 (www.ipcc-data.org/observ/clim/) (50) include precipitation, ground frost frequency, near-surface air temperature (daily minimum, daily maximum, and mean), water vapor pressure, wet day frequency, cloud area fraction, near-surface air temperature, and diurnal range. (iii) Current and historic forest extent was estimated from the United Nations Environment Programme World Conservation Monitoring Centre Generalised Original and Current Forest cover map (www.unep-wcmc.org/resources-and-data/generalised-original-and-current-forest). (iv) Topographic parameters were derived from Global EarthEnv-Digital Elevation Model 90 (51) including slope, curvature, topographic index, topographic openness, valley depth, and multiresolution valley bottom index. Topographic properties were derived using the System for Automated Geoscientific Analyses Geographic Information System (52) at finer resolution (250 m) and then resampled to 10-km resolution. (v) Landform and geologic substrate were determined from the United States Geologic Survey Global Ecophysiography landform classification and lithological map (rmgsc.cr.usgs.gov/outgoing/ecosystems/Global/) (53).
If not already available at 10-km resolution, spatial data layers were resampled to 10-km resolution using the Geospatial Data Abstraction Library software (54).
Statistical Modeling and Prediction.
The statistical modeling was accomplished using machine learning techniques implemented in R environment for statistical computing (55). We used an ensemble prediction of two algorithms: (i) random forest as implemented in the package ranger (56) and (ii) gradient boosting as implemented in the package xgboost (57). For model fitting, we used all soil profiles, then used this model to predict
All computing was run on ISRIC High Performance Computing servers with 48 cores and 256 GB RAM. Total computing time required to produce all outputs from scratch is about 18 h of optimized computing (or about 1,000 central processing unit hours). All derived maps (as geotiff files) and R code used in this analysis can be found at Woods Hole Research Center’s github repository (https://github.com/whrc/Soil-Carbon-Debt). Metaanalysis data can be found at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QQQM8V.
Acknowledgments
We thank Emily Cheney for compiling the remnant native vegetation soil carbon database and Rebecca McCulley for contributing data to the remnant native vegetation soil database. Funding was provided by The Nature Conservancy and the Doris Duke Charitable Foundation. ISRIC – World Soil Information is a nonprofit foundation primarily funded by The Netherlands government.
Footnotes
↵1J.S. and T.H. contributed equally to this work.
- ↵2To whom correspondence should be addressed. Email: jsanderman{at}whrc.org.
Author contributions: J.S. and T.H. designed research; J.S., T.H., and G.J.F. performed research; J.S., T.H., and G.J.F. contributed new reagents/analytic tools; J.S., T.H., and G.J.F. analyzed data; and J.S. and T.H. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: Meta-analysis data have been deposited on Harvard University’s website, https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QQQM8V. Code and model outputs have been archived on GitHub, https://github.com/whrc/Soil-Carbon-Debt.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1706103114/-/DCSupplemental.
Freely available online through the PNAS open access option.
References
- ↵.
- Koch A, et al.
- ↵.
- Amundson R, et al.
- ↵.
- Montanarella L, et al.
- ↵
- ↵.
- Ellis EC,
- Fuller DQ,
- Kaplan JO,
- Lutters WG
- ↵.
- Food and Agriculture Organization of the United Nations and Intergovernmental Technical Panel on Soils
- ↵.
- Keesstra S, et al.
- ↵.
- Schlesinger W
- ↵
- ↵
- ↵.
- Oldfield EE,
- Wood SA,
- Palm CA,
- Bradford MA
- ↵
- ↵
- ↵
- ↵.
- Don A,
- Schumacher J,
- Freibauer A
- ↵
- ↵.
- Neill C, et al.
- ↵.
- Hoyle F,
- D’Antuono M,
- Overheu T,
- Murphy D
- ↵.
- Churchman GJ,
- Noble A,
- Bailey G,
- Chittleborough D,
- Harper R
- ↵
- ↵.
- Pugh T, et al.
- ↵.
- Smith P, et al.
- ↵.
- Batjes NH, et al.
- ↵
- ↵.
- Scharlemann JP,
- Tanner EV,
- Hiederer R,
- Kapos V
- ↵.
- Batjes N
- ↵.
- Hengl T, et al.
- ↵.
- Eggleston HS,
- Buendia L,
- Miwa K,
- Ngara T,
- Tanabe K
- Intergovernmental Panel on Climate Change
- ↵
- ↵.
- Nachtergaele F, et al.
- ↵.
- Spera SA,
- Galford GL,
- Coe MT,
- Macedo MN,
- Mustard JF
- ↵
- ↵.
- Fujisaki K, et al.
- ↵.
- Pachepsky Y,
- Hill RL
- ↵
- ↵.
- Sanderman J,
- Baldock JA
- ↵
- ↵
- ↵.
- McSherry ME,
- Ritchie ME
- ↵
- ↵.
- Houghton R
- ↵.
- Lal R
- ↵.
- Smith P, et al.
- ↵.
- Eggleston H,
- Buendia L,
- Miwa K,
- Ngara T,
- Tanabe K
- ↵.
- Smith P, et al.
- ↵.
- Jenny H
- ↵
- ↵
- ↵.
- Köchy M,
- Don A,
- van der Molen MK,
- Freibauer A
- ↵
- ↵.
- Robinson N,
- Regetz J,
- Guralnick RP
- ↵.
- Conrad O, et al.
- ↵.
- Sayre R, et al.
- ↵.
- Mitchell T
- ↵.
- R Development Core Team
- ↵.
- Wright MN,
- Ziegler A
- ↵.
- Chen T,
- Guestrin C
Citation Manager Formats
Article Classifications
- Physical Sciences
- Sustainability Science