Carbon emissions reductions from Indonesia’s moratorium on forest concessions are cost-effective yet contribute little to Paris pledges

Significance More than a decade after the global adoption of REDD+ as a climate change mitigation strategy, countries have started accessing results-based payments. However, the extent to which payments are actually based on results is unknown, necessitating program evaluations to establish the contribution of REDD+ to the Paris NDCs. We undertake a microeconometric evaluation of one of the most globally significant REDD+ initiatives, Indonesia’s moratorium on forest concessions, in which a payment has been awarded. At the agreed US$5/tCO2-eq, the value of our estimated cumulative carbon emissions far exceeds the proposed payment from the donor, Norway. Although cost-effective, the emissions reductions only contribute 3 to 4% of Indonesia’s NDC. This contribution could be increased in new initiatives with better-designed incentives and institutional arrangements.

. Share of pre-2011 palm oil, timber and logging concessions (combined) by district.

Ben Groom, Charles Palmer and Lorenzo Sileci 3 of 61
Around 3% of land designated as concessions pre-2011 is located in forestland within the Moratorium's 2011 boundaries, which are shown in Fig. S2. Covering a substantial proportion of Indonesia's primary forest estate, the 2011 Moratorium was renewed every two years before being made permanent in 2019 (Presidential Instruction No. 10/2011, extended under No. 6/2013, then No 8/2015 No. 6/2017 regarding a Moratorium on the Granting of New Licences and Improvement of Natural Primary Forest and Peatland Governance). The Moratorium boundaries established in 2011 have been periodically updated since. Initially covering 69 million hectares, the total area under the Moratorium has fluctuated since 2011, declining to about 64 million ha before rising to 66 million by 2018. These fluctuations were due to new spatial mapping inputs and surveys that changed the classification or designation of land (2). Moratorium boundaries denote the treatment areas for our empirical analysis, and are assumed constant throughout our treatment periods. Sources: Authors' elaboration on forest cover data (6). Moratorium shapefile from (7). recreation forests" (17.4 million ha), and "production forest" (46.2 million ha). The last, the biggest forest category by area, contains forests designated for the exploitation of timber and other forest products. According to Indonesia's FREL submission to the UNFCCC in 2016, up 20 million ha of permanent forest (specifically that which, by law, is slated for conversion) and APL forest is excluded from the estimation of Indonesia's FREL (20). Further inference is not possible due to the focus of estimates of the RSB and the FREL on the levels of deforestation rather than on levels of forest cover.
For both the RSB and FREL, annual forest cover change was estimated by overlaying land cover maps of two subsequent periods. Deforestation is defined as the change of natural forests from a forest class to a non-forest class at a given location in the period 2006-2016, while degradation occurs when a change from a primary to a secondary forest class is identified. Calculation of the RBP baseline used emission factors for each forest class and are identical to those used in Indonesia's first FREL. These emission factors were estimated using data primarily collected by the National Forest Inventory (NFI). Average annual emissions from deforestation and forest degradation in the period 2006-2016 were estimated to be 236.9 MtCO2-eq/yr and 41.6 MtCO2-eq/yr, respectively. Actual emissions in 2017 were estimated to be 228.3 MtCO2-eq/yr and 42.7 MtCO2-eq/yr from deforestation and forest degradation, respectively.
Against the 2006-2016 RBP baseline, Indonesia reduced emissions in 2017 by 7.4 MtCO2-eq/yr: 8.6 MtCO2-eq/yr from avoided deforestation (3.6% below the baseline) net of an increase in emissions from forest degradation, 1.2 MtCO2-eq/yr (2.9% above the baseline). Uncertainties related to the forest data and emissions factors used were calculated following IPCC guidelines (2006). To account for statistical uncertainty, the risk of carbon reversals and leakage, a "set aside factor" of 35% was applied to the estimates of emissions reductions, giving a net reduction of 4.8 MtCO2-eq/yr. Also estimated were emissions from changes in peatland (swamp) forest, both from peat decomposition and peat fires. The former was defined as the changing process of peat form as a result of a decline in water levels caused by deforestation and forest degradation, and land utilization. When peat is drained and exposed, it oxidises thus generating carbon dioxide. Emission factors for peat decomposition also followed IPCC guidelines, based on the assumption that all utilized areas are drained. The calculation of emissions from peat decomposition was otherwise identical to the method used for calculating emissions from deforestation and forest degradation but with the inclusion of 'inherited' emissions from peat decomposition. In 2017, 256.7 MtCO2-eq/yr was emitted from the decomposition of peat, which when compared to the RBP baseline of 260.6 MtCO2-eq/yr, generated an emissions reduction of 3.9 MtCO2-eq/yr.
Emissions from peat fires were estimated using data on burn scar areas. Fire incidence in peatland was 13,555 hectares in 2017. Combined with estimates of the mass of fuel available for combustion, to generate emission factors for burned peat, this amounted to 12.5 MtCO2-eq/yr. Against an annual emission baseline of 249 MtCO2-eq/yr between 2006 and 2016, there was an estimated emission reduction of 236.4 MtCO2-eq/yr, in 2017. In contrast to estimates of emissions from deforestation and forest degradation, these estimates of peat fire emissions were subject to considerably more uncertainty, which was not possible to quantify.
Perhaps due to this uncertainty, Indonesia's government initially excluded emissions from peat fires in its calculations. However, Norway's government factored in peat fires, coming up with a total emissions reduction estimate of 11.2 MtCO2-eq/yr for 2017 (1). Multiplied by a carbon price of US$5 per ton, set by Norway, this generated an estimated payment of $US 56.2 million to be paid by Norway to Indonesia for emissions reductions from deforestation and forest degradation in 2017.

Main and Additional Results
ATT results: year-on-year. The ATT estimates in the following tables are our main results, which are used to derive estimates of avoided forest cover loss and carbon emissions reductions.  Notes: Abadie-Imbens (21) standard errors.

Estimated avoided forest cover loss and carbon emissions reductions.
The estimates of avoided forest cover loss and carbon emissions reductions in the following tables can be seen graphically in Fig. 2.  Robustness checks (2017 endpoint). The following tables show the results from testing the robustness of our main results. Notes: Abadie-Imbens (21) standard errors.
* p < 0.1, ** p < 0.05, *** p < 0.01.        "Intact primary" forest (22, 23) results. The following tables show the results from using a tighter definition of "forest" determined by extent of forest intactness and contiguity (22,23).     Spatially heterogeneous effects of the Moratorium, 2011-17. Fig. S5 shows the heterogeneous effects of the Moratorium on dryland forest cover using district-level ATT generated from the aggregate ATT. Relatively large positive effects (shaded dark green) are revealed in the mountainous regions of Sumatra, districts in East and South Kalimantan, as well as parts of Java and Papua. Some of these districts are also the location of high shares of the total area of Indonesia's land designated as concessions pre-2011 ( Fig. S1) but this is not a consistent pattern across the country. For example, there is little or no effect in the lowlands of Sumatra, nor in parts of Central Kalimantan, where many palm oil and timber concessions are located.

Methods
Summary of empirical approach. The impacts of the 2011 Moratorium on forest cover and carbon emissions are estimated for the period starting in 2011 and ending variously in years 2012-2018. Up until 2017, the estimates are undertaken separately for dryland and peatland forest but for the period 2011-2018 the estimates are for dryland forest only due to the additional restrictions on peatland implemented in 2017 (17). Pre-and post-Moratorium panel data on forest cover spanning the period 2000-2018 are used. Identification of the causal effect stems from a difference-in-differences (DD) research design, which allows for a comparison of the treated (Moratorium) grid cells with untreated, control (non-Moratorium) grid cells, while controlling for pre-existing differences and a secular counterfactual trend in forest cover. A matching approach is taken so that counterfactual cells have the same or similar characteristics as the treated cells. A key identification assumption is that of parallel trends in unobservable cell characteristics between Moratorium and non-Moratorium cells. Although impossible to test directly, we undertake several placebo tests in order to check this assumption. The main analysis uses pre-treatment data from 2004. The placebo tests require data from before the time horizon considered in the main analysis and so also use data from 2000 to 2004. DD can be implemented in a number of ways, and decisions have to be taken about the appropriate DD estimator to deploy. We select among a number of different parametric and non-parametric (matched) DD and triple difference (DDD) estimators through a process of empirical testing of typical DD identification assumptions, and a sensitivity analysis of matching estimators and standard parametric (linear) models. This process of model selection culminates with a preference for a matched triple differences estimator following (24,25). After outlining the identification problem, the following sub-sections provide: an account of the estimators; our process of model selection via empirical testing of their identification assumptions, including placebo tests; and, the robustness checks applied to our main results, including sensitivity to matching procedures and a test of the Standard Unit Treatment Value Assumption (SUTVA), another identification assumption for DD.
The identification problem. The core empirical problem with evaluating the causal impact of the Moratorium is one of selection bias: the forest areas treated by the Moratorium differ in their observable and unobservable characteristics. Any imbalance in these characteristics implies that a direct comparison of forest cover in Moratorium and untreated non-Moratorium areas will capture pre-existing imbalances in, for example, their suitability for palm oil or timber production, level of forest cover or susceptibility to deforestation. These factors would then confound the estimate of the treatment effect. This generic programme evaluation problem, and the associated imbalances in underlying characteristics, is common in the analysis of area-based conservation policies (26)(27)(28). Protected areas, for instance, are often established in economically-marginal land with "few profitable alternative uses" (26,29). Forest areas covered by the Moratorium are similarly distinctive from those not covered by the Moratorium. Fig. 1 shows the average level of forest cover for grid cells outside concessions and cells within concessions, for both Moratorium and non-Moratorium areas. Areas under the Moratorium start with higher levels of forest cover in 2000, and this remains true throughout the period of observation and for both peatland and dryland forest, and for all concession types (palm oil, logging and timber). In all but one within-concession case, there are clear differences in forest levels between the treated and control cells. Fig. 1 also illustrates forest cover trends over our observation period. The general picture is one of steady decline, with forest cover declining by at least 10 percentage points over the 18-year observation period. These trends are steeper in concession cells than in non-concession cells irrespective of whether forest is part of the Moratorium or not. Tables S27-S34 present descriptive statistics illustrating additional differences between the Moratorium and non-Moratorium forest areas in terms of the key variables that determine the likelihood of forest areas becoming new concessions.
The essence of the DD design is depicted in Fig. 1. We have pre-and post-Moratorium data, and spatially well-defined treatment (Moratorium) and control (non-Moratorium) groups. A simple parametric DD would identify the impact of the Moratorium by estimating the difference in forest cover before and after the Moratorium was implemented in areas covered by the Moratorium, and takes from this the difference in forest cover in the non-Moratorium areas. Under certain assumptions DD approaches identify causal impacts, removing both the pre-existing differences between Moratorium and non-Moratorium areas, which would confound a simple comparison of treatment and control, and the trend in non-Moratorium areas under the assumption that this trend would have continued in the post-implementation period in Moratorium areas. Two key assumptions for the control area trend to act as a counterfactual for the treatment and DD estimates to be interpreted as causal are: (1) the unobservable trends in both Moratorium and non-Moratorium areas are the same on average; and, (2) the Stable Unit Treatment Value Assumption (SUTVA), that is, there are no spillovers ('leakage') from Moratorium areas to non-Moratorium areas which would confound the estimated treatment effect. In what follows we explain how we address these identification issues.
Two further issues are noteworthy for our empirical approach. First, the Moratorium mandates that district governments stop issuing new concession licenses in forest areas covered by the Moratorium. Yet, after 2011 licenses continued to be issued in forest areas that dropped out of the Moratorium determined by boundaries established in 2011. Our measure of impact is based on a treatment group with boundaries that were established in 2011, and we assume that any concession-driven deforestation observed within these boundaries after 2011 violated the Moratorium. Second, our empirical strategy must overcome the fact that after 2011 we do not observe where new oil palm, timber and logging concessions would have located in the absence of the Moratorium, in forest areas covered by the Moratorium. For this reason, the empirical strategy must control for characteristics of grid cells that determine the likelihood becoming a new concession, and ensure that these characteristics are the same or similar between Moratorium and non-Moratorium areas. The hope is that by balancing the observable characteristics, the confounding effect of unobservable characteristics will also be balanced or controlled for (30)(31)(32).
We take several steps to ensure that the grid cells compared in the Moratorium and non-Moratorium forest areas are similar in terms of their observable characteristics. First, we exclude cells which are part of the Indonesian protected area network, both within and outside the Moratorium, as conversion in these cells is already strictly prohibited. Second, we remove all cells outside of concessions with an elevation of 1,000 metres or more above sea level. The likelihood of these cells being a realistic proposition for a concession in either Moratorium or non-Moratorium areas is close to zero because above 1,000 metres land is unsuitable for palm oil cultivation (33) and for Acacia mangium, the main tree species employed for the production of wood pulp and paper (34). Robustness to this choice of sample is discussed below, in the final sub-section. Third, as noted earlier, we undertake separate analyses for dryland and peatland areas. These three steps can be thought of as a type of 'matching' on elevation and land types, whereby Moratorium and non-Moratorium grid cells are constrained to be similar in these dimensions. Lastly, we use a suite of matching approaches to ensure the analysis takes place on a sample whose Moratorium grid cells can be matched to non-Moratorium grid-cells on the basis of a set of more specific grid-level characteristics. The main analysis uses the sample arising from a one-to-one propensity score caliper matching procedure. Other matching procedures are used to check robustness to this particular procedure. In the propensity score estimates, the sample ensures common support in the propensity score (30). The matched dataset for the main analysis is used for parametric and non-parametric estimators. For the analysis of the impact of the Moratorium across forest types (peatland, dryland) and land uses (concession, non-concession), matching approaches are applied separately. For instance, the propensity score is estimated separately for each forest type, and the matched dataset upon which the impact estimated differs from one forest type to another. Our process of model selection involves a sequence of placebo and falsification tests, which provide insights on the likelihood that the identification assumption of DD, parallel trends, holds. Finally, we test the SUTVA assumption. This process leads us to select a triple differences (DDD) estimator, which provides a correction for non-parallel trends. The estimators and the process of model selection are now explained in detail.

Estimators.
We use a number of estimators to estimate the causal impact of the Moratorium. We estimate the Average Treatment on the Treated (AT TDD) using a DD design, using both parametric and a non-parametric propensity score matching estimator.
The observed data on the outcome variable, Yit, for individual grid-cell i at time t, are generated via a switching process governed by the occurrence of the Moratorium, which is indicated by the variable Dit : where Dit is an indicator variable that interacts a variable for pre-and post-treatment periods, rt, with an indicator for treatment status after the treatment period, di : Dit = rtdi (35, p100-101). Y0it and Y1it are the potential outcomes for grid cell i in the treated state (1) and untreated state (0) at time t. Potential outcomes can be separated into their expected values, µ1it and µ0it, and a mean zero random component, ε kit : Suppose that the ε kit are comprised of an individual fixed effect, λi, and a transitory component, u kit : ε kit = λi + u kit . If the pre-treatment period is T0 and the post-treatment period is T1, then the difference-in-differences estimator uses the observed data from the switching equation to identify the Average Treatment on the Treated (AT TDD) defined in terms of the potential outcomes as follows: where Eq. 1 uses the observed data, Eq. 2 reflects the potential outcomes, and Eq. 4 is simply a definition. Eq. 2 follows because the individual fixed effects, λi, drop out when first differences are taken. Eq. 3 follows under the assumption of parallel trends in unobservables, that is if: and Eq. 4 is the definition of the Average Treatment Effect and follows because by definition: meaning that potential outcomes pre-treatment are identical for the treated group. Identification of AT TDD using the DD estimator above does not rely on parametric assumptions, and is in principle a non-parametric estimator. With the assumption of a separable disturbance term: ε kit = λi + u kit , with individual heterogeneity, λi, fixed over time and between potential outcomes, the existence of panel data allows us to take differences within each individual grid cell, i, remove these fixed effects and effectively control for any endogeneity that might be associated with individual heterogeneity of this type. A parametric fixed effects regression would use a linear model to control for individual heterogeneity in a similar way. We use a range of parametric and non-parametric matching DD estimators to estimate the treatment effect of the Moratorium. The parametric DD estimator is specified as: where Yit is forest cover in (non-concession) grid cell i in year t, Dit is the time-varying Moratorium treatment indicator, X kit are n time-varying control variables, which could include including climatic controls (minimum and maximum temperature, precipitation). Other controls, such as slope, elevation, and distance from cities and roads, are unnecessary for this particular estimation strategy because grid-cell fixed effects adjust for time-invariant controls. Dit is the "policy" variable taking the value 1 after 2010 for Moratorium areas, otherwise 0 and β1 is the ATT. This basic model controls for time-invariant characteristics via the individual grid-level fixed effects, λi, and time fixed effects, θt, which capture shocks common to all grid cells such as weather shocks. The estimate of β1 in Eq. 5 provides an estimate of ATT that reflects the average of the annual impacts over the post-treatment period (from period T1 to T ). Also interesting is to estimate the impact year-by-year, which requires the following specification (see e.g. 36): In Eq. 6, the β1s coefficients represent the DD estimates of ATT for each year s. The coefficients are comparable in interpretation to the estimates from our alternative estimators, which belong to the non-parametric, matched DD family of estimators. Matching DD methods have some advantages over parametric DD in that they do not rely on parametric assumptions to control for differences in observable characteristics (30) and hence, are more likely to obtain like-for-like comparisons between treated grid cells and untreated ones. Matching estimators allow for arbitrary heterogeneity in the treatment effect compared to parametric approaches where the heterogeneity must be specified explicitly (37). As discussed in the following sub-section, while differencing accounts for unobservable fixed effects in DD estimators, under the identifying assumptions matching ensures that the heterogeneous trend in the transitory component, it, is balanced between treatment and control groups (see e.g. (24)). Matching methods have been used to evaluate other area-based conservation policies due to these properties (27,28,38,39). One typical matching approach matches on the propensity score: the probability of a unit being in the treated or untreated group. We use caliper one-to-one propensity score matching so that each Moratorium cell is matched with a similar non-Moratorium cell on the basis of the likelihood of being in the Moratorium. While propensity score matching does rely on parametric assumptions in estimating the propensity score, the estimation of the treatment effect is otherwise non-parametric, so we maintain the distinction between parametric and non-parametric (matching) DD approaches in what follows.
Propensity score matching DD conditions the differences in forest cover on the propensity score P (X) to remove bias in levels and trends from confounding variables (24,30,40) under the assumption of selection on observables: di is independent of (Y0, Y1) when outcomes are conditioned on P (X). Suppose that T is our final post treatment period, then together with the parallel trends assumption: where T ranges from 2012-2018, reflecting the different time-horizons over which the Moratorium's impact is estimated, T1 = 2011 is the treatment year, and Tm < T1 is the pre-treatment year(s) from which the matching variables' data are taken.
To estimate AT TDD,T we use one-to-one caliper propensity score matching on data Y M it from the Moratorium grid cell i, matched with data Y j,N M it in cell j from the non-Moratorium grid-cells: where I0 is an indicator variable that is equal to 1 if a grid cell i in the Moratorium has a counterfactual grid cell j in the non-Moratorium area whose propensity scores pi and pj fall within the caliper: where ε is a predetermined distance in propensity score space. The caliper defines the set of one-to-one matches from the non-Moratorium area, C (i) , such that j ∈ C (i). Therefore, I0 = I (min |pi − pj| : |pi − pj| < ). Our large dataset allows us to overcome the chief concern in the literature on optimal caliper choice: the loss of sample size with increased precision (41).
We use a precise caliper of 0.0001 which leads to a small percentage of observations being dropped where Moratorium cells do not have matches within this caliper. These are relatively low-quality matches and do not appear in C (i). In the process of model selection, described in the following sub-section, we show the results are in any event insensitive to the choice of a wider caliper of 0.01. The remaining matched sample is of size NM . The average of the counterfactual matched outcome is taken in case of ties in the non-Moratorium group. Our process of model selection suggests that the parallel trends assumption, which underpins the identification of the treatment effect when using the DD estimator, may not hold. Triple Difference estimators (DDD) are argued to have weaker identification assumptions than DD estimators, e.g. (42, p.11), and can be used as a robustness check or corrective measure (25,42). We use DDD as corrective measure following (24)). Our DDD estimator takes the form: where the second line reflects the correction for the non-parallel pre-treatment trends between T0 and T 1 with a correction to adjust the trend-correction for potentially different pre-and post-treatment time horizons.
The additional adjustment in the DDD estimator identifies the treatment effect in the event that the expected divergence of trends between the Moratorium and non-Moratorium areas is linear in time and the conditioning variables include pre-treatment outcomes including the year immediately prior to the treatment, in our case 2010. To demonstrate, leaving the conditioning on observable characteristics implicit, in a typical DD framework the AT TDD is non-parametrically identified as follows: where Yit are the observable data, ∆T,T 1 represents the change operator between the period T and treatment period T1, AT TDD,T, Yi0t] (and as above Y ikt represent the potential outcomes for treated: k = 1, and untreated: k = 0 grid cells), and the last two terms in Eq. (11) are the trends in the unobservable characteristics in the treated and untreated states, which sum to zero if the parallel trends assumption holds. Placebo tests are necessary to test whether or not the pre-trends are parallel in all cases for the simple DD estimator even when matching on several pre-treatment outcomes. If they are not parallel then this would imply that the DD estimator over the time horizon T1 to T is potentially contaminated by a diverging trend in unobservables: the difference between the last two terms of Eq. 11. To try and resolve this issue, which could be problematic over long-time horizons, the DDD approach (proposed by (24)) could be adopted. The main identification assumptions are twofold. First, the non-parallel trend is linear: Second, the trend holds for all t and is unaffected by the treatment. With these assumptions, the confounding term in Eq. 11 becomes γ (T − T1). For pre-treatment time periods T0 to T 1 the DD estimator in Eq. 11 becomes: because AT T DD,T 1 ,T 0 = 0 in the pre-treatment phase. This expression is estimated using propensity score matching DD in the pre-treatment period for Moratorium and non-Moratorium areas. Having conditioned on pre-treatment outcomes in the matching routine, a trend correction is required for the period T1 to T , which requires a conversion factor: T −T 1 to relate the estimate in Eq. 12 to the required treatment effect in Eq. 10. This is a minor augmentation of the DDD approach proposed in (25). Step 1: Parametric versus non-parametric estimators. Although AT TDD is non-parametrically identified above, both parametric and non-parametric estimators can be used to estimate AT TDD. Each controls for time invariant unobservables and each shares the following identification assumptions: (i) selection on observables; and, (ii) parallel trends in unobservable characteristics. Parametric estimators are typically less well-suited to dealing with individual heterogeneity in estimating treatment effects (43). In the difference-in-differences framework, the parallel trends assumption is often argued to be more likely to hold when treated units are closely matched to non-treated (43) and experimental-empirical evidence often supports this view (e.g. 44). Furthermore, parametric approaches to conditioning on observable characteristics often place restrictive assumptions on functional forms of the relationship between conditioning variables and outcome variables, which are absent from non-parametric approaches and their estimation of treatment effects. Since these shortcomings can introduce bias in the estimates, we have an a priori preference for non-parametric estimates, among which we include propensity score matching. Nevertheless, matching estimators can also be subject to biases of their own (see e.g. (24,45)). To transparently investigate potential differences between estimators, we undertake some exploratory sensitivity analysis of different estimators. We estimate parametric DD models of the form described in Eq. 5 using a fixed-effects estimator. We compare these estimates to a propensity score, one-to-one caliper, matched DD estimator (NP-DD). To rule out the possibility that the balance in the sample of observable characteristics between Moratorium and non-Moratorium areas causes differences between the parametric and non-parametric estimates, we use the same matched sample for the parametric DD estimation as is used for matched DD estimators (see (30)). We then undertake some basic sensitivity analysis on the non-parametric estimators by relaxing the precision of the matching procedure in two ways: (i) widening the caliper; and, (ii) sampling matches without replacement. We then compare the results with the parametric estimates and draw tentative conclusions.
Results. Table S35 presents the results (the ATT for 2017) of this comparative analysis. The results do not undermine, and rather strengthen, our a priori preference for non-parametric matching estimators. The parametric estimators (P-1 to Conley-HAC 4) are based on Eq. 6. Table S35 reports the results of these specifications. P-1 controls for grid-cell (λi) and year (θt) fixed effects. Model P-2 also controls for district-by-year fixed effects. Standard errors are clustered at the district level. These interaction effects capture heterogeneous trends by district. Models Conley 1 to Conley 4 are based on the same estimating equation as models P-1 and P-2, but employ Conley standard errors with respectively 10, 20, 50 and 100km thresholds for spatial autocorrelation decay. Specifications labelled Conley-HAC 1 to Conley-HAC 4 augment these latter models with Newey-West corrections for heteroskedasticity and temporal autocorrelations (10 lags). Comparing estimators P-1 and P-2, we find that the parametric DD is sensitive to whether or not district-by-year interaction effects are included in the estimation. The estimated ATT for P-1 is 3.76, whereas with district-by-year fixed effects the estimate drops by almost 80% to 0.79. Models Conley 1 to Conley-HAC 4 differ from P-2 exclusively with respect to the standard errors around the point estimates. We find that for thresholds of spatial autocorrelation decay below 20Km, Conley and Conley-HAC standard errors are smaller than those clustered at the district level; these are similar in magnitude to Conley and Conley-HAC standard errors using a 50km radius around observational units, while employing a 100km radius results in larger standard errors and slightly lower statistical significance. Nonetheless, the parametric results remain strongly significant for any standard error specification we deploy. The sensitivity of the ATT's magnitude to the inclusion of district-by-year interactions most likely reflects that concessions, and land uses in general, are licensed and administered at the district level and these administrative processes may differ across districts, causing trends to diverge in a way that is correlated with the location of the Moratorium. This suggests that district-level trends should be controlled for when estimating treatment effects.
Rather than control for this possibility in a parametric model, we use a matching estimator which matches on grid-level characteristics. The matching variables explicitly capture important differences between the Moratorium and non-Moratorium grid cells, their dynamics and suitability for future concessions. We use pre-treatment values of: distance to concessions (palm oil, timber and logging), distance to roads and cities, population (2005 and 2010), forest cover for each year from 2005 to 2010, elevation, slope, peat depth and above-ground carbon stock in the year 2000. With regard to the heterogeneous trends that were shown to be an important factor in the parametric estimator, we control for heterogeneous pre-treatment trends by matching on pre-treatment trends in forest cover and population for several pre-treatment years: 2005-2010 for forest cover, 2005 and 2010 for population. We prefer this approach to the simple inclusion of district-by-year fixed-effects since the approach in principle controls for more sources of heterogeneous trends due to conditioning on other matching variables.
Comparing model P-2 with the propensity score matched result from the main analysis, NP-DD (caliper = 0.0001), we find that the NP-DD estimates are almost identical: NP-DD = 0.81 and P-2 = 0.79. Table S43 shows the extensive list of matching variables that underpin the NP-DD estimate, which help to control for heterogeneity. When the matches are made less precise by obtaining matches without replacement (NP-1), the matching ATT decreases significantly: NP-1 = 0.36. NP-1, however, drops ten times the number of observations (> 70,000) in the process, and so is less favourable. Widening the caliper from 0.0001 to 0.01 (NP-2) drops fewer observations (2 in total), but this has little effect on the estimates: NP-2 = 0.81, as does removing the caliper altogether (NP-3), which is equivalent to nearest neighbour matching: NP-3 = 0.81. With no caliper and no replacement (NP-4), estimates increase towards the potentially problematic parametric estimates that do not control for district-by-year effects: NP-4 = 1.6. From these comparisons we conclude that the matching estimates are sensitive only to extreme reductions in precision of the matching (no replacement or no caliper), and there are some reasons to prefer the NP-DD estimator.
While these comparisons of estimators are exploratory, we prefer the propensity score matching estimator because it allows arbitrary heterogeneity in the treatment effect and closely matches treatment and control cells on the basis of pre-treatment trends, thereby accommodating heterogeneous pre-trends at district or indeed other levels, without imposing any additional structure on the estimation process. It is possible that controlling for district-by-year fixed effects in the parametric model may attenuate the estimate of the treatment effect by controlling for district level post-treatment responses, which would be a legitimate channel for the Moratorium being successful. Furthermore, linear parametric approaches to causal estimation also have been shown to be problematic in some multi-period cases (46). We therefore pursue a variety of non-parametric matching estimators to estimate the impact of the Moratorium while noting that the appropriate parametric approaches (P-2 to Conley-HAC 4) provide estimates of essentially the same magnitude, albeit slightly lower. We then subject our preferred estimator to rigorous robustness testing of the matching approach algorithm and identification assumptions.
Step 2: Placebo tests in time. Our model selection approach turns to the identification assumptions of the DD estimator, namely the parallel trends assumption. Using the preferred non-parametric matched DD estimates of the treatment effect of the Moratorium for peatland and dryland forest, we attempt to empirically falsify the parallel trends assumption. To do this we undertake two distinct temporal placebo tests. The relevant time periods for the placebo tests can be seen in Fig. S6, which reproduces the trends for non-concession dryland grid cells for the period 2000-2018, from Fig. 1.
The first placebo test, Placebo Test 1, maintains the one-to-one matches between the Moratorium grid cells and the non-Moratorium grid cells that were used in the original DD estimates shown in Fig. 1 of the main text. In this case, matching variables are drawn from the period between T0 to T 1 in Fig. S6. These cells are then subjected to a placebo Moratorium in 2005 rather than the actual Moratorium in 2011. We use the difference in forest cover post-and pre-placebo treatment, respectively, the period between 2000 and T0, and T0 to T 1 , in Fig. S6, and estimate the placebo treatment effect AT TDDP . This tests how successful the strategy of matching on pre-treatment outcomes in 2005-2010 is in ensuring, on average, parallel trends between Moratorium and non-Moratorium grid cells. The null hypothesis is that these pre-treatment controls ensure that the parallel trends assumption cannot be falsified and hence, we ought not to observe a significant effect of the placebo treatment. The alternative hypothesis is that parallel trends assumption can be falsified despite conditioning on outcomes between 2005-2010. Placebo Test 1 is motivated by (24,25), who showed that matching on pre-treatment outcomes as a means of obtaining parallel pre-trends can sometimes fail and actually introduce bias. We match on several pre-treatment lags of the outcome variable, and so our expectation was that the test would fail to reject the null and hence, fail to falsify the parallel trends assumption. Year Forest Cover (%) Non-concession, outside moratorium Non-concession, inside moratorium The second placebo test in time, Placebo Test 2, is a more traditional placebo test which shifts the entire analysis in time. That is, matching takes place on pre-placebo data (2000 to T0) rather than on pre-(real) treatment data (T0 to T 1 ) as in Placebo Test 1. Again, the null hypothesis is that the parallel trends assumption of the matching estimator is not falsified. Both placebo tests are undertaken on each forest type. Together, these placebo tests can also be seen in light of the concerns raised by (47) about regression to the mean when matching on pre-treatment outcomes.
Results. The placebo/falsification tests indicate that the parallel trends assumption fails in some cases for the DD estimator. Tables S36-S40 report the results of all temporal placebo tests for dryland and peatland. For dryland areas, Placebo Test 1 yields a positive treatment effect in some cases (See Table S36). For stringency we undertake sensitivity on the definition of forested land, at the grid-cell scale, to be included in the analysis ranging from all land (untrimmed) to 30% forest cover in the base year, 2005, to 60% forest cover in 2005. Our central results use the unrestricted definition. In all cases but the 60% definition, we reject the null of Placebo Test 1. On the other hand, Placebo Test 2 is unable to falsify the parallel trends assumption, with well-defined zero impacts estimated across all three of our definitions of forest cover.
For peatland areas, Placebo Test 1 fails to reject the null hypothesis and falsify the parallel trends assumption, suggesting that matching on pre-treatment outcomes may be sufficient for identification. However, Placebo Test 2 rejects the null, with a statistically significant and negative effect of the placebo in 2005. This suggests that the parallel trends assumption may not hold in general for peatland. The failure of either of the placebo tests for dryland and peatland suggests remedial action is necessary.
Step 3: Response to placebo tests in time. We now argue that NP-DDD estimator shown in Eq. 10, which makes a correction in such cases, is therefore preferred. Rejection of the null hypothesis of the Placebo Test 1 is a sign that controlling for pre-treatment variables may have failed to ensure parallel pre-trends between Moratorium and non-Moratorium grid cells. Trends in unobservables diverge post-placebo and therefore are likely to do so post-treatment in the main analysis, when the real treatment happens in 2011 (T1). Rejection of Placebo Test 2 has a similar implication: falsification of the parallel trends assumption. While often framed as a test of the parallel trends assumption, these tests are only indicative of unobservable phenomena. Failure to reject the null is an encouraging sign, rejection is suggestive of a need for remedial measures.
Following (24,25), falsification can be ameliorated by a triple differences (DDD) approach that removes the pre-treatment difference in trends from the NP-DD estimate. Formally, this estimator (NP-DDD) is given by Eq. 10, where the last terms in square brackets represents the difference in the pre-trend between Moratorium and non-Moratorium areas between 2005 and 2010 at the matched grid-cell level. In Eq. 12, the assumption of a linear trend in the divergence of parallel trends is assumed, and the effect of this trend is corrected for the time horizon of the treatment effect being estimated (the choice of T in Fig. S6). This specification is an adaptation to the proposal by (25) and its application to deforestation in (24).
Results. In practice, the NP-DDD estimator in Eq. 10 requires estimating and removing an adjusted (for time horizon) version the estimated ATT in the placebo test. For instance, in Table S36 we see that 0.181 of the estimated effect in the main NP-DD estimate (untrimmed) of effect of the Moratorium may well arise because of non-parallel trends. The NP-DDD estimator simply removes this estimated effect, adjusting for the different time-horizon, from the NP-DD estimate for each year T in which it is estimated. Finally, we perform a third placebo test in time, Placebo Test 3, which aims at establishing whether the DDD methodology identifies a positive effect of the Moratorium on dryland cells when it is applied over the 2000-2010 (2011) placebo interval, with placebo treatment set to start in 2005. For the DDD estimator, we fail to reject the null of parallel trends in the cases where DD rejected the null. Our main results are derived using the NP-DDD estimator implemented in this way, and Fig. 2 compares the results from NP-DD and NP-DDD estimators graphically to illustrate the differences.
Step 4: Placebo tests in space. Finally, as a final check of the robustness of the NP-DDD estimator, we undertake another placebo-falsification test. We run spatial placebo tests using untreated cells in the Moratorium (treated) area as the placebo treatment cells. This spatial placebo test exploits a loophole in the Moratorium rules, which allows for continued deforestation in pre-2011 (pre-Moratorium) concession areas irrespective of whether the concessions are in the Moratorium or not. Since grid cells in concessions in the Moratorium area should be unaffected by the Moratorium, they are obvious candidates for Moratorium placebo treatment cells.
The null hypothesis of the spatial placebo test is that the NP-DDD estimator is well-identified: the parallel trends assumption cannot be falsified and there is no impact of the Moratorium on Moratorium concession cells compared to non-Moratorium concession cells. The alternative hypothesis is that there remain some residual unobservable differences between the Moratorium and non-Moratorium areas that confound the estimate of the impact of the Moratorium. This would be the implication of a non-zero estimate of the Moratorium in the spatial placebo test. If the null hypothesis is rejected, it can be interpreted as a falsification of the parallel trends assumption. In principle, rejection of the null can inform a spatial triple differences DDD design to remove any deviation of parallel trends from the central NP-DDD estimate (31,35,48,49). We run the spatial placebo test for all concession types, and also investigate the placebo for individual concession and forest types using the non-parametric (25) NP-DDD matching estimator of the treatment effect in 2011 (T1). The empirical model can be thought of as Eq. 10, with the exception that the treatment and control data come from the concession cells in the Moratorium and concession cells in the non-Moratorium areas, instead of Moratorium and non-Moratorium non-concession forested areas.
Results . Tables S41-S42 show the results of these spatial placebo tests, all of which apply the NP-DDD estimator. Table S41 shows that for dryland areas, there are no differences in forest cover between Moratorium and non-Moratorium concession cells. Logging concessions show a negative effect, which is only statistically significant at the 10% level. For peatland areas, a significantly positive effect is detected for logging concessions only (Table S42), while for all concessions we fail to reject the null of the placebo test. We take these results to mean that on average, the NP-DDD is not susceptible to differences in trends due to unobserved differences between Moratorium and non-Moratorium areas.
With the propensity score, one-to-one, caliper matching NP-DDD estimator shown to satisfy the spatial placebo test, we undertake some additional checks and robustness analyses. First, we illustrate the success of our propensity score matching procedure at balancing the characteristics between Moratorium and non-Moratorium areas. We then test the robustness of the results to different types of propensity score matching approaches and a non-propensity score matching procedure, a series of Coarsened Exact Matching estimates. Finally we test the Stable Unit Treatment Value Assumption (SUTVA). (P-1) Parametric DD, on matched sample from NP-1, year fixed effects, SEs clustered at the district level.
(P-2) Parametric DD, on matched sample from NP-1, year and district*year fixed effects, SEs clustered at the district level.
(Conley 4) Same as P-2, Conley SEs with 100km radius for spatial autocorrelation decay.      Balance tests for one-to-one caliper matching. The descriptive statistics in Tables S27-S34 show that in the raw data there are large differences between the Moratorium and non-Moratorium areas. Matching variables are selected on the basis of these differences, shown in Table S43-S54 by forest type and land use. For example, Table S43 describes the successful results of the matching procedure for dryland forest areas, and the success in balancing the treatment and control grid cells is similar across all of the remaining matching tables. In Table S43, forest cover in the six years prior to the treatment (2011) in the matched sample has a mean difference of no more than 0.5 ha compared to approximately 15 ha before matching. The standardised difference is no more than 0.55. These small mean differences are also complemented by two comparisons of the distributions of these characteristics: the Kolmogorov Smirnov test and a test of differences in the cumulative distribution function (eCDF). Both reveal highly balanced control variables in the matched sample, with small differences in the CDFs of the variables. Our conclusion is that the matching procedure performs well.

Robustness
Sensitivity to matching procedures. We conduct additional sensitivity analyses on the assumptions used in propensity score matching estimates and a robustness analysis of the central NP-DDD on alternative approaches to matching procedures that do not rely on the propensity score. In part these test the possibility that propensity score matching may introduce biases due to the way in which it reduces the dimensionality of the matching problem to matching on a single dimension: the propensity score (e.g. (45)). On the sensitivity of the propensity score estimates to the parameter choices in implementation, the analyses are: (1) k > 1 nearest neighbours analysis in which we test the sensitivity of our results to matching on k = 2, 3, 5 nearest neighbours; and, (2) caliper choices, where we test calipers of 0.01 and 0.001 in one-to-one nearest neighbour matching. With regard to alternative, non-propensity score, matching estimators we undertake Coarsened Exact Matching (CEM) (50,51). The use of CEM means that the sample sizes are slightly more sensitive to particular choices of matching variables and the coarseness of the matching. For this reason we undertake four separate CEM approaches with various degrees of coarseness to understand these sensitivities. We use the STATA 16 cem routine and default coarseness settings unless otherwise stated (See  table notes). Tables S13 and S14 shows the results of the sensitivity analyses on dryland and peatland forests. The results show that the main results are not being driven by the choice of the propensity score matching estimator over CEM. Tables S13 and S14 show that the CEM estimates do not differ to any great extent for peatland and dryland forest. Furthermore, Tables S17 and S18 show that the main results are robust to the number of nearest neighbours chosen for 2,3 and 5 neighbours. Finally, our choice of sample removed areas above 1,000 m for agronomic reasons. Tables S17 and S18 also show that this selection of the sample is not driving the results either. Overall, these results are strongly supportive of our estimation approach leading to our main results.
Testing the SUTVA assumption using Regression Discontinuity: 'Leakage'? . A final identification assumption of the DD and DDD estimators is the Stable Unit Treatment Value Assumption (SUTVA): the treatment causes leakage to untreated forest areas via spatial, behavioural or general equilibrium effects, a common confounder in the evaluation of area-based policies (27,28,38,39,52). If protection via the Moratorium induces the displacement of forest clearing to outside the Moratorium's boundaries, deforestation rates inside and outside these boundaries are contemporaneously affected in opposite directions resulting in treatment effects of a higher magnitude than the "true" effects. To check for leakage, we use a regression discontinuity design (53, 54) with a sharp cut-off at the Moratorium's boundaries. Regression discontinuity estimates the LATE of the Moratorium in the proximity of its boundaries with non-Moratorium land. In particular, we estimate the following equation (following (55)): where Di is deforestation (in pixels) in grid cell i; M oratoriumi is a binary indicator representing treatment assignment, Xi is a vector of cross-sectional control variables, namely slope, elevation, distance from cities and roads, and above-ground biomass content, and f (Distancei) = M oratoriumi * fMoratorium(Distancei)+

+ (1 − M oratoriumi)fNon−Moratorium(Distancei)
is a polynomial term which weighs observations with respect to their distance from the Moratorium's boundaries.
We specify separate linear polynomials on both sides of the boundary cut-off (following (56) and (55)), and estimate treatment effects via OLS regressions with robust standard errors clustered at the district level. Our preferred results are obtained via separate linear polynomials and optimal bandwidth selection through the Calonico (54) method. In Fig. S7, we also present results using the Imbens-Kalyanaraman (57) bandwidth selection algorithm (panel B).
In Fig. S8-S12, we visualise the full dataset employed in the spatial regression discontinuity designs, letting the bandwidth size vary according to either the Calonico (54) or the Imbens-Kalyanaraman (57) method. First, we report two plots which relate the forcing variable, distance from the Moratorium border, to forest cover in pixels, for 2004 and 2010 (Fig. S8). We treat 2004 as the "base year" since we estimate regression discontinuity LATEs for 2005-2018. As shown in Fig. S8 (left panel), forest cover is not distributed smoothly around the threshold at the baseline, with Moratorium grid cells discontinuously more forested than the surrounding areas. Fig. S8 (right panel) confirms the trend for 2010, the actual baseline year prior to the implementation of the Moratorium. These descriptive results are a confirmation of the economic marginality of Moratorium grid cells, which tended to be cleared at a slower rate than non-Moratorium grid cells in the immediate vicinity, even before the Moratorium was implemented.
In Fig. S9 and S10, we report the discontinuity in deforestation rates around Moratorium boundaries, restricted to the bandwidth level calculated with our preferred Calonico (54) procedure. Fig. S9 examines pre-treatment years while Fig. S10 visualises data for 2011-2018. Clearly, deforestation is persistently higher outside of Moratorium boundaries, starting from 2005 and up until 2018, with only slight changes in the underlying data distribution. In Fig. S11 and S12, we report analogous regression discontinuity plots using the Imbens-Kalyanaraman (57) optimal bandwidth selection procedure, which yield similar results: deforestation rates are always higher outside the Moratorium, enhancing our confidence in ruling out the occurrence of spatial leakage.
We perform several robustness checks to assess the validity of our regression discontinuity estimates, which rule out the potential role of leakage in our results. In particular, we first run regressions with alternative bandwidths, namely: a 5 km fixed bandwidth; a 10 km fixed bandwidth; and, a 20 km fixed bandwidth (reported in the top row of Fig. S13). Given the spatial configuration of the Moratorium (Fig. S2), our bandwidths cover a large extent of non-Moratorium cells in which leakage could feasibly occur. One reason why we might not observe a leakage effect is because the Moratorium's boundaries have changed after forest areas dropped out of the Moratorium. This can be ruled out, at least partially, by the application of these different bandwidths.
The sensitivity of our results is tested with respect to alternative polynomial specifications, employing a quadratic function of distance on both sides of the cut-off using: the Imbens-Kalyanaraman (57) optimal bandwidth; a 10 km fixed bandwidth; and, the Calonico (54) optimal bandwidth. These results are reported in the bottom row of Fig. S13. Notably, the 95% confidence interval for the regression discontinuity LATE always includes zero, thereby reassuring us about the stability of our estimates with respect to alternative specifications. We are therefore able to rule out the role of leakage in affecting our main results.
Year Deforestation, ha