# On the relationship between aerosol model uncertainty and radiative forcing uncertainty

See allHide authors and affiliations

Edited by John H. Seinfeld, California Institute of Technology, Pasadena, CA, and approved December 11, 2015 (received for review April 10, 2015)

## Abstract

The largest uncertainty in the historical radiative forcing of climate is caused by the interaction of aerosols with clouds. Historical forcing is not a directly measurable quantity, so reliable assessments depend on the development of global models of aerosols and clouds that are well constrained by observations. However, there has been no systematic assessment of how reduction in the uncertainty of global aerosol models will feed through to the uncertainty in the predicted forcing. We use a global model perturbed parameter ensemble to show that tight observational constraint of aerosol concentrations in the model has a relatively small effect on the aerosol-related uncertainty in the calculated forcing between preindustrial and present-day periods. One factor is the low sensitivity of present-day aerosol to natural emissions that determine the preindustrial aerosol state. However, the major cause of the weak constraint is that the full uncertainty space of the model generates a large number of model variants that are equally acceptable compared to present-day aerosol observations. The narrow range of aerosol concentrations in the observationally constrained model gives the impression of low aerosol model uncertainty. However, these multiple “equifinal” models predict a wide range of forcings. To make progress, we need to develop a much deeper understanding of model uncertainty and ways to use observations to constrain it. Equifinality in the aerosol model means that tuning of a small number of model processes to achieve model−observation agreement could give a misleading impression of model robustness.

All models of environmental systems are uncertain because the model design cannot be a faithful reproduction of reality, and the magnitudes of many model quantities (parameters) are poorly defined by theory or experiment. To make models “accurate,” they are therefore evaluated against measurements so that the model state matches the observed state as closely as possible. However, models need to make predictions outside the conditions for which they were evaluated, and they need to be able to predict quantities for which observational constraints do not exist. Because of the difficulty of performing a sufficient number of model simulations that sample all possible settings of uncertain processes, it is rarely demonstrated how observational constraints truly reduce the uncertainty in related predicted quantities. Here we show that, even when extensive measurements are available to constrain a particular model output, other closely related outputs may not be well constrained at all.

We focus on aerosol radiative forcing as an important climate model uncertainty and attempt to reduce the uncertainty based on idealized present-day (PD) aerosol measurements. Changes in aerosols between the preindustrial (PI) and PD periods have caused changes in cloud properties, resulting in a very uncertain radiative forcing of between −1.33 W⋅m^{−2} and −0.06 W⋅m^{−2} (1) calculated across multiple global climate models. This large uncertainty remains despite our increased understanding of aerosol science and the improvement in model fidelity and continues to limit our ability to estimate the climate sensitivity (2⇓–4). Although the modulating effect of aerosols on clouds can be observed in the modern atmosphere (5), the PI to PD aerosol−cloud radiative forcing is not directly measureable because the PI aerosol state is unknown. The PI aerosol state has been shown to be a major component of the forcing uncertainty (6⇓–8). Satellite measurements of aerosol and cloud properties have been used to estimate the forcing (1, 9, 10), but they are also dependent on unverifiable assumptions about the PI aerosol and cloud state (1, 11, 12), and cloud−aerosol relationships may not extrapolate well to the appropriate low-aerosol conditions (13). We therefore rely on models evaluated against PD conditions to predict how historical changes in aerosols affect cloud radiative forcing.

In developing and evaluating models, there is an implicit and reasonable assumption that, if we can reduce the uncertainty of the aerosol component of a model, we will also reduce the forcing uncertainty. In this paper, we investigate this assumption for aerosol−cloud forcing by quantifying the relationship between the uncertainty in an aerosol model and the uncertainty in aerosol−cloud forcing predicted by that model.

Fig. 1 outlines the nature of the model constraint problem. We simulate aerosol properties *A* [such as cloud condensation nucleus concentration (CCN) or aerosol optical depth] with uncertainties that are determined by a large set of uncertain processes in the aerosol model. The combined and interacting effects of all of the process uncertainties result in an uncertainty in the magnitude of the aerosol property *A*, represented by a probability distribution in each global grid cell (Fig. 1*A*). We use the model to predict the forcing *F*, which will have an associated uncertainty caused by the uncertainty in the aerosol model. In this paper, we examine the link between the uncertainty in the aerosol component of the global model and the effect it has on the uncertainty in forcing.

The task of aerosol model constraint involves finding models that predict aerosol properties *A* within the observed range (Fig. 1*C*) and quantifying the spread of forcings from this smaller set of models (Fig. 1*D*). Normally, because of computational demands, we do not know the probability distribution of the model outputs. Rather, we have only a small number of structurally different models (14) or a few sensitivity tests in which single parameters are varied within one model. Because these approaches provide a very limited sampling of model uncertainties, it has not previously been possible to quantify the relationship between uncertainty in *A* and *F*. However, based on refs. 15 and 6, we have information about the probability distribution of aerosol properties and forcing from a sampling of 140,000 runs exploring all dimensions of 28 uncertain parameters of a global aerosol model (from an emulator), which allows us to quantify this relationship for the first time, to our knowledge.

The question we pose is: If we could constrain a model using aerosol measurements *A* with known uncertainty in every grid box of the model (Fig. 1*C*), how well would we be able to constrain the uncertainty in forcing *F* (Fig. 1*D*) caused by the uncertainty of the aerosol model? There are several factors that may limit the extent to which aerosol observations will help narrow the forcing uncertainty.

First, *F* depends on model processes that do not affect *A*. This is, of course, the case because *F* is additionally sensitive to cloud distribution as well as model processes like updraft speed and cloud entrainment that do not greatly affect aerosols. However, here we explore only the effects of uncertainty in the aerosol component of the model and design the experiment such that uncertainties in *F* and *A* are fully defined by the same set of parameters (i.e., there are no hidden sources of uncertainty that are unique to *F*). Although this is only part of the forcing uncertainty problem, the uncertainty range due to aerosol processes alone is comparable to that calculated from multiple models (6).

Second, although uncertainties in *F* and *A* depend on the same set of processes, the dependencies can be different. If *F* is very sensitive to a particular parameter but *A* is less sensitive, then observations of *A* can only weakly constrain *F*. Nevertheless, we might expect that, by making global measurements of *A*, we can sample environments in which, eventually, all parameter sensitivities become important and can be constrained by the observations.

Third, *F* depends on the change in aerosols between an historical baseline period and the present day, but we can observe *A* only in the present day. A large fraction of the uncertainty in forcing can be attributed to natural aerosols that determine PI conditions (6), and natural aerosol sensitivities are suppressed by pollution (16). Nevertheless, the modeled PI and PD aerosol properties are determined by the same model processes, so we expect a model that is constrained using PD measurements to have increased skill in simulating PI conditions also. Furthermore, if we can measure *A* in environments that have changed little since the baseline period (12), then we might be able to constrain the processes that have the greatest influence on aerosols in PI environments.

Fourth, *F* is nonlinearly related to *A* (6). At very low aerosol concentrations, *F* depends quite steeply on changes in *A*, but, in polluted high-aerosol environments, *F* becomes increasingly insensitive to changes in *A*. Under clean conditions, *F* is much more sensitive to the constraint of *A*. This itself is likely to be an irreducible uncertainty in *F* unless the PI baseline aerosol can be defined exactly.

Fifth, there are many compensating model factors that affect *A*, so the values of the process parameters will not be uniquely defined by measurements of *A*. For example, errors in deposition and emission rates could be difficult to separate using measurements of *A* alone because they have compensating effects. This problem is termed “equifinality” (17) and is well recognized in other fields of environmental science. Equifinality is not a problem if the aim is to develop a model of *A* for PD conditions. However, equifinality can become a problem when trying to make predictions outside the conditions for which the model was constrained. Nevertheless, the dependence of *A* on the model parameters will vary substantially across the globe and throughout the year. The question is whether there are enough different environments providing enough relationships between *A* and the parameters that we can eventually overcome equifinality.

The above considerations lead to the following questions, which we address in this paper.

How much does the uncertainty in modeled forcing fall when we constrain the aerosol model using measurements of known uncertainty? Can we quantify how a constraint on aerosol uncertainty translates to a constraint on forcing uncertainty and identify the best observations to constrain forcing uncertainty? Can we identify the global environments in which different parameter sensitivities are important for aerosol uncertainty to ensure our observations sample these different environments and maximize the possible constraint on forcing?

## Methodology

To study the effects of observational constraints, we use a parametric uncertainty analysis of global aerosol (15) and aerosol−cloud forcing (6) (Fig. 2 *C* and *D*).

We define a parameter to be any scalar value in the global aerosol model that can be perturbed to investigate the uncertainty in different model processes, including scaling or absolute perturbations to nucleation, deposition, emission rates, and emitted particle sizes. Both studies were based on the same perturbed parameter ensemble of 28 parameters of the GLObal Model of Aerosol Processes (GLOMAP) global aerosol model, which was run over 1 y driven by meteorological reanalyses. The uncertainties and associated parameter sensitivities were calculated from sampling validated Bayesian emulators (18, 19) of CCN and forcing. The emulators are fast surrogate models of GLOMAP that define 28-dimensional response surfaces of a monthly mean model output in each grid box in terms of variations in model parameters related to the aerosol component of the global model. The emulators were conditioned on a 168-member perturbed parameter ensemble of GLOMAP with the 28 parameter values sampled in such a way that the emulators capture covariations between parameter effects. To test the reliability of the emulators, they were validated against a further 84 GLOMAP simulations.

Three aerosol properties are used to evaluate model constraint, which we choose based on the wide availability of measurements. The CCN concentration at a defined supersaturation is a measured aerosol quantity of direct relevance to cloud droplet concentrations and hence cloud albedo. We also use the total particle concentration (particles larger than 3 nm diameter, *N*_{3}) and black carbon (BC) mass concentration (6). Although *N*_{3} and BC appear to be less closely related to cloud droplet concentrations and cloud albedo than CCN, the key consideration is which processes control the aerosol property. If two aerosol properties are controlled by shared parameters (such as wet deposition), then constraint of one aerosol property will influence the constraint of the other. Below we show that *N*_{3} measurements can also constrain forcing, but BC less so. Relationships between aerosol and cloud properties are often used to constrain model processes and forcing (e.g., ref. 9). We do not use such observational constraints here because our aim is to understand how uncertainty in the aerosol model itself affects forcing uncertainty. Most of the uncertainty in aerosol−cloud relations will be related to cloud physics processes, which were not part of the experimental design of the ensembles we use (6, 15). These processes would therefore increase the forcing uncertainty range reported in Carslaw et al. (6) and would need to be constrained using aerosol−cloud process-based metrics.

The parametric uncertainty of annual global mean aerosol−cloud forcing (figure 1 in ref. 6) has a 95% credible interval of −0.7 W⋅m^{−2} to −1.6 W⋅m^{−2}. Our uncertainty estimate is of comparable size to the multimodel ensemble range of −0.06 W⋅m^{−2} to −1.33 W⋅m^{−2} estimated by the Intergovernmental Panel on Climate Change (IPCC) (1). The forcing in each grid box is assumed to be determined solely by changes in cloud albedo with the global distribution of other cloud properties (coverage, depth, etc.) the same in each model run—i.e., we explore the first indirect or cloud brightening effect rather than the effective radiative forcing that allows for cloud adjustment to changes in aerosols (1). Therefore, the uncertainty in the forcing is determined by uncertainty in the changes in cloud droplet concentration and hence only by the 28 perturbations applied to the aerosol component of the model. We can therefore determine precisely how aerosol measurements with well-defined uncertainty feed through to forcing. In reality, rapid adjustments to cloud properties in response to changes in aerosol can occur (1), which will introduce further uncertainties in forcing that we do not account for here. Additional observation types, such as cloud liquid water path, will be required to reduce the further uncertainties introduced by rapid adjustments, so we are considering only a subset of all sources of forcing uncertainty, albeit with a sufficiently large uncertainty range.

To constrain the aerosol model, we first generate a very large ensemble of model outputs by sampling densely from the emulators across the full uncertainty space of the 28 parameters. This generates a probability distribution of aerosol properties and forcing in each grid box (or an average over any region), shown schematically in Fig. 1. We then reduce the sample by discarding ensemble members that predict aerosol properties that are deemed implausible compared with the measurements. By reducing the range of allowable aerosol properties, the set of uncertain parameter values is also reduced. Approximately 3 million samples were drawn from the emulators, using a Monte Carlo approach. This is a sufficiently large number that, when the 28-dimensional parameter space is reduced, there remain enough points to perform statistical analysis.

This approach, often called history matching (or precalibration), has been used effectively in other fields (20⇓–22). When dealing with a complex multidimensional uncertainty space, history matching is an effective way to constrain a model within observed limits, which is impossible by tuning individual parameters. Similarly, generalized likelihood uncertainty estimation (23) constrains the model by identifying behavioral and nonbehavioral parameter sets (or models), with the addition of assigning probability to the remaining uncertain parameter sets based on its closeness to observations.

## Results and Discussion

### Defining Optimum Measurement Locations.

Our aim is to constrain the aerosol component of the global model in every grid cell. However, we show below that an analysis of the uncertainties in the model allows this to be achieved by making a small number of measurements in locations that represent the dominant parameter sensitivities.

Fig. 2 shows the effect of constraining PD CCN concentrations using synthetic measurements in just two locations—over the Southern Ocean (Fig. 2*E*) and over Central Europe (Fig. 2*G*). The observational constraint is actually an idealized CCN measurement (with an assumed very small uncertainty of ±0.5%) taken from a model simulation with all parameters set to the expert-defined median value. This approach guarantees that a reduced part of parameter space can be found by history matching (see *Methodology*).

As hypothesized, each measurement constrains the model far beyond the location of the measurement. Measurements in the Southern Hemisphere marine region, for example, lead to a reduction in CCN uncertainty over most marine regions, showing that, in this model, the processes and emissions that account for CCN uncertainty are fairly common to all marine environments. Likewise, a CCN measurement over Central Europe (Fig. 2*G*) leads to very large reductions in uncertainty over similarly polluted regions of North America, Europe, and Asia, and extending over much of the North Atlantic. There is an assumption in our approach that the parameter uncertainties can be applied uniformly across the globe (*SI Methodology*). Therefore, for example, if CCN measurements over Europe allow a part of parameter space to be eliminated that has high deposition rates, then this part of parameter space will be eliminated globally. Whether local constraints actually work globally in reality could easily be confirmed by using multiple real measurements. As a result, our approach is likely to overestimate the feasible uncertainty reduction for both CCN and forcing.

The spatial patterns of uncertainty reduction in Fig. 2 *E* and *G* suggest that uncertainty could be reduced globally by making a small number of measurements in regions that are representative of the model’s sources of uncertainty. We defined 11 measurement locations that are representative of larger model environments using *k*-means cluster analysis (24) to group locations in which similar parameter combinations account for the CCN uncertainty (Fig. S1). The two measurement locations used in Fig. 2 were placed at points within their clusters that are most representative of the cluster uncertainties. Although we have sampled parameter values across 28-dimensional space, typically about 90% of the CCN and forcing variance in any one location is accounted for by less than 10 parameters (Fig. S2). Examination of the average parameter contribution to uncertainty in each cluster (Fig. S2) shows that physical meaning can be applied to the cluster environments despite being defined through purely statistical methodology.

As we show in *Constraint of CCN and Forcing*, further measurements within the same cluster have a progressively weaker effect on model constraint, because the same parameters are repeatedly constrained. Based on this result, we propose that, for the purposes of model uncertainty reduction, it is not necessary to make measurements with global coverage. Rather, the key objective should be to make measurements in regions that are representative of the model uncertainty to be constrained.

### Constraint of CCN and Forcing.

For a single measurement made at a geographical location that is most representative of a model uncertainty cluster, the CCN variance is reduced in individual grid boxes of the model by between 0% and greater than 90% (compare the uncertainty reduction maps in Fig. 2 *E* and *G* with the cluster maps in Fig. S1). The constraint becomes weaker toward the geographical margins of the cluster where the sources of uncertainty differ most from where the measurement was made. Uncertainty is reduced elsewhere (outside the cluster geographical margins) because most clusters share at least some common parameter relationships that are important for CCN uncertainty.

Fig. 2 shows that there is much less constraint on the forcing uncertainty than on the CCN uncertainty. Also, the regions of forcing constraint are different than those for CCN, even accounting for the spatial pattern of cloudiness. For example, a measurement in the Southern Hemisphere marine region reduces the forcing uncertainty over parts of the continental Northern Hemisphere (Fig. 2*F*) because of shared but spatially different combinations of parameters that control CCN and forcing variance.

Fig. 3 shows the probability distribution of the prior and constrained CCN and forcing in the North Atlantic as a result of making 10 CCN measurements throughout the northern hemispheric marine region. The measurements have an assumed uncertainty of 30%, leading to constraint of the CCN range as expected. However, the forcing is hardly constrained at all, with almost the full range between 0 W⋅m^{−2} and −0.9 W⋅m^{−2} being plausible for a model with a much narrower plausible range of PD CCN. This implies that eliminating model variants that predict low (∼0–200 cm^{−3}) or high (∼400–800 cm^{−3}) CCN concentrations in the PD North Atlantic has little effect on the range of forcings predicted by the model.

Fig. 4 summarizes the constraint on CCN and forcing in terms of the quantity

Fig. 4 is an ideal way to directly compare the magnitudes of CCN and forcing uncertainty reduction, but it might present a misleading picture of how much the forcing is constrained if the global mean values are dominated by large relative constraints of small absolute forcings. We tested this by varying the threshold value of forcing above which each grid box is included in the calculation of the global mean relative constraint, which shows that the threshold of 1 W⋅m^{−2} used in Fig. 4 provides a useful measure of mean forcing uncertainty reduction representative of regions where forcing is important (Fig. S3). Other choices of the threshold would not change our conclusions.

Multiple measurements at 10 random locations within a single cluster provide further constraint of CCN (Fig. 4, small open symbols), but the additional constraint from successive measurements decreases because similar parameters are repeatedly constrained and the unique information content of further observations is reduced. This result shows that the value of multiple measurements lies more in determining a representative aerosol state than in obtaining additional unique information for uncertainty reduction. For these multiple measurements, we assumed a CCN measurement uncertainty of 30% to account for the likely real CCN variability within the clusters, although the choice of the assumed CCN uncertainty is not important because we are comparing CCN and forcing relative constraints. The CCN was also constrained successively by measurements representative of each of the 11 clusters assuming thresholds of 40%, 50%, and 90% on CCN variability. These globally distributed measurements constrain CCN more than multiple measurements from within a single cluster. This occurs because the sensitive parameters are different in each cluster, so different regions of the 28-dimensional uncertainty space can be constrained.

The global mean CCN relative constraint from single and multiple measurements ranges from about 5% to 85% reduction in variance relative to the prior range, whereas the forcing uncertainty constraint is in the range 3–40%. The ratios of forcing constraint to CCN constraint (dashed lines in Fig. 4) lie mostly in the range 0.125–0.75, with a mean of about 0.25—that is, forcing is constrained on average only about one-quarter as much as CCN, even though the same set of parameters are being constrained and there are no additional sources of uncertainty unique to forcing in this study.

### Causes of Weak Forcing Constraint.

By analyzing the dependence of CCN and forcing on the parameters we can explore the reasons for weak forcing constraint. CCN and forcing uncertainty depends on the same 28 parameters, so the region of plausible model performance defined by the CCN measurements will be a complex 28-dimensional parameter space. We therefore illustrate the problem in just two dimensions in Fig. 5 (a schematic is shown in Fig. S4). The model results in Fig. 5 come from the representative grid box in the anthropogenic source region (Fig. S1). These diagrams were generated by densely sampling from the emulator while keeping 26 of the 28 parameters at their median settings.

Fig. 5 *A* and *B* shows a case in which uncertainty in forcing is caused by parameters that PD CCN concentrations are insensitive to. This problem can occur because the forcing is sensitive to parameters that determine the PI baseline aerosol state (6), but the effects of these parameters on PD CCN can be masked by air pollution (12, 16). In Fig. 5*A*, a CCN measurement of 600–700 cm^{−3} strongly constrains the Aitken mode width but leaves the dimethyl sulfide (DMS) flux essentially unconstrained because CCN is not strongly sensitive to DMS in the anthropogenic source region in the model. However, the forcing here depends strongly on DMS flux (6) (Fig. 5*B*), which leaves the modeled forcing unconstrained in the range −0.72 W⋅m^{−2} to −1.0 W⋅m^{−2}.

Fig. 5 *C* and *D* shows a case in which two parameters contribute similarly to the uncertainty in forcing and CCN but the dependencies are very different. In this case, both parameters would contribute approximately equally to the variance in CCN and forcing, which would suggest that uncertainty in both would fall if CCN concentrations were constrained by measurements. In this case, a CCN measurement would tightly constrain the relationship between the two parameters, but neither of the individual parameters is well constrained, thereby leaving the prior forcing range essentially unconstrained. To further illustrate this problem, Fig. S5 shows the contributions of parameters to uncertainty in CCN and forcing in the Northern Hemisphere marine location shown in Figs. 3 and 6 and Fig. S1. The sources of uncertainty in CCN and forcing are similar. However, our results show that 10 CCN measurements with a 30% uncertainty across the Northern Hemisphere marine uncertainty cluster reduce the CCN variance in a North Atlantic grid box by ∼75% as expected, but the forcing uncertainty remains very close to the unconstrained range of 0 W⋅m^{−2} to −0.9 W⋅m^{−2}. Therefore, even when the sources of uncertainty in CCN and forcing are very similar, the reductions in the magnitudes of the uncertainties are very different.

The examples in Fig. 5 show fundamentally why a tightly constrained model of PD aerosol concentrations does not necessarily imply that the modeled forcing will be any better constrained as a result, even when the same set of uncertainty sources is considered. The key problem is equifinality (17): There are multiple parameter combinations (or model variants) that produce an equally good (or equifinal) model of CCN, but these models predict a wide range of forcings. In the case shown in Fig. 5, the dark shaded region shows that CCN is well constrained, but the forcing is essentially unconstrained. Although we have shown only two dimensions, the problem extends to all of the dimensions of the important parameters, which is typically about 10 at any one location. Therefore, when aerosol measurements are used to constrain the aerosol state, a commensurate constraint on forcing will be applied only when the dependence of the aerosol state on the parameters is the same as the dependence of forcing on the same parameters. In Fig. 5, this means that the CCN and forcing contour lines would need to be parallel to each other, and the more they deviate from parallel, the less the forcing will be constrained by making CCN measurements. According to our results, such similarity of parameter sensitivities appears to be uncommon, even when sampling across all global aerosol environments; otherwise, the CCN measurements would have constrained more of the forcing uncertainty (Fig. 4). Any mismatch of the parameter dependencies results in unconstrained forcing parameter space. There are many examples in the atmosphere where such compensation of parameter effects occurs, such as erroneously high emissions being compensated by high aerosol removal rates. There are also likely to be many such cancellation effects in other components of climate models.

### Multiple Measurement Constraints.

It might be argued that we cannot expect the observational constraint of one aerosol variable to fully constrain forcing. Nevertheless, as we argued in the Introduction, because uncertainties in CCN and forcing are caused by exactly the same 28 parameters, we might expect global measurements to eventually reduce forcing uncertainty. Equifinality means that this does not occur. We now explore the effect of additional constraints on the same 28 parameters.

The constraint provided by other state variable measurements will depend on how the state variable depends on the uncertain parameters. Surprisingly, measurements of the total particle concentration (*N*_{3}) with an uncertainty threshold of 90% constrain forcing more than CCN measurements (Fig. 4) in July, even though most of these particles do not form cloud droplets. This result occurs because, in many locations, *N*_{3} depends on more of the natural parameters that are important for the forcing uncertainty. Measurements of *N*_{3} also provide additional constraint for CCN beyond that provided by making just CCN measurements, even when the measurements are made at the same location. The additional constraint occurs because the *N*_{3} parameter relationships are similar to those for CCN at locations where we have not previously taken CCN measurements.

BC mass concentration measurements with an uncertainty threshold of 90% barely constrain forcing at all. Although BC as a state variable is essentially unrelated to cloud drop concentrations and forcing, its uncertainty is controlled by many of the same uncertain parameters. For example, carbonaceous particle emission fluxes and dry deposition rates both affect CCN and BC, so constraint of BC could help to constrain forcing. The very weak constraint has not been investigated further but is likely caused by a high degree of equifinality.

Multiple measured variables can be used together. However, measurements of CCN, *N*_{3} and BC in all 11 clusters constrain forcing only about one-third to one-half as much as they constrain CCN. Again, the ability of these measurements to further constrain forcing is limited by equifinality, so the posterior uncertainty will not converge to zero unless the parameter relationships are the same as for forcing.

## Conclusions

Although it is well recognized that there are compensating processes in environmental models (21, 25, 26), the effect on our ability to constrain unobservable model quantities through state variable measurements has not previously been quantified. We posed the question: If we could make aerosol measurements with known uncertainty in every grid box of the model, how well would we be able to constrain the uncertainty in forcing caused by these aerosol uncertainties? We first showed that it is possible to define regions of the world according to the “information content” of measurements for CCN constraint. This could be valuable information for defining observational strategies aimed at reducing model uncertainty—a few representative measurements are as effective as widely distributed measurements. We then used CCN measurements from these representative locations to constrain the 28-dimensional parametric uncertainty space and calculated the resulting constraint on modeled aerosol−cloud forcing. We found that, although globally CCN could be constrained well by measurements made in representative environments, the 28-dimensional parametric uncertainty space was not constrained in such a way that resulted in a similarly constrained forcing uncertainty.

The reasons for the limited forcing constraint relate back to our original hypotheses. We first hypothesized that the effectiveness of aerosol measurements to constrain uncertainty in forcing would be reduced if the two quantities depended on different processes. However, we eliminated this possibility by ensuring that the same 28 parameters in a single global aerosol model were the only sources of uncertainty in both aerosol and forcing. We further hypothesized that if aerosols and forcing had different sensitivities to the parameters, then this would reduce the effectiveness of aerosol measurements for constraining forcing. Nevertheless, we hoped to find enough environments globally to constrain most of the 28 uncertain dimensions and therefore eventually constrain forcing uncertainty. In fact, we did identify regions in which the uncertainties in aerosol and forcing were controlled by similar parameters, but we found that, even in these environments, the constraint on the forcing was limited. The cause of the weak constraint is related to the fact that uncertain parameters have compensating effects in the model, which leads to many ways in which combinations of different parameters lead to plausible aerosol models compared with measurements. This is termed equifinality. Our results show that equifinality in the global aerosol system significantly limits our ability to constrain the uncertainty in forcing caused by uncertainty in the aerosol model, even when using apparently closely related aerosol measurements that have the same sources of uncertainty. The result is that these equifinal aerosol models predict a narrow range of aerosols but a wide range of forcings.

There are several important implications of equifinality in the aerosol system. First, because forcing must be predicted and cannot be measured directly, and because the impact of equifinality can only be estimated by dense sampling of model parameter space, it is likely to be a hidden source of uncertainty in most model studies. A model that is tuned to match an observed aerosol state will predict a particular forcing, but there will exist a range of equally plausible forcings that will not be detected unless the model’s full parameter space is sampled. Therefore, obtaining an accurate model for aerosol state variables is a necessary but not sufficient condition for obtaining an accurate model of radiative forcing, even if we account only for uncertainties in the aerosol component of the model. Second, several models, each achieving very good agreement with aerosol measurements, could calculate very different forcings just as we have found in a single model. Current models do not achieve universal good agreement with aerosol measurements (14), which suggests we are a long way from reducing the persistent diversity in aerosol−cloud forcing (1).

The results shown here also have implications for the use of emergent constraints to constrain climate models if parametric uncertainty is not accounted for. Emergent constraints are used when a relationship between two model outputs emerges from collections of climate model simulations, which allows the observational constraint of one output to be used to constrain the other, of relevance to long-term climate. In our model, a relationship between CCN and cloud albedo forcing emerges from multiple model simulations (Fig. 6), as expected. These results are based on a sample from the forcing and CCN emulators of 3 million parameter combinations (based on the same data as Fig. 3). Although a clear positive correlation of forcing with CCN emerges from these model runs, there is very large spread due to the 28-dimensional parametric uncertainty. Clearly, even high-precision CCN measurements will have a relatively small effect on forcing uncertainty when the emergent CCN−forcing relationship is so uncertain. Proper account needs to be taken of equifinality before robust conclusions can be drawn about the usefulness of emergent behavior among multiple models.

The difficulty of aerosol model constraint is related to model complexity. In earlier climate models, only the aerosol mass concentrations were simulated, and cloud drop concentrations were predicted in terms of empirical relationships (27). Although observational constraint of such models would still suffer from equifinality (e.g., compensating emission and removal rates), there were fewer processes to constrain. In an aerosol microphysics model, there are tens of important parameters (6, 15) and multidimensional equifinality. Therefore, although model fidelity may have improved, the uncertainty may now be more difficult to constrain.

Our results are based on the assumption that the parametric uncertainties apply globally. In some places, this will lead to an overestimate of the uncertainty range given to some parameters locally. We have also applied the history matching technique globally so that a constraint on the parameter set from CCN anywhere applies globally, which is likely to lead to an overestimate of the constraint from CCN measurements. We believe that, together with the fact that we have applied tight constraints, we are still likely to be overestimating the constraint on CCN and therefore forcing.

Additional steps could be taken to reduce uncertainty further. First, measurements could be used to constrain relationships between modeled outputs (such as cloud albedo versus aerosol load), which has been shown to be a valuable approach (9). Given what we have learned about compensating model effects (equifinality), we suggest that the effectiveness of aerosol−cloud relations for constraining forcing should be assessed in a way that accounts for model uncertainty. Second, given the problem of compensating processes, it may be more effective to design atmospheric experiments in ways that enable single processes to be isolated. To do this, experiments will need to be guided by models that can point to where the effects of single processes can be isolated, for example, by examining the results of comprehensive sensitivity analyses (e.g., ref. 15). Laboratory measurements can quantify individual processes, but we need to ensure that we understand the uncertainties introduced by parameterizing the processes in global models. Third, given the uncertainty due to the PI aerosol state, we could consider quantifying forcings over recent decades where we have observational constraints. Over more recent periods, we might also expect changes in forcing to scale with aerosol emissions and concentrations (28), thereby reducing some of the uncertainty compensation issues highlighted in this paper. Fourth, and most importantly, we need to treat uncertainty as one of the main scientific problems to be understood and solved. Neglect of important issues like equifinality may render any apparent improvements in model performance illusory.

There are many reasons why the large PI to PD aerosol−cloud forcing uncertainty has persisted through all IPCC reports. Observational constraints on models are essential for reducing the uncertainty. Our results suggest that we need to carefully evaluate how the available observations constrain models, and how this feeds through to forcing uncertainty, which cannot be directly constrained by measurements. Here we have tackled only the uncertainty in the aerosol model. Other cloud properties of importance to Earth’s radiative balance not included here, such as cloud cover and thickness, also respond to changes in aerosols (7) through radiative and dynamical couplings. Some of the uncertainties in these cloud properties are likely to be correlated with the uncertainty in albedo studied here, but there will be additional sources of uncertainty related to cloud and atmospheric physics responses. By neglecting these interactions, we have presented a best-case scenario of how improvements in aerosol models will result in improved simulations of forcing. We can expect equifinality to be a significant issue in models of the fully coupled aerosol−cloud−atmosphere system just as it is in the aerosol−albedo relation. However, the relationship between the uncertainty in the aerosol model and the uncertainty in the coupled climate model response will be even more challenging to demonstrate.

## SI Methodology

### Global Aerosol Model.

GLOMAP (29, 30) is an aerosol microphysics and chemistry model run within the TOMCAT global 3-D offline chemical transport model (31, 32) at a horizontal resolution of 2.8° × 2.8° with 31 vertical levels between the surface and 10 hPa. GLOMAP-mode simulates the evolution of the particle size distribution and size-resolved chemical composition of aerosol particles within seven lognormal modes defined by size and solubility. The aerosol chemical components are sulfate, sea salt, BC, particulate organic matter (POM), and dust. Secondary organic aerosol (SOA) is produced from the first-stage oxidation products of biogenic monoterpenes and anthropogenic volatile organic compounds (VOCs), and is assumed to condense with zero equilibrium vapor pressure. SOA is combined with the POM component after condensation on the aerosol particles. The aerosol and chemical species are transported by 3-D meteorological fields read in from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalyses for 2008. Low-level stratiform clouds are read in separately from International Satellite Cloud Climatology Project (ISCCP)-D2 data (33). GLOMAP-mode has been widely evaluated against global measurements of particle number concentrations (34, 35), CCN (36, 37), aerosol chemical components (38⇓–40), and cloud droplets (41).

### Emissions.

The emission fluxes were perturbed by scaling baseline values of PD (2008) and PI (1750) emissions from a range of inventories specified in Table S1.

### Radiative Forcing.

The aerosol−cloud forcing was calculated as the difference of top-of-the-atmosphere net shortwave and longwave (SW+LW) radiative fluxes between the PD and the PI. Year 2008 analyzed meteorological fields were used for both the PI and PD simulations; thus the PI and PD simulations are identical in every respect except for the anthropogenic emissions. The modeled aerosol properties were used to calculate the cloud drop number concentrations (CDNCs) in the PI and PD, from which the forcing was calculated for each 2-D grid point of the model.

An activation parameterization (42) was used to calculate CDNCs from the modeled monthly mean aerosol size distribution and composition in each grid cell for each perturbed parameter run. These calculations account for the coupling between the uncertain aerosol particle size distribution (and composition) and the number of particles activated into cloud drops. An updraft speed of 0.15 m⋅s^{−1} was used over marine regions and 0.3 m⋅s^{−1} was used over land, which is typical of cloud-base speeds in low-level stratus and stratocumulus clouds (6).

To calculate the top-of-the-atmosphere net radiative fluxes, we used the off-line version of the Edwards and Slingo radiative transfer model (43) with six bands in the SW and nine bands in the LW, with a delta-Eddington two-stream scattering solver at all wavelengths. We used a monthly mean climatology for water vapor, temperature, and ozone based on ECMWF reanalysis data, together with surface albedo and cloud optical depth fields from ISCCP-D2 (33, 44) for the year 2000.

The cloud albedo forcing between the PI and PD experiments was quantified by modifying the cloud drop effective radius *r*_{e} for low-level and midlevel water clouds up to 600 hPa,

### Perturbed Parameters.

The ensemble of model runs was designed for emulation. The ensemble consists of 168 combinations of parameter settings from 28 parameters (Table S2) representing aerosol and precursor gas emissions, microphysical processes, and aerosol model structures. The uncertainty range for each parameter was chosen based on expert elicitation, defining a maximin Latin Hypercube sampling of the parameter space (15). The parameter perturbations were applied in a globally uniform way. For example, the uncertainty in grid point aerosol emissions does not vary regionally. In the case of scaled parameter uncertainties (Table S2), the scaling means that there will be regional variations in the absolute value of the parameter. For example, dry deposition rates are calculated at each location and time step, but the scaling of these local values is globally uniform. The assumption of globally applicable uncertainties remains to be tested by real observations.

### Model Emulation.

Gaussian process emulation (15, 18, 19) was used to estimate model predictions at untried points throughout the space bounded by the upper and lower limits of the uncertain model parameters. An emulator was built for the simulated monthly mean CCN at 915 hPa and the monthly mean PI to PD top-of-the-atmosphere cloud albedo forcing for every 2-D grid point. The emulator was validated in each case using 84 additional model runs to ensure that the emulator uncertainty around its mean was low compared with the parametric uncertainty (15). The unconstrained (or prior) parametric uncertainty was calculated as the variance of 140,000 parameter sets sampled from within the defined uncertainty space from the grid box emulators before observational constraints were applied.

### Cluster Analysis of CCN Uncertainty Sources.

K-means clustering (24) was applied to the percentage that each parameter contributes to the monthly mean CCN variance throughout the year 2008 (the main effects in ref. 15). K-means clustering partitions a dataset into *k* clusters of varying size, with the cluster center defined as the mean of the main effect (*ME*) values of all grid boxes in the cluster. The clusters are chosen such that the within-cluster value of *i* grid boxes is minimized across all of the data. The dataset is the grid box mean effect of monthly mean CCN from each of the 28 uncertain parameters through the year 2008, where the number of grid boxes is 8,192 and the number of main effects in each grid box is 28 × 12 = 336. Eleven clusters were chosen by inspecting the difference in the between-cluster sum of squares with varying cluster numbers, and whether repeated runs of the algorithm resulted in the same clusters. The 11 grid boxes chosen to provide CCN measurement constraint on the forcing come from the 11 clusters, to provide good coverage of the important uncertainties globally. Fig. S1 shows a map of the grid box assignment to each of the 11 clusters. Fig. S2 shows the mean of the *ME* values in every grid box assigned to that cluster. It can be seen that, despite the sensitivity analysis and cluster analysis being purely statistical methods, the cluster means can be interpreted physically in terms of the processes that you would expect to be important in the different regions covered by each cluster—this has helped us to name each cluster accordingly. For example, the boreal fire region is dominated by the biomass burning parameters in the summer seasons where it is known that fires occur in the region, and the anthropogenic source region is dominated by parameters related to anthropogenic emissions. Despite the general interpretability of the cluster centers, the occurrence of some dominating features has been a surprise. For a full discussion on the meaning of the CCN sensitivity to the uncertain parameters in GLOMAP, see ref. 15.

### Idealized CCN Observations.

The idealized observations are taken from a GLOMAP simulation where all parameters were set to their median values defined by expert elicitation. The median value for each parameter can be found in Table S2. The median value of all parameters was applied globally in the model, although, when the parameter is applied as a scaling, the absolute value of the parameters will vary spatially and temporally during the model runtime.

### History Matching and Observation Uncertainty.

The concept of history matching is used to constrain parameter uncertainties. History matching is a technique originally developed in the field of oil reservoir modeling (20). The idea of history matching is that any run of a model must be consistent with history, including what is known about its uncertainties; otherwise, it is considered an implausible model run. No probabilities are assigned to the plausible model runs. The parameter set associated with an implausible model run is also considered implausible; thus, history matching is a way of constraining uncertain model parameter space rather than just excluding single members of a sparse ensemble. Removing these implausible parameter sets from the multidimensional parameter uncertainty space leaves a new observationally constrained parameter uncertainty space, with equal probability attached to the plausible parameter sets. This technique is sometimes called precalibration (21), although the calibration step does not necessarily follow, where calibration aims to find the best fitting parameter set, given some observations.

In this paper, we have simplified the implausibility measure and used an idealized observation with an accurate emulator so that only model parameter uncertainties are considered. The implausibility measure in this paper is*y*_{em} is the emulated model output given some parameter set in the 28-dimensional uncertainty space and *y*_{obs} is the model output from the median GLOMAP run. When multiple grid boxes are used in the constraint, the constraint is applied sequentially; for example, the plausible space is found by comparing the CCN to the initial parameter space in a single grid box, then the CCN is simulated via the emulator sampling only the plausible parameter sets to compare with the idealized CCN observation in the next grid box, until all grid box constraints have been applied. The threshold value is dependent on the assumed measurement uncertainty, which needs to account for measurement precision as well as the representativeness error associated with comparing a measurement with the model (for example, whether a measurement can be made under the same meteorological conditions as used in the model). For single grid box measurement constraint, we use an arbitrarily small uncertainty of 0.5% on the measurement to demonstrate maximum feasible uncertainty reduction. When using multiple measurements, we present results for thresholds between 30% and 90%, which range between a realistic estimate of the approximate precision of a CCN measurement and the typically observed scatter in measurements that probably represents, partly, the measurement representativeness (37). The varying thresholds give some indication of the potential constraint from observations with varying levels of uncertainty.

### Calculating the Constrained CCN and Forcing Variances and the Relative Constraint.

The final plausible parameter set is used to sample from the emulator of monthly mean CCN and forcing in every 2-D grid box and the resulting CCN or forcing variance calculated. The variance calculated using the plausible parameter sample is referred to as the constrained variance. The variance is shown to be a reasonable measure of spread in the model outputs given the unconstrained and the constrained parameter set, because the corresponding emulated mean and median are close. The relative constraint in every grid box, for both CCN and forcing, is calculated as 1 − (*var _{constrained}*/

*var*). The global relative constraint for both CCN and forcing is calculated as 1 − Σ

_{unconstrained}*w*

_{i}

^{−1}(

*var*/

_{i,constrained}*var*), where the weights (

_{i,unconstrained}*w*) account for the grid box area (which varies with latitude) and

*i*represents each individual grid box. For the forcing global relative constraint, all weights

*w*

_{i}are 0 when the emulated mean forcing is <1 W⋅m

^{−2}. The effect of setting a threshold of 1 W⋅m

^{−2}is shown in Fig. S3, where the actual threshold is shown to be unimportant as long as grid boxes with zero forcing are removed and we are not left with only a few grid boxes of large forcing.

## Acknowledgments

We thank Kirsty Pringle, Graham Mann, and Alex Rap, who provided the original model runs for the Natural Environment Research Council (NERC) Grant AEROS (NE/G006172/1), and all the experts involved in the elicitation of the uncertain parameters included in this study. This work is funded under the NERC Grant GASSP (NE/J024252/1). K.S.C. is a Royal Society Wolfson Merit Award holder. We made use of the N8 HPC facility funded from the N8 consortium and an Engineering and Physical Sciences Research Council Grant (EP/K000225/1) and the JASMIN facility (www.jasmin.ac.uk/) via Centre for Environmental Data Analysis funded by NERC and the UK Space Agency and delivered by the Science and Technology Facilities Council.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: k.s.carslaw{at}leeds.ac.uk.

Author contributions: L.A.L. and K.S.C. designed research; L.A.L. and C.L.R. performed research; L.A.L. and K.S.C. analyzed data; and L.A.L. and K.S.C. wrote the paper.

This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, ”Improving Our Fundamental Understanding of the Role of Aerosol–Cloud Interactions in the Climate System,“ held June 23−24, 2015, at the Arnold and Mabel Beckman Center of the National Academies of Sciences and Engineering in Irvine, CA. The complete program and video recordings of most presentations are available on the NAS website at www.nasonline.org/Aerosol_Cloud_Interactions.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1507050113/-/DCSupplemental.

## References

- ↵
- Boucher O, et al.

- ↵
- ↵
- ↵
- Myhre G,
- Myhre CEL,
- Samset BH,
- Storelvmo T

- ↵
- Rosenfeld D,
- Sherwood S,
- Wood R,
- Donner L

- ↵
- ↵
- Koren I,
- Dagan G,
- Altaratz O

- ↵
- ↵
- Quaas J,
- Boucher O,
- Bellouin N,
- Kinne S

- ↵
- Ma X,
- Yu F,
- Quaas J

- ↵
- ↵
- Hamilton DS, et al.

- ↵
- Penner JE,
- Xu L,
- Wang M

- ↵
- ↵
- ↵
- ↵
- ↵
- O’Hagan A

- ↵
- ↵
- Craig PS,
- Goldstein M,
- Seheult AH,
- Smith JA

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Regayre LA, et al.

- ↵
- ↵
- ↵
- Stockwell D,
- Chipperfield M

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Schmidt A, et al.

- ↵
- ↵
- ↵
- ↵
- Cofala J,
- Amann M,
- Klimont Z,
- Schopp W

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Earth, Atmospheric, and Planetary Sciences