Communicating scientific uncertainty

Edited by Dietram A. Scheufele, University of Wisconsin–Madison, Madison, WI, and accepted by the Editorial Board February 24, 2014 (received for review November 5, 2013)
September 15, 2014
111 (supplement_4) 13664-13671

Abstract

All science has uncertainty. Unless that uncertainty is communicated effectively, decision makers may put too much or too little faith in it. The information that needs to be communicated depends on the decisions that people face. Are they (i) looking for a signal (e.g., whether to evacuate before a hurricane), (ii) choosing among fixed options (e.g., which medical treatment is best), or (iii) learning to create options (e.g., how to regulate nanotechnology)? We examine these three classes of decisions in terms of how to characterize, assess, and convey the uncertainties relevant to each. We then offer a protocol for summarizing the many possible sources of uncertainty in standard terms, designed to impose a minimal burden on scientists, while gradually educating those whose decisions depend on their work. Its goals are better decisions, better science, and better support for science.
Decision making involves uncertainty. Some of that uncertainty concerns facts: What will happen if we make a choice? Some of that uncertainty concerns values: What do we want when we cannot have everything?
Scientific research can reduce both kinds of uncertainty. Research directed at clarifying facts can provide imperfect answers to questions such as how well an oil pipeline will be maintained and monitored, how long the recovery period will be after bariatric surgery, and how much protection bicycle helmets afford. Research directed at clarifying values can provide imperfect answers to questions such as which charitable contributions give donors the greatest satisfaction, what happens when people commit to long-term goals, how much pleasure people can expect from windfalls (e.g., lottery winnings) and how much pain from misfortunes (e.g., severe accidents) (1, 2).
Taking full advantage of scientific research requires knowing how much uncertainty surrounds it. Decision makers who place too much confidence in science can face unexpected problems, not realizing how wary they should have been. Decision makers who place too little confidence in science can miss opportunities, while wasting time and resources gathering information with no practical value. As a result, conveying uncertainty is essential to science communication.
Much of scientists’ own discourse is about uncertainty. Journals require authors to disclose the assumptions and ambiguities underlying their work. Scientific debates focus on uncertainties requiring attention. Peer review scrutinizes the uncertainty in individual studies, protecting science from unwarranted faith in flawed results. A healthy scientific community rewards members who raise problems before their critics and penalizes those who overstate results.
By revealing uncertainties, scientific discourse is an essential resource for communications about them. However, it typically provides both more and less detail than decision makers need (3). At one extreme, scientists’ discourse can involve minutiae that overwhelm even experts. At the other, it can omit vital uncertainties that are common knowledge within a field, hence go without saying, and uncertainties that a field routinely ignores, either because they appear unimportant or because its scientists have nothing to say about them.
As a result, communicating scientific uncertainty requires both simplifying and complicating normal scientific discourse. On the one hand, the uncertainties that it addresses must be reduced to their decision-relevant elements. On the other hand, the uncertainties that scientists fail to mention must be uncovered. Which uncertainties to subtract and add depends on the decisions that the communications are meant to serve. Some decisions (e.g., whether to reexamine disaster plans) may be robust to any plausible uncertainty (e.g., how storm surge estimates are revised). Others (e.g., whether to buy monetized securities) may be highly sensitive to details (e.g., the assumptions made about the liquidity of capital markets).
Once communicators know what science is worth knowing, they can study what facts decision makers know already and how best to convey missing knowledge. As with all communication, though, the first task is determining what to say. That will depend on the decision, which might fall into one of three broad categories. (i) Decisions about action thresholds: Is it time to act? (ii) Decisions with fixed options: Which is best? (iii) Decisions about potential options: What is possible?
Communicating uncertainty for each class of decision requires (i) characterizing uncertainty, by identifying the issues most relevant to the choice; (ii) assessing uncertainty, by summarizing that information in a useful form; and (iii) conveying uncertainty, by creating messages that afford decision makers the detail that their choices warrant. After considering the scientific foundations for addressing these tasks, we offer a protocol for eliciting uncertainty from the scientists who know it best, in a standard form designed to serve those who need to learn about it.
From this perspective, communications are effective when they help people identify choices that serve their own, self-defined best interests. In contrast to such nonpersuasive communications, persuasive messages are effective when they convince people to behave in ways that someone else has chosen (e.g., public health officials, marketers, politicians). With persuasive communications, shading or hiding uncertainty might be justified. With nonpersuasive ones, honesty is the only policy.

Decisions About Action Thresholds: Is It Time to Act?

The simplest but sometimes most fateful communications inform decisions to act, triggered by evidence passing a threshold for action. Science may provide that trigger, in communications such as those telling people to head for (or leave) storm shelters, start (or stop) medical treatments, and sell (or buy) securities. However, setting a threshold for action inevitably poses value questions. Which tornado warning level best balances protection and disruption? Which medical guideline embodies the best tradeoff between the uncertain risk and benefits of a treatment? Which portfolio rebalancing policy is best for a retirement plan? As a result, communications must address the uncertainties surrounding both scientists’ facts and decision makers’ values.

Characterizing Uncertainty.

People will judge science’s value by the apparent wisdom of recommendations based on it. Do calls to action seem to lead to saving lives, health, and profits or to needlessly taking shelter, undergoing surgery, or selling valuable securities? Those evaluations should depend on both the uncertainty in the science and the appropriateness of the decision rule translating it into a recommendation. In retrospect, such evaluations can be colored by outcome bias, arising when decisions are judged by their outcomes rather than by their wisdom (4), and by hindsight bias, arising when people exaggerate how well those outcomes could have been foreseen (5).
Even when decision makers resist such biases, they can judge science fairly only if they can see its evidence through the filter of the recommendations. If a call to action proves unnecessary, then the science might have been highly uncertain, making mistakes inevitable, or the science might have been strong but the threshold overly cautious. Conversely, if a needed call to action was missed, it could mean uncertain science or insufficient caution.
With a single event, decision makers cannot disentangle the evidence and the decision rule on their own but must be told how the advice was reached. That explanation might be, “We recommend this risky, painful surgery because we are confident that it increases 5-y survival rates” or, “The evidence for a market collapse is weak, but it is our fiduciary responsibility to issue a ‘sell’ call for retirement accounts.” With multiple events, though, decision makers can infer the quality of the science and the decision rule from the pattern of decisions and outcomes. As formalized by signal detection theory, quality is represented by d′ (how well experts can discriminate among states of the world), whereas decision rules are represented by β (the tradeoffs made among possible outcomes) (6).
Fig. 1 illustrates the kind of research needed to estimate d′ and β (7) from a study examining physicians’ decisions about whether to transfer emergency room patients from regional hospitals to medical centers. As with all empirical studies, it has heterogeneity in both parameters. Experts vary in their discrimination ability and diagnoses vary in their difficulty. Experts vary in their decision rules and cases pose different tradeoffs. Statistical analyses are needed to estimate these parameters. Unless decision makers receive such estimates, they must rely on their own intuitive statistics for inferring the uncertainty of the science and the caution of the decision rule.
Fig. 1.
Signal detection theory in evaluating decisions to transfer ER patients. Emergency physicians provided written recommendations for the next steps in treating patients depicted in detailed profiles drawn from actual records. Higher values for the decisional threshold indicate more cautious decisions (β). Higher values for perceptual sensitivity indicate better discrimination ability (d′). American College of Surgeons–Committee on Trauma (8) guidelines were applied to the profiles to identify the appropriate decision. (Reproduced with permission from ref. 7.)
If decision makers guess incorrectly about d′, then they may place inappropriate confidence in the science, pay inappropriate attention to new information, and hold inappropriate resources in reserve. If they guess incorrectly about β, then they may take undesired risks, follow advice unsuited to their needs, and feel misled by the experts. Thus, although calls to action (or inaction) can demonstrate science’s usefulness, they can also generate distrust when those messages are misinterpreted or misaligned with decision makers’ values. Communications increase the chance of better outcomes when they make the uncertainties underlying recommendations transparent. Doing so requires both analytical research for assessing uncertainty and empirical research for conveying it.

Assessing Uncertainty.

The receiver operating curve is the standard summary of the uncertainty underlying categorical messages (e.g., whether to shelter from a storm or get a medical treatment) (6). It shows the tradeoffs possible with a given ability to discriminate states of the world. The greater that ability is, the more favorable the decision options become. For example, as medical imaging improves, a decision maker who tolerates a 5% chance of missing a cancerous tumor will face less risk of exploratory surgery that finds nothing.
How precise estimates of d′ and β must be depends on the decision. Sometimes decision makers need only a rough idea of how much experts know and what values the decision rule embodies. Sometimes they need greater precision. Estimation is harder when they must infer both parameters, compared with when they know one and must estimate the other. Experts can eliminate uncertainty about values (β) by applying and communicating explicit decision rules, leaving just uncertainty about their knowledge (d′).
Estimating d′ or β means evaluating recommendations in the light of subsequent events. Sometimes, those events can be observed directly, for example, seeing how often floods (or tumors) follow warnings and reassurances. Doing so requires defining the events clearly (e.g., precisely what “tumor” means), lest performance be judged too harshly or leniently (“we did not mean to include benign tumors”). Sometimes, the evidence is indirect, as when diseases are inferred from biomarkers or tornadoes from damage patterns. The weaker the science supporting those inferences, the greater the uncertainty is about what scientists knew (d′) and valued (β). For example, (7) used American College of Surgeons guidelines (8) to define the event “patient requires transfer to a major medical center.” If those guidelines predict medical outcomes, using them simplifies the analysis (compared with evaluating cases individually). If not, then using them may bias the analysis, perhaps even imposing a standard that some physicians reject.
An alternative to observing experts’ beliefs and values is to elicit them (9). Like survey research, expert elicitation must balance decision makers’ desire for precise answers with respondents’ ability to translate their knowledge into the requested terms. Research into how people access their own beliefs can guide that process (10). For example, rather than asking people why they reached a conclusion, it is better to ask them to think aloud as they make the inference, in order to avoid having the answers contaminated by their intuitive theories of mental processes.

Conveying Uncertainty.

Effective communication requires clear, mutually understood terms. One threat arises when scientists use common words in uncommon ways. For example, probability-of-precipitation forecasts can mislead people who do not know that “precipitation” means at least 0.01 inches at the weather station (11). Survey researchers have documented the difficulty of conveying what experts mean by such seemingly simple terms as “room,” “safe sex,” and “unemployed” (12, 13). Clarity is an empirical question, answered by user testing. When scientists’ favored term fails to communicate, a more familiar one may be needed (e.g., using “given up trying to find a job” rather than “out of the labor market”). After Hurricane Sandy, the National Oceanic and Atmospheric Administration concluded that it had confused the public by downgrading the storm when it fell below formal hurricane status, despite being more powerful than many people could imagine (14).
Once the terms are clear, decision makers can consider d′ and β. As mentioned, given multiple observations, decision makers can estimate these parameters, subject to the biases afflicting such intuitive statistics (e.g., flawed memory, illusory correlation, wishful thinking, denial). Given a single observation, though, they must rely on their beliefs about experts’ knowledge and values when asking questions such as “Why is there so much apparent controversy in that field?” and “How much do those experts know, or care, about people like me, when giving advice?” (15, 16).
Serious miscommunication can arise when recipients do not realize that categorical advice always reflects both parameters. Physicians who fail to transfer seriously ill patients to medical centers might be faulted for their decision rule (e.g., wanting to keep the business at their facility) when the problem lies with their judgment (e.g., overlooking critical symptoms). Observers may overestimate the uncertainty of scientists whose professional norms preclude giving simple answers to complex questions (e.g., “Is this storm due to climate change?”). They may place unwarranted faith in pundits who face no sanctions for going far beyond the evidence. Observers familiar with the norms of science might misinterpret scientists who abandon their customary caution for advocacy or style themselves as public intellectuals (17).
As with all communications, the content of science-based messages should depend on what recipients know already. At times knowing the gist of β might allow inferring d′ (18). For example, “FDA approved” (Food and Drug Administration) means more to people who know that the agency’s β for approving a drug means “potentially suitable for some patients,” implying a range of acceptable tradeoffs. At other times, decision makers need more detail. For example, recommendations against routine mammography screening for younger women evoked public furor in 1997 and again in 2009 (19, 20). They might have fared better had they communicated more clearly the d′ for mammography (which has difficulty detecting treatable cancers in younger women) and the parties whose welfare its β considered (patients? physicians? insurers?) (21).

Decisions with Fixed Options: Which Is Best?

Categorical communications recommend choices. Other communications merely seek to inform choices, allowing recipients to apply their values by themselves. Information about scientific uncertainty can serve that end, for example, by telling patients how well new and old drugs are understood; by telling investors how predictable bonds or stocks are, given their use in complex financial products; or by telling climate change advocates what surprises to expect from future research, with campaigns focused on ocean acidification or sea-level rise.

Characterizing Uncertainty.

Sensitivity analysis is the common formalism for assessing how much people need to know about how much is known (22). It asks questions like, “Would any plausible value for expected mean global temperature change affect your preferred energy portfolio?” and “Would any prostate-specific antigen test result affect your prostatectomy decision?” If the answer is no, then that uncertainty is immaterial to the decision, whatever the value of having reduced it that far.
If the answer is yes, then one can elicit experts’ judgments of the uncertainties that matter to the decision. A standard representation of uncertainty is a probability distribution over possible parameter values (23). For example, Morgan and Keith elicited the judgments of 16 climate experts for questions such as globally averaged surface temperatures given a doubling of atmospheric CO2 (24). For some decisions (and decision makers), knowing an extreme fractile will suffice (e.g., “with even a 5% chance of a 3 °C increase, I support an aggressive carbon tax”). For others, such as investors in carbon permits, knowing the full probability distribution may have value.
Whether such probability distributions capture all useful uncertainty has been a topic of lively academic debate (2528). Some decision theorists argue for the usefulness of higher-order uncertainties (e.g., ranges of probability distributions) when probabilities are not well known. Others contend that all beliefs should be reducible to a single distribution. One practical reason for assessing higher-order uncertainties is preparing decision makers for surprises. For example, one of Morgan and Keith's climate experts gave two probability distributions (for temperature with doubling of atmospheric CO2), depending on whether the North Atlantic thermohaline circulation collapses. That expert could not assign a probability to that event, thereby allowing a weighted combination of the two conditional distributions, arguing that it was beyond current scientific knowledge. On other questions, some experts predicted that scientific uncertainty would increase over time, as research revealed unforeseen complications. Decision makers need to know when uncertainty assessments include all factors or just factors that scientists are comfortable considering.
A second practical reason for going beyond a single summary distribution is identifying opportunities to reduce uncertainty. Decision makers can then ask whether the benefits of that additional information outweigh the direct costs of collecting it and the opportunity costs of waiting for it. Those costs will depend on the source of uncertainty. When it arises from sampling variation, more observations (e.g., additional clinical trial patients) will provide greater precision, with readily calculated costs and benefits. When uncertainty arises from poor measurement, calculating the cost of reducing it might be straightforward (e.g., using a better test) or difficult (e.g., developing a better test). When uncertainty arises from unstudied issues, the costs and benefits of new knowledge are matters for expert judgment. Whatever the case, decision makers need to know how certain the science could be, as well as how certain it currently is.

Assessing Uncertainty.

Scientists routinely apply statistical methods to assess the variability in their data. Those calculations add uncertainty to inferences about the data through the assumptions that they make (e.g., normal distributions, independent errors). Additional uncertainty arises from how the data are treated before analysis. For example, political polls may include or delete incomplete surveys or ones evoking negative answers to screening questions (e.g., “Are you following the lieutenant governor’s race closely enough to answer questions about it?”). Methodological research can inform assessments of such uncertainties (29, 30). As elsewhere, their importance depends on the decision. Issues that matter for tight elections may be irrelevant for blowouts.
Statistical methods can also summarize the variability in the estimates produced by integrative models (e.g., simulations of climate, pandemics, or national economies). The assumptions that modelers make when creating and estimating their models add uncertainty to that already found in the data that they use. Scientific communities organized around models sometimes convene consensus-seeking bodies to characterize those uncertainties [e.g., the Intergovernmental Panel on Climate Change (IPCC), the Federal Reserve Bank, the Particle Data Group]. Needless misunderstandings may arise when they fail to communicate how they handle uncertainty.
Morgan and Keith's elicitation procedure (22) asked its experts to integrate uncertainty from all sources (22, 29). To that end, it included day-long elicitation sessions with individual experts. Each session began with detailed instructions, discussion of potential judgmental biases, and practice questions, familiarizing participants with the process. Its questions were formulated precisely enough to allow evaluating the judgments in terms of their consistency and accuracy. It used probes designed to help experts assess and express the limits to their knowledge.
A complementary approach asks experts to audit existing studies for their vulnerability to key sources of uncertainty. One such procedure is the Cochrane Collaboration’s risk-of-bias assessment for medical clinical trials (32). Fig. 2 adapts that procedure to assess the uncertainty in field trials of programs designed to reduce residential electricity consumption. It shows that most trials had a high risk of biases likely to overestimate the effectiveness and underestimate the uncertainty of the programs being evaluated (33). For example, without blinding, participants’ mere knowledge that they are being studied may affect their behavior. Although the possibility of such Hawthorne effects has long been known (34), their size is uncertain. A recent experiment reduced this uncertainty for electricity field trials, finding a 2.7% reduction in consumption during a month in which residents received weekly postcards saying that they were in a study. That Hawthorne effect was as large as the changes attributed to actual programs in reports on other trials (35).
Fig. 2.
Methodological flaws in field trials of interventions for reducing home electricity consumption. (Reproduced with permission from ref. 33.)

Conveying Uncertainty.

An audit like that in Fig. 2 creates a profile of the uncertainties surrounding scientific results. Unless scientists provide such summaries, observers must infer them. The well-known heuristics-and-biases research program captures some important patterns in such intuitive statistics (36, 37). Although known for its demonstrations of bias, that research predicts that performance will vary by task. The availability heuristic can produce good estimates when people receive and recall a representative sample of examples (38). The anchoring-and-adjustment heuristic can be useful when an informative value (or anchor) is salient. However, when appearances are deceiving, the same heuristics can produce unwittingly biased judgments, leaving people confidently wrong.
Effective communication is especially important when uncertainty arises from poor data quality (e.g., small samples, weak measurement) (39). Even when people recognize those problems, they may be insufficiently sensitive, as seen in biases such as the base-rate fallacy, belief in the law of small numbers, and insufficiently regressive predictions (37). When uncertainties arise from limits to the science, decision makers must rely on the community of scientists to discover and share problems, so as to preserve the commons of trust that it enjoys (40).
The precision in the summaries that scientists produce for one another provides the foundation for communications that reduce others’ need to rely on intuition. As mentioned, meteorologists’ precise probability-of-precipitation forecasts are readily understood, once the meaning of “precipitation” is clear (41). It has also been found that most people can extract needed information from drug facts boxes with numerical summaries of risks and benefits and narrative summaries of threats to validity (e.g., a clinical trial’s size, duration, and inclusion criteria). With imprecise summaries, however, lay observers are left guessing. For example, there is wide variation in how laypeople interpret the expressions of uncertainty improvised by the IPCC, in hopes of helping nonscientists (42).

Decisions About Potential Options: What Is Possible?

Some decision makers are neither waiting for a signal to deploy a preselected option nor choosing from fixed options, but, rather, trying to create ones. Such people need to understand the science relevant to how their world works in order to devise ways to deal with it. Uncertainty is part of what they need to know. Greater uncertainty may prompt them to act sooner (to reduce it) or later (hoping that things become more predictable). When they choose to act, they may wish to create options with more certain outcomes in order to know better what they will get, or less certain ones in order to confuse rivals.

Characterizing Uncertainty.

Fig. 3 reflects one way to organize the science relevant to creating options, for decisions about drinking water safety (43). In this influence diagram (Fig. 3), the nodes are variables and the links are relationships (4446); an arrow means that knowing the value of the variable at its tail should influence predictions of the variable at its head. Such models make predictions by simulation. Each model run samples a value for each variable, and then uses them to predict outcomes. The distribution of those predictions is the computed uncertainty about the outcomes.
Fig. 3.
Influence diagram showing the expertise needed for systematic assessment of the uncertainty in responses to drinking water contamination. (Reproduced with permission from ref. 43.)
Although consumers in developed countries may give drinking water safety little thought, they still must monitor their environment for decision points (e.g., “Is my tap water brownish just because of turbidity? “Should I take that ‘boil water’ notice seriously?”). When such decisions arise, they must create and evaluate response options (e.g., home testing, bottled water). Officials, too, must also create and evaluate options both to remedy problems (e.g., stopping contamination) and to inform consumers. Their confidence in the science will help them to determine when and where to look for better options.
Like scientific theories, models have uncertainties in both their variables and their relationships. For example, predicted storm surges at one location might be based on observations at another point along a coast with varying microclimates and offshore geomorphology. Those predictions might reflect historical relationships from periods with weaker storms and lower seas (47). Additional uncertainty arises when models omit variables or relationships, whether because they seem unimportant or because the model’s formalisms cannot accommodate them. For example, energy models often neglect social factors (e.g., how quickly people adopt new technologies that they oppose). It takes insight and humility to identify these uncertainties. It takes research to assess their impact.

Assessing Uncertainty.

Assessing uncertainty typically involves running a model with values sampled from probability distributions over the possibilities and seeing how sensitive its predictions are to those uncertainties. If the predicted outcomes are all unattractive, then adding factors may allow for the creation of additional, and better, options. For example, simulations using the model in Fig. 3 found that even the best communications could not reduce the health effects of cryptosporidium contamination because the pathogen was so difficult to detect. However, better outcomes were possible by expanding the model to include options for protecting vulnerable populations (e.g., providing bottled water to people with AIDS). Adding other factors could create options for developing countries with endemic cryptosporidiosis (48).
Just as modelers must assess the uncertainty created by omitting factors from their models, so must scientists assess the uncertainty created by factors that their science typically ignores or takes for granted. Table 1 identifies four common properties of decision-making experiments that could affect the performance observed in them. Where everyday life has those properties, then the research results could be robust. Otherwise, the research might produce “orchids”—exotic, theoretically informative phenomena that are rarely observed in the wild, hence an uncertain foundation for predicting behavior there.
Table 1.
Four common properties of decision-making experiments and their potential effects on participants’ performance
i)The tasks are clearly described: can produce better decisions, if it removes the clutter of everyday life, or worse decisions, if it removes vital context, such as the choices that other people are making.
ii)The tasks have low stakes: can produce better decisions, if it reduces stress, or worse decisions, if it reduces motivation.
iii)The tasks are approved by university ethics committees: can produce better decisions, if it reduces participants’ worry about being deceived, or worse decisions, if it leads to artificiality.
iv)The tasks focus on researchers’ interests: can produce better decisions, if researchers seek decision makers’ secrets of success, or worse decisions, if researchers are committed to documenting biases.
Any discipline could summarize its generic uncertainties. For example, fields that use computational models could assess the uncertainty created by omitting hard-to-quantify factors. Fields that rely on qualitative analysis could assess the uncertainty from such expert judgment. The Numeral Unit Spread Assessment Pedigree (NUSAP) protocol offers a general approach, charactering any field by its pedigree, giving higher scores to ones that use direct outcome measures, rather than surrogates; have experimental evidence, rather than just statistical relationships; and use widely accepted methods, rather than investigator-specific ones (49, 50).
When research communities exclude relevant disciplines, uncertainty may be seriously underestimated. For example, the landmark Reactor Safety Study (51) assessed physical threats to nuclear power plants but neglected human factors, such as design flaws, lax regulatory regimes, unfriendly user interfaces, and punishing work schedules—all seen at Three Mile Island, Chernobyl, Fukushima, Browns Ferry, and other incidents. Conversely, aviation has reduced uncertainty by addressing human factor problems in instrument design (52) and cockpit team dynamics (53). Decision makers need to know which factors a field neglects and what uncertainty that creates.

Conveying Uncertainty.

To create options, people need to know the science about how things work. A common formulation for that knowledge is having a mental model of the domain (5456). Mental models have been studied for topics as diverse as logical relationships, spatial patterns, mechanical systems, human physiology, and risky technologies. The research typically contrasts lay models with expert ones, such as the rules of syllogistic reasoning, schematics of biological system, maps, and analyses like Fig. 3. The contrast shows where people need help and which facts go without saying. Lay beliefs can go astray in each element of a model. People can misunderstand or ignore factors or relationships; they can know current science but not the uncertainty surrounding it; and they can integrate the pieces poorly when the processes are unintuitive (e.g., dynamic, nonlinear) or demand too much mental computation.
Communications designed to improve mental models target specific problems, in order to avoid overwhelming recipients by saying too much or boring (and even insulting) them by repeating things that they know already. Messages addressing specific uncertainties include “Future storms are so hard to predict that you should be ready for anything,” “Experience with these treatments is so limited that you might wait until we know more,” and “Today’s financial markets move too fast to rely on previously successful trading strategies.”
When the processes are unintuitive, people may need explanations, such as “if you’ve never experienced a rip tide, you can’t imagine its power” (57) or “young women’s breast tissue is too thick for mammography to be useful.” (58). If the processes require difficult mental arithmetic, then the communication may need to “run the numbers” (e.g., showing how small risks accumulate over repeated exposures) (59). If the kind of science is unfamiliar, then science communication may require some science education (e.g., how models work). If the discourse of science is bewildering, then communications may need to explain the role of controversy in revealing and resolving uncertainties, and how passionate theoretical disagreements might have limited practical importance. If scientists tend to overstate their results, then a warning may be needed. As elsewhere, the test of success is empirical: Do people grasp the science well enough to create and evaluate options (60)?

Eliciting Uncertainty

Science communication is driven by what audiences need to know, not by what scientists want to say. Thus, it is unlike communicating with students, who must accept curricular definitions of what matters. However, relevance poses challenges. Mastering the uncertainties of many decisions forces laypeople to become experts in everything. Addressing the needs of many decision makers forces scientists to become experts in everyone.
One way to reduce that load is by adopting standard reporting formats, which experts can learn to create and decision makers can learn to use. Subjective probability distributions are one such format, with seemingly growing acceptance. Signal detection theory could be another, were the research conducted to make its quantitative estimates (d′, β) as clear as its concepts (discrimination ability, caution). Influence diagrams offer a representation with the precision needed for analyzing a problem and a graphic format for conveying its gist (18).
Table 2 offers a standard protocol for eliciting and reporting expert assessments of scientific uncertainty. It is designed to require little additional effort from scientists, by making explicit concerns that are already on their minds, in terms that should be familiar.
Table 2.
A protocol for summarizing scientific uncertainty, illustrated in the context of medical clinical trials
Step iIdentify key outcomes for decision makers (e.g., stroke) and how to measure them (e.g., annual probability).
Step iiSummarize variability
Step iiiSummarize internal validity
 Selection biasDo the initial groups differ from randomly assigned ones?
Were the groups drawn from same population, over same time periods, and with the same inclusion and exclusion criteria?
 Attrition biasDo the final groups differ from the initial ones?
  Did the groups differ as a result of which participants dropped out (e.g., because the treatment did not seem to be working or their lives were too disorderly to continue) or were excluded from analyses (e.g., for incomplete data or seemingly anomalous responses)?
 AdministrationWas the study conducted as intended?
Were instructions followed in administering the treatment and analyzing the results?
 Performance biasDoes the manipulation have unintended effects?
Were participants affected by knowing that they were in the study (or in a study), perhaps trying to satisfy researchers’ (real or perceived) expectations?
Step ivSummarize external validity
 Population biasDo treatment groups differ from the general population?
Might they be relatively sensitive to positive effects or to unintended side effects?
 Intervention biasAre treatments administered differently in different conditions?
Might they be applied less consistently, intensively, or obviously?
 Control group biasDo untreated groups differ in other ways?
Might they receive more (or less) of other treatments with more (or less) supervision?
 Scenario biasDo other conditions differ from those of the study?
 Might other factors diminish (or enhance) the treatment’s effect? Might the world have changed?
Step vSummarize the strength of the basic science
 DirectnessHow well do a field’s measures capture key outcomes?
  The strongest sciences measure outcomes directly rather than relying on proxy measures (e.g., biomarkers that appear related to health states).
 Empirical basisHow strong are the best available estimates?
The strongest sciences base their theories on large, well-controlled experiments rather than on datasets that are small or collected under variable conditions (e.g., dose–response relationships derived from epidemiological data).
 Methodological rigorHow strong are the best methods?
  The strongest sciences have methods with well-understood strengths and weaknesses, and extensive experience in their application.
 ValidationHow well are theoretical results confirmed?
The strongest sciences have foundations (theories, hypotheses, relationships) that are strongly confirmed by evidence from multiple, independent sources.
Step viSummarize uncertainty (e.g., 95% credible interval)
Statements of the form, “Considering the variability of the evidence (step ii) and my assessments of the internal validity of the studies that collected it (step iii), their relevance to the decision-making domain (step iv), and the strength of the underlying science (step v), I am 95% certain that the true value of the critical outcome (step i) is between Y and Z.”
Steps iii and iv are based on CONSORT criteria for evaluating medical clinical trials (31, 62). Step v is based on the NUSAP criteria for evaluating the strength of sciences (47) Ref. 9 summarizes research regarding step vi.

Variability.

All measurement has variability, arising from variations in procedure (e.g., how long a thermometer is left in), phenomena (e.g., how body temperature changes diurnally), and measured individuals (e.g., how much they run “hot” or “cold”). Scientists routinely estimate variability. Routinely sharing those estimates should cost little, while freeing decision makers from having to guess them. Numerical estimates are needed because verbal quantifiers (e.g., “stable measurement,” “widely varying”) communicate poorly to people unfamiliar with scientists’ conventional usage of the terms (6163). The ±X% format used in survey research, and required by its professional organizations (64), is one example.

Internal Validity.

Scientists ask common questions when evaluating studies. The protocol reports their assessments for the uncertainties that have been found to pose the greatest threats to the validity of medical clinical trials (32, 6567). Although expressed in terms of experimental research, these threats have analogs in observational studies. For example, selection bias arises when individuals are not randomly assigned to experimental groups. In observational studies, the equivalent is ignoring confounds (e.g., the effects of cohort differences on correlations between age and behavior).

External Validity.

Decision makers need to know how confidently they can extrapolate results from the contexts that scientists have studied to the ones that interest them. Sometimes scientists have evidence that can limit the uncertainty (e.g., “although the study involved women, gender differences are rare with such behaviors”). Sometimes, they can say little (e.g., “no one has ever studied men”). The protocol in Table 2 proposes reporting on four such threats.

Strength of Science.

The protocol adopts the NUSAP framework (49) for four aspects of a field’s pedigree. The best fields have strong theoretical foundations supported by robust experimental methods and converging results. The weakest ones have poorly understood observational data. The weaker a field, the greater its uncertainty.

Credible Intervals.

Quantitative summaries of uncertainty take the form, “I am XX% certain that the true value is between Y and Z.” A credible interval is the same size as the confidence interval when scientists consider only observed variability. It is wider when scientists question those observations or the strength of the underlying science. It is narrower when scientists have enough confidence in the basic science to discount anomalous observations. It can be higher or lower if scientists perceive bias (e.g., clinical trials designed to get the maximum effect; climate change estimates made conservatively).
Unless there are negligible threats to the internal validity, external validity, and pedigree of scientific results, credible intervals should differ in size from confidence intervals. Nonetheless, many scientists are uncomfortable providing them. Table 3 lists four common reasons and potential responses (68). Decision makers who receive a full report (Table 2, steps iv) might be able to infer the credible interval that it implies. Whether they can is an empirical question.
Table 3.
Frequently asked questions addressing four concerns of scientists reluctant to express their uncertainty in credible-interval form
Concern 1If I give credible intervals, people will misinterpret them, inferring greater precision than I intended.
 ResponseBehavioral research has found that most people (i) like receiving explicit quantitative expressions of uncertainty (such as credible intervals), (ii) can interpret them well enough to extract their main message, and (iii) misinterpret verbal expressions of uncertainty (e.g., “good” evidence, “rare” side effect). For audiences that receive the reports created with the protocol (Table 2), understanding should be greater if they receive credible intervals than if they have to infer them (63).
Concern 2People cannot use probabilities.
 ResponseBehavioral research has found that laypeople can often provide reasonably consistent probability judgments if asked clear questions and extract needed information if provided with well-designed displays (41, 60, 74). Whether they do so well enough to satisfy their decision-making needs is an empirical question, which should be answered with evidence rather than speculation.
Concern 3My credible intervals will be used unfairly in performance evaluations.
 ResponseSuch judgments can protect experts from unfair evaluations, unjustly accusing them of having conveyed too much or too little confidence, especially when supported by the rationale for those judgments. The protocol provides such protection—if the experts’ management stands behind it.
Concern 4People do not need such judgments.
 ResponseDecision makers must act with some degree of uncertainty. Not helping them means preferring the risk of having them guess incorrectly over the risk of expressing oneself poorly.

Conclusion

Communicating uncertainty requires identifying the facts relevant to recipients’ decisions, characterizing the relevant uncertainties, assessing their magnitude, drafting possible messages, and evaluating their success. Performing these tasks demands commitment from scientists and from their institutions. It also demands resources for the direct costs of analysis, elicitation, and message development, and for the opportunity costs of having scientists spend time communicating uncertainty rather than reducing it (through their research). Making this investment means treating communication as part of scientists’ professional responsibility and rewarding them for strengthening the public goodwill that science needs (40).
This investment would be more attractive if it advanced science, as well as making it more useful. A standard reporting protocol might help do that. Although assessing uncertainty is at the heart of all science, such protocols are not (67, 68). Indeed, the Consolidated Standards of Reporting Trials (CONSORT) scheme, reflected in Table 2 and Fig. 3, was a response to the inconsistent reporting of medical clinical trials. The FDA has adopted a standard format for summarizing the evidence and uncertainties underlying its drug approval decisions (70). The drug facts box uses a similar strategy (41). The open-data movement’s reporting standards seek to record scientific uncertainty and reduce the file-drawer problem, whereby scientists report results that affirm their hypotheses while finding reasons to discard inconsistent ones. Such practices add inestimable uncertainty to published science by obscuring its capitalization on chance (7173).
For scientists, uncertainty obscures theoretical questions. For people who rely on science, uncertainty obscures choices. Those awaiting a signal for action need to know whether the evidence is certain enough to pass the threshold defined by their decision rule. Those choosing among fixed options need to know how far to trust predictions of valued outcomes. Those creating options need to know how well the processes shaping their outcomes are understood.
Table 2 offers a protocol for summarizing those uncertainties. For scientists, it should require little additional work, merely asking them to report, in standard form, judgments that they make already. For decision makers, it should make uncertainties easier to grasp by putting them in a common format. For communicators, it should allow economies of scope in creating ways to address recurrent issues. The result should be better science and better decisions.

Acknowledgments

Preparation of this manuscript was supported by the National Science Foundation (SES-0949710).

References

1
B Fischhoff, J Kadvany Risk: A Very Short Introduction (Oxford Univ Press, Oxford, 2011).
2
, eds S Lichtenstein, P Slovic (Cambridge Univ Press, New York The Construction of Preferences, 2006).
3
D von Winterfeldt, Bridging the gap between science and decision making. Proc Natl Acad Sci USA 110, 14055–14061 (2013).
4
J Baron, JC Hershey, Outcome bias in decision evaluation. J Pers Soc Psychol 54, 569–579 (1988).
5
B Fischhoff, Hindsight ≠foresight: The effect of outcome knowledge on judgment under uncertainty. J Exp Psych Human Perc Perf 1, 288–299 (1975).
6
DM Green, JA Swets Signal Detection Theory and Psychophysics (Wiley, New York, 1966).
7
D Mohan, et al., Sources of non-compliance with clinical practice guidelines in trauma triage: A decision science study. Implement Sci 7, 103 (2012).
8
; American College of Surgeons Resources for Optimal Care of the Injured Patient (ACS, Chicago, 2006).
9
A O’Hagan, et al. Uncertain Judgements: Eliciting Expert Probabilities (Wiley, Chichester, UK, 2006).
10
A Ericsson, HA Simon Verbal Protocols as Data (MIT Press, Cambridge, MA, 1993).
11
AH Murphy, S Lichtenstein, B Fischhoff, RL Winkler, Misinterpretations of precipitation probability forecasts. Bull Am Meteorol Soc 61, 695–701 (1980).
12
; National Research Council Survey Measure of Subjective Phenomena (National Academy Press, Washington, 1982).
13
S Macintyre, P West, ‘What does the phrase “safer sex” mean to you?’ Understanding among Glaswegian 18-year-olds in 1990. AIDS 7, 121–125 (1993).
14
National Oceanic and Atmospheric Administration (2013) Hurricane/Post-Tropical Cyclone Sandy October 2229, 2012 (NOAA, Washington).
15
D Kahneman Thinking, Fast and Slow (Farrar Giroux & Strauss, New York, 2011).
16
B Fischhoff, Giving advice. Decision theory perspectives on sexual assault. Am Psychol 47, 577–588 (1992).
17
B Fischhoff, Nonpersuasive communication about matters of greatest urgency: Climate change. Environ Sci Technol 41, 7204–7208 (2007).
18
VF Reyna, A new intuitionism: Meaning, memory, and development in Fuzzy-Trace Theory. Judgm Decis Mak 7, 332–339 (2012).
19
; National Institutes of Health, Breast cancer screening for women ages 40–49. Consensus Statement Online January 21-23, 1997. 15, 1–35 (1997).
20
; US Preventive Services Task Force, Screening for breast cancer: US Preventive Services Task Force recommendation statement. Ann Intern Med 151, 716–726 (2009).
21
S Woloshin, et al., Women’s understanding of the mammography screening debate. Arch Intern Med 160, 1434–1440 (2000).
22
H Raiffa Decision Analysis (Addison-Wesley, Reading, MA, 1968).
23
C Howson, P Urbach Scientific Reasoning: The Bayesian Approach (Open Court, Chicago, 1989).
24
MG Morgan, DW Keith, Subjective judgments by climate experts. Environ Sci Technol 29, 468A–476A (1995).
25
P Gärdenfors, N Sahlin, Unreliable probabilities, risk taking, and decision making. Synthese 53, 361–386 (1982).
26
W Edwards, H Lindman, J Savage, Bayesian statistical inference for psychological research. Psychol Rev 70, 193–242 (1963).
27
A Gelman, CR Shalizi, Philosophy and the practice of Bayesian statistics. Br J Math Stat Psychol 66, 8–38 (2013).
28
T Seidenfeld, Why I am not an objective Bayesian: Some reflections prompted by Rosenkrantz. Theory Decis 11, 413–440 (1979).
29
JS Armstrong Persuasive Advertising (Palgrave Macmillan, New York, 2010).
30
EC Poulton Behavioral Decision Making (Lawrence Erlbaum, Hillsdale, NJ, 1994).
31
Morgan MG (2014) Use (and abuse) of expert elicitation in support of decision making for public policy. Proc Natl Acad Sci USA 111(20):7176–7184.
32
Higgins JPT, Altman DG, Sterne JAC, eds (2011) Assessing risk of bias in included studies. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (Updated March 2011), The Cochrane Collaboration 2011, eds Higgins JPT, Green S, Chap 8. Available at www.cochrane-handbook.org. Accessed May 4, 2014.
33
AL Davis, T Krishnamurti, B Fischhoff, W Bruine de Bruin, Setting a standard for electricity pilot studies. Energy Policy 62, 401–409 (2013).
34
FJ Roethlisberger, WD Dickson Management and the Worker (Harvard Univ Press, Cambridge, MA, 1939).
35
D Schwartz, B Fischhoff, T Krishnamurti, F Sowell, The Hawthorne effect and energy awareness. Proc Natl Acad Sci USA 110, 15242–15246 (2013).
36
, eds D Kahneman, P Slovic, A Tversky (Cambridge Univ Press, New York Judgment Under Uncertainty: Heuristics and Biases, 1982).
37
A Tversky, D Kahneman, Judgment under uncertainty: Heuristics and biases. Science 185, 1124–1131 (1974).
38
A Tversky, D Kahneman, Availability: A heuristic for judging frequency and probability. Cognit Psychol 5, 207–232 (1973).
39
D Kahneman, A Tversky, On the psychology of prediction. Psychol Rev 80, 237–251 (1973).
40
Fiske ST, Dupree C (2014) Gaining trust as well as respect in communicating to motivated audiences about science topics. Proc Natl Acad Sci USA 111:13593–13597.
41
LM Schwartz, S Woloshin, The Drug Facts Box: Improving the communication of prescription drug information. Proc Natl Acad Sci USA 110, 14069–14074 (2013).
42
DV Budescu, SB Broomell, HH Por, Improving communication of uncertainty in the reports of the intergovernmental panel on climate change. Psychol Sci 20, 299–308 (2009).
43
EA Casman, B Fischhoff, C Palmgren, MJ Small, F Wu, An integrated risk model of a drinking-water-borne cryptosporidiosis outbreak. Risk Anal 20, 495–511 (2000).
44
WJ Burns, RT Clemen, Covariance structure models and influence diagrams. Manage Sci 39, 816–834 (1993).
45
B Fischhoff, W Bruine de Bruin, U Guvenc, D Caruso, L Brilliant, Analyzing disaster risks and plans: An avian flu example. J Risk Uncertain 33, 133–151 (2006).
46
RA Howard, Knowledge maps. Manage Sci 35, 903–922 (1989).
47
Wong-Parodi G, Strauss BH (2014) Team science for science communication. Proc Natl Acad Sci USA 111:13658–13663.
48
E Casman, et al., Climate change and cryptosporidiosis: A qualitative analysis. Clim Change 50, 219–249 (2001).
49
SO Funtowicz, J Ravetz Uncertainty and Quality in Science for Policy (Kluwer, London, 1990).
50
G Bammer Uncertainty and Risk, ed M Smithson (Earthscan, London, 2008).
51
; Nuclear Regulatory Commission Reactor Safety Study (NRC, Washington, WASH-1400. (1974).
52
CD Wickens, JS McCarley Applied Attention Theory (CRC Press, Boca Raton, FL, 2008).
53
E Salas, CS Burke, CA Bowers, KA Wilson, Team training in the skies: Does crew resource management (CRM) training work? Hum Factors 43, 641–674 (2001).
54
W Bruine de Bruin, A Bostrom, Assessing what to address in science communication. Proc Natl Acad Sci USA 110, 14062–14068 (2013).
55
PN Johnson-Laird How We Reason (Oxford Univ Press, New York, 2006).
56
MG Morgan, B Fischhoff, A Bostrom, C Atman Risk Communication: The Mental Models Approach (Cambridge Univ Press, New York, 2001).
57
GD Webster, D Agdas, FJ Masters, CL Cook, AN Gesselman, Prior storm experience moderates water surge perception and risk. PLoS ONE 8, e62477 (2013).
58
E Silverman, et al., Women’s views on breast cancer risk and screening mammography: A qualitative interview study. Med Decis Making 21, 231–240 (2001).
59
J Cohen, EI Chesnick, D Haran, Evaluation of compound probabilities in sequential choice. Nature 232, 414–416 (1971).
60
B Fischhoff, The sciences of science communication. Proc Natl Acad Sci USA 110, 14033–14039 (2013).
61
J Cohen, EJ Dearnley, CEM Hansel, A quantitative study of meaning. Br J Educ Psychol 28, 141–148 (1958).
62
S Fillenbaum, A Rapoport Structures in the Subjective Lexicon (Academic, New York, 1971).
63
I Erev, BL Cohen, Verbal versus numerical probabilities: Efficiency, biases, and the preference paradox. Organ Behav Hum Decis Process 45, 1–18 (1990).
64
American Association of Public Opinion Research (2010) Standards and Ethics. Available at www.aapor.org/Standards_and_Ethics/5102.htm. Accessed May 4, 2014.
65
WA Mahon, EE Daniel, A method for the assessment of reports of drug trials. Can Med Assoc J 90, 565–569 (1964).
66
F Mosteller, JP Gilbert, B McPeek, Reporting standards and research strategies for controlled trials: Agenda for the editor. Control Clin Trials 1, 37–58 (1980).
67
D Moher, et al., CONSORT 2010 explanation and elaboration: Updated guidelines for reporting parallel group randomised trials. BMJ 340, c869 (2010).
68
B Fischhoff, Communicating uncertainty: Fulfilling the duty to inform. Issues Sci Technol 28, 63–70 (2012).
69
; PBL Netherlands Environmental Assessment Agency Guidance for Uncertainty Assessment (PBL, 2nd Ed, Amsterdam, 2012).
70
; Food and Drug Administration Structured Approach to Benefit-Risk Assessment for Drug Regulatory Decision Making (FDA, Washington, 2013).
71
J Cohen, The statistical power of abnormal-social psychological research: A review. J Abnorm Soc Psychol 65, 145–153 (1962).
72
B Nosek, Y Bar-Anan, Scientific utopia: Opening scientific communication. Psych Inquiry 23, 217–243 (2012).
73
AL Davis, B Fischhoff, Communicating uncertain experimental evidence. J Exp Psychol Learn Mem Cogn 40, 261–274 (2014).
74
, eds B Fischhoff, N Brewer, JS Downs (Food and Drug Administration, Washington, DC Communicating Risks and Benefits: An Evidence-Based User's Guide, 2011).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 111 | No. supplement_4
September 16, 2014
PubMed: 25225390

Classifications

Submission history

Published online: September 15, 2014
Published in issue: September 16, 2014

Keywords

  1. science communication
  2. expert judgment
  3. expert elicitation
  4. risk
  5. mental models

Acknowledgments

Preparation of this manuscript was supported by the National Science Foundation (SES-0949710).

Notes

This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, “The Science of Science Communication II,” held September 23–25, 2013, at the National Academy of Sciences in Washington, DC. The complete program and video recordings of most presentations are available on the NAS website at www.nasonline.org/science-communication-II.
This article is a PNAS Direct Submission. D.A.S. is a guest editor invited by the Editorial Board.

Authors

Affiliations

Baruch Fischhoff1 [email protected]
Departments of aEngineering and Public Policy and
Social and Decision Sciences, Carnegie Mellon University, Pittsburgh, PA 15213-3890
Alex L. Davis
Departments of aEngineering and Public Policy and

Notes

1
To whom correspondence should be addressed. Email: [email protected].
Author contributions: B.F. and A.L.D. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Communicating scientific uncertainty
    Proceedings of the National Academy of Sciences
    • Vol. 111
    • No. supplement_4
    • pp. 13583-13671

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media