Years of good life is a well-being indicator designed to serve research on sustainability

Significance Attempts at comprehensive quantitative assessments of sustainable development can focus on either determinants or constituents of long-term human well-being. While much research on determinants has relied on economic concepts of capital and inclusive wealth, here we focus on the constituents of well-being using a demographic approach. We construct a tailor-made metric based on life expectancy and indicators of objective and subjective well-being. The future trend in this metric has the potential to serve as a sustainability criterion and marks a crucial step in the endeavor to comprehensively assess sustainable development. At this stage, it is only applied to observed past and current conditions. To address sustainability, it will be combined with scenarios addressing future changes including feedback from environmental change.


Six desiderata for defining a wellbeing indicator that can potentially be used as sustainability criterion
The field of wellbeing indicators is currently mushrooming. On the one hand, this reflects a deep dissatisfaction with conventional indicators of wellbeing that fail at "Measuring Tomorrow" (1) in the sense that they do not account for sustainability and resilience within social and natural systems and have therefore been described as "broken compasses for policy" (2). On the other hand, there is an evident need for quantitative indicators to help us assess whether developments go into desired directions and for comparing and benchmarking such developments across populations (an overview and comparison of alternative wellbeing indicators currently available follows in the next section of this Supplementary Material). As Donella Meadows put it: "If we can't define what our ultimate ends are, how can we know whether we are approaching them"? (3). The official UN-led statistical process assessing and evaluating the implementation of the SDGs, for example, has identified 230 indicators covering the 169 targets and 17 broader goals that should be estimated and compared across all national populations. But among such a flood of indicators it is difficult to see the big picture, in particular when their relative importance, potential synergies and trade-offs remain disputed (4).
An alternative strategy is to aim at one composite metric, a wellbeing indicator which incorporates a number of key constituents of wellbeing. Here, at least four possible approaches in terms of combining the different constituents can be distinguished: i. One can leave the weighting of different aspects of to the users as is done e.g. by the OECD Better Life Index (5). While being seemingly rather user-friendly, this approach leads to noncomparable values depending on individual tastes and preferences (6). ii.
One can give fixed weights to the different dimensions within , thus already implicitly making a choice about the substitutability among components as exemplified by the UNDP Human Development Index (HDI). This approach suffers from the implicit assumptions of problematic trade-offs between sub-indices (7). iii.
One can apply a data-driven approach to select the relative weights of different dimensions within based on e.g. the frequency by which a specific type of hardship occurs within a population (8) or the quality of the data available for each of the dimensions to be aggregated (9). While this approach seems to save researchers the trouble of applying explicit normative judgements, the lack of theoretical reasoning, implicit judgement about the choice of indicators and the serendipity of data quality in the chosen indicators makes this approach unconvincing (10). iv.
Finally, one can develop a fully integrated indicator dominated by one metric (life expectancy in the case of YoGL) plus additional conditioning factors that must all be above a minimum level. This is the case of YoGL, where the conditioning factors are reduced to one: the Boolean conjunction of the constituent dimensions, with the result that someone's years of life will contribute to only if the minimum standards are met simultaneously in all the (four) dimensions. This way, YoGL does not apply a standard weighting structure to its constituents.
As a first step towards operationalizing this approach, following the literature we specify six key desiderata that should meet in order to serve as a wellbeing indicator whose trend over time can be used as a sustainability criterion.
(1) It should reflect widely shared values in terms of ultimate ends.
The use of as a wellbeing indicator only makes sense as far as there is near universal agreement that it reflects a highly desirable target (3). While extremist views may contradict and individual preferences may vary to some degree, the aspiration here is to capture the single most important ultimate end that broader groups of people with very different orientations, values and cultural backgrounds would be ready to subscribe to.
Survival and the avoidance of unnecessary premature mortality, either of ourselves or people we care about, is a prime candidate for such a universally shared goal. "Being able to live to the end of a human life of normal length; not dying prematurely, or before one's life is so reduced as to be not worth living" is ranked first in Martha Nussbaum's list of "central human functional capabilities" (11). This includes altruistic behaviors that seem to contradict the universality of survival as a goal, such as a mother's willingness to sacrifice her own life to save the life of her child or a resistance fighter's determination to risk his or her own live in defense of other lives or a group's freedom from oppression. These people contribute to life expectancy and the subjective wellbeing of the group (12) under the premise expressed in the second part of Nussbaum's statement of mere survival not being considered enough for life to be called "worth living". Rather, for most people minimum standards in terms of quality of life (QOL) need to be met.

(2) It should be based on bottom-up information that can be flexibly aggregated to sub-populations.
An often-voiced criticism against composite wellbeing indicators is that they are not based on individuals and thus lose important information on correlations at the individual level in the aggregation process (13). Moreover, there is a growing recognition -not only in sustainability science -of the importance of considering heterogeneity at the sub-national level and developing local or social group-specific indicators of sustainability (14). The use of national level indicators is also problematic under the long-term perspective of sustainable development because nations come and go and may change their boundaries.
As a second requirement, therefore, it should be possible to specify "bottom-up", i.e. based on individually measurable (or at least possible to estimate quantitatively) characteristics that can be aggregated to sub-populations. Such a focus on sub-populations, rather than nations, is essential for answering many of the important questions in sustainability science, in particular those of distribution and inequality: how does wellbeing differ by gender or by various ethnic or socio-economic groups in a population; how does it differ by urban/rural place of residence or other geographic units? The need to focus on sub-populations rules out indicators that are computed only at the national level, such as conventional GDP estimates, and it also renders the widely used HDI unfit as a candidate for .
(3) It should be comparable over time and across sub-populations.
For the purpose of comparing the wellbeing of certain populations at two different points in time and to see whether there has been improvement or deterioration, the indicator must have a meaning in its absolute value and not be defined on a relative scale. As an example, the life expectancy component of the HDI is defined as a fraction of the maximum national life expectancy observed in any given year. Hence, when comparing this fraction for a given population at two different points in time, it is impossible to see whether survival conditions in this population actually improved and by how much. In its relative form the index can only show whether the given population improved its relative standing to the country with the highest life expectancy. Paradoxically, a country's index value can even improve without the country actually having improved in any of the HDI dimensions (15). This is why an absolute rather than a relative metric is preferred for .
(4) It should be theory-based and reflect key constituents of wellbeing.
There seems to be broad consensus in the literature that any one dimension alone would not be sufficient for adequately capturing the ultimate end of human development or measuring human wellbeing in its multi-dimensional nature (16). Any one-dimensional indicator can be subject to similar criticism as GDP per person in terms of mis-measuring our lives (17). Even life expectancy, which has been suggested as a good and stable indicator covering and reflecting many key dimensions of wellbeing (7), would not suffice because of its focus on mere survival rather than quality of life. On the other hand, bearing in mind the possibility of complex interactions, complementarity and limited substitutability between different dimensions (18), one should be parsimonious in the number of constituents considered and choose only key dimensions with strong theoretical grounding (19,20). Theories and conceptual frameworks should support the definition of well-being, thus informing the decisions over which dimensions to include and how to relate them with each other. In particular, the theory should highlight the weighting structure and the trade-offs implied in a specific indicator. One way of dealing with this is to set minimum standards with respect to which no compromise or compensation across dimensions could be possible at the individual level. This is the choice we made in the combination of minimum standards and Boolean conjunction of constituent dimensions in the definition of YoGL.
(5) There should be sufficient empirical information for different sub-populations and time points to be fit for serving as the dependent variable in panel regressions.
Since the purpose of is to be estimated for many (sub-)populations at different points in time, its constituents need to be based on empirical information from survey items or other sources that are readily available. Ideally, all pieces of information necessary for the calculation of the indicator are available for the same individuals participating in a survey to satisfy desideratum #2. However, since this will not always be the case, reasonable data integration methods could be applied to obtain missing dimensions. An additional strength of a good indicator of wellbeing is its interpretability in terms of a real-life analogy. As Veenhoven (21) points out, the strength of GDP per person, for example, lies in its "clear substantive meaning". It is suggestive of the amount of money the average citizen has at his/her disposal to purchase goods and services. Meanwhile, life expectancy gives the number of years one can expect to live on average. The HDI, on the other hand, does not have an intuitive interpretation and gives us an abstract number that can hardly be associated with anything tangible. Unlike the previous ones, this property of the indicator is desirable but not absolutely necessary.
YoGL as defined and discussed in the main text of this paper meets each of these desiderata and can thus be applied to operationalize the definition of sustainability, demanding that for a change in living conditions to be called sustainable, it must not lead to a decline in wellbeing for any sub-population of interest over time. The time frame that is relevant here varies by the type of intervention. Many political reforms can affect living conditions negatively in the short run, e.g. when increased taxation has thus far only led to a reduction in disposable income while the induced positive effects still need time to diffuse. This view is consistent with Laurent stating that 'measuring sustainability needs to evaluate well-being in the long run, both after the occurrence of shocks and during normal times (1).
In the following section we will discuss wellbeing indicators that have been suggested in the literature and assess whether they meet the above described desiderata.

Comparing 31 other wellbeing indicators to YoGL
Over the past 50 years, a large number of institutions and researchers around the world have contributed to the development of human wellbeing indicators for the purpose of supporting governments in devising meaningful policy interventions and to spur the debate on how to best raise people's quality of life in different national and cultural contexts. Many of these indicators serve as advocacy tools, educating the public on national and international differences in quality of life and leading to the expansion in research on wellbeing (22). GDP per capita continues to be by far the most prominent and widely used indicator of human wellbeing to this day, despite warnings by its inventor against its use for this purpose (23). Yet after heavy criticism of the concept (17), the majority of modern wellbeing indicators look beyond the measurement of national income and pay more attention to social and ecological dimensions of human development, including health, social capital, governance, civil liberties or environmental quality (24)(25)(26).
The desire to go beyond GDP has been highlighted by the prominent report "Mis-Measuring our Lives" where the economists Stiglitz, Sen and Fitoussi (17) discuss appropriate metrics aside from GDP per capita. In their conclusions, the authors clearly state that sustainability assessments require a well-identified dashboard of indicators. But they also stress that assessment of sustainability must be examined separately from the question of current well-being and warn against mixing or blending indicators of current well-being with indicators of sustainability that mostly will play out in the future.
While many of the more recent proposals for indicators follow the suggestion of producing a dashboard by covering indicators from a broad range of domains, they do not follow the suggestion with respect to separating the measurement of current well-being from the issue of sustainability. The Happy Planet Index (8, see also item #15 in Table S1), for instance, combines current mortality conditions in different countries -as summarized by life expectancy and data on life satisfaction -with the ecological footprint. While mortality and stated life satisfaction are measures of current wellbeing, the ecological footprint is not directly reflected in current conditions but instead measures possible impacts on future conditions, thus referring to sustainability. It therefore has a dual function which makes its direct interpretation difficult.
The conceptually clearer way -following the suggestion by Stiglitz et al. (17) -would be to have a period indicator that reflects only current conditions at every point in time and allows observing trends over longer time periods (including possibly projections for the future) in order to follow its evolution in response to e.g. deteriorating environmental conditions. The OECD Better Life Index (item #19 in Table  S1) is consistent in this respect by reflecting only current conditions (6). Through an interactive onlineinterface, it allows the user to choose from eleven domains, ranging from current conditions in housing and income to life satisfaction and work-live balance. These can be weighted according to the user's own assessment before aggregation of the indicator across domains.
While the above approach is internally consistent and reflecting only current conditions, the OECD Better Life Index is not designed for the analysis of sustainable development. Like many other multi-dimensional indicators, such as the Multi-Dimensional Poverty Index (28), it depends on a large number of empirical measurements. These are typically collected in surveys and can hardly be projected into the future based on a model, which is necessary for making forecasts. The simpler the indicator, the easier it is to build such a model because less assumptions need to be made and less feedbacks need to be considered. For this reason, meaningful model-based long-term scenarios exist for certain indicators, such as life expectancy and GDP per person, but it would be hard to come up with a model for forecasting the Better Life Index.
For a comprehensive literature overview of the contemporary field of wellbeing indicators see Lijadi (29). Selected examples of prominent indicators summarizing different approaches currently in use can be found in Table S1 below, ordered by year of appearance. What all of these indicators have in common is that they cover different aspects of human wellbeing, thus acknowledging the multi-dimensional nature of human wellbeing, while increasingly shifting the focus away from the dominance of the economic system to individuals and households. But as suggested by OECD, it is also important to "reconcile the objective measures of wellbeing with those based on individual perceptions, which reflect how people actually experience and assess their life circumstances." (30) In addition, a human wellbeing indicator useful for the study of sustainable development should be derived bottom-up to be applicable to subpopulations and focus primarily on long-term developments that can be measured in time series (31).
YoGL fulfills all these requirements and several more. It still takes material conditions into account but places them on a par with other essential dimensions of human wellbeing. The choice of the objective and subjective YoGL dimensions that are superimposed on life expectancy is based on solid theoretical foundations, considering not just economic theories of wellbeing (32-34) but also theories of subjective wellbeing (35,36), while recognizing the important role of both the cognitive (37,38) and physical health (39)(40)(41) dimensions of human wellbeing.
Finally, YoGL is unequivocal in not confounding means and ends of wellbeing. For example, rather than using years of schooling -which is a means to an end -as a proxy for cognitive wellbeing, YoGL measures the years a person can expect to live in a state of numeracy or functional literacy and without cognitive limitations restricting them in conducting their daily life. Schooling serves as the means of achieving numeracy and literacy, but YoGL is calculated based on the "ends", not the "means" of a good life (i.e. functional literacy, not years of schooling). This property of the indicator is crucial for using YoGL as a dependent variable in a "wellbeing production function".

Data and Methods
To exemplify the application of YoGL, we accompany the conceptual part of the paper with results demonstrating the indicator's potential to compare wellbeing between countries, sub-populations and over time. In this section, we describe how YoGL is approximated in view of limited data availability. Despite our efforts to collect and carefully impute harmonized individual-level data for a large group of countries over time, the empirical results demonstrating the application of YoGL still have to be treated as an exemplary illustration and thus need to be interpreted with caution. We hope that future data availability will allow scholars and policy makers to untap the full potential of YoGL.
According to desideratum #2 above, for YoGL to serve as an indicator of sustainable wellbeing, it needs to be computed "bottom-up", i.e. from individual-level survey data. This represents one of the major advantages of YoGL and allows for the indicator to be compared across countries but also different subpopulations. Unfortunately, most existing surveys do not yet collect all the necessary information on each of the four individual dimensions necessary to calculate YoGL. We thus utilize survey data whenever feasible but add imputations and out-of-sample predictions when needed.
The first step in deriving YoGL is to identify a database that contains information for a large enough number of individual observations from different countries, years and sub-populations in harmonized form. The ideal database for this purpose is the Survey of Health, Ageing and Retirement in Europe (SHARE) (61), which includes indicators of all four individual characteristics needed for YoGL based on tested data. This is what is used for the computations of YoGL at age 50 presented in Figure 2 of the main text. However, SHARE is available only for a very limited sample of countries. For the results presented in Table 1 and Figure 3 and 4 of the manuscript, where the goal was to make YoGL comparisons for a diverse set of countries and over time, we had to find an alternative data source. Since the subjective dimension of YoGL is by far the most volatile one and therefore more difficult to infer, we had to find an alternative data source that contains life satisfaction for a large number of countries, to then impute the missing variables for the sample population. The best available source for that purpose is the World Values Survey (WVS) (62). The missing dimensions are imputed from SHARE (61), the Study of Global Ageing and Adult Health (SAGE) (63) and the Multi-Country Survey Study on Health and Responsiveness (MCSS) (64), respectively.
An alternative and simpler solution to compute YoGL would be to calculate age-specific prevalence rates of each of the four dimensions using different surveys and to apply those as weights to the number of person-years lived in each age group within a population. However, that would ignore important correlations that exist between the different dimensions at the individual level. These correlations can change over time, just as much as they can vary over the individual life course, e.g. as people's income situation improves with age, their life satisfaction may deteriorate because of increased stress and responsibility.

Survival
As specified in the main text, the calculation of YoGL requires information on several different dimensions of wellbeing. The essential one of those is survival, which is conventionally described in demography by life tables. Life tables are derived from individual death records which -as a minimum requirement for the computation of life tables -have to specify age at death. As survival conditions tend to differ quite heavily by gender, life tables are often reported separately for men and women within a population. The life tables we are using to calculate YoGL by gender and over time are taken from Eurostat (65) (Figure 1) or the latest available revision of the UN World Population Prospects (66) that cover all countries of the world (Table 1 and Figure 3 and 4 in the main text).
Another important determining factor for differences in survival is education. While the pattern of highly educated sub-populations outliving less educated ones can be assumed to be almost universal, the extent of the education advantage can vary widely across countries and over time (67,68). Education-specific life tables are not as widely available as the breakdown by gender, but for a small sample of European countries and for selected years, Eurostat (69) reports remaining life expectancies in single year steps from age zero for three broad education groups and by gender. These are used to derive YoGL at age 50 by education-group and gender presented in Figure S1.

Capable Longevity
As mere survival is not enough, YoGL incorporates four additional dimensions of wellbeing, three of which are combined under the name of "capable longevity". The dimensions that are chosen to calculate capable longevity are (1) being out of poverty, (2) being free from cognitive and (3) physical activity limitations. Only when a respondent scores above a certain threshold on all three of these dimensions, while being satisfied with their life (see 3.3 below), the respondent will be counted as contributing a person-year to the total number of good years lived within the population in that year.
Ideally, all three dimensions of capable longevity are to be taken from objective assessments of people's income situation, cognitive and physical health, respectively. However, apart from SHARE, no individual cross-national survey fulfills this strict requirement for the accurate computation of YoGL. Therefore, when objectively assessed information on the three dimensions of capable longevity is missing, we use different methods of imputation and inference at the individual respondent level.

Being out of poverty
Objective information on people's income situation and/or material conditions is particularly hard to come by and often needs to be inferred from proxy information, such as the availability of amenities and large consumer durables in the household. However, for reasons explained above, YoGL needs to be computed at the individual level rather than the household level. For results based on SHARE (see Figure  2 in the main text and Figure S1 in the SI), the poverty dimension of YoGL is thus covered by the survey item "total household income", which is obtained by aggregating all single income components at the household level. Individual level data is then derived by applying an equivalence scale, i.e. assigning weights to each member of the household. In particular, the square root scale is employed, which divides the total household income by the square root of the household size. This scale is used in recent OECD publications (70,71). Finally, the equivalized household income is converted to international dollars per day, using purchasing power parity (PPP) conversion rates, and then compared to the World Bank (WB) poverty line for upper-middle income countries ($5.50/day). Individuals that fall below the WB poverty rate are classified as being poor.
In WVS, which is used for Figures 3 and 4 and Table 1 in the main text, as well as for Figure S3 in the SI, the situation is more complex, since objectively assessed information on respondents' material living conditions is not available. Thus, we observe within-country variation of poverty based on self-assessed data and weight this information so that the between-country variation of poverty matches poverty information provided by the WB (72). The within-country variation of poverty is based on two subjectivelyassessed survey items: (1) people's self-declared position within their national income distribution ("On this card is an income scale on which 1 indicates the lowest income group and 10 the highest income group in your country. We would like to know in what group your household is. Please, specify the appropriate number, counting all wages, salaries, pensions and other incomes that come in."), and (2) people's self-declared saving behavior ("During the past year, did your family save money, just get by, spent some savings or spent savings and borrowed money"). We specify that all individuals who report a higher position on the national income scale than the second step and who are members of households that were able to save money during the past year are out of poverty.
The self-reported poverty information might be subject to bias, for example, when individuals do not know about the income distribution in their country or their position in that distribution. Moreover, when asked about their income/poverty status, individuals compare themselves with their peers, e.g. with individuals from their own country. Thus, on average, individuals from high income countries "overestimate" their poverty and individuals from low income countries "underestimate" their poverty. We account for this bias by adjusting the self-reported country-mean derived from WVS to the country mean reported by the WB (72). While this indicator does not meet all necessary characteristics of capable longevity presented in the main text, it can serve as a reliable approximation of poverty allowing us to exemplify the computation and utilization of YoGL.

Being free from cognitive limitations
Results presented in Figure 2 of the main text are based on SHARE. The survey provides a range of cognitive indicators, including basic functional literacy, numeracy, and word recall. We conducted sensitivity analyses of how YoGL changes when either of those are used to derive the cognitive dimension. Figure S2 compares prevalence of cognitive fitness according to two different items in the SHARE survey. As these measures are heavily correlated across age groups, our choice of which one we use to calculate YoGL does not affect the results strongly. The results presented in Figure 2 of the main text are based on a numeracy test, consisting of five simple numerical questions. For example, the respondent is asked: "One hundred minus 7 equals what?". Respondents who answered two or more questions correctly are classified as cognitively fit.
Unfortunately, tested information on cognition is not available in WVS, which is why to obtain the results presented in Table 1 and Figures 3 and 4 of the main text as well as in Figure S3 of the SI we have to rely on proxy information. In particular, we utilize the assessments of the interviewers from wave 6 of the WVS on whether the respondent was able to read the questions on their own during the survey. We crosschecked the resulting country-specific age distributions of literacy against data that is available from national level literacy surveys and find them to be fairly similar. Since this proxy of cognition is not available in earlier waves of WVS, we prepare out-of-sample predictions for the other waves employing country-specific binary logit models. Age, education and gender are available throughout all waves of the survey, they have the strongest predictive power and are therefore used as predictor variables.

Being free from physical limitations
Another requirement for a person-year to be counted as a good year of life in YoGL is for this year to be spent free from physical activity limitations. While this does not imply that a physically impaired person cannot live many good and meaningful years of life, especially if equipped with alternative means to compensate for their physical limitations, as exemplified by the late Stephen Hawking, the majority of people living with severe activity limitations today still suffer from a lack of participation possibilities, be it for reasons of discrimination or for lack of barrier-free infrastructure. Moreover, we are merely using conventional indicators of physical activity limitations to study the age pattern in health decline as one important element of human wellbeing, as it is common practice and widely accepted in the field of epidemiology.
In SHARE we can choose between several different measures of physical limitation, such as the ability to stand up from a chair, walking speed and grip strength, but not all measures are available in each wave. While grip strength has repeatedly been proven to correlate with subsequent adverse health outcomes to predict physical disability decades later, chair stand reflects a person's current health status (73). Since current health status is what we are interested in, given that YoGL is a period indicator derived from people's good person-years during the survey period, we construct a measure of physical health based on a chair stand test, in which respondents were asked to raise from a chair without using their arms, after confirming that they felt safe to do so. Respondents who did not feel safe to do the test or were unable to raise from the chair without using their arms were classified as not physically healthy. This health indicator was used for the results in Figure 2 of the main text as well as Figure S1 in the SI.
The results presented in Table 1 and Figures 3, 4, as well as S3 are again based on WVS, in combination with other data sources. The reason for this mix of data sources is that objective measures of physical limitations are not available in WVS, while SHARE and SAGE only provide objective measures for the population aged 50 and older in certain countries. MCSS, on the other hand, includes information on objective health for younger ages, yet only for a small sample of countries. Thus, in a first step we append SHARE and SAGE with MCSS. Second, we match all countries that are not available in SHARE, SAGE and/or MCSS with similar countries from those surveys. We match countries based on life expectancy at birth, as well as geographical proximity. Third, we regress tested health measures from SHARE and SAGE on age, gender, education and subjective health. Based on the estimated coefficients, we conduct out-of-sample predictions for the matched countries. This is possible since all predictor variables -age, gender, education and subjective health -are available in WVS. Due to the combination of different data sources, the results for the health dimension have to be taken with a grain of salt. We do believe, however, that the extrapolations allow us to demonstrate how YoGL can be calculated, once the required data is available.

Life Satisfaction
As described earlier, the subjective dimension in YoGL is the one that would be most difficult to derive from sources of information other than the direct assessment of the survey respondents themselves. By relying exclusively on surveys that already contain information on life satisfaction (WVS, SHARE), we avoid having to deal with the determinants of subjective wellbeing that can only be explained to a very small degree. Figure S3 presents YoGL and its individual components at age 20 for women for 38 countries. The results are based on WVS data in combination with other data sources, as described above. A corresponding figure presenting the results for men can be found in the main text ( Figure 3). Figure S4 provides the main results along with sensitivity analyses based on SHARE data, as discussed in Section "YoGL with complete data" in the main text.