Available online at www.sciencedirect.com
ScienceDirect Policy and Society 32 (2013) 303–317 www.elsevier.com/locate/polsoc
Beyond indices: The potential of fuzzy set ideal type analysis for cross-national analysis of policy outcomes John Hudson *, Stefan Ku¨hner * University of York, Department of Social Policy & Social Work, York, United Kingdom
Abstract League tables ranking performance outcomes within or between countries have become commonplace in most policy sectors in recent decades and there are numerous examples to be found in practice. UNICEF’s 2007 Overview of Child Well-Being in Rich Countries offered a comprehensive and widely cited comparative analysis of children’s well being in 21 of the richest countries of the world. Utilising an additive index the authors distilled a large amount of quantitative data relating to children’s well being and were able to provide the most comprehensive snapshot of outcomes to date. Whilst an advantage of the method – and certainly a key factor in generating media coverage – was the way it allowed for a ranking of nations, recent debates in the comparative policy analysis literature have pointed to the advantages of methods that aim to classify nations into qualitatively distinct types rather than ranking them in league tables. These debates have particular force when multiple components of analysis are conceptually distinct or cases have widely varying contexts. Fuzzy set ideal type analysis (FSITA) has become an increasingly popular alternative approach to the additive index, precisely because it addresses these concerns. In this paper we explore the potential for using FSITA for the comparative analysis of children’s well-being. Drawing on the same data and conceptual foundations as the 2007 UNICEF study we explore the potential advantages of utilising a diversity oriented method such as FSITA as tool for policy evaluation that eschews ranking in favour of classifying. # 2013 Policy and Society Associates (APSS). Elsevier Ltd. All rights reserved.
1. Introduction League tables ranking performance outcomes within or between countries have become commonplace in most policy sectors in recent decades and there are numerous examples to be found in practice. So, for example, the UNDP’s Human Development Index (HDI – which assesses all member states) and its more focused Human Poverty Index 2 (HPI-2, which only assesses high income countries) are both relatively straightforward cumulative indices that rank nations on the basis of outcomes in three conceptually distinct areas: health, wealth and knowledge (see UNDP, 2008). The OECD’s Better Life Index (OECD, 2011) offers a more wide ranging attempt to measure well-being in high income countries via a cumulative index. In terms of the breadth of data included, UNICEF’s 2007 Overview of Child Well-Being in Rich Countries, published as its Innocenti Report Card 7 (hereafter referred to as the report card or UNICEF Index), provides one of the best examples of a comparative analysis of outcomes based on an additive index (see UNICEF, 2007). It offered a comprehensive and widely cited comparison of children’s well-being in 21 of the richest countries of the world. The
* Corresponding authors. E-mail addresses:
[email protected] (J. Hudson),
[email protected] (S. Ku¨hner). 1449-4035/$ – see front matter # 2013 Policy and Society Associates (APSS). Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.polsoc.2013.10.003
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
304
report card commanded significant media attention, particularly in the UK where the country’s bottom placed ranking in the report card’s league table of child well-being sparked some heated political debate (see BBC, 2007; Knight, 2007; Womack, 2007). Whilst the ranking of nations in the report card certainly played an important role in generating media coverage of the UNICEF Index, recent debates in the comparative social policy literature have pointed to the advantages of methods that aim to classify nations into qualitatively distinct types rather than ranking them by averaging scores across a range of indicators (Kvist, 2006, 2007). In particular, fuzzy set ideal type analysis (FSITA) has become an increasingly popular alternative approach to the additive index (Hudson & Ku¨hner, 2010). In this paper we explore the potential for using FSITA for the comparative analysis of children’s well being. Drawing on the same data and conceptual foundations as the Innocenti Report Card 7 we explore the potential advantages of utilising a diversity oriented method such as FSITA as tool for both academic analysis and policy evaluation and draw out differences in the substantive findings both approaches produce. Indeed, we find that while both techniques produce fairly robust classifications of cases that are strong or weak across the board, the diversity oriented approach of FSITA is better able describe the particular strengths and weaknesses of individual cases with more mixed child well-being outcomes. The findings produced by FSITA are thus potentially very meaningful for policy makers seeking inspiration for policy design and development from crosscountry comparisons and should – at least – be reported alongside more traditional index based league tables. 2. The UNICEF child well-being index Established in 2000, UNICEF’s Innocenti Report Card series focuses on the well-being of children in high-income countries. A key feature of the series is that: ‘Each Report Card includes a league table ranking the countries of the OECD according to their record on the subject under discussion’ (UNICEF, 2011). Nine report cards have appeared to date (UNICEF, 2000, 2001a,b, 2002, 2003, 2005, 2007, 2008, 2010), on a range of subjects, but with the topics becoming increasingly broad based and multi-dimensional over time. Indeed, early reports focused on comparatively discreet issues such as child deaths by injury (UNICEF, 2001a), teenage births (UNICEF, 2001b) or child poverty (UNICEF, 2000), whilst more recent reports have focused on more multidimensional issues such as children’s well-being (UNICEF, 2007) and inequality in children’s well-being (UNICEF, 2010). Perhaps in order to generate media coverage and political debate, these reports not only include a league table ranking of countries they, in fact, begin with one and all but two reports (UNICEF, 2005, 2007) carry the phrase ‘a league table of’ in their title. Though Innocenti Report Card 7 was an exception in dropping ‘a league table of’ from its title, the report card itself begins with a league table ranking all countries covered by the study and, indeed, the text makes repeated reference to the ‘league table’ (e.g. UNICEF, 2007: 3, 10, 12 18, 19, 29, 32, 39, 42, 44). The need to produce some kind of final league table of children’s well-being no doubt influenced the methods adopted in the study and, in particular, the decision to create an additive index that combined many indicators into, potentially, a single summary measure. Given the high profile nature of the Innocenti Report Cards, a robust approach was essential, and the method chosen by the research team allowed a tremendous number of quantitative indicators of children’s well being to be used to create their overall league table ranking. Indeed, the report card is subtitled, with good reason, as a ‘comprehensive assessment of the lives and well-being of children and adolescents in the economically advanced nations’ (UNICEF, 2007: cover page). In order to assess the value of an alternative to this approach, we need first to explain the methods and indicators used in the report card itself. Some of the precise details of the methods used to construct the UNICEF Index are complex, but the general principles underpinning them are not. Simplifying a little, it was constructed using the process described below (see UNICEF, 2007 and, for a more detailed technical discussion, Bradshaw, Hoelscher, & Richardson, 2007). Step one: drawing on the conceptual literature, the researchers identified six ‘domains’ of importance to child wellbeing in rich countries, namely: 1. Material situation 2. Health and safety
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
3. 4. 5. 6.
305
Education Children’s relationships Subjective well-being Behaviour and lifestyles
Step two: again influenced by the literature, they broke each of the six domains into three components. So, for instance, the ‘material situation’ domain was broken down into the following three components: child income poverty; deprivation; joblessness. A full list of components used in the report card can be found in the Online Appendix of this paper. Step three: the research team then undertook a systematic review of the major cross-national data sets in order to identify specific indicators relating to these components. Some components were captured by a single indicator; for instance: child poverty was captured using the percentage of children (aged 0–17) in households with equivalised income less than 50% of the median. However, some components were measured by taking an average of a number of different indicators; for example: deprivation consisted of an average of (a) the percentage of children aged 15 years old reporting less than six educational possessions, (b) the percentage of children aged 15 years old reporting less than ten books in the home and (c) the percentage of children aged 11, 13 and 15 years old reporting low family affluence. Step four: in order to allow the different indicators to be combined into a single index score for each component and domain, the researchers needed to determine a method for standardising them into a common format. They used ‘zscores’ that measure the distance from the mean: a z-score of 1.5 for a nation’s poverty figures, for instance, would indicate the nation’s poverty rate is 1.5 standard deviations above the mean of all countries in the sample. Similarly, a z-score of 1.5 for a nation’s poverty figures, for instance, would indicate they are 1.5 standard deviations below the mean. Step five: before adding the indicators together to create the index score for each component and domain, the researchers needed to ensure the z-scores ‘pointed’ in the right direction. Most of the indicators measured factors that were negative for the well-being of children (e.g. percentage of children in poor households), but others measured positive factors (e.g. percentage of young people who eat fruit every day). The z-scores for negative factors were multiplied by 1.0 in order to reflect this. Step six: finally, the different pieces of data were combined in order to produce an index score for each component and domain and, indeed, a single measure for the final league table. Here, the researchers needed to ensure that each component counted equally within each domain. As we noted above, if more than one indicator was used in a component then the final score for that component was determined by calculating an average of the z-scores of each of the indicators. In other words, each component was given equal weighting in its relevant domain. On top of this, because each domain consisted of three components, they too were equally weighted in the final index. Whilst adding the z-scores for each domain would provide a simple single measure for the report card they instead calculated the final index by calculating the ranks for each domain and then calculating the average rank across each six domains (Fig. 1). This allowed for simpler presentation of the data and, similarly, z-scores were presented as an index measure with the average being 100 rather than 0, but the more detailed working paper that accompanied the report card does not simplify in this way (see Bradshaw et al., 2007). As noted above, in utilising an additive index the authors distilled a large amount of quantitative data relating to children’s well-being and were able to provide the most comprehensive snapshot to date. Whilst the method offers a robust approach to ranking nations on the basis of a wide range of indicators – and, clearly, this is a feature that UNICEF see as a central part of the Innocenti Report Card series – some recent empirical applications in the comparative policy analysis literature have favoured methods that aim to classify nations into qualitatively distinct types rather than ranking them in league tables (Kvist, 2006, 2007; for a summary see also: Hudson and Ku¨hner, elsewhere in this issue). Does this distinction between index-based league tables and FSITA matter – both methodologically and substantively – for the measurement of child well-being across rich countries? It is to this debate that we now turn. 3. The challenges of multi-dimensional comparative policy analysis In the technical working paper accompanying the report card, the authors note in a review of conceptions of child well-being that:
306
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
Fig. 1. UNICEF child well-being index summary table.
‘While there is no consensus about frameworks and definitions, all concepts have in common that they are inherently multi-dimensional, taking into account the complexity of children’s lives and relationships’ (Bradshaw et al., 2007:1). Similarly, reflecting that the index should build on the holistic view of the child found in the UN Convention on the Rights of the Child, they argued that: ‘Concepts of child well-being accordingly need to be multi- dimensional and ecological, recognising both children’s outcomes and the conditions they need for their development’ (Bradshaw et al., 2007:6). These conclusions fed directly into the methods they used to construct the report card: ‘The objective of this study was to use whatever data of acceptable quality was available to produce an index of child well-being. In searching for data we were guided by our understanding of the concept of child well-being. We searched for data to represent an ecological, multi-dimensional understanding [of child well-being]’ (Bradshaw et al., 2007:20 – emphasis added). In short, there is no doubting that a key aim of the report card was to offer a nuanced, multi-dimensional analysis of child well-being across the high income countries. Indeed, the report card justified the choice of approach on the basis that it ‘represents a significant advance on previous titles in this series which have used income poverty as a proxy measure for overall child well-being in the OECD countries’ (UNICEF, 2007:2). Comparative social policy analysts have long been troubled by the search for robust measures of welfare state intent and outcomes and the so-called ‘dependent variable debate’ has become increasingly prominent in the field (e.g. Clasen & Sigel, 2007; Ku¨hner, 2007). Given the complexity of phenomena under analysis – and the limitation of key
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
307
cross-national data sets – simple proxy measures have long been deployed despite their known flaws. In response to these limitations, a trend across the field has been to move away from simple proxies towards more complex measures that combine a range of indicators into a single index, be it in theoretical works such as Esping-Andersen’s (1990) The Three Worlds of Welfare Capitalism that devised a ground breaking ‘decommodification index’ to capture multiple dimensions of social rights, or in more applied work such as the UNDP Human Development Reports (e.g. UNDP, 2008) that are built around a global Human Development Index and two Human Poverty Indices. The oft-quoted final report of Stiglitz–Sen–Fitoussi commission (2009:16), which builds on the work of UNDP, states ‘‘statistical offices should provide the information needed to aggregate across quality-of-life dimensions, allowing the construction of different [cumulative] indexes.’’ Indeed, ‘‘while assessing quality-of-life requires a plurality of indicators’’, they see ‘‘strong demands to develop a single summary measure’’ of economic performance and social progress. However, in an earlier study, we (Hudson & Ku¨hner, 2010) argue that while more complex indices have advanced analysis significantly, they bring with them a new set of problems. In particular, we warn of the problems associated with methods – such as the z-score based additive index – that rely on averaging processes because: ‘while such approaches work well when dealing with a single component of welfare, they struggle to cope with more complex pictures of welfare that highlight multiple, conceptually distinct, components of welfare’ (Hudson & Ku¨hner, 2010:177). More specifically, it was suggested (Hudson & Ku¨hner, 2010:169) that such methods suffer from two key problems. Firstly, they are prone to outlier effects: if a country is exceptionally strong or weak in one dimension then this can mask the true nature of a country’s overall situation. For example, if a study examined six dimensions of activity and one country was average in all it could be ranked in the same place as a qualitatively very different country that is very strong in half of the dimensions but very weak the other half. Or, similarly, a country that is around average in five dimensions but way above average in one might find itself towards the top of the table on the basis of this one exceptional score. Principally, such outlier problems can be dealt with by careful data cleaning prior to index construction in large N-designs. In contrast, dropping outliers due to exceptional dimension or component scores may not be appropriate in small-N context, for instance, if policy makers rely on a full ‘population’ of OECD or EU countries. This hints at a second, and more serious, problem, which is that additive indices find it difficult to identify or, indeed, pay due regard to, conceptually important differences signified by the data used to compile the index. To build on the previous example, it is probably very useful to know which nations fall into the average at everything or which are extremely high on half the dimensions and extremely low on the other, for they probably represent different patterns of outcomes or activities. This is particularly so when the different dimensions relate to conceptually distinct areas of activity. But, in an additive index, all numbers count equally, irrespective of their meaning (see also: Rubinson & Ragin, 2007). Crucially, both these are issues relevant to the Innocenti Report Card. We have already noted that the authors aimed to capture data relating to multiple-dimensions of child well-being. Added to this, however, it is worth flagging here that they concluded that ‘in the end the analysis has been data driven’ (Bradshaw et al., 2007:20), not least because of limits in terms of what data they could find but also because of the sheer volume of data included in their study. A question that follows, then, is one of whether viable alternative methods that could overcome some of these limitations exist. We have argued that the recently developed fuzzy set ideal type analysis (see Kvist, 2006, 2007) can overcome these issues for small-N studies with a modest number of conceptual dimensions under analysis. In the remainder of the paper, we aim to explore whether it can be applied to a study such as the Innocenti Report Card 7 that is based on a relatively large number of variables. 4. Applying FSITA to the analysis of child well-being The starting point for any qualitative comparative analysis (QCA) ought to be a consideration of the key concepts to be analysed (Ragin, Strand, & Rubinson, 2008; again: see Hudson and Ku¨hner, elsewhere in this issue). Given that our FSITA aims to replicate the Innocenti Report Card we have not started our study with a thorough review of the background conceptual literature. The report card itself was based on a prior review of this literature. However, the authors themselves noted that several competing conceptions exist and that their own approach was both based on the rather generalised notions carried in the UN Convention of the Rights of the Child and, ultimately, was also rather
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
308
data driven due to limitations in available quantitative information (Bradshaw et al., 2007:20). In other words, if we were to start from scratch then perhaps we would not end up with the same dimensions of child well-being as those included in the report card. We need, therefore, a technique for checking the robustness of these dimensions and, in particular, for testing their coherence.
4.1. Identifying conceptually coherent sets Strictly speaking, the conceptual coherence of the sets – and the components and indicators underpinning them – should be determined through a thorough review of the background literature. With a potential maximum of 40 indicators combined in the Innocenti Report Card we faced a practical challenge here, particularly as many relate to rather discrete issues on which conceptual literature is minimal. For instance, one indicator concerns the % of children reporting less than six educational possessions, but what conceptual literature does this relate to? In the report card this indicator was used as a proxy for part of material well-being, concerning issues around reported deprivation, but might equally be tied to issues around educational well-being. Reviewing all potentially relevant literature on these often very discrete issues would involve reviewing many thousands of studies so, instead, we adopted some arithmetic short cuts to indentify the presence of potential conceptually distinct components or variables within each of the six dimensions used within the original study. Specifically, we adopted the following four steps:
Step Step Step Step
1: 2: 3: 4:
Compute simple OLS regression including post hoc Cook’s D. Identify and exclude bivariate influential cases (within dimensions). Compute bivariate correlations and principal components analysis. Identify linearly uncorrelated indicators and components (within dimensions).
Simple OLS regression of all indicators within each dimension of the child well-being index served as a first indication to which extent components, in fact, capture independent latent concepts. While there are multiple ways in which concepts, such as deprivation or educational attainment, can be operationalised, the replaceability principle postulates that indicators measuring the same concepts should produce broadly comparable findings across cases (Schnell, Hill, & Esser, 2011). Put differently, we expect correlations between each indicator within a component to be at least moderately high and statistically significant (at the 0.1-level) to suggest that they are indeed part of one conceptually distinct ‘indicator universe’. However, since small-N correlation analysis may be subject to bias caused by influential cases (Anscombe, 1973), we use simple OLS regression mainly to compute the Cook’s D statistic to identify and exclude cases that would otherwise heavily influence subsequent analyses. After accounting for bivariate outliers in this way, bivariate correlations and principal components analysis was employed to test whether bundles of indicators capture single components within dimensions. Indeed, this was not always the case and indicators were assigned to different components within dimensions or excluded altogether based on these statistical criteria. To provide one illustration of this chosen approach, one may look at the first of the six dimensions of the Innocenti Report Card, material child well-being. Here, the percentage of children reporting less than six educational possessions in Greece (61%) was identified as an influential outlier according to the Cook’s D statistic. After excluding Greece, the bivariate correlation between the percentage of children reporting low affluence and the percentage of children reporting less than 6 educational possessions was high (r = 0.847) and statistically significant at the 0.01level. The correlation between the percentage of children with less than 10 books, however, was only weakly correlated with both the percentage of children reporting low affluence and the percentage of children reporting less than six educational possessions. Furthermore, principal component analysis suggested a two factor solution across the five indicators in the material dimension: (a) the percentage of children living in poverty, (b) the percentage of children reporting low affluence, (c) the percentage of children reporting less than six educational possessions and (d) the percentage of working-age households with children without an employed parent all loaded high on the first factor; (e) the percentage of children reporting less than ten books in the home loaded on the second factor. Consequently, we retained (a)–(d) and dropped (e) from further analysis of the new component, which we labelled ‘family affluence’. Space restricts the ability to discuss decisions to retain/drop individual indicators in all other dimensions/ components of the original Innocenti Report Card, but further information can be found in the Online Appendix to this paper.
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
309
However, these steps provided initial indications rather than rigid markers: where potentially discordant elements were detected, the researchers examined them in more detail and considered whether or not the data indicated a conceptual dissidence. We report these deliberations in our analysis sections. Given that the report card was data driven, this reflection on concepts was of particularly important value here as it is essential that our sets represent coherent conceptually distinct dimensions of child well-being. Indeed, based on the above we suggest that the original dimensions in the report card lacked some clarity and offer an alternative set of dimensions.1 We do not have space to offer a detailed discussion here, but further information can be found online in the statistical appendix to this paper. Here we merely note that the revised dimensions for our analysis are as follows: Material situation Education Children’s relationships Subjective well-being Behaviour and lifestyles And, that within each of these dimensions we found a number of distinct components. A summary of the dimensions and components – and the indicators we find that constitute them – can be found in Table 1.
4.2. Calibrating the sets After assessing the robustness of our sets, we also needed to calibrate them. As noted above, not all variation matters in FSITA and the method allows us to set conceptually rooted floors and ceilings for set membership in order to eliminate noise arising from outlier effects. With a potential maximum of 40 indicators (we use 19 in the end) we faced another practical challenge here, particularly as many relate to rather discrete issues on which conceptual literature is limited. For instance, building on the indicator of the percentage of children living in households with fewer than 10 books, what percentage of households should this be in order for a nation to be fully out of the ‘book loving’ set? We decided, therefore, to again use some arithmetic short cuts, and though this no doubt goes against the grain of FSITA/QCA and how the majority of its proponents would approach the issue (Kvist, 2006, 2007; Ragin et al., 2008: chap. 4 & 5, but see, e.g. Eliason, Stryker, & Tranby, 2008), our view is that some deviation from this is required if the technique is to be used more widely in policy evaluation work and some compromises are likely to be necessary in order to allow researchers to apply the technique in a manner that does not require a systematic review of the literature for every single indicator. Here we adopted a four-step approach to calibrating the data for each:
Step Step Step Step
1: 2: 3: 4:
Compute box-plots. Identify and exclude outliers and extreme cases. Compute adjusted means and standard deviations. Compute upper/lower cut-off points as the adjusted mean one standard deviation.
1 This passage from the report card’s technical report shows the difference in approaches well: ‘If we had been using an effect model we would have expected that changes in a component would have had an impact on all the indicators making up the component. They are dependent on the component. So with effect models one would expect co-variance and one could determine the weighting of an indicator in constructing a component by assessing its contribution to the component by a scalability test such as Cronbach’s Alpha or by establishing the underlying component by using factor analysis or principal component analysis.However, we have no justification for doing any of that because we are using a causal indicator model in this analysis. In a causal indicator model it is the indicators that determine a latent indicator (the component) rather than the reverse. We are assuming that the indicators that make up the component cause the component. We would not expect a change in the component to impact equally on our indicators. Thus they can be considered independent contributors to our component. We do not necessarily expect our indicators to correlate with each other. If the indicators in a component do correlate highly we might consider dropping one, particularly if there was another indicator in the component that is not correlated with them – on the grounds that the correlated indicators might be measuring the same thing and thus overweighting that thing. In the case of the health from birth component we have selected two indicators which we have decided all contribute something to that construct. The two are in fact statistically significantly correlated, but not closely enough to believe that they are each contributing the same thing to the component’ (Bradshaw et al., 2007:24–5).
310
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
Table 1 Alternative dimensions of child well-being. Dimension
Components
Indicators
Material situation
Family affluence
Percentage of children (0–17) in households with equivalent income less than 50% of the median: most recent data. Children reporting low family affluence 11, 13 and 15 years (%): 2001/02. Percentage of children reporting less than six educational possessions aged 15 years: 2003. Percentage of working-age households with children without an employed parent OECD: most recent data.
Education
Educational attainment
Reading literacy attainment aged 15 years: 2003 Mathematics literacy attainment aged 15 years: 2003 Science literacy attainment aged 15 years: 2003 Percentage of the 15–19 population not in education and unemployed in the total population: 2003
Educational enrolment Children’s relationships
Traditional family structures
Family and peer relations
Subjective well-being
Positive attitude to life
Feeling of belonging
Behaviour and lifestyles
Risk behaviour Health behaviour
Percentage of young people living in single parent family structures, 11, 13 and 15 year-olds: 2001/02. Percentage of young people who report living in step family structure, 11, 13 and 15 year-olds: 2001/02. Percentage of students whose parents eat their main meal with them around a table several times a week aged 15 years: 2000 Young people finding their peers kind and helpful, aged 11, 13 and 15 years: 2001/02. Young people rating their health as fair or poor, aged 11, 13 and 15 years: 2001 Young people with scores above the middle of the life satisfaction scale, aged 11, 13 and 15 years: 2001/02. Students who strongly agree with the statement ‘I feel like an outsider or left out of things’ aged 15 years: 2003 Students who strongly agree with the statement ‘I feel lonely’ aged 15 years: 2003 Teenage pregnancy (adolescent fertility rate), births per 1000 women 15–19: 2003 Young people who are over weight according to BMI aged 13 and 15 years: 2001/02. Cigarette smoking at least once per week, aged 11, 13, 15-year-olds: 2001/02.
As mentioned above, fuzzy set ideal type analysis typically goes beyond utilising such sample characteristics to calibrate fuzzy set membership since summary measures of central tendency and distribution tend to change once single cases are included or excluded (Ragin, 2000). This, of course, is particularly an issue if a sample of cases is heterogeneous, but the impact of outliers and extreme cases is generally exacerbated in small-N specifications. We proceed in two steps: first, in order to reduce heterogeneity in our sample, we concentrate our analysis on 19 European countries (this also became a necessity due to missing values in a considerable number of indicators for countries outside this geographical region); second, we identify outliers and extreme cases within the distribution of values for each indicator by computing box-plots for indicators within each dimension. PASWStatistics/SPSS 18 identifies outliers as cases with values between 1.5 and 3 box lengths from the upper or lower edge of the box where the box length is the interquartile range. Extreme cases are defined as cases with values more than 3 box lengths from the upper or lower edge of the box. We excluded any such cases and computed adjusted means and standard deviations, which we then based the calibration of fuzzy set membership scores on for each indicator. This difference is important empirically, since particularly at the crossover point even one outlier or extreme case can determine whether a country is more or less in or out of the set. Admittedly, this is so because of the chosen method to assign membership scores to each fuzzy set: generally, and for want of any more objective calibration rules, the upper and lower cut-off point for each indicator are set as the adjusted mean one adjusted standard deviation. Cases equal and above the adjusted mean average plus one standard deviation receive a score of 1 – fully in the set. Cases equal and below the adjusted mean average minus one adjusted standard deviation score 0 – fully out of the set. Using this approach, we were able to scale set scores for each indicator within each dimension in a systematic and transparent fashion.
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
311
Again, space prevents a detailed discussion of the construction of each of the sets, but further information can be found in the Online Appendix to this paper. 4.3. Deriving scores for sets Significantly, after having improved the conceptual clarity of the components so that they are composed of conceptually congruent indicators we can use arithmetic logic to derive the scores for each component and we do so here by using the mean average of each component’s indicators to create a component score on the basis that – within components – the indicators are capturing different measures of the same concept.2 However, we do not adopt this approach for reducing the component scores to dimension scores on the basis that the components represent conceptually distinct entities and so we do not wish to see compensation effects occur: in other words, if (as we suggest in Table 1) we think that educational well-being requires both a high level of educational attainment and a high level of enrolment then countries should be a member of both of these component sets in order to be a member of the combined dimension sets. More specifically, we use the minimum rule to reach final membership scores for our dimensions – meaning countries have to be a member of all component sets to be a member of the dimension set – except for the children’s relationships set. This is a potentially controversial decision, but good comparative research needs to account for the varying cultural context of nations in the sample. While most of the components we have identified remain largely value neutral (though, of course, there remain significant political debates around issues such as how to define poverty, see e.g. Neff elsewhere in this issue), the question over whether or not traditional family structures are significant to child well-being seems a particularly awkward question. Added to this, there may be a sense in which the two components of children’s relationships that we have identified are unrelated in at least some countries: while in some cultures traditional family structures remain key, the raw data seem to suggest that in others more diverse structures have combined well with good family and peer relations. Consequently, we use the maximum principle here, allowing entry into the set on the basis of either component, but flag in the analysis these different entry routes. This allows us to value strong traditional family structures in their own right, but not to allow the absence of these to be an indicator of poor children’s relationships if there is alternative data indicating positive outcomes here. 5. Combining the sets: the final analysis In the final stage of analysis we present FSITA of different types of child well-being based on a property space that includes each logical combination of the different dimensions of child well-being outlined in Table 1. Table 2 provides a summary of the data and Fig. 2 details the final fuzzy set ideal type memberships in a Venn diagram. Though there are 32 logical combinations of our five dimensions, empirically we found between 10 and 12 combinations in practice depending on how borderline memberships are handled. The remaining combinations are treated as ‘logical remainders’, which will not be considered in subsequent discussions (see Hudson & Ku¨hner elsewhere in this issue; for a more detailed discussion of limited diversity in QCA see Ragin & Sonnett, 2005). Here we present 10 ideal types, but flag the problematic cases. We will run through these ideal types one-by-one. 5.1. Comprehensive well-being While not offering a league style ranking that can identify ‘winners’ and ‘losers’, the ideal types grouping still provides an opportunity to identify the country or countries with the most extensive model of child well-being by pinpointing those with membership of all five sets. In our analysis, only one country claims membership of this, namely Sweden, though we should note that the Netherlands is on the borderline for this ideal type as it is half-in and half-out of the subjective well-being dimension.3 Sweden, we should note, scores very low on the traditional family values component but high on the children’s relations component. 2 Note that we use the inverse original fuzzy set scores based on the actual indicator to make sure that membership to each of the sets depicts higher, rather lower, child well-being. 3 The Netherlands is italized in Fig. 2, as technically the Venn diagram should not plot it together with Sweden. We present the data in this way mainly to facilitate the substantive interpretation of our findings. We appreciate this may present a simplification of our findings for some readers familiar with the detail of QCA.
312
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
Fig. 2. Universe of child well-being ideal type memberships.
In some ways, these findings differ little from those in the report card, except that it ranked the Netherlands 1 and Sweden 2, the latter losing marks for its failure to protect traditional family structures. Interestingly, the report card also ranked the Netherlands number 1 for subjective well-being, but in our study it only just enters on the border of this set. This is because some extreme values on some of these indicators masked lower scores on others, so clear compensation effects were at play. In short, the more diversity oriented approach we have adopted here underlines the strengths of the Swedish model, bringing it – rather than the Netherlands – to the fore as the ideal example of a pro-child well-being nation and suggesting it should be treated by academics and policy makers accordingly. 5.2. Near comprehensive well-being There was one country that held clear membership of four of the five dimensions. We dub this as the ‘near comprehensive well-being’ type. Denmark has membership of all sets bar education so joins the ‘near comprehensive well-being’ set. Indeed, Denmark is very close to delivering comprehensive well-being: in the education set it has full membership of the enrolment indicator and is just short on the attainment indicator. Small improvements would take it to fully comprehensive status. We should flag that, as with Sweden, its traditional family values score is weak and it joins the relationship set on the basis of its high score on the other indicator. We have already noted above that the Netherlands is situated at the point of maximum ambiguity between the comprehensive and near-comprehensive types on the basis of its subjective well-being scores and while technically not a member of the ‘near comprehensive well-being’ type is nevertheless at its borderline (Ragin et al., 2008).
Table 2 Summary of well-being set memberships. Material situation
Education
Components:
Family affluence
Educational attainment
Austria Belgium Czech Republic Denmark Finland France Germany Greece Hungary Ireland Italy Netherlands Norway Poland Portugal Spain Sweden Switzerland United Kingdom
0.71 0.74 0.32 0.72 0.82 0.57 0.51 0.35 0.05 0.21 0.40 0.73 0.90 0.02 0.28 0.41 0.96 0.83 0.38
0.37 0.89 0.68 0.37 1.00 0.64 0.44 0.00 0.23 0.68 0.03 1.00 0.34 0.38 0.00 0.05 0.73 0.82 0.82
Children’s relationships
Subjective well-being
Behaviour and lifestyles
Educational enrolment
Traditional family structures
Family and peer relations
Positive attitude to life
Feeling of belonging
Risk behaviour
Health behaviour
0 0.49 0.71 1 0.06 0 0.89 0.13 0.55 0.82 0 0.91 1 1 0.22 0.46 0.97 0.35 0.11
0.49 0.72 0.15 0.00 0.11 0.48 0.37 1.00 0.45 0.90 1.00 0.72 0.00 0.91 0.81 1.00 0.00 0.54 0.00
0.49 0.82 0.06 0.79 0.34 0.50 0.74 0.12 0.32 0.43 0.51 0.90 0.86 0.32 0.90 0.42 0.82 1.00 0.00
0.50 0.69 0.46 0.54 0.97 0.37 0.35 1.00 0.28 0.63 0.54 0.50 0.02 0.19 0.00 0.89 0.55 0.98 0.07
0.36 0.26 0.08 0.70 0.65 0.28 0.56 0.43 0.00 0.87 0.80 1.00 0.44 0.00 0.74 1.00 0.57 0.32 0.67
0 0.72 0 0.93 0.79 0.79 0.51 0.31 0 0.45 0.93 1 0.79 0.38 0 0.86 0.86 1 0
0.34 0.75 0.47 0.89 0.13 0.58 0.31 0.50 0.32 0.72 0.34 0.86 0.68 0.80 0.20 0.12 0.89 0.82 0.09
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
Dimension:
Note: Fuzzy set membership scores of all indicators can be found in the Online Appendix of this paper. Bold indicates membership of a set. Italics indicates country is at point a maximum ambiguity.
313
314
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
5.3. Wealthy and happy There are no other nations with membership of four sets, but Finland is perhaps worth a special mention. Though it is a member of just two dimensions – material and subjective well-being – it comes close on education (low enrolment is an issue, but attainment is high) and on relationships (where relations is near membership, but traditional family structures are not). It is also strong on risk behaviours but not health behaviours.4 That said, ultimately Finland is only a member of two sets and we have dubbed its ideal type ‘wealthy and happy’ (with ‘and’ indicating our use of the Boolean ‘minimum rule’ to arrive at final membership scores for each dimension, see above). In the original report card Finland ranked 4th overall, as its strong performance in some areas compensates for weak performances in others. Our analysis suggests that Finland’s overall performance is rather patchy, but that it has the potential to move towards the comprehensive model with relatively small improvements in some key areas. 5.4. Happy and home and school Ireland comes close to membership of four sets too. It is clearly out of the material well-being set but falls just short of the behaviour and lifestyles set. Though income poverty is comparatively high and there are signs of risky behaviours, Ireland scores well in subjective well-being and educational well-being and maintains strong family structures. We dub this type ‘happy and home and school’ on this basis. 5.5. Healthy and wealthy and secure relationships Trumping Finland in terms of the total number of set memberships we have a cluster of four countries – Belgium, France, Norway and Switzerland – who are all members of the same type that we dub ‘healthy and wealthy and secure relationships’. In each case, these countries are members of the material, relationships and behaviours sets. However, there are still some variations between them worth noting. Though they fail to join the education set, each is a member of one component. Perhaps reflecting their corporatist roots, Belgium, France, Switzerland fail on enrolment, but Norway’s more social democratic route sees it succeed here, but it just misses out on attainment and, so, just misses on membership of the near comprehensive set. We might also usefully note that Norway places much less emphasis on traditional family structures than the others, again linking with the typical features of their social democratic and corporatist welfare regimes. The similarities between the types of well-being in these nations – particularly Belgium, France and Switzerland – are interesting to note and cannot be easily detected in the original report card, where the three countries are presented as falling into three different shaded areas of the league table (based on the top, middle and bottom thirds of the ranking). What our analysis makes clearer is that the gaps in their child well-being are in the same area for this cluster of neighbouring countries. This, in turn, perhaps hints at a common weakness in the specific model of welfare in this cluster. We should also note that Belgium is on the borderline of the educational membership set, scoring 0.49 for this dimension. With rounding to 0.5 it is also, therefore, a member of an ideal type of its own, but to simplify our types here we group it in this ideal type. 5.6. Wealthy and secure relationships Our remaining countries are all members of two or fewer sets. Austria and Germany, perhaps not surprisingly given the similarities in welfare regime, fall into a type not too dissimilar to that found in Belgium, France and Switzerland. Both are members of the material and relationship sets and we dub this type ‘wealthy and secure relationships’. We should note that Austria is on the borderline (scoring 0.49, which we round to 0.5) for the relationships set and so, 4
Finland thus presents a very good illustration of the limitations of our sample-driven direct method of calibration based on adjusted means and standard deviations. While not implemented for the reasons discussed previously, we appreciate that a calibration based on a more conceptual reasoning would be preferable and almost certainly lead to a much more precise separation of membership scores around the point of maximum ambiguity.
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
315
technically, is therefore also at the borderline of a separate type with membership of material well-being only. As above we simplify here and classify Austria only as a member of this ideal type.5 5.7. Happy and secure relationships Italy and Spain are also members of two sets – subjective well-being and relationships – and we call their type ‘happy and secure relationships’. With regard to the relationships dimension, though, we should note that both score more highly on traditional family structures than on relations, although they are in the sets for both these indicators. This fits well with the ‘Mediterranean rim’ model of welfare perhaps, where traditional family values are both protected and valued. They remain some distance from membership of the other sets and both are particularly weak on education. Interestingly these two countries held high rankings in the report card (Spain 5th, Italy 8th) but it seems these high rankings stemmed from significant compensation effects, where very strong scores in the family relations and aspects of subjective well-being (particularly, perhaps, those that are most likely to be connected with strong family structures) made up for some much weaker scores in other areas. Here we find a particularly useful example, therefore, of the advantages of a set based approach over an index based league table. 5.7.1. Traditional family values Greece and Poland are members of another set where traditional family values appear to play a key role and, indeed, we label this type ‘Traditional Family Values’. All have membership of the relationships and behaviours dimensions, though we should note that – in contrast to Italy and Spain – they are in the traditional family structures component but out of the family and peer relations one. This – and the membership of the behaviours set that might, in part, be a consequence of traditional family values – influences our naming of this ideal type. The data on subjective well-being for Greece are worthy of attention: it ranked third on this dimension in the report card, but is out of this set in our study. Closer examination reveals that this is an outcome of our data reduction exercise which pinpointed two components on which Greece’s performance varies widely: it is fully in the Positive Attitude to Life component but some way from membership of the Feeling of Belonging component. The raw data used for the report card show that there is a bifurcation in the data for Greece, so not only were compensation effects at play (the very strong scores for some indicators exaggerating the overall depth of subjective well-being) but some very useful contextual information is lost too. Indeed, the mismatch between, for example, the very high life satisfaction scores and the below average scores on feelings of loneliness in Greece are worthy a further investigation in their own right. It could be that cultural factors affect these results or it could simply be that the former was based on responses for 11, 13, 15 year olds and the latter only for 15 year olds and that a finer grained analysis might find stronger levels of well-being amongst younger children in Greece. 5.7.2. Family focused In Portugal, perhaps, we find a less extensive version of the model in Italy and Spain. Portugal attains membership of only the relationships set, but like Italy and Spain is a member of both components. It does not come close to entry in any other sets. Though it joins the same space in our Venn diagram as Greece and Poland we highlight it separately here given it joins through a combination of both parts of the relationships dimension. 5.8. Human capital focused Similarly, the Czech Republic attains membership of just one set: education. Again, it is some way from membership of other sets and we dub it ‘human capital focused’. The Czech Republic might be deemed a poor performer on this basis (as might others that are members of just one set) and we could, possibly, re-label them as ‘Near Comprehensively Low Well-Being’. Yet, in the original report card the Czech Republic out-ranked Austria (member of two sets) and France (member of three sets) largely on the basis of their widely varying educational scores. 5
See Footnote 1.
316
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
5.9. Comprehensively low well-being Finally, we have those countries that were unable to establish membership of any set: we dub this type ‘Comprehensively Low Well-Being’, it being the polar opposite of our first type. Two countries are members: Hungary and the United Kingdom. This is consistent with the original report card – where the UK ranked bottom and Hungary third bottom (separated by the USA, which does not appear in our study). 6. Concluding discussion What can we conclude about the potential for FSITA for comparative policy evaluation based on our attempt to repeat the Innocenti Report Card 7 study using this method? Firstly, we should not pretend that adopting an FSITA approach presents a ‘magic bullet’ for the macro-comparative analysis of child well-being. This is not insignificant: good policy analysis needs to speak to non-academic policy actors and the complexity of these methods (or, at the very least, their unfamiliarity) could be a major barrier to their usefulness. It is also true that many policy analyses use equally complex quantitative methods that policy makers rarely have the knowledge to interrogate, but that has not stopped the successful penetration of findings from such research into policy discourse. Much then, perhaps, depends also on how clearly findings can be presented and articulated and here we have the second key conclusion. There is certainly a raw appeal to the kind of league table produced in the report card. Indeed, this is one of the reasons the whole Innocenti Report Card series features them. Whether they are always helpful for the debate is another matter. Our typology of child well-being types breaks down complex data into a simpler set of summary headings and does so in a way that both (i) allows policy actors to see where the weaknesses in each country’s child well-being outcomes lie and (ii) provides some ‘model’ cases from which others can learn. Crucially, with respect to (i) this is a clear advantage over the league table approach and with respect to (ii) we have Sweden joining (in truth, replacing) Netherlands as the exemplar. This, in turn, links to a third key point. In the report card all data counted equally and, from our analysis, it seems that the true extent of child well-being in Scandinavia was underplayed because of ‘league points’ lost through the lack of traditional family structures. In allowing different routes to achieving membership of the children’s relationships dimension, the FSITA method has clear advantages over the linear approach, respecting diversity when there are clear conceptual reasons for so-doing. Our approach has also detected some cases where compensation effects allowed strong outliers in some parts of the data set to mask the true depth of well-being both across the board and within dimensions. The case of subjective well-being in Greece is particular interesting, but in more general terms so too are the high overall rankings in the report card of Spain and Italy, which seem undeserved on the basis of FSITA, not least because there were clear weaknesses in some of the dimensions in these cases. The high league table ranking of Spain, in particular, might lead policy makers to falsely believe that their policy arrangements have provided comparatively strong protections across the board. In short, much in our analysis suggests that FSITA methods allow for a qualitatively rich approach to cross-national policy evaluation that avoids some of the unintended consequences of decision making based on league tables. Our analysis also illustrated how complex data can be presented in intuitively appealing fashion by means of a Venn diagram and – maybe more importantly – without the loss of crucial information typically involved in the production of one-dimensional league Tables particularly for cases with mixed outcomes across child well-being dimensions. Moreover, in offering conclusions that require some degree of interpretation, perhaps the approach encourages proper use of evaluation too, creating ‘tin openers’ that help policy makers discover where they should look rather than acting as illusory ‘dials’ that accurately record the speed of travel (Carter et al., 1995). Acknowledgements We would like to thank the anonymous reviewers and the editors of this journal for their constructive comments and helpful suggestions. We are also grateful to Dominic Richardson of the OECD for providing the original data set used for the UNICEF Innocenti Report Card. A very early version of this article was presented at the 3rd Conference of the International Society for Child Indicators, University of York, United Kingdom, 27–29th July 2011. We are grateful to those who attended this event for the useful feedback they provided.
J. Hudson, S. Ku¨hner / Policy and Society 32 (2013) 303–317
317
Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/ j.polsoc.2013.10.003. References Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27(1), 17–21. BBC News. (2007). Child study finds big divisions. BBC News Online, http://news.bbc.co.uk/1/hi/6360477.stm (14.02.07). Bradshaw, J., Hoelscher, P., & Richardson, D. (2007). Comparing child well-being in OECD countries: Concepts and methods. Innocenti Working Paper, No. 2006-03, Florence: UNICEF Innocenti Research Centre. Carter, N., Day, P., & Klein, R. (1995). How organisations measure success. London: Routledge. Clasen, J., & Sigel, N. (2007). Investigating welfare state change: The dependent variable problem in comparative analysis. London: Edward Elgar. Esping-Andersen, G. (1990). The three worlds of welfare capitalism. Princeton: Princeton University Press. Eliason, S. R., Stryker, R., & Tranby, E. (2008). The welfare state, family policies and women’s labor force participation: Combining fuzzy-set and statistical methods to assess causal relations and estimate causal effects. In L. Kenworthy & A. Hicks (Eds.), Method and substance in macrocomparative analysis (pp. 135–195). New York: Palgrave/Macmillan. Hudson, J., & Ku¨hner, S. (2010). Beyond the dependent variable problem: The methodological challenges of capturing productive and protective dimensions of social policy. Social Policy and Society, 9(2), 167–179. Knight, S. (2007). Outcry after Unicef identifies UK’s ‘failed generation of children’. The Times, http://www.timesonline.co.uk/tol/news/uk/ article1384238.ece (14.02.07). Ku¨hner, S. (2007). Country-level comparisons of welfare state change measures: Another facet of the dependant variable problem within the comparative analysis of the welfare state? Journal of European Social Policy, 17(1), 3–16. Kvist, J. (2006). Diversity ideal types and fuzzy sets in comparative welfare state research. In B. Rihoux & H. Grimm (Eds.), Innovative comparative methods for policy analysis: Beyond the quantitative–qualitative divide. New York: Springer. Kvist, J. (2007). Fuzzy set ideal type analysis. Journal of Business Research, 60(5), 474–481. OECD. (2011). How’s life? Measuring well-being Paris: OECD. Ragin, C. (2000). Fuzzy-set social science. Chicago: University of Chicago Press. Ragin, C. C., & Sonnett, J. (2005). Between complexity and parsimony: Limited diversity, counterfactual cases and comparative analysis. In S. Kropp & M. Minkenberg (Eds.), Vergleichen in der Politikwissenschaft (pp. 180–197). Wiesbaden: VS Verlag fu¨r Sozialwissenschaften. Ragin, C. C., Strand, S. I., & Rubinson, C. (2008). User’s guide to fuzzy-set/qualitative comparative analysis. Department of Sociology, University of Arizona. www.compass.org. Rubinson, C., & Ragin, C. C. (2007). New methods for comparative research? In (Series Ed.) & L. Mjøset, & T. H. Clausen (Vol. Eds.), Comparative social research: Vol. 24. Capitalism compared: A symposium on methodology in comparative research (pp. 373–389). . Schnell, R., Hill, P. B., & Esser, E. (2011). Methoden der Empirischen Sozialforschung (Vol. 9). Munich: Oldenbourg Academic Publishers. Stiglitz, J. E., Sen, A., & Fitoussi, J.-P. (2009). Report by the commission on the measurement of economic performance and social progress http:// www.stiglitz-sen-fitoussi.fr/en/index.htm (30.12.12). UNDP. (2008). Human development report. New York: UNDP. UNICEF. (2000). A league table of child poverty in rich nations, Innocenti Report Card No. 1. Florence: UNICEF Innocenti Research Centre. UNICEF. (2001a). A league table of child deaths by injury in rich nations, Innocenti Report Card No. 2. Florence: UNICEF Innocenti Research Centre. UNICEF. (2001b). A league table of teenage births in rich nations, Innocenti Report Card No. 3. Florence: UNICEF Innocenti Research Centre. UNICEF. (2002). A league table of educational disadvantage in rich nations, Innocenti Report Card No. 4. Florence: UNICEF Innocenti Research Centre. UNICEF. (2003). A league table of child maltreatment deaths in rich nations, Innocenti Report Card No. 5. Florence: UNICEF Innocenti Research Centre. UNICEF. (2005). Child Poverty in Rich Countries, 2005, Innocenti Report Card No. 6. Florence: UNICEF Innocenti Research Centre. UNICEF. (2007). Child poverty in perspective: An overview of child well-being in rich countries, Innocenti Report Card 7. Florence: UNICEF Innocenti Research Centre. UNICEF. (2008). The child care transition, Innocenti Report Card 8. Florence: UNICEF Innocenti Research Centre. UNICEF. (2010). The Children Left Behind: A league table of inequality in child well-being in the world’s rich countries, Innocenti Report Card 9. Florence: UNICEF Innocenti Research Centre. UNICEF. (2011). Publications: Innocenti Report Card. UNICEF. http://www.unicef-irc.org/publications/series/16/ (accessed 24.07.11). Womack, S. (2007). British youngsters get worst deal, says UN. The Daily Telegraph, http://www.telegraph.co.uk/news/uknews/1542649/Britishyoungsters-get-worst-deal-says-UN.html (14.02.07).