Journal of Statistical Planning and Inference 102 (2002) 189–210
www.elsevier.com/locate/jspi
Comparability in multi-country survey programmes Vijay Verma ∗ ORC Macro International Social Research, Angel Corner House, 1 Islington High Street, London N1 9AH, UK Received 1 June 1999
Abstract The rise of multi-country programmes for the generation of comparable data is one of the most salient developments in the area of practical survey work. Starting from the central concept of “comparability”, this paper identi.es a dozen criteria determining the degree of comparability achieved in multi-country survey programmes, and describes and characterises some twenty major such programmes in terms of those criteria. Multi-country survey programmes di1er greatly in the relative importance given to international aspects concerning centralisation of operations and standardisation of procedures, on the one hand, and national aspects concerning their sensitivity to speci.c circumstances and statistical priorities of individual countries, on the other. The paper also identi.es some special technical issues arising speci.cally from international nature of the c 2002 Elsevier Science B.V. All rights reserved. programmes. Keywords: Multi-country (international) survey programmes; Comparability; Harmonisation; Standardisation; International comparisons; National survey capability; Micro-data; Data access
1. Introduction: the rise of international survey programmes It is an honour to dedicate this paper to the memory of Professor P.V. Sukhatme. Given his truly outstanding contribution to the development of international statistics, what more appropriate topic than the important but di;cult issue of comparability in international survey programmes? 1 Diverse factors have contributed to the rise of international survey programmes, and there are many consequences of this development. “Though international comparisons in statistics ... have been made for a long time, the deliberate design of valid and e;cient multinational surveys is new and is increasing” Kish (1994). This is in response to ∗
Tel.: +44-20-76751063; fax: +44-20-76751906. E-mail address:
[email protected] (V. Verma). 1 An earlier version of this paper formed an invited contribution at Special Session in Memory of Professor P.V. Sukhatme at Joint Statistical Meetings of American Statistical Association, Anaheim, California, August 10 –14, 1997, organised by the International Indian Statistical Association. c 2002 Elsevier Science B.V. All rights reserved. 0378-3758/02/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 1 ) 0 0 1 8 9 - 6
190
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
an expanding need for data for comparisons and cumulation across countries, and for monitoring trends across individual countries and groups of countries. This information need is manifest at both the national and the international levels. (i) Countries need to assess their place in relation to other countries, especially their geo-political neighbours. (ii) International agencies, and national agencies of some of the more developed countries as well, require similar data on di1erent countries for their international programme and policies. (iii) This is helped by the development of an intellectual atmosphere in which researchers are increasingly looking for internationally comparable datasets. An international dimension lends greater weight and signi.cance to their research and makes it more easily funded and more publishable. (iv) For agencies such as the United Nations interested in promoting statistical development and capability, there is also the attraction of the economy of scale which standardisation of content, design and procedures o1ers. (v) With increasing communication and exchange, there is increased scope for learning from others’ practices and joining in co-operative ventures. (vi) Recipient countries see advantages in joining international survey programmes for the .nancial and technical support they o1er.2 (vii) In addition, there can also be an element of “coercion”: data needs and research agenda of international agencies and researchers have, perhaps not so infrequently, determined the recipient countries’ participation into programmes which are not necessarily of high priority for the countries concerned. This increasing need=demand for more internationally comparable information and greater standardisation becomes e3ective to the extent it attracts more resources, methods and instruments addressed to speci.c problems of international research improve, and the international capacity to organise and the local capacity to implement such research develop. 1.1. Comparability of statistical data An essential feature of any multi-country survey programme has to be comparability of the statistical data generated. What is comparability? What factors have made comparison important? How comparable data may be generated? The comparability of statistical data refers to their usefulness in drawing comparisons and contrast among di1erent populations; it is a fundamental requirement for any data to be used in multi-population comparisons and contrasts. Though a complex concept, di;cult to assess in precise or absolute terms independent of speci.c objectives of the analysis, comparability is an important and useful concept. By this we mean that data 2
There are numerous examples where national surveys could be undertaken only on the basis of assistance available through international programmes; and in many cases surveys, originally intended to be ongoing, were discontinued once the international assistance ceased. An interesting consequence has been that countries “that received technical assistance ... have reaped bene.ts not generally enjoyed by others who did not participate in multi-country survey programs and have, in the short run at least, leapfrogged [in certain aspects] a developing country such as India despite its long-standing tradition of survey technology” (Som, 1996).
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
191
(estimates) for di1erent populations can be legitimately (i.e. in a statistically valid way) put together (aggregated), compared (di1erenced), and interpreted (given meaning) in relation to each other and=or against some common standard. Comparability is a relative concept: we can only have “degrees of comparability”, not absolute comparability. Most statistical datasets pertain to multiple populations. This is clear in the case of a dataset covering several countries, but the same applies to any dataset for a single country. We are invariably interested in diverse subpopulations de.ned in terms of characteristics of individuals, geographical or administrative divisions, and di1erent time periods. Hence the total population has to be seen as made up of many subpopulations de.ned by diverse characteristics of the constituent units, spatial divisions and time segments. The results for these subpopulations (1) have to be aggregated to construct the total picture; (2) have to be contrasted to study di1erentiation; and (3) even for individual subpopulations, meaningful interpretation can only be given on the basis of shared concepts, de.nitions and classi.cations. A degree of comparability is the essential basis for all these three operations. In relation to the basic requirements for generating comparable data, a distinction can be drawn between the measurement and estimation aspects of a data generation system. Measurement aspects. These concern obtaining information on the given set of units in the study, such as units in a sample. These include de.nition of concepts, variables and survey population; methods of measurement and data collection; and the related substantive analysis. These should be strictly standardised so as to control (make similar) biases of measurement in the comparisons. Estimation aspects. These concern drawing conclusions about the population which the observed units are meant to represent. These include sampling frames, sample size and design, many operational aspects, as well as weighting, estimation and other aspects of statistical analysis. Generally, these have to be chosen Nexibly to suit the conditions and requirements of individual populations in the comparison. What is required is not identical procedures, but the common standards to be followed. Comparability requires control of the measurement aspects so as to ensure that the same type of information is obtained. In principle, the estimation aspects can be chosen Nexibly without a1ecting comparability, as long as valid and common standards are followed. 3 Standardisation. In addition, there are in practice often powerful reasons for aiming at standardisation and control of many details in surveys aiming at generating comparable data, going well beyond the development and provision of common concepts, de.nitions, survey instruments, and the main statistical outputs. Standardisation is a useful tool for ensuring that conditions for comparability are actually met. There is often also a considerable economy of e1ort in adopting a uniform package of procedures
3 Kish (1994) has proposed a similar distinction between ‘survey aspects’ and ‘sampling aspects’; however, I .nd the above schema more general and clearer.
192
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
for data collection, processing and analysis, in contrast to custom-design for each case Verma (1985). We shall return to this in Section 4 below. Harmonisation. A more general concept is termed harmonisation, which we take to encompass consistency, similarity, standardisation, etc., depending on the context. We may identify its various dimensions which inNuence comparability of the resulting data from di1erent sources or from di1erent populations: harmonisation of technical standards, of design, implementation and statistical analysis, of the format of the resulting micro-data, and so on. In this paper, I will propose a number of criteria to do with comparability, harmonisation and standardisation (Section 2) to describe characteristics of some 20 international programmes of surveys (Section 3). In an international programme “a number of major issues arise that do not arise in individual country programs; it is thus more than the sum of its component country programs” Som (1996). Section 4 identi.es some of these special issues. These issues have not received much attention in the literature.
2. Criteria for description and classication There already exist a large number of important survey programmes with an international dimension. It is not possible here to list them all, even less to identify and describe their main characteristics. Nevertheless, I would suggest a dozen criteria in terms of which such programmes may be described and classi.ed. The criteria fall into three groups: concerning programme objectives, scope and content; aspects of harmonisation; and technical assistance and .nancing (Table 1). The earlier noted distinction between the measurement aspects (which need to be strictly controlled) and the estimation aspects (which can be chosen more Nexibly) provides a useful framework for assessing the extent to which cross-national comparability of survey results is achieved. 2.1. Objectives and scope In terms of objectives, apart from (1) the generation of internationally comparable data, these include (2) the provision of information to meet speci.c needs at the country level, and (3) the enhancement of national capability for data gathering and research in the areas concerned. The speci.ed aims of one of the best known international programmes, the World Fertility Survey, express this very succinctly: “The explicit aims of the WFS project were to assist interested countries to describe and interpret the fertility of their populations by conducting scienti.cally designed sample surveys; to enhance national capability to undertake demographic surveys; and to produce internationally comparable data on human fertility”.
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
193
International programmes di1er according to the relative importance given to each of these objectives. I admit that there are di;culties (and subjectivity) in classifying programmes according to these criteria. For one thing, there can be much debate and disagreement about the meaning of what constitutes “international comparability”, “speci.c national need as distinct from international requirements”, and above all, the “enhancement of national capability”. Perhaps any programme worthy of the pre.x “international” can claim to have all these three objectives, but in Table 1 I have tried to indicate objectives which may be considered “signi.cant enough” to serve as distinguishing characteristics. Scope concerns (4) coverage and (5) content. Some programmes are truly international, covering national samples in many countries from several continents. Others are con.ned to particular regions or only to a handful of countries. The coverage may also be limited if the countries included lack fully national samples. Content refers to the extent to which more than one major topics are covered in the programme.
2.2. Harmonisation The second set of criteria concerns the degree of harmonisation in various dimensions. As noted earlier, “harmonisation” is a more general concept, encompassing consistency, comparability, similarity, standardisation, etc., depending on the context. Its main dimensions identi.ed in the table include the following. (6) Harmonisation of standards, such as those concerning objectives, concepts, definitions, classi.cations, variables, measures or statistics, and the population and units of analysis. I would also include under standards the choice of units and methods of data collection and substantive aspects of data analysis. All these aspects need to be similar and controlled for the statistics generated to be comparable. (7) Design concerns a creation of the survey structure for implementation of the common standards. It covers sampling and operational as well as substantive aspects: sample size, allocation and design; survey timing, .eldwork duration, reference periods and other temporal aspects such as sample rotation and overlaps; and also the translation of concepts and variables into actual survey questions, response categories, measurement scales, respondent rules, etc. All these aspects need to follow common technical requirements (such as the use of probability samples) for the resulting data to be comparable, but not all need be the same or standardised across the surveys for the purpose. Indeed, the choice of sampling aspects is determined by the requirement of statistical e;ciency, and therefore should be Nexible and di1er as much as necessary to suit national conditions. However, there are also aspects of the design which need to be speci.ed and controlled at the international level. If, for instance, the total budget and hence the overall sample size is .xed at the international level, it is clearly important in the programme to appropriately determine the allocation of the sample among individual countries. More critical and general is the requirement to develop common questionnaires to ensure comparable operationalisation of common concepts
194
Programme
Objectives and scope
Inter national comparisons Criteria 01 Programmes with strong comparative dimension 1 Demographic and DHS; ** health surveys INFS 2 World fertility survey WFS; ** CIFS 3 European community EU-ECHP ** household panel 4 Contraceptive CPS ** prevalence survey 5 Gulf child health GCHS; ** survey PAP-CHILD 6 EuroBarometer EB ** Post-hoc harmonisation for international comparisons 7 Luxembourg LIS; * income study PACO 8 Time use surveys TUS ** 9 International social ISSP * science programme
Harmonisation
Assistance
National Survey Inter Content Standards Design Operations Statistical Datasets Financial= data capability national (multi-topic) analysis technical coverage 02
03
04
05
06
07
08
09
10
11=12
**
*
**
**
**
**
*
**
**
**
*
*
**
*
**
*
*
**
**
**
*
*
**
**
*
**
**
**
*
**
*
*
*
**
*
*
*
*
*
*
**
**
?
*
*
*
**
** **
*
*
**
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
Table 1 Characteristics of some international survey programmes
National programmes 14 ILO—labour force surveys 15 Income-expenditure surveys 16 Population, agricultural and other censuses 17 Mid-decade indicator surveys 18 Expanded programme of immunisation survey 19 National household survey capability programme
national dimension EU-LFS *
**
*
EU-HBS
*
**
*
CDC-RHS; FPS LSMS; SDA
*
**
*
*
*
ILO-LFS
*
**
*
**
ILO-IES
**
*
**
UN-Censuses ?
**
*
**
UNICEF-MIS
**
*
EPI
*
*
NHSCP
**
** *
**
**
*
*
*
*
*
*
*
*
*
?
*
*
**
*
**
*
**
*
*
**
*
*
*
*
*
** **
*
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
Programmes with stronger 10 EU labour force survey 11 EU household budget surveys 12 Reproductive health surveys 13 Living standard measurement study
195
196
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
and content. 4 Another very important practical reason for standardisation of the design is the economy of e1ort, and the greater insurance it o1ers against serious errors at the national level. It may be useful to separate out substantive (mainly questionnaire) aspects of the design from statistical (mainly sampling) aspects. The former generally require more uniformity, the latter more Nexibility. (8) Implementation covers aspects such as recruitment and training of survey sta1, organisation of .eldwork, supervision and quality control. International programmes di1er in the extent to which direct inNuence and control is applied from the centre and common procedures are followed in di1erent countries. (9) Statistical analysis. This includes various steps involved in the preparation of analysable microdatasets from the survey, such as editing, imputation, variable construction and weighting, as well as statistical procedures for estimation, variance computation, evaluation of other errors and biases, tabulation and other forms of analysis. Some of these aspects may require uniformity to ensure comparability, but mostly speci.c procedures and tools can be selected Nexibly to suit circumstances without a1ecting comparability—so long as certain basic standards are met. The justi.cation for any operational standardisation has to do with e;ciency of e1ort and the need to ensure uniform adherence to the speci.ed standards. (10) Datasets. The creation, maintenance and dissemination of standardised microdatasets is obviously a critical requirement for comparative research, yet this is far from being a universal feature of international survey programmes, as seen from Table 1. In organising the dissemination of micro-data, it is important to ensure easy and a1ordable access so as to promote and facilitate their use. Comparative research can be greatly facilitated by the availability of data in a highly standardised format, with identically de.ned and structured and fully documented data .les. However, it is also important to retain control so as to protect data con.dentiality and guard against their improper use. These can be particularly sensitive considerations in an international programme, where ‘ownership’ of the data is shared in some manner between the international agency and individual countries (Verma, 1996a). 2.3. Technical and 9nancial support Finally, international programmes di1er in the amount and form of technical and 9nancial assistance, and this is generally an important factor a1ecting the degree of comparability actually achieved. It is useful to distinguish between (11) technical (advisory) assistance and (12) .nancial support to meet local operational costs. Some major programmes have fully .nanced both components. This tends to maximise international control over national surveys. More commonly, the contribution is 4
Here I speak of comparability, not necessarily of indentity. Due to of di1ering conditions and institutional frameworks, di1erent formulations of the actual questions may well be required in di1erent countries to obtain the same sort of information.
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
197
con.ned mainly to technical assistance, while some international projects have been organised primarily on a cooperative basis among countries without external .nancing. Another important distinction can be whether the funding is provided centrally through the programme, or is secured at the level of individual countries from diverse sources. As a rule, the former permits greater control and central direction. Information on .nancing is less readily available than technical aspects of the programmes, and this makes our information in Table 1 rather tentative. I have also not attempted to distinguish between the two components (11) and (12).
3. Characterisation of some of the main international programmes International programmes have two di1erent but related aspects: international and national. The international aspects are concerned with how the international survey is organised, its objectives, and the size, composition, form, and e1ectiveness of the .nancial and technical assistance provided, and the degree to which the resulting information is comparable across countries. The national aspects are concerned with issues such as the relevance of the survey results to the country; the relationship of the survey to other national statistical activities; the extent to which the survey procedures and arrangements are designed to promote the development of national capability and, in particular, are applicable to other survey work; and also the numerous technical issues of design, cost, implementation, and management of surveys at the national level. Multi-country survey programmes vary greatly in the relative importance given to these two aspects. A few very comprehensive programmes have been strong in both international and national dimensions, while most have emphasised one or the other aspect. In the following, the programmes have been divided into four broad classes: (i) those particularly strong on international comparability, though generally also generating good national data; (ii) programmes primarily concerned with international comparisons, though mostly only through post-hoc harmonisation of outputs; (iii) programmes with a strong national dimension, but also achieving a degree of international comparability through the use of common standards; and (iv) those concerned primarily with national data, though supported internationally. In characterising various international programmes in more detail in Table 1, I have tried to rate the signi.cance (hopefully actual, not merely intended or calimed) given to each of the above dimensions in relative terms, roughly as high (∗∗ ), medium (∗ ), and low (no star). In relation to the harmonisation criteria, these reNect levels of standardisation or control. A few doubtful cases are indicated as (?): normally this indicates some level of presence, but lower than (∗ ). The most important distinction in the present context is in terms of the relative emphasis on the international (comparative) versus the national (country-speci.c) dimensions.
198
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
3.1. Programmes with strong comparative dimension Programmes (1–5) in the .rst set include the generation of both internationally comparable and pertinent national data as major objectives, with a greater emphasis on the former. The Demographic and Health Surveys (1) and the World Fertility Survey (2) represent the most extensive and thorough programmes in this respect. The WFS has a special place because it was among the .rst major e1orts of this type (Cleland and Verma, 1989); and the DHS because it follows the WFS lead and in certain respects goes even further—with somewhat greater standardisation of the designs on the one hand, and greater Nexibility of the content to meet country-speci.c needs on the other. The DHS is also richer in content, especially in the area of health. In formal statements of their objectives, and in actual practice as well, both WFS and DHS also aimed to make a signi.cant contribution towards improving national survey capacities in demographic research. As important supplements to the DHS, we must also mention the Indian National Family Health Surveys. Highly comparable surveys based on the DHS model, but each with its own sampling and other aspects of design and separately organised survey operations, have been carried out in most Indian States. 5 Given the size of the population covered and the size and number of the surveys, NFHS can be regarded as a comparative programme in its own right. Furthermore, in some respects its “international” dimension was stronger than that of DHS itself: as a result of stronger constraints on the overall budget, sample size and survey timing; a uni.ed institutional structure; the use of common sampling, computing, technical assistance and other facilities; possibilities of much closer technical co-operation between survey teams in di1erent Indian States; and the greater standardisation of objectives and procedures for similar reasons. Similarly, the WFS had an important supplement in the series of ten highly comparable In-depth Fertility Surveys in China conducted after formal conclusion of the WFS programme in the mid-1980s (China In-depth Fertility Surveys, 1986). The European Community Household Panel (3) is a similarly standardised survey, but more limited in coverage, being con.ned to Member States of the European Union. However, the programme leaves survey organisation and execution to individual countries and, given the existing survey capacities of the participating countries, the building of national capabilities does not amount to a separate and signi.cant objective of the ECHP. The creation of highly standardised mirco-data sets is an important output of all the above programmes. This is a crucial element of data comparability in practice. Central .nancial and technical support are the essential instruments in ensuring the high level of standardisation of the surveys in these programmes.
5 The .rst round of these surveys in all Indian States were conducted during 1992–93, and the second round in 1999 (National Family Health Surveys 1992–1993).
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
199
It is instructive to note the type of steps taken in, for instance, the ECHP to enhance cross-national comparability. From its inception, and in design and implementation, ECHP has been a EU-wide undertaking. Its design has a number of features introduced to enhance cross-national comparability: a common survey structure and procedures, in this case annual interviewing of a representative panel using speci.ed follow-up rules; common sampling requirements and standards (concerning sample size, probability selection procedures, respondent and call-back rules, etc.), coupled with Nexibility and variation in the actual designs to suit national conditions; common standards, and where possible common arrangements as well, for data processing and statistical analysis, including editing, variable construction, imputation, weighting, estimation and variance computation; standardised micro-data .les; and common frameworks for analysis through a collaborative network of researchers. Of course, there remain some important limitations in the comparability of the information obtained in ECHP. This applies in particular in a number of countries (Belgium, The Netherlands, Luxembourg, Germany and the United Kingdom) where existing national panels, di1ering to some extent from the common model, had to be incorporated into the programme’s framework. A central feature of the above programmes is the use of a common “blue-print” questionnaire which serves as the point of departure for all national surveys. The use of a common instrument ensures not only common concepts and content for the surveys, but also their common operationalisation. The development of a common questionnaire serves several objectives in multi-country surveys. It de.nes the information to be provided by national surveys in precise terms. That is, it can be read as a list of variables to be produced. Common information requirements elaborated in the form of actual questions help to standardise basic concepts, de.nitions and classi.cations to be used, as well as the survey arrangements and methods of measurement. The blue-print questionnaire greatly facilitates the development of national questionnaires, since many aspects can be directly used or adapted. However, this requirement of comparability of the information generated does not necessarily imply the need to use identical questionnaires in all countries. On the contrary, because of di1ering legal and institutional frameworks, di1erent questions are sometimes required in di1erent countries to obtain the same sort of information (Verma and ClTemenceau, 1996). A number of other international programmes have been undertaken in a similar mode, but with generally more limited coverage and scope, and less developed elements of comparability=harmonisation=standardisation. Examples include the Contraceptive Prevalence Surveys (4) which preceded the DHS during the 1970s, and regional programmes such as the Gulf Child Health Survey and Pan-Arab Project for Child Development (5). Finally, the EuroBarometer (6) represents a special type of arrangement. It consists of a series of biannual or even more frequent surveys with varying focus simultaneously in European countries, .nanced by and conducted speci.cally for an international client (the European Commission). Highly standardised questionnaires, procedures and outputs are imposed, but generally the implementation is contracted out to independent agencies within countries. The surveys need to be completed quickly, and samples
200
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
are usually small and may diverge more or less seriously from the requirements of probability sampling and national coverage. 3.2. Post hoc harmonisation for international comparisons In contrast to the above, there are programmes (7–9) concerned with harmonising only the outputs from existing surveys or other sources. While the original surveys undoubted have national objectives, the harmonisation programme itself is concerned only with comparisons. The Luxembourg Income Study (7) and projects such as PACO (PAnel COmparability project) attempt to create, post hoc, comparable micro-data from existing surveys with common content. When the basic survey content and structures are similar, the resulting standardised data may also be comparable, as in PACO which is based on similar household panel surveys from some Western European countries. But diversity of the input material limits comparability, as illustrated in the case of the Luxembourg Income Study. The LIS project began in 1983 with the stated aim to increase the degree of cross-national comparability through the creation of commonly de.ned variables on “comprehensive measures of income and economic well-being for a set of modern welfare states”, with the micro-data sets gathered in one central location (Luxembourg) (Luxembourg Income Study, 1988). However, major limitations to the comparability remain. The surveys cover quite a wide range of time periods, corresponding to di1erent economic conditions in di1erent countries. Furthermore, “the type of survey data used by LIS are not uniform in nature, purpose or objective. The lowest common denominator the LIS requires is the existence of a substantial level of detail concerning income sources and totals. The surveys themselves are quite diverse : : : Some surveys are designed .rst and foremost to collect income data; others are derived from income-tax records; and still others come from special supplements to labour force surveys. Some LIS datasets are based on income questions taken from expenditure surveys; : : : others are separate waves of household panel data; and still others are taken, at least in part, from government administrative data” (OECD, 1995). The Time Use Surveys (8) of the mid-1960s in various European countries represent a di1erent model: self-.nanced voluntary co-ordination among national surveys with own national design and execution, but with a high level of standardisation in the form and content of the outputs produced. Yet another model is provided by the International Social Science Programme (9), which co-ordinates research goals of existing projects, with the objective to add a cross-national perspective to individual national studies. There is talk of developing a European Social Survey, perhaps on similar lines. 3.3. Programmes with stronger national dimension While retaining international comparability as a goal, some other programmes (10 –13) put greater emphasis on data on individual countries.
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
201
Three major household surveys being conducted in the European Union countries provide rather contrasting pictures in terms of the standardisation of the content, design, organisation and other aspects which a1ect comparability of the results. In addition to ECHP described above, these include the labour force and household budget surveys in European Union countries. The European Labour Force Survey (10) is conducted quarterly or continuously in countries of the European Union. It closely follows the “labour force” approach (Hussmanns et al., 1990). A number of steps are taken to improve cross-country comparability: the use of common de.nitions and classi.cations; the recording of the same characteristics in each country; close correspondence between the common list of items and the national questionnaires; synchronisation of survey timing; and central processing of the common data by Eurostat (Eurostat, 1988). Nevertheless, it remains a fact that, while the surveys are standardised in terms of the concepts used and the variables generated etc., they lack the same standardisation in many other aspects a1ecting comparability: (i) the design of questionnaires; (Bastelaer, 1992) (ii) basic structure such as the pattern of sample rotation over time; (iii) the mode, organisation and other “essential conditions” of data collection; (iv) the response rates achieved and the methods of dealing with non-response; and more generally, (v) the procedures for weighting and other aspects of the statistical analysis of the results. The lack of standardisation in how the basic concepts are operationalised in the form of actual questions is perhaps the most important aspect limiting comparability of the surveys (Verma, 1993). Future directions of harmonisation of EU-LFS include two aspects. (i) Consolidation of the existing surveys by “establishing a system of quality control whose essential aim is to determine the extent to which the national questionnaires and coding of data to Community format actually provide comparable data conforming to ILO recommendations and Community speci.cations : : :”. (ii) De.ning a “target structure” to ensure convergence of future developments: “a target structure for a more frequent Labour Force Survey, with an improved measure of the volume of work and of under-employment, and computation of the [monthly] unemployment rates. The target structure will set out organisational ways and means for the survey (reference period, sample rotation, periodicity of results) together with the content and presentation of questionnaire” (Eurostat, 1995). The European Household Budget Surveys (11) evolved basically as a set of independent national surveys with di1ering objectives, designs and approaches, yet the surveys are increasingly required to furnish comparable information across countries of the European Union. While major di1erences in the methodologies of similar surveys may reNect di1erent objectives and conditions across countries, it is my belief that many of these di1erences are arbitrary and purely historical. A common search for improved methodologies is possible in many cases. Perhaps another factor contributing to this diversity is the lack of internationally accepted guidelines, at least of the type available for labour force statistics. The surveys di1er in timing and frequency, there being three distinct patterns: (i) surveys carried out at .ve-yearly or longer intervals; (ii) annual surveys (generally with .eldwork throughout the year); and (iii) continuous
202
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
surveys with small samples where annual reporting involves pooling the data over more than one year. Generally, (iii) is becoming more common replacing the other two, and (ii) more common replacing (i). Data collection involves a combination of one or more personal interviews and diaries maintained by households or individuals, but this arrangement lacks a common structure and the length and type of the period for the diary recording vary greatly. More or less serious di1erences also exist in the concepts, variables and data collection methods. Sample designs and sampling arrangements also vary. Of course, this variation as such does not a1ect the comparability of results— except for the facts that in some cases probability sampling is lacking, non-response is high, and=or the range of sample sizes varies too much to suit the needs of comparative analysis (Verma, 1992). Consequently, given the increasing need for comparative information and the lack of alternative sources, it has been necessary to undertake a progressive, step-by-step programme of work to enhance the quality and comparability of the existing national household budget surveys. This involves: better utilisation of existing survey data through harmonisation of classi.cations and coding of variables and archiving of micro-data; the construction of standard tabulations from all surveys; recommendation for the improvement of survey timing, content and methodology; and documentation and evaluation of the di1erences which still remain (Verma and Gabilondo, 1993). More recently, the recommendations on common methodology have been expanded to include: a greater emphasis on de.ning a common approach to measure consumption expenditure and income of households; re.nement of a common set of variables to be constructed from each national survey; the introduction of a new harmonised nomenclature classifying consumption; and further development of a common survey model towards which some of the national surveys may evolve (Eurostat, 1997). Another example of nationally oriented surveys, but with a degree of comparability, are the Reproductive Health Surveys (12) supported in a number of countries by US Centers for Disease Control, Division for Reproductive Health. While not formally constituting an internationally standardised survey programme, factors such as the use of common objectives, concepts, instruments and survey procedures often impart a signi.cant degree of comparability between the surveys in di1erent countries, particularly among countries in the same region. International funding support from a common source undoubtedly promotes uniform procedures and hence more comparable results. Technical assistance from a common source and transfer of experience from other participating countries are also sigini.cant factors. Here for example is an extract from a national RHS report. “The questionnaire was .rst drafted by CDC=DRH consultants based on a core questionnaire used in the 1993 Czech Republic RHS. This core questionnaire was modi.ed, including the adding of modules targeted to explore important issues for Romania [and] then reviewed by Romanian experts : : : as well as by AID [US Agency for International Development] and AID co-operating agencies who have worked in Eastern Europe” (Preliminary Report, Reproductive Health Survey Romania, 1993). The Family Planning Surveys constituted an earlier programme with similar characteristics.
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
203
The Living Standards Measurement Study (13) involves a series of complex multi-topic surveys, generally with strong international technical and operational control. Sometimes the important motivation is to meet international requirements for data on individual countries, as distinct from broader comparative objectives or more strictly national needs. The Social Dimensions of Adjustment was a similar undertaking, but con.ned to sub-Saharan Africa. 3.4. National programmes The last set in Table 1 lists international programmes (14 –19) but with predominantly national orientation. The ILO-supported labour force surveys (14) and income-expenditure surveys (15), and the population, agricultural and other censuses supported by the United Nations and its Specialised Agencies (16) provide the prime examples of nationally oriented programmes with wide coverage, and with fairly strong technical assistance and capability-building functions. Though generally international comparisons and aggregations are not explicitly the prime objective, the development and promotion of international standards has helped to impart a degree of comparability to the results. Rather di1erent are the Multiple Indicator Surveys (17) and the Expanded Program on Immunisation Surveys (18) programmes, both designed to meet urgent national needs quickly and cheaply. The strict standards and controls required for international comparability took second place behind these needs. The WHO-EPI involves thousands of surveys. In 1977 WHO’s Expanded Program on Immunisation established the objective of immunising all children throughout the world against six major childhood diseases. WHO developed a “quick and dirty” methodology for estimating immunisation levels of children in target areas. Mostly, a rigid design was used (30 sample clusters with 7 children in each cluster each “target” area), with non-probability selection of units and limitations on the population covered. The UNICEF-MIS have been, on the whole, more scienti.cally designed and aimed at achieving a more representative national coverage. The surveys were undertaken in response to special data needs. At the beginning of the 1990s, most governments pledged themselves to a Declaration and Plan of Action for Children, setting up goals on child health and welfare for the year 2000. Subsequently, speci.c “Mid-Decade Goals” were identi.ed as “the moral minimum that all countries need to achieve by the end of 1995 as stepping stones for the goals for the year 2000”. Since many countries lacked the basic statistics required for the measurement of progress (or lack of it) towards these goals, UNICEF promoted a programme of national surveys for the purpose. Given the severe constraints in technical and .nancial resources and in the available time, an easily manageable, practical approach had to be adopted for the design and implementation of these surveys, hoping at the same time that the information generated would be reliable enough for the guidance of social policy. To assist the process, technical guidelines on the content and design of the surveys were developed, but some technical de.ciencies remain (Verma, 1996b).
204
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
Finally, the United Nations National Household Survey Capability Programme (19) followed quite di1erent objectives and approach. Its stated aim was not to produce data on any particular topics, but to help countries formulate integrated national survey programmes to meet whatever national requirements were considered a priority by the country concerned, and through that to enhance the national survey capabilities (Rao and Verma, 1982). It developed and disseminated various methodological guidelines and reviews, but did not in itself result in the creation of internationally comparable surveys or datasets (United Nations Statistical O;ce, 1983).
4. Some special issues Clearly, international survey programmes raise special issues and problems not normally encountered in single-country surveys. “Most problems of multinational designs also occur for major provinces of national surveys, especially when the provinces di1er greatly in custom, conditions, languages. However, the co-ordination of multi-national surveys accentuates some of these problems and also raises new ones : : : First, for each nation separate centres for decision making and policy aims must be found and convinced; also separate sources of funds; also institutions with separate survey capabilities. Second, separate survey designs must be completed with translation of concepts and questionnaires. Third, separate national samples must be designed and executed”. 6 In this section, a few of such special problems are identi.ed. 4.1. Diverse organisational and operational issues Many of these problems are organisational and operational in nature (Som, 1996), such as • Possible conNicts between objectives: national versus international data needs on the one hand, and data generation versus improving longer-term national capabilities on the other. • Co-ordination between di1erent programmes with di1ering objectives, modes of operation, funding, sponsors, etc. in the same countries. • Synchronisation of the timing of surveys in di1erent countries. • Management and operational control: centralisation versus decentralisation; respective roles of the international and national organisations. • The form and amount of international support: centralised programme funding versus funding at the level of individual countries; technical assistance versus operational support; long-term (residential advisors) versus more selective and targeted short-term consultancies; technical co-operation among countries themselves, etc. 6 Designs and uses of multi-population samples. I am grateful for the privilege to refer to this unpublished paper (Kish, 1997).
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
205
The actual choices are likely to be highly situation-speci.c. But all such issues need to be made explicit and discussed in the planning of international programmes. As in most practical situations, acceptable compromises should be, and normally can be, achieved in the face of conNicting objectives. I would like to draw attention to four aspects in particular. 4.1.1. National capabilities In relation to developing countries, international survey programmes must address squarely the issue of building national survey capability. Ignoring this issue would constitute a most serious Naw in any major international programme. Often, programme objectives in terms of data gathering—whether for national purposes or for international comparisons—are clearly speci.ed, but the issue of capability is dispensed with in the vaguest of terms. Yet because of its complexity, this issue requires all the more careful consideration. All too frequently, any achievement in this area has resulted primarily from spontaneous factors and individual initiatives, rather than from deliberate planning (Verma and Palan, 1987). 4.1.2. The cost of international technical assistance The “cost equation” in an international programme can be very di1erent from that in independent national surveys. This is because of the high cost of international technical assistance in relation to the prevailing income and cost-of-living levels in many developing countries, and hence in relation to the local operational costs of the surveys. “It has never failed to surprise me how substantial a part [technical assistance] forms of the total assistance of externally supported projects, at least in the .eld of statistics. Indeed, from the international perspective, methodological issues should be concerned less with whether, for example, segmentation is more e;cient than [complete] listing [of sample areas] in the sample design, than with whether the provision of technical assistance is managed in an e;cient manner focused on critical areas, and whether institutional arrangements and methodologies are promoted in a way to minimise the need for such assistance in the future” (Verma, 1985). 4.1.3. The need for caution I think that a good dictum in the choice of survey methodology and procedures is that “in the absence of su;cient knowledge, wisdom lies in being cautious”. The bigger the operation, the more important this principle becomes. It is important in national surveys in individual countries, but they still have more room for experimentation and choosing e;ciency against security; it is crucial in an international programme. 4.1.4. Standardisation An important general methodological issue to resolve in any international survey programme concerns the balance between standardisation of design and procedures versus Nexibility and variation. The latter may be required because of special needs
206
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
and circumstances, but it often also results merely from historical di1erences between national practices, and from more or less arbitrary preferences or even prejudices. From a technical point of view, comparability in the resulting data may not be affected by variations in many aspects of statistical design and operations, so long as they meet certain standards. However, there are often powerful practical reasons for aiming at standardisation and control of many details in international programmes, going well beyond the development and provision of common concepts, de.nitions, survey instruments, and the main statistical outputs. This is especially useful when existing technical capability of individual countries varies and is inadequate in some. Standardisation is generally economical and convenient. For an international organisation, there is considerable economy of e1ort in designing a uniform package of procedures for data collection, processing and analysis, in contrast to custom-designing survey tools and procedures for each country. For given resources, it can provide a much more intensive technical support to country surveys. If implemented democratically (after proper consideration and discussion, consensus if possible), it enhances the change of selecting “the best practices”. In any case, it lessen the risk of failure. But standardisation can also make the programme less sensitive to national needs and circumstances, and less Nexible. Undue insistence on uniform methods and procedures can be wasteful, and more critically, can damage comparability by failing to take into account di1erences which must be compensated for. Also, it can limit the scope of the programme’s broader contribution. Permanence, building of capability, integration and co-ordination at the national level are increasingly important in choosing the appropriate design and methodology as the number of surveys within each country increases. Hence the form and degree of standardisation in survey design and operations in an international programme is crucial both from the international angle and from the national. It is important to appreciate the nature, extent and rationale of this standardisation as it de.nes both the strengths and limitations of the programme. 4.2. Sampling issues New and diverse sampling issues arise because of multiple objectives in an international programme. The results are required at di1erent levels: at the level of individual countries; for comparisons between countries; for aggregation over regions or other groups, including the world level; comparisons between such aggregates; and comparisons and aggregations over time, especially when the coverage of the same countries extends over many years. 4.2.1. Coverage For building regional or other aggregates, we require “representative” samples from those domains. But since in almost all situations the sampling procedures must take individual countries as the design and implementation domains (world or regional samples in that sense do not exist, as yet), this requires complete coverage of all the countries in the aggregate, and also probability samples with national coverage
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
207
in each country. These requirements also apply to comparisons and country-speci.c analysis. Yet in most international programmes, countries participate by choice and complete coverage cannot be achieved. The impact is greater when the largest countries have to be excluded (such as India and Brazil from the World Fertility Survey, or China from the Demographic and Health Surveys—otherwise the most extensive programmes). The problem is accentuated when, because of .nancial or other practical reasons, the coverage within countries is also restricted, such as the surveys being con.ned to particular regions, urban areas, selected cities or other “sites” in the country. How to produce statistically valid, or at least “acceptable” estimates with such incomplete coverage and=or shortfalls from probability sampling? Yet such estimates are and must be produced. 4.2.2. Sample size and allocation The problem of sample allocation among countries does not always arise explicitly in international programme, because the total “programme” size may not be .xed and often decisions have to be made country-by-country. This certainly has been the case with most major survey programmes. Yet appropriate allocation is important for e;ciency of comparisons and aggregations in an international programme. What appropriate criteria should be used, and can actually be used in practice? My own preference would be to start by determining the minimum size required for any country to produce the essential country-level estimates and comparisons; the maximum size a1ordable and manageable for any country, irrespective of its size; and some simple allocation within that range as a function of country size. Other models have been proposed (Kish, 1988). 4.2.3. Weighting of units There is a choice in the type of units to be used in aggregating results over countries. The most common practice in pooling data across countries is to take individual persons (or households) as the units, i.e., aggregate national results weighted according to population size; however, where there are large di1erences in country sizes, this means that the results are largely determined by the large countries, and the samples from the small ones are largely wasted. Others may argue that the primary interest in an international survey is the comparison between countries and the construction of the picture of the “average country”; with this objective, each country should be given equal weight, irrespective of its population size. But neither of these two extremes may be suitable in all circumstances. Would it not be more meaningful in certain circumstances to look for appropriate compromises in the choice of analysis units, i.e. in the choice of weights to be given to constituent country results? The problem is accentuated when survey coverage is incomplete, especially when the biggest countries are missing (Verma and Pannuzi, 1996).
208
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
4.3. Access to and use of micro-data Carefully constructed and well-documented micro-data sets, rather than conventional publications of detailed cross-tabulations, are increasingly becoming the most relevant output from statistical surveys. Both the production of micro-data and the conditions for their distribution and use are of crucial importance in international programmes, because, by de.nition, their uses and users must cut across national boundaries. Traditionally, national practices concerning access to o;cial statistics often tend to be highly restrictive, especially (but not only) in developing countries, and especially concerning data transfers across national frontiers. On the other hand, international agencies normally encourage, and can insist upon when they control the funding for the survey operation, the dissemination of micro-data. The major concern is of course to protect con.dentiality of the data. In this context, I would like to point to a common misunderstanding of what is really needed: a misunderstanding which has had a very serious negative e1ect on data availability in a number of countries (Verma, 1998). It is extremely important to note a major di1erence between social data based on sample surveys of small and numerous units such as households and persons, on the one hand, and data involving complete enumeration or pertaining to a small number of large units where there is a real danger of exposure at the individual level. Con.dentiality rules which are developed for the latter type of situation are too often inappropriately applied to restrict the accessibility to microdata for the former. Another aspect which acquires added importance in multi-country programmes is that of data archiving. This is because internationally comparable data have more diverse users and uses. There are a number of technical and organisational aspects involved in the establishment of an archive. It deserves emphasising that the development and servicing of a data archive goes considerably beyond merely processing the data for one-time use, such as for the production of a report from a single survey. Most data are of much longer-term and wider value, and need to be used repeatedly and in conjunction with other data. These requirements can only be met through the establishment of a proper facility ful.lling a number of technical requirements, such as the following. • high standards of data preparation and documentation; • construction of analysis .les: restructuring, imputations, derived variables, etc. to facilitate data use; • standardisation and linkages: a system for storing and retrieving data according to a common structure which facilitates data access and matching; • quality control, which needs to be more thorough and systematic for data for general distribution, especially for comparative analysis using data from diverse sources; • management, e.g. the establishment of criteria and mechanisms for data distribution, keeping accurate records of the data sets and versions, providing
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
209
information to users on data errors, special features, conditions determining availability, etc.; • technical supports to users; and • the establishment of appropriate procedures for data access.
References Bastelaer, A. van, 1992. Di1erences in the measurement of employment in the Labour Force Surveys of the European Community. Training of European Statisticians. Seminar on International Comparison of Survey Methodologies, Athens. Cleland, J., Verma, V., 1989. The world fertility survey: an appraisal of methodology. J. Am. Statist. Assoc. 84 (407), 756–768. China In-depth Fertility Surveys, 1986. Beijing: State Statistical Bureau. Eurostat, 1995. The Future of European Social Statistics: Guidelines and Strategies. Series 0D. Eurostat, 1988. Labour Force Survey: Methods and De.nitions. Eurostat, 1997. Household Budget Surveys in the EU: Methodology and Recommendations for Harmonisation. Series 3E, ISBN 92-827-9805-4. Hussmanns, R., Mehran, F., Verma, V., 1990. Surveys of the Economically Active Population, Employment, Unemployment and Underemployment: An ILO Manual on Concepts and Methods. International Labour O;ce, Geneva. Kish, I., 1988. Multipurpose sample designs. Survey Methodology 14 (1), 19–32. Kish, I., 1994. Multi-population survey designs: .ve types with seven shared aspects. Int. Statist. Rev. 62 (2), 167–186. Kish, 1997. Designs and uses of multi-population samples draft, unpublished paper. Luxembourg Income Study, 1988. LIS Information Guide. Working Paper 7. National Family Health Surveys, 1992–1993. Mumbai: International Institute of Population Sciences, 1995. OECD, 1995. Income Distribution in OECD Countries: Evidence from Luxembourg Income Study. Social Policy Studies, no. 18. Preliminary Report, Reproductive Health Survey Romania, 1993. Institute for Mother and Child Care, Ministry of Health, Romania, and Division for Reproductive Health, Centers for Disease Control and Prevention, USA, January 1994. United Nations Statistical O;ce, 1983. National Household Survey Capability Programme: Selected issues of design and implementation. Bull. Internat. Statist. Inst. 50 IP(2), 1365–1380. Rao, V.R., Verma, V., 1982. The United Nations National Household Survey Capability Programme. Special Session on International Statistical Programs, JSM=ASA, Cincinnati. Som, R.K., 1996. Practical Sampling Techniques, Appendix VI. Marcel Dekker, New York. Verma, V., 1985. WFS survey methods. In: Cleland, J., Hobcroft, J. (Eds.), Reproductive Change in Developing Countries. Oxford University Press, Oxford. Verma, V., 1992. Household surveys in Europe: some issues in comparative methodologies. Training of European Statisticians. Seminar on International Comparison of Survey Methodologies, Athens. Verma, V., 1993. Comparative surveys in Europe: problems and possibilities. Bull. Internat. Statist. Inst. 55 CP(2), 527–528. Verma, V., 1996a. The strategy for making the microdata base accessible to users: dual requirement— promotion and direction. In: Gais, B. (Ed.), The Future of European Social Statistics: Use of Administrative Registers and Dissemination Strategies. Luxembourg: O;ce for O;cial Publications of the European Communities, Series 0D, pp. 161–165. Verma, V., 1996b. Sampling for Unicef’s Mid-decade Surveys. Essex’96: International Social Science Conference, University of Essex. Verma, V., ClTemenceau, A., 1996. Methodology of the European Community Household Panel. Statistics in Transition 2 (7), 1023–1062. Verma, Gabilondo, 1993. Family Budget Surveys in the EC: Methodology and Recommendations. Luxembourg: Statistical O;ce of the European Community, Series 3E.
210
V. Verma / Journal of Statistical Planning and Inference 102 (2002) 189–210
Verma, V., Palan, V.T., 1987. Contribution to survey capability. In: Cleland, J., Scott, C. (Eds.), The World Fertility Survey: An Assessment. Oxford University Press, Oxford. Verma, V., Pannuzi, N., 1996. Procedures, analytical methods and comparisons in the European Community Household Panel. Social Science Research Conference, University of Sienna. Verma, V., 1998. Data sources and access for comparative analysis. In: Walker, R., Taylor, M. (Eds.), Information Dissemination and Access in Russia and Eastern Europe. IOS Press, Amsterdam.