Outcomes and effectiveness in reproductive health

Outcomes and effectiveness in reproductive health

PII: Soc. Sci. Med. Vol. 47, No. 12, pp. 1925±1936, 1998 # 1998 Elsevier Science Ltd. All rights reserved Printed in Great Britain S0277-9536(98)0033...

197KB Sizes 0 Downloads 38 Views

PII:

Soc. Sci. Med. Vol. 47, No. 12, pp. 1925±1936, 1998 # 1998 Elsevier Science Ltd. All rights reserved Printed in Great Britain S0277-9536(98)00334-7 0277-9536/98/$ - see front matter

OUTCOMES AND EFFECTIVENESS IN REPRODUCTIVE HEALTH WENDY J. GRAHAM* Dugald Baird Centre for Research on Women's Health, Department of Obstetrics and Gynaecology, Aberdeen University, Cornhill Road, Aberdeen AB25 2ZD, U.K. AbstractÐMeasuring reproductive health is problematic. Awareness of the problems needs to be raised both among those collecting and those using data on reproductive health. This paper discusses two major measurement questions ± one related to ascertainment and the other to attribution. The ®rst question is to what extent the observed levels and patterns of reproductive health outcomes in women are valid as opposed to artefacts of the data sources and the data collection methods? The second question is can lack of evidence of e€ectiveness for any reproductive health intervention ever con®dently be separated into no e€ects vs an inability to measure e€ects? Determining the e€ectiveness of health interventions is notoriously dicult. Reproductive health may not be a case for special pleading in the competition for scarce resources, but equally it should not be a case for special standards of proof of the e€ectiveness of interventions ± standards which have not indeed been met by many other, and yet unquestioned, health care priorities. ``What works'' in reproductive health should in fact be judged from at least four di€erent perspectives: from that of women and their families, health professionals, the scienti®c community, and national and international policy-makers. # 1998 Elsevier Science Ltd. All rights reserved Key wordsÐReproductive health, Measurement, E€ectiveness, Evidence, Outcomes

INTRODUCTION

Measuring women's reproductive health is problematic. Whether or not it is a case for special pleading ± in other words, it is more dicult than other dimensions of health or indeed men's reproductive health ± is not a particularly useful line of enquiry. Energies are better spent on acknowledging the measurement barriers and their implications. But why are these barriers important and is there any point to discussion at a general, global level? Measuring women's reproductive health is necessary for two main reasons ± to assess levels of health and so identify needs and priorities for action, and to evaluate changes in health, particularly in relation to actions taken. Assessing needs and evaluating e€ectiveness are relevant to all ``providers'' of health care ± whether ``providers'' are de®ned narrowly or broadly, operating formally or informally, and locally or internationally. Barriers to measurement a€ect both the quality and the scope of information on women's reproductive health. Review of these barriers is worthwhile from a global perspective since women's reproductive health has recently been elevated in the consciousness and action agenda of governments throughout the world, in part through the International Conference on Population and Development (ICPD, 1994) in Cairo in 1994. Although speci®c priorities clearly *Tel.: +44-1224-681818 ext. 53924/53621; Fax: +44-1224404925; E-mail: [email protected].

do vary between parts of the world, women's reproductive health outcomes exist on a continuum which is common to all populations ± a continuum from death to complete physical, social and mental well-being. The aim of this paper is to highlight two measurement-related challenges of global relevance: the ascertainment of women's reproductive health outcomes and their attribution.

THE CONTINUUM OF REPRODUCTIVE HEALTH OUTCOMES

De®ning what is encompassed by the term ``reproductive health'' has occupied much attention in the literature over the years. In many industrialised countries, it is still regarded as essentially synonymous with the specialty of obstetrics and gynaecology. This clinical view of reproductive health is manifest both in a focus on negative outcomes, and in the separation of services dealing with pregnancy and delivery, sexually transmitted diseases, adolescent health, gynaecological disorders, family planning and psycho-sexual health. A more comprehensive view of reproductive health has emerged over the last ten years, as now summarised in the de®nition from the Cairo conference: Reproductive health is a state of complete physical, mental and social well-being and not merely the absence of disease or in®rmity, in all matters relating to the reproductive system and its functions and process. ICPD Programme of Action (ICPD, 1994, paragraph 7.2)

1925

1926

W. J. Graham

Translation of this de®nition into areas of e€ective service provision is well-recognised to present a serious challenge, and for many populations integrated reproductive health programmes are unlikely to become a reality for many years to come (McGinn et al., 1996). Some progress can however be regarded to have taken place by virtue of this broadening of the de®nition. It is clear that, previously, some aspects of reproductive health simply did not feature on the agenda for action because of the narrower perspective. These aspects had fallen foul of a measurement trap (Graham and Campbell, 1992) whereby lack of information was both the cause and e€ect of continuing ignorance of the subject. Domestic violence and female genital mutilation are just two examples of these ``hidden burdens'' on women's reproductive health. Although there is certainly a greater global awareness of these topics now than say ®ve years ago, there is still the potential for them to slip from view in some of the approaches recently adopted for assessing the global burden of disease (see for example, Barker and Green, 1996; Murray and Lopez, 1996). The Cairo de®nition of reproductive health has not yet been translated into a list of outcomes which are unequivocally regarded as priorities for action. There is, however, a broad consensus, at least at the international level, on the programme or service areas encompassed by ``reproductive health'', as indicated in Table 1. Looking across these programme areas, it is possible to recognise a continuum of reproductive health outcomes which would fall within their remit. The desire to measure health outcomes has become a world-wide preoccupation in the 1990s and an extensive literature has emerged on the various conceptual and methodological issues. Whilst de®nitions of ``outcome'' are thankfully still not as common as those for ``health'', there has been a proliferation of terminology. An ``outcome'' is still generally regarded as an endpoint or result, but it is increasingly acknowledged that a health ``outcome'' can be used to refer to a change in health status even though the causal factors (the attribution) are not measured or even known (Shanks and Frater, 1993); the uncertainty speci®cally surrounding the attribution of reproduc-

Table 1. Components of a comprehensive reproductive health programme * * * * * * * * * * *

Safe motherhood Health of the newborn and breastfeeding Family planning Abortion care Prevention and management of STDs/HIV/AIDS Infertility Adolescent reproductive health Maternal nutrition Female genital mutilation Violence against women Reproductive tract cancers

tive health outcomes will be returned to later in the paper. So what is the range or continuum of outcomes through which reproductive health is manifest? The idea of a continuum from negative to positive health outcomes is not a new one (Chiang and Cohen, 1973; Bowling, 1991). Neither is the recognition of the predominance given to the negative end of the spectrum (Hansluwka, 1985), which is sometimes referred to in terms of the ``5 D's'' ± death, disease, disability, discomfort, dissatisfaction (Siegmann, 1979). More recent interpretations of the continuum are found in the ``disability weights'' used in the calculation of disability adjusted life years (DALYs), which range from death to ``perfect health'' (World Bank, 1993). Figure 1 is a schematic representation of the continuum of reproductive health outcomes. Examples of outcomes in the positive half of the continuum remain essentially speculative. Although the lack of positive health outcomes has been decried for many years and across many health ®elds, real progress at both conceptual and methodological levels has been dismal; reproductive health is no exception. Concerted research e€ort is needed if those activities related to well-being, as opposed to ill-health, which were endorsed in the ICPD Plan of Action are to be tracked for progress. The prevalence of the broad categories of reproductive health outcomes ¯agged in Fig. 1, clearly vary enormously between populations throughout the world. The patterns are fairly predictable at the negative end, but less so with progression towards the positive end. Thus, for example, the level of maternal mortality is generally highest in the poorest countries with the least developed health infrastructure, but it does not follow that positive reproductive health outcomes are most prevalent in the richest countries. Indeed, there is ample evidence both of high levels of dissatisfaction with general health status among a‚uent populations and, conversely, of ``good health at low cost'' in comparatively poor countries, such as Costa Rica and Sri Lanka (World Bank, 1993). The reproductive health continuum can also be linked with the concept of the epidemiological transition, with population health pro®les shifting away from the negative extremes as these outcomes become rarer over time. Such a shift also occurs in relation to the non-reproductive health outcomes, and the balance of attention within women's health is clearly di€erent between industrialised and lesser industrialised nations (Macran et al., 1996; Paolisso and Leslie, 1995). The extent to which a shift along the continuum of reproductive health outcomes is possible for many of the poorest countries is, of course, highly debatable. Certainly there is little evidence of such a shift according to the latest ®gures for one key indicator at the negative end of the spectrum ± maternal mortality (WHO, 1996). But

Outcomes and e€ectiveness in reproductive health

1927

Fig. 1. The continuum of reproductive health outcomes

can we rely on these ®gures? Are the measures we use of reproductive health outcomes reliable and sensitive to change? This brings us to the measurement-related challenge of ascertainment. ASCERTAINMENT OF REPRODUCTIVE HEALTH OUTCOMES

For the majority of outcome measures for reproductive health, it is important to start by acknowledging the diculty of disentangling the ``truth'' from the artefacts of the data source and the data collection method. This is a dilemma regardless of the intended use of the information ± be this, for example, needs assessment or service evaluation. Diculties of ascertainment occur along the continuum of outcomes, although their relative importance will di€er between populations depending on the health pro®le. In Latin America, for instance, one important question is whether the reported patterns of abortion-related mortality are reliable or primarily re¯ect the highly selective routine statistics which are available. Another example further along the continuum is menorrhagia which accounts for a signi®cant proportion of gynaecological outpatient consultations in the United Kingdom (Nueld Institute for Health et al., 1995). But how do you measure the discomfort women feel? Are the reports women give in outpatient clinics largely a re¯ection of the manner in which the question is asked or the sensitivity of the person asking it, and what e€ects can this have on treatment o€ered? For instance, the socio-economic di€erentials among women in the United Kingdom undergoing hysterectomy or dilatation and curettage (Kuh and Stirling, 1995) may be due partly to variations in the process of ascertainment ± a process in¯uenced by women's

education and social class, and complicated by di€erences between professionals in their tendency to intervene. Moving further towards the positive end of the continuum, the diculty is essentially one of lack of outcome measures rather than inappropriate methods or sources. It is the two negative extremes of the continuum depicted in Fig. 1 ± death and disease ± which remain the focus of attention for health service provision throughout the world, even in those settings in which mortality and speci®c morbidity outcomes, such as syphilis, are now extremely rare. These are sometimes referred to as ``hard outcomes'', implying a degree of objectivity apparently lacking in ``soft outcomes'', such as dissatisfaction. Although the appropriateness of the word ``hard'' can certainly be challenged in this sense, mortality and morbidity outcomes are certainly hard to measure. The measurement diculties can be related to both the source of the data and the data collection method, as indicated in Table 2. Clearly, the purpose for which the data are required is an intermediary here since some uses require, for example, greater precision and representativeness than others ± a subject we will return to shortly. Nevertheless, there are four main barriers which emerge across the data sources and collection methods: rarity, validity, reliability and selectivity. Rarity is an obvious diculty but one which is still not fully appreciated, particularly among initiatives to reduce maternal morbidity and especially mortality. Irrespective of whether these outcomes are being measured to establish current levels or for looking at trends, rarity has serious implications for sample sizes and this in turn has repercussions for

tests conducted on women: selectivity; validity and reliability of tests/ instruments; availability of test results

interviews with patients as part of medical consultation or exit interviews: validity and reliability of self-reported morbidity; selectivity examination of women: selectivity; skills of examiner; reliability interviews with relatives, household members: large samples sizes; reliability of reports; choice of respondent not applicable

Laboratory/physical test

Observation/physical examination

Interviews

review of medical records: misclassi®cation; quality of records; selectivity; availability; small number of deaths interviews with health workers: reliability of reports; choice of respondent; selectivity; small number of deaths autopsies: comparative rarity of deaths and autopsies; selectivity not applicable Record review

health facilities Data collection method

not applicable

review of medical records: quality of records; availability; selectivity not applicable

morbidity health facilities ``community''

Source mortality

Table 2. Measurement constraints on ascertaining mortality and morbidity outcomes, by data source and data collection method

review of patient-held records: selectivity (only those using facilities and retaining records); quality of records; large samples sizes interviews with women using structured questionnaires or open-ended questions: validity and reliability of self-reported morbidity; large sample sizes examination of women: refusals to participate; need for trained ®eldworkers; reliability tests conducted on women: refusals to participate; need for trained ®eldworkers; validity and reliability of tests/instruments; large sample sizes; ®eld logistics

W. J. Graham

``community''

1928

data collection. In order to accumulate sucient cases to produce stable estimates, reference periods often need to be extensive. This may compromise the reliability of the data owing to recall errors, and prevents the creation of an up-to-date picture. Both the qualitative and quantitative methods as well as the sources used to explore rare outcomes often involve a compromise between scienti®c and practical concerns. The sisterhood method, for instance, provides a simple and comparatively lowcost approach to deriving a community-based estimate of maternal mortality (Graham et al., 1989), and has been used widely since its development in the late 1980s (for instance, David et al., 1991; Shahidullah, 1995; Danel et al., 1996). The method is able to overcome the problem of rarity of events by using an unrestricted recall period and thus accumulating maternal deaths over many years. The price to be paid, however, for the small sample size requirements of the sisterhood method is that the resulting estimate of maternal mortality is not a current one. This therefore makes the method unsuitable for measuring short-term programme impact (Graham, 1994; Hanley et al., 1996). The characteristics of measurement referred to as validity and reliability are usually discussed together, although they are quite di€erent concepts. Validity refers to the extent to which the measurement tool used actually taps or represents reality, whereas reliability is the extent to which the tool produces the same result when used more than once to measure precisely the same phenomenon (Veney and Kaluzny, 1984). In Table 2 it can be seen that they tend to operate jointly as constraining factors, particularly in relation to the measurement of selfreported morbidity. Growing recognition of the dif®culties of measuring mortality outcomes has encouraged the search for alternative indicators at the negative end of the continuum. In developing country settings, concerns over the accuracy and selectivity of facility-based sources of data on reproductive morbidity, have encouraged researchers to explore the use of women's selfreports on health (Bulut et al., 1995; Stewart and Festin, 1995; Filippi et al., 1997; Goodburn and Graham, 1997; Zurayk and Kabakian, 1997). These studies have developed questionnaires and health examination protocols, and tested their validity and reliability, both for morbidities related to pregnancy and for other broader aspects of reproductive illhealth. Although they have used separate tools which obviously undermines direct comparisons, the studies do yield two general observations: considerable variability between di€erent conditions with regard to validity and reliability, but also overall unsatisfactory degrees of sensitivity and speci®city. The consensus which seems to be emerging is that self-reported reproductive morbidity is not the answer to the measurement-related barriers of mortality and facility-based morbidity, particularly

Outcomes and e€ectiveness in reproductive health

when gathered in large-scale cross-sectional surveys, such as the Demographic and Health Surveys (Stewart and Festin, 1995). This is disappointing on several grounds, not the least of which is the grist it adds to the ancient mill undervaluing the views of women about their own health (Oakley et al., 1994). A hasty rejection of self-reported morbidity should, however, be cautioned against. Not only do we need to question whether the tools used were ``at fault'' and if they could be improved, but also the underlying rationale for morbidity measurement if individuals' perceptions of their own heath status are totally ignored in favour of the prevailing medical models of ascertainment. Greater concern for the validity of tools should be promoted in all aspects of reproductive health, and the tendency to ``borrow'' techniques developed for other purposes and population sub-groups should be avoided unless validity is reassessed. For example, the use of the trait component of the widely applied Speilberger State-Trait Anxiety Scale has recently been shown (Hundley et al., 1998) to be unstable when used on pregnant women before and after delivery. This raises the familiar discussion on the uniqueness of childbirth as a life event and the consequent need for dedicated measurement tools. Reliability can also be considered at the common sense level and synonymously with accuracy. Table 2 for example, highlighted the question of the quality of routine facility-based data. Problems of missing information, misclassi®cation and misreporting have been presented by many studies of reproductive health outcomes in a wide variety of settings. These diculties have ®red the long-standing debate on the worth of routine health information systems, and a systematic approach to assessing the quality of facility-based data has been widely called for (see for instance, Nirupam and Yuster, 1995; Paterson et al., 1991). Regardless of the source of data, the potential for observer bias needs to be acknowledged. The skills of the observer, who may be a physician in a hospital or a lay interviewer visiting women's homes, clearly a€ect ascertainment, and so does the observer's status as perceived by the interviewee. Many studies have noted the apparent overreporting of health problems when the observer is a health professional (see for instance, Filippi et al., 1997). Similarly, the operation of a Hawthorne e€ect can produce biased results which are further complicated by the general tendency for ascertainment to improve over time. For example, a recent longitudinal study of postpartum maternal health in Bangladesh (Goodburn and Graham, 1997) found a spurious increase in the prevalence of speci®c selfreported reproductive morbidities as the study progressed over a period of 18 months. The extent to which di€erent information sources and data collection methods are prone to selectivity bias is the fourth and ®nal measurement barrier to

1929

be highlighted. This is primarily an issue of generalisability and of knowing not so much who is captured by facility-based data but who is not. In some settings, it is virtually impossible to determine if the level yielded from a health facility for a particular problem is an under- or overestimate of that in the general population; in the case, say, of complicated induced abortions it may be an underestimate as some complications occur without contact with any health facilities, or an overestimate as the facility may be a referral centre for emergency complications. A full appreciation of the characteristics of the catchment population of a facility which would be necessary to disentangle selection biases is rare. This is illustrated by the current controversy in many industrialised countries over the need to adjust for case-mix before hospital statistics can be interpreted and compared (see for example, Scottish Oce, 1996). The constraints on the ascertainment of reproductive health outcomes using di€erent data sources and collection methods do, as mentioned previously, have a di€erential impact according to the intended uses of the information. Demonstrating, for instance, the impact of a safe motherhood programme using indicators such as the maternal mortality rate or ratio is not feasible in most developing country settings (Graham et al., 1996). The number of maternal deaths, on the other hand, may be a useful indicator to assess the need for action between di€erent district hospitals. The question of the ``best'' indicators of reproductive health outcomes continues to be a subject debated among international agencies (WHO, 1997). ``Best'' can only be judged in relation to purpose and this brings us to the second major measurement-related challenge ± attribution. ATTRIBUTION OF REPRODUCTIVE HEALTH OUTCOMES

The drive towards ascertaining health outcomes originated among those with an interest in assessing the quality and e€ectiveness of health services (Shanks and Frater, 1993). This is not, of course, to deny that measurement is also sometimes undertaken to de®ne needs and priorities for care, as well as for characterising populations irrespective of service provision. The primary concern nevertheless has been and continues to be the ascertainment of outcomes in order to attribute these to speci®c health interventions. It is important here to highlight the distinction between two approaches to this concern; these are represented schematically in Fig. 2. The ®rst ¯ow diagram re¯ects an approach which places the health outcome as the focal point and then seeks to identify the in¯uencing factors, and the second diagram shows the alternative approach which starts with a given health intervention and assesses its e€ects. These two perspectives clearly imply di€erent measurement processes as

1930

W. J. Graham

Fig. 2. Alternative perspectives on measuring e€ectiveness

regards attribution. They also vary in terms of their appeal to di€erent interest groups. For example, a donor agency supporting a particular health intervention is more likely to take the perspective illustrated in Fig. 2 which places the intervention at the centre and then asks how its e€ects can be measured, whereas a district medical ocer may start with a particular priority health outcome, such as perinatal death, and seek to identify the relative impact of di€erent initiatives. The role of di€erent interest groups in de®ning e€ectiveness will be returned to later in the paper. Figure 2 also highlights three other important points. Firstly, health outcomes or e€ects are not solely the result of health interventions but are now acknowledged to re¯ect a wide range of social, economic, cultural and environmental forces (Oakley, 1994). Secondly, it is important to note the distinction between those health outcomes which are indeed the result of health interventions from those which are not, and between those e€ects of health interventions which are manifest in changes in health status and those which are evident in some other way (Shanks and Frater, 1993). Finally, only a fraction of the relationships between health outcomes and health interventions have been scienti®cally proven. Clearly di€erent outcomes are attributable to health care to di€ering extents, and this degree of uncertainty is not just peculiar to reproductive health. Nor is the world-wide concern to establish e€ectiveness ± to know ``what works'' ±

the sole province of reproductive health. What is perhaps di€erent is both the quest for a degree of proof of e€ectiveness, which has not yet been provided even in the most long-established and wellresearched aspects of health care, and the adverse impact that failing to reach this goal is likely to have on funds allocated, at national and international levels, to the programme areas of reproductive health shown earlier in Table 1. In order either to attribute speci®c reproductive health outcomes to speci®c interventions, or to assess a health intervention in terms of its e€ects, the diculties of both ascertaining health outcomes and assigning attribution have to be faced. Moreover, just as the level of a particular health outcome measure, such as the prevalence of chlamydia, may be an artefact of the data source and collection method, so the inability to attribute an e€ect to a health intervention, such as STD targeting through family planning, may be an artefact of the measurement process. In the major programme area of safe motherhood, for instance, there have been extremely few demonstrations of impact; examples of these are given in Table 3. Is this because there has not been an e€ect manifest in the chosen health outcomes, or because it has not been detected? To disentangle this important question, two further questions need to be posed: how and with what can e€ectiveness be established, and who decides what is e€ective?

Outcomes and e€ectiveness in reproductive health

1931

Table 3. Safe motherhood interventions: selected projects reporting on e€ectiveness Randomised controlled trials Eclampsia multicentre trial: magnesium sulphate most e€ective treatment against recurrent convulsions (Duley, 1996) Reduced antenatal care trial ± Harare: reduced visits had no adverse e€ects on maternal or neonatal outcomes (Munjanja et al., 1996) Observational studies Partogram multicentre study: reduction in number of prolonged labours, stillbirths and unnecessary caesarean sections, and reduction in rate of postpartum infection (Lennox and Kwast, 1995) High risk prenatal screening, Nigeria and Ethiopia: reduction in hospital-based maternal mortality (Brennan, 1989; Poovan et al., 1990) Training traditional birth attendants, Indonesia, Guatemala, Brazil: ``successful'' detection and referral of women with complications during labour and delivery (Janowitz et al., 1985; Alisjahbana, 1991; Schieber, 1991) Training traditional birth attendants, Indonesia and Gambia: no impact on maternal mortality ratio (Greenwood et al., 1987; Alisjahbana, 1991) Maternity waiting homes, Ethiopia: contributed to reducing maternal mortality (Poovan et al., 1990) Maternity care programme, Bangladesh: decline in direct obstetric deaths in intervention area owing to combined e€ects of community midwives, clinic physicians and referral system (Fauveau et al., 1991; Maine et al., 1996)

Although in some areas of reproductive health the priority is still to identify biologically ecacious interventions, the predominant concern now lies with establishing e€ectiveness. E€ectiveness can be de®ned on a number of bases, with outcomes increasingly measured not only in terms of health but also cost and operational eciency. The choice of study designs and outcome indicators to use to demonstrate health impact are primarily technical issues. Much has been written on the diculties of establishing causality in general (Seedhouse, 1986), and for reproductive health in particular (Chowdhury et al., 1996). The problems of ascertainment and attribution interact to produce a formidable array of obstacles: adequacy of sample size, control of confounding and intervening variables, appropriate time lag, and valid and reliable

instruments, to name but a few. The situation is further complicated by the nature of the interventions in reproductive health, many of which have signi®cant behavioural components and involve multiple, often ill-de®ned inputs rather than discrete ``magic bullets''. Needless-to-say, the current shift towards implementing comprehensive reproductive health programmes is not likely to ease these diculties of establishing e€ectiveness. At one level, the choice of outcomes with which to demonstrate e€ectiveness seems intuitively obvious ± being those the speci®c intervention is intended to impact upon. This however re-introduces the uncertainty of attribution and highlights a circular argument which should be challenged, as indicated schematically in Fig. 3. Priority health outcomes become the targets of interventions

Fig. 3. Di€erentiating between health outcomes for targeting and evaluating health interventions

1932

W. J. Graham

which, in turn, are evaluated according to health outcomes. The key point is that the health outcomes used in the initial prioritisation need not necessarily be the optimum ones for judging e€ectiveness. As illustrated previously (Fig. 2), there are two separate issues involved: what is the ``best'' (most e€ective) intervention to improve health outcomes, and what is the ``best'' outcome with which to show e€ectiveness? For example, although essential obstetric care (EOC) may be the optimum intervention to reduce maternal deaths, maternal mortality is not necessarily the best outcome with which to show the e€ectiveness of EOC owing to the substantial diculties in ascertaining the levels and trends for this negative outcome. So what are the alternatives to the prioritised health outcomes for evaluating an intervention? Fig. 3 suggests that, in addition to other health outcomes, there are two further types of indicators: process and cost. In recent years, there has been considerable debate on the relative merits of process vs outcome measures, particularly for assessing quality of care (Orchard, 1994). In part, a reliance on process evaluation has been both a cause and an e€ect of the lack of impact evaluation in reproductive health. The power of process measures to detect failures in quality of care lies in their ability to overcome or side-step many of the problems besetting outcomes data (Davies and Crombie, 1995). The case for process measurement is further strengthened since it identi®es speci®c shortcomings, points the way towards change, and the data required are often already being gathered as part of service or programme monitoring. The major drawback is that measures of process are valuable proxy indicators of quality only when the causal pathways between inputs and outcomes are well-supported by research evidence. This brings us full circle back to the question of uncertain attribution. Process measurement remains, however, a prerequisite to attempting to establish e€ectiveness. If an intervention is not being implemented in the manner or degree to which it was intended, why should any health impact be expected? The use of outcome measures based solely on cost to assess reproductive health interventions is still comparatively rare (McGinn et al., 1996; Ratcli€e et al., 1996). In part, this is also a re¯ection of the emphasis which has been placed on process measurement. Whilst at an international level various assessments have been made of the coste€ectiveness of di€erent ``packages'' of interventions (World Bank, 1993; WHO, 1994), at a district or programme-speci®c level, the diculties of gathering reliable data, particularly on the direct and indirect costs to service users, are major obstacles. At present, outcomes based on cost are thus not signi®cantly easier to measure nor more readily available than those based on health.

The technical matter of how and with what should e€ectiveness be established is, however, conditional on the question of who decides ``what works''. Acknowledging the existence of multiple perspectives on the same outcome or on the same intervention is still not widespread globally nor across the sub-components of women's reproductive health. Four major interest groups or stake-holders should be recognised: women and their families, health professionals, the scienti®c community, and national and international policy-makers and donors. The common concern which unites these disparate groups is the question of evidence, since all implicitly or explicitly act upon the information available to them, be this from the care they themselves received or the published ®ndings of a randomised controlled trial. How each group de®nes and judges the evidence clearly varies considerably and in ways which are not yet adequately understood. Not only is there variation in their technical ability to evaluate di€erent types of information, but also in the relative importance they attach to so-called grades of evidence as well as in their preference for ``hard'' or ``soft'' data. A policy decision-maker may, for example, be unwilling to provide resources to shift the national model of maternity care on the basis of one or two research projects (Graham, 1997a), but a woman and her family may make their decision to use a particular traditional birth attendant on the basis of a favourable experience of one other relative. Similarly, whilst a health service manager may be more comfortable to act upon the quantitative data from a recent district-wide consumer survey, a social scientist may feel that the results of an indepth study of a few women provide a more valid basis for judging satisfaction with care. However, acknowledging the value of these di€erent perspectives to deciding ``what works'' in reproductive health is not enough, and now needs to be accompanied by greater understanding of their relevance to resource allocation, as well as to service provision and utilisation. In recent years, there has been a marked shift in many industrialised countries towards so-called ``patient power''. Signi®cantly and in recognition that pregnancy is not a disease, the words ``client'' or ``consumer'' rather than ``patient'' are increasingly being used to de®ne women care-seekers (Graham, 1997b). The need to give individuals choice in the care they receive and to seek their views on the quality of this care have been driving forces behind many new initiatives, such as Changing Childbirth which is reforming maternity care provision in the United Kingdom (Department of Health, 1993). Of course, the willingness of women to report frankly their judgement on the quality or e€ectiveness of care, particularly when asked by service providers or in service settings, is known to be problematic. There are many examples

Outcomes and e€ectiveness in reproductive health

of studies yielding extremely high levels of satisfaction, which have often been interpreted as signs of signi®cant response bias (Lumley, 1985). Again the need to question the validity of the tools used, and to disentangle the process of measurement from the ``truth'', is as pertinent to patient-centred assessments of quality of care as for so-called ``hard'' measures of morbidity. In industrialised countries, an array of approaches has evolved to integrate women's perspectives to what works, including conjoint analysis and willingness to pay (Ryan, 1996; Ryan and Hughes, 1997), as well as evaluations based on ``softer'' outcomes, such as anxiety and quality of life (Bowling, 1991). The extent to which these approaches can be applied in less well-resourced settings, where the choice is often between any care or no care rather than types of care, has not yet been explored. There are, however, clear indications that women and their families are making a judgement on the quality or e€ectiveness of care. Many of the barriers and delays in using services are overcome, particularly in the case of emergencies, when the care women will receive is perceived as high quality ± e€ective, appropriate and acceptable (Paolisso and Leslie, 1995). Conversely, perceptions of poor quality of care are barriers to uptake and thus can add to the burden of mortality and morbidity (Sundari, 1992; PMMN, 1995). The need to explore both the evidence and the process by which women make their own judgements of e€ectiveness should be regarded as a research and development priority in reproductive health. The perspectives on e€ectiveness held by the health professional can be expected to be closer to those of the scienti®c community since the promotion of evidence-based practice (Sackett et al., 1996), although clearly there are signi®cant barriers to implementing best practice in routine service settings, such as cost and organisational structure (Gray, 1997). The concept of grades of evidence (Table 4) and of prioritising practices which are based on scienti®cally-reliable evidence has profound implications for reproductive health care, both positive and negative. In terms of maternity care, for example, The Cochrane Pregnancy and Childbirth Database (Cochrane Collaboration, 1995) includes a classi®cation of forms of care into six categories: bene®cial, likely bene®cial, trade-o€,

Table 4. Levels of evidence of e€ectiveness for health interventions Ia Ib IIa IIb IIc III IVa IVb

systematic review of 2 or more RCTs an RCT a cohort study a case-control study well-conducted uncontrolled study respected opinions, expert groups ``someone told me...'' ``I know...''

Adapted from: Critical Appraisal Skills Programme (1995).

1933

unknown e€ectiveness, unlikely to be bene®cial, and likely to ine€ective or harmful. The largest category is in fact the one labelled ``unknown'' and this clearly illustrates the lack of basic research on e€ectiveness and strengthens the case for more resources to help bridge these signi®cant information gaps. However, although the promotion of practice and decision-making based on sound data is entirely laudable, it is important that the standards applied to proof of e€ectiveness of reproductive health interventions are not disproportionate to those prevailing in other areas of health care. Both the ascertainment and the attribution of reproductive health outcomes are problematic, as shown in this paper, and the study design advocated as the gold standard for establishing e€ectiveness is no more nor less applicable to this than to any other area of health care (Black, 1996). Whilst, for example, speci®c drug treatments for the management of eclampsia were amenable to assessment through a randomised controlled trial (RCT) (Duley, 1996), an intervention to raise community awareness of the need to eliminate harmful traditional practices, such as female genital mutilation, would be much more dicult to evaluate using this design. Increasingly, observational inference is being promoted as a complementary approach to RCTs (Shanks and Frater, 1993; Black, 1996). Finally, the perspective of the policy-makers and donors at national and international levels on ``what works'' is perhaps the one least explored. Intuitively, their priority would be expected to be one of maximisation ± of maximum health gain for minimum cost, and so readily at con¯ict with the priorities of providers and of consumers. Evidence of e€ectiveness for policy-makers would thus be based primarily on such measures as deaths averted and costs. The diculty mentioned earlier of di€erentiating between no apparent health impact owing to the ine€ectiveness of the intervention as opposed to the inability to measure change, is a serious obstacle in the competition for ®nite health resources. The key question is whether other areas of resource allocation are able to con®dently demonstrate good value, or whether double standards are being applied. Reproductive health in its post-Cairo formulation is a comparatively new bidder for resources. If it is to remain a priority into the next millennium, much more needs to be found out about the requirements for evidence of policymakers and donors, as well as the political and technical constraints on their use of this evidence. CONCLUSIONS

Despite signi®cant improvements over the last decade in the amount of information on women's reproductive health, there are still huge gaps. The reality is that perfect data ± in scope and quality ± are unattainable, and the need to utilise the infor-

1934

W. J. Graham

mation available is therefore obvious. The reality is also that actions to improve outcomes along the reproductive health continuum must go ahead even if the data are inadequate. What must, however, accompany these realities is both a continuing push for better data (with all its resource implications), and a raising of awareness among data users of the signi®cant uncertainties in both the ascertainment and the attribution of reproductive health outcomes. These uncertainties can be couched in terms of two key questions: . are the observed levels and patterns of reproductive health outcomes valid, or are they artefacts of the data sources and data collection methods? . is a lack of evidence of e€ectiveness due to no e€ects or to an inability to measure e€ects? These uncertainties are clearly interrelated and impose serious limitations on assessing needs for interventions as well as evaluating change in reproductive health outcomes. Determining the e€ectiveness of health interventions is notoriously dicult and the desire for all care to be evidence-based is still a dream for most health specialties. Reproductive health may not be a case for special pleading in the competition for scarce resources, but equally it should not be a case for special standards of proof of the e€ectiveness of interventions. ``What works'' in reproductive health should be judged from at least four di€erent perspectives: from that of women and their families, health professionals, the scienti®c community, and national and international policy-makers. Each party has its own view of evidence and its own approach to interpreting and acting upon such evidence, and this has implications at both individual and population levels. A woman who perceives there is no evidence that delivery in the local health centre will improve the outcome for her and her baby, may not present herself even in the event of a complication. An international donor who is unable to receive evidence of the ``success'' of an educational intervention on STDs may be unwilling to make the case for further funding. Real progress in reproductive health now and into the 21st century requires a greater understanding of these di€erent perspectives on e€ectiveness ± ``what works'' is neither a simple nor single question.

AcknowledgementsÐThis paper draws heavily on the methodological research the author has been involved in over several years and, as such, bene®ts from collaborative work with many individuals ± too numerous to mention all here ± both in the United Kingdom and overseas. Speci®c thanks for reading earlier drafts of this paper are due to Dr Marion Hall, Consultant Obstetrician/ Gynaecologist at Aberdeen Maternity Hospital, and Ms Veronique Filippi, Lecturer at the London School of Hygiene and Tropical Medicine. Appreciation is also given to Mrs Elaine Stirton from the Dugald Baird Centre for her secretarial support.

REFERENCES

Alisjahbana, A. (1991) Regionalization of care in West Java. Presentation at the Workshop on Guidelines for Safe Motherhood Programming. The World Bank/ MotherCare, Washington, DC. Barker, C. and Green, A. (1996) Opening the debate on DALYs. Health Policy and Planning 11(2), 179±183. Black, N. (1996) Why we need observational studies to evaluate the e€ectiveness of health care. British Medical Journal 312(May), 1215±1218. Bowling, A. (1991) Measuring Health. A Review of Quality of Life Measurement Scales. Open University Press, Milton Keynes. Brennan, M. (1989) Training traditional birth attendants. Postgraduate Doctor Africa 11(1), 16±18. Bulut, A., Yolsal, N., Filippi, V. and Graham, W. J. (1995) In search of truth: a comparison of alternative sources of information on reproductive tract infections. Reproductive Health Matters 6, 31±39. Chiang, C. L. and Cohen, R. (1973) How to measure health: a stochastic model for an index of health. International Journal of Epidemiology 2, 7±13. Chowdhury, S., Egero, B., Myntti, C. and Rees, H. (1996) Sexual and Reproductive Health: The Challenge for Research, pp. 1±26. Swedish International Development Cooperation Agency/World Health Organisation. Cochrane Collaboration (1995) The Cochrane Pregnancy and Childbirth Database, issue 2, pp. 1±26. BMJ Publishing Group. Critical Appraisal Skills Programme (CASP) (1995) Critical appraisal skills programme. Making sense of evidence about clinical e€ectiveness. The Oxford Institute of Health Sciences, Oxford. Danel, I., Graham, W. J., Stupp, P. and Castillo, P. (1996) Applying the sisterhood method for estimating maternal mortality to a health facility-based sample: A comparison with results from a household-based sample. International Journal of Epidemiology 25(5), 1017±1022. David, P., Kawar, S. and Graham, W. J. (1991) Maternal mortality in Djibouti: an application of the sisterhood method. International Journal of Epidemiology 20(2), 551±557. Davies, H. T. O. and Crombie, I. K. (1995) Assessing the quality of care. British Medical Journal 311, 796. Department of Health (1993) Changing Childbirth: Report of the Expert Maternity Group, pp. 1±108. HMSO, London. Duley, L. (1996) Magnesium sulphate regimens for women with eclampsia: messages from the Collaborative Eclampsia Trial. British Journal of Obstetrics and Gynaecology 96, 103±105. Fauveau, V., Stewart, K., Khan, S. A. and Chakroborty, J. (1991) E€ect on mortality of community-based maternity care programme in rural Bangladesh. Lancet 338, 1183±1186. Filippi, V., Marshall, T., Bulut, A., Graham, W. J. and Yolsal, N. (1997) Asking questions about women's reproductive health: validity and reliability of survey ®ndings from Istanbul. Tropical Medicine and International Health 2(1), 47±56. Goodburn, E. A. and Graham (1997) Methodological lessons from a study of postpartum morbidity in rural Bangladesh. In Innovative Approaches to the Assessment of Reproductive Health, ed. O. Campbell. Proceedings of IUSSP Seminar. Manilla, Philippines. September 24±27 1996. Ordina/IUSSP, Liege. Graham, W. J. (1994) The sisterhood method for estimating the level of maternal mortality: seven years' experience. The Kangaroo (December), 82±87. Graham, W. J. (1997a) Devolving maternity services ± recommendations for research and development. Health Bulletin 55(4), 265±275.

Outcomes and e€ectiveness in reproductive health Graham, W. J. (1997b) Midwife-led care. Commentary. British Journal of Obstetrics and Gynaecology 104, 396± 398. Graham, W. J. and Campbell, O. M. R. (1992) Maternal health and the measurement trap. Soc. Sci. Med. 35(8), 967±977. Graham, W. J., Brass, W. and Snow, R. W. (1989) Estimating maternal mortality: the sisterhood method. Studies in Family Planning 20(3), 125±135. Graham, W. J., Filippi, V. and Ronsmans, C. (1996) Demonstrating programme impact using maternal mortality. Health Policy and Planning 11(1), 16±20. Gray, J. A. M. (1997) Evidence-Based Healthcare. Churchill Livingstone, London. Greenwood, A. M., Greenwood, B. M., Bradley, A. K., Williams, K., Shenton, F. C., Tulloch, S., Byass, P. and Old®eld, F. S. J. (1987) A prospective survey of the outcome of pregnancy in a rural area of the Gambia. Bulletin of the World Health Organisation 65, 635±643. Hanley, J. A., Hagen, C. A. and Shiferaw, T. (1996) Con®dence intervals and sample size calculations for the sisterhood method for estimating maternal mortality. Studies in Family Planning 27(4), 220±227. Hansluwka, H. E. (1985) Measuring the health of populations: indicators and interpretations. Soc. Sci. Med. 20(12), 1207±1224. Hundley, V., Gurney, E., Graham, W., Rennie, A. M. (1998) Can anxiety in pregnant women be measured by the Speilberger State-Trait Anxiety Inventory. British Journal of Midwifery (in press). ICPD (1994) Programme of Action of the United Nations International Conference on Population and Development. United Nations, New York. Janowitz, B., Wallace, S., Araujo, G. and Araujo, L. (1985) Referrals by traditional birth attendants in Northeast Brazil. American Journal of Public Health 75(7), 745±748. Kuh, D. and Stirling, S. (1995) Socio-economic variation in admission for diseases of female genital system and breast in a national cohort aged 15±43. British Medical Journal 311(September 30th), 840±843. Lennox, C. E. and Kwast, B. E. (1995) The partograph in community obstetrics. Tropical Doctor 95, 56±63. Lumley, J. (1985) Assessing satisfaction with childbirth. Birth 12(3), 141±145. Macran, S., Clarke, L. and Joshi, H. (1996) Women's health: dimensions and di€erentials. Soc. Sci. Med. 42(9), 1203±1216. Maine, D., Akalin, M. Z., Chakraborty, J., de Francisco, A. and Strong, M. (1996) Why did maternal mortality decline in Matlab? Studies in Family Planning 27(4), 179±187. McGinn, T., Maine, D., McCarthy, J. and Rosen®eld, A. (1996) Setting priorities in international reproductive health programs: A practical framework, pp. 1±50. Center for Population and Family Health, Columbia School of Public Health, New York. Munjanja, S. P., Lindmark, G. and Nystrom, L. (1996) Randomised controlled trial of a reduced-visits programme of antenatal care in Harare, Zimbabwe. Lancet 96, 364±369. Murray, C. J. L. and Lopez, A. D. (1996) The Global Burden of Disease. A Comprehensive Assessment of Mortality and Disability from Diseases, Injuries, and Risk Factors in 1990 and Projected to 2020, pp. 1±43. Harvard University Press. Nirupam, S. and Yuster, E. A. (1995) Emergency obstetric care: measuring availability and monitoring progress. International Journal of Gynaecology and Obstetrics 50(2), S79±S88. Nueld Institute for Health, NHS Centre for Reviews and Dissemination and Royal College of Physicians

1935

(1995) The management of menorrhagia. E€ective Health Care 9, 2±14. Oakley, A. (1994) Who cares for health? Social relations, gender, and the public health Journal of Epidemiology and Community Health 48, 427±434. Oakley, A., Rigby, A. S. and Hickey, D. (1994) Life stress, support and class inequality. Explaining the health of women and children. European Journal of Public Health 4, 81±91. Orchard, C. (1994) Comparing healthcare outcomes. British Medical Journal 308(June 4th), 1493±1496. Paolisso, M. and Leslie, J. (1995) Meeting the changing health needs of women in developing countries. Soc. Sci. Med. 40(1), 55±65. Paterson, C. M., Chapple, J. C., Beard, R. W., Jo€e, M., Steer, P. J. and Wright, C. S. W. (1991) Evaluating the quality of the maternity services ± a discussion paper. British Journal of Obstetrics and Gynaecology 98, 1073± 1078. Prevention of Maternal Mortality Network (1995) Situation analyses of emergency obstetric care: examples from eleven operations research projects in West Africa. Soc. Sci. Med. 40(5), 657±667. Poovan, P., Ki¯e, F. and Kwast, B. E. (1990) A maternity waiting home reduces obstetric catastrophes. World Health Forum 11, 440±445. Ratcli€e, J., Ryan, M. and Tucker, J. (1996) The costs of alternative types of routine antenatal care for low-risk women: shared care vs care by general practitioners and community midwives. Journal of Health Services Research and Policy 1(3), 135±140. Ryan, M. (1996) Using willingness to pay to assess the bene®ts of assisted reproductive techniques. Health Economics 5, 543±558. Ryan, M. and Hughes, J. (1997) Using conjoint analysis to assess women's preferences for miscarriage management. Health Economics 6, 261±273. Sackett, D. L., Rosenberg, W. M. C., Gray, J. A. M. and Haynes, R. B. (1996) Evidence-based medicine: What it is and what it isn't. British Medical Journal 312, 71±72. Schieber, B. (1991) The Quetzaltenango Project. Presentation at the Workshop on Guidelines for Safe Motherhood Programming. November 17±21, Washington, DC. The World Bank/MotherCare. Scottish Oce (1996) Clinical outcome indicators. Clinical Outcomes Working Group, pp. 3±35. HMSO, Edinburgh. Seedhouse, D. (1986) Health: The foundations for achievement. John Wiley and Sons Ltd. Shahidullah, M. (1995) The sisterhood method of estimating maternal mortality: The Matlab experience. Studies in Family Planning 26(2), 101±106. Shanks, J. and Frater, A. (1993) Health status, outcome, and attributability: is a red rose red in the dark? Quality in Health Care 2, 259±262. Siegmann, A. (1979) A classi®cation of socio-medical health indicators: perspectives for health administrators and health planners. In Socio-medical Health Indicators, eds. J. Elinson and A. Siegmann, pp. 197±213. Baywood, Farmingdale, NY. Stewart, M.K. and Festin, M. (1995) Validation study of women's reporting and recall of major obstetric complications treated at the Philippine General Hospital. International Journal of Gynaecology and Obstetrics 48 (Suppl.), S53±S66. Sundari, T. K. (1992) The untold story: how the health care systems in developing countries contribute to maternal mortality. International Journal of Health Services 22(3), 513±528. Veney, J. E. and Kaluzny, A. D. (1984) Evaluation and Decision Making for Health Services Programs. PrenticeHall, Inc., NJ.

1936

W. J. Graham

World Bank (1993) World Development Report 1993. Investing in Health, pp. 1±329. Oxford University Press, Washington. World Health Organisation (1994) Mother±Baby Package: Implementing Safe Motherhood in Countries. WHO/ FHE/MSM/94.11, pp. 1±89. World Health Organisation, Geneva. World Health Organisation (1996) Revised 1990 Estimates of Maternal Mortality. A New Approach by WHO and UNICEF. WHO/FRH/MSM/96.11, pp. 1±16. World Health Organisation, Geneva.

World Health Organisation (1997) Reproductive Health Indicators for Global monitoring. Report of an InterAgency Technical Group. WHO 9±11 April 1997. World Health Organisation, Geneva. Zurayk, H. and Kabakian, T. (1997) Measurement of reproductive morbidity: the usefulness of perceived/ reported morbidity on reproductive tract infections. In Innovative Approaches to the Assessment of Reproductive Health, ed. O. Campbell. Proceedings of IUSSP Seminar. Manilla, Philippines. September 24±27 1996. Ordina/IUSSP, Liege.