Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

PART VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL PART VIII CHAPTER Chapter Severity Scoring Systems: Too...

172KB Sizes 2 Downloads 50 Views

PART VIII

ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

PART

VIII CHAPTER

Chapter

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units Rui P. Moreno and Philipp G. H. Metnitz

Historical Perspective Severity of Illness Assessment and Outcome Prediction Recalibrating and Expanding Existing Models New Models Available Developing Predictive Models Selecting the Target Population Outcome Selection Data Collection Selection of Variables Validation of the Model Updating Severity Scores Using a Severity of Illness Score Application of a Severity of Illness Score Organ Dysfunction/Failure Scoring Systems Multiple Organ Dysfunction Score Sequential Organ Failure Assessment Score Logistic Organ Dysfunction System Score Comparison of Scoring Systems Scoring Systems for Specific Clinical Conditions Septic Patients Trauma Patients Cardiac Surgical Patients Directions for Further Research

When you can measure a phenomenon about which you are talking and express it in numbers, you know something about it. But, when you can not express it in numbers, your knowledge is vague and unsatisfactory. It may be the beginning of knowledge, but you progressed very little toward the state of science. Lord Kelvin (1824-1907)

The goal of intensive care is to provide the highest quality of treatment in order to achieve the best outcomes for critically ill patients. Although intensive care medicine has developed rapidly over the years, scientific evidence pointing to optimal treatments and practices is minimal. Moreover, intensive care faces new economic challenges, increasing the need to provide evidence on both the effectiveness (the probability of benefit to patients from a medical technology applied for a given medical problem under average conditions of use) and efficiency (effectiveness of an intervention with respect to the resources used) of care. Intensive care is a complex process that is carried out on a very heterogeneous population and is influenced

by variables that include cultural background and differences in structure and organization of health care systems. It is therefore extremely difficult to reduce the quality of intensive care to something measurable, to quantify it, and then to compare it between different institutions. Although quality encompasses a variety of dimensions, the main interest to date focuses on effectiveness and efficiency. Other issues are less relevant if the care being provided is either ineffective or harmful. The priority must be to evaluate effectiveness. The instrument available to measure effectiveness in intensive care is outcomes research. The starting point for this research was the high degree of variability in medical processes identified in the first part of the 20th century, when epidemiological research was developing. The variation in clinical practice—including the lack of standardization—led to the search for “optimal” therapy. The undertaking of randomized controlled trials in intensive care is fraught with ethical and other difficulties. For this reason, observational studies to evaluate the effects of intensive care treatment frequently are employed. Outcomes research provides the methods necessary to compare different groups of patients and institutions. Risk adjustment (also called case mix adjustment) is the method of choice to standardize different groups of patients. The purpose of risk adjustment is to take into account all of the characteristics of patients known to affect their outcome, irrespective of the treatment received. This chapter describes the different methods and systems currently available for assessing and comparing severity of illness and outcome in critically ill patients. Starting with a short historical outline of the development of scoring systems over time is presented first, followed by a discussion of how such systems have been designed and constructed. Available systems with their applications and limitations are described next. Finally, potential applications of these systems, both at patient level and at ICU level, are reviewed.

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

74

74

HISTORICAL PERSPECTIVE Scoring systems have been in broad use in medicine for several decades. In 1953 Virginia Apgar1 published a simple scoring tool, the first general severity score designed to be applicable to a general population of newborn chil-

1547

Ch074-A04841.indd 1547

9/13/2007 11:51:54 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

dren. It was composed of five variables, easily evaluated at the bedside, that reflect cardiopulmonary and central nervous system function. Its simplicity and accuracy have never been improved on, and any child born in a hospital today receives an Apgar score at 1 and 5 minutes after birth. More than 50 years ago Dr. Apgar commented on the state of research in neonatal resuscitation: “Seldom have there been such imaginative ideas, such enthusiasms and dislikes, and such unscientific observations and study about one clinical picture.” She suggested that part of the solution to this problem would be a “simple, clear classification or grading of newborn infants which can be used as the basis for discussion and comparison of the results of obstetric practices, types of maternal pain relief and the effects of resuscitation.” Some 30 years later, physicians working in intensive care units (ICUs) found themselves saying the same thing about the state of adult critical care. Efforts to improve risk assessment during the 1960s and 1970s were directed at improving clinicians’ ability to quickly select patients most likely to benefit from promising new treatments. For example, Child and Turcotte2 created a score to measure the severity of liver disease and estimate mortality risk for patients undergoing portosystemic shunting. In 1967 Killip and Kimball classified the severity of acute myocardial infarction by the presence and severity of signs of congestive heart failure.3 In 1974 Teasdale and Jennette introduced the Glasgow Coma Scale (GCS) for reproducibly evaluating the severity of coma.4 The usefulness of the GCS has been confirmed by the consistent relationship between poor outcome and a reduced score among patients with a variety of diseases. The GCS is reliable and easy to perform, but problems with the timing of evaluation, the use of sedation, interobserver variability, and its use in prognostication have caused controversy.5 Nevertheless, the GCS remains the most widely used neurologic measure for risk assessment. The 1980s saw an explosive increase in the use of new technology and therapies in critical care. The rapidity of change and the large and growing investment in these high-cost services prompted demands for better evidence for the indications and benefit of critical care. In response, several researchers developed systems to evaluate and to compare severity of illness and outcome in critically ill patients. The first of these systems was the Acute Physiology and Chronic Health Evaluation (APACHE) system, published by Knaus and associates in 1981,6 followed soon after by the Simplified Acute Physiology Score (SAPS) from Le Gall and coworkers.7 The APACHE system subsequently was updated to APACHE II,8 and another system—the Mortality Probability Models (MPM)—joined the group.9 By the beginning of the 1990s, multiple systems were available to describe and classify ICU populations, to compare ICU patients with respect to severity of illness, and to predict mortality within the ICU. These systems performed well, but concerns included errors in prediction caused by differences in patient selection and also lead-

time bias (the effect of time and previous therapeutic interventions on the calculation of the score). Other concerns were related to adequacy of size of the database and data accrual methods. These concerns, in part, led to the development of revised systems such as APACHE III,10 the SAPS II,11 and the MPM II,12 all published between 1991 and 1993. During the mid-1990s, the need to quantify not only mortality but also morbidity in specific groups of patients became evident and led to the development of organ dysfunction scores, such as the Multiple Organ Dysfunction Score (MODS),13 the Logistic Organ Dysfunction System (LODS) score,14 and the Sequential Organ Failure Assessment (SOFA) score.15

SEVERITY OF ILLNESS ASSESSMENT AND OUTCOME PREDICTION The evaluation of severity of illness in the critically ill patient is made through the use of severity scores and prognostic models. Severity scores are instruments that aim at stratifying patients by severity of illness, assigning to each patient an increasing score as illness severity increases; apart from this stratification ability, prognostic models aim at predicting a certain outcome (usually the vital status at hospital discharge) on the basis of a given set of prognostic variables and a certain modeling equation. The development of these types of systems, applicable to heterogeneous groups of critically ill patients, started in the 1980s (Table 74-1). The first general severity of illness score applicable to most critically ill patients was the APACHE.6 Developed at George Washington University Medical Center in 1981 by Knaus and coworkers, the APACHE system demonstrated the ability to evaluate, in an accurate and reproducible form, the severity of disease in this population.16-18 Two years later, Le Gall and coworkers published a simplified version of this model, the SAPS.19 This model soon became very popular in Europe, especially in France. Another simplification of the original APACHE system, the APACHE II, was published in 1985 by the same authors of the original model.8 This system introduced the possibility to predict mortality, using for this purpose the selection of a major reason for ICU admission from a list comprising 50 operative and nonoperative diagnoses. Additional contributions for increasing prognostic ability comprise the MPM,20 developed by Lemeshow using logistic regression techniques. Further developments in this field include the third version of the APACHE system (APACHE III)10 and the second versions of the SAPS (SAPS II)11 and of the MPM (MPM II).12 All use multiple logistic regression to select and weight the variables and can compute the probability of in-hospital mortality for groups of critically ill patients. It has been demonstrated that they perform better than their previous counterparts21,22 and, as of the end of the last century, represent the state of the art in this field. Because of the lack of ongoing calibration of these models, performance of these instruments slowly

1548

Ch074-A04841.indd 1548

9/13/2007 11:51:54 AM

Panel of experts No No No Yes Yes No 34 Yes No

Method of variable selection/weighting

Variables Age Origin Surgical status Chronic health status Physiology Acute diagnosis

Number of variables

Score

Mortality prediction

No

Yes

14

Yes No No No Yes No

Panel of experts

679

8

1

1984

SAPS

Yes

Yes

17

Yes No Yes Yes Yes Yes

Panel of experts

5,815

13

1

1985

APACHE II

Yes

No

11

Yes No Yes Yes Yes No

9,222

Multiple logistic regression

2,783

1

1

1988

MPM*

1

Yes

Yes

26

Yes Yes Yes Yes Yes Yes

Multiple logistic regression

17,440

40

1991

APACHE III

Yes

Yes

17

Yes No Yes Yes Yes No

Multiple logistic regression

12,997

137

12

1993

SAPS II

Yes



Yes

Yes

20

15‡ No

Yes Yes Yes Yes Yes Yes

Multiple logistic regression

16,784

303

35

2005

SAPS 3

Yes No Yes Yes Yes Yes

Multiple logistic regression

19,124

140

12

1993

MPM II†

*These models are based on previous versions developed by Lemeshow and colleagues. The numbers presented are those for the admission component of the model (MPM II0). MPM II24 was developed from data for 15,925 patients from the same ICUs. ‡ MPM II24 uses only 13 variables. ICU, intensive care unit. Data from APACHE, Acute Physiology and Chronic Health Evaluation; SAPS, Simplified Acute Physiology Score; MPM, Mortality Probability Models.

705

2

No. of ICUs

No. of patients

1

1981

Year

No. of countries

APACHE

1

Yes

Yes

142

Yes Yes Yes Yes Yes Yes

Multiple logistic regression

110,558

104

2006

APACHE IV

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

Ch074-A04841.indd 1549

Characteristic

Table 74-1. General Severity Scores and Outcome Prediction Models

CHAPTER

74

1549

9/13/2007 11:51:54 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

deteriorates as time passes. Changes occurring over time in the baseline characteristics of admitted patients, the circumstances of ICU admission, and the availability of general and specific therapeutic measures have the potential to produce an increasing gap between actual mortality and predicted mortality.23 An increase in mean age of the admitted patients, a larger number of chronically sick patients and immunosuppresed patients, and an increase in the number of admissions due to sepsis was noted in the last years of the previous decade.24,25 Although most of the models kept an acceptable discrimination, their calibration (or prognostic accuracy) deteriorated to a point at which major changes were needed. Use of these instruments outside their sampling space was responsible for some misapplication of the instruments, especially for risk adjustment in clinical trials,26,27 as demonstrated recently.28 A new generation of general outcome prediction models has now been developed: the MPM III, developed in the IMPACT database in the United States29; new models based on computerized analysis by hierarchical regression, developed by some authors of the APACHE systems30; the APACHE IV model31; and the SAPS 3 admission model, developed by hierarchical regression using a worldwide database.32,33 Models based on other statistical techniques such as artificial neural networks and genetic algorithms have been proposed.34,35 These approaches have been reviewed recently36 and are summarized later in the chapter.

Recalibrating and Expanding Existing Models All of the existing general outcome prediction models use logistic regression equations to estimate the probabilities of a given outcome in a patient with a certain set of predictive variables. Consequently, the first approach to improve the calibration of a model when the original model is not able to adequately describe the population is to customize the model.37 Several methods and suggestions have been proposed for this exercise,38 based usually on one of two strategies: ■



1550

First level customization, or the customization of the logit, introducing slight modifications in the logistic equation (without changing the weights of the constituent variables) such as proposed by Le Gall or Apolone.39,40 Second level customization, or the customization of the coefficients of all the variables in the model as described for the MPM II0 model.37

Both of these methods have been used in the past, with some success in increasing the prognostic ability of the models.37,41 Both fail, however, when the problem with the score lies in discrimination (of observations with a positive outcome from those with a negative outcome) or in poor performance in subgroups of patients (poor uniformity of fit).42 The addition of new variables to an existing model may be useful in this context.43,44 This approach may lead to very complex models, requiring the collection of special data with a considerable increase in cost and time expenditure. The tradeoff between the burden of

Ch074-A04841.indd 1550

data collection and accuracy should be addressed on a case-by-case basis. It should be noted that the aim of first level customization—which is nothing more than a mathematical translation of the original logit in order to get a different probability of mortality—is to improve the calibration of a model and not to improve discrimination. This approach should therefore not be considered when the improvement of this parameter is considered important. Also of potential value would be a third level of customization, through introduction in the model of new prognostic variables and recalculation of the weights and coefficients for all variables, but this technique straddles both of the other approaches, customizing a model and building a new predictive model. All of these approaches have been tried recently. In France, Le Gall and colleagues customized the SAPS II model, using a retrospective database containing input for 77,490 patients hospitalized in 106 French ICUs in France between January 1, 1998, and December 31, 1999.45 On the basis of these data, the investigators evaluated the goodness of fit (calibration and discrimination) of the original SAPS II model, of a customized SAPS II, and of an expanded SAPS II developed in the training set by adding six admission variables: age, sex, length of preICU hospital stay, patient location before ICU admission, clinical category, and presence or absence of drug overdose. They concluded that the calibration of the original SAPS II calibration was poor, with marked underestimation of observed mortality, whereas discrimination was good. Customization improved calibration but had poor uniformity of fit, and discrimination was unchanged from that reported originally.37 The expanded SAPS II exhibited good calibration, good uniformity of fit, and better discrimination than in the original model. It should be noted that some ICUs had better and others worse risk-adjusted mortality with the expanded SAPS II than with the customized SAPS II. The investigators concluded that customization improved the statistical qualities of the model but gave poor uniformity of fit. Adding simple variables to create an expanded SAPS II model led to better calibration, discrimination, and uniformity of fit. Also in France, Aegerter and colleagues performed a retrospective analysis of prospectively collected data for a multicenter database including 33,471 patients from 32 ICUs belonging to the Cub-Rea* database.46 On the basis of this dataset, these investigators estimated two logistic regression models based on SAPS II: one model using first level customization (having only the SAPS II score as independent variable) and a second model reevaluating the original items of SAPS II with integration of the preadmission location and chronic comorbid conditions. Again, the more complex model had better calibration than the original SAPS II for in-hospital mortality, but its discrimination was not significantly higher. Second level customization and integration of new items improved uniformity of fit for various categories of patients except for diagnosis-related groups. The rank order of ICUs was modified according to the model used. *Cub-Rea, Collége des Utilisateurs de Base de Donnees en Réanimation.

9/13/2007 11:51:55 AM

CHAPTER

New Models Available Two other general outcome prediction models have been developed and published: the SAPS 3 admission model in 2005 and the APACHE IV in 2006. A third model, the MPM III, is available, but results are soon to be published.

The SAPS 3 Admission Model Developed by a group of investigators working on behalf of the SAPS 3 Outcomes Research Group, the SAPS 3 model was published in 2005.32,33 The study used a total of 19,577 patients consecutively admitted to 307 ICUs all over the world from October 14 to December 15, 2002. This multinational database was designed to reflect the heterogeneity of current ICU case mix and typology throughout the world, including areas outside of Western Europe and the United States. The SAPS 3 database reflects important differences in patients’ and health care systems’ baseline characteristics that are known to affect outcome. These include, for example, differences in genetic makeup and in lifestyle and related factors; heterogeneous distribution of major diseases within different regions; issues such as access to the health care system in general and to intensive care in particular; and differences in availability and use of major diagnostic and therapeutic measures within the ICU. Although the integration of ICUs outside Europe and the United States surely has increased representativeness, the extent to which the SAPS 3 database reflects case mix in ICUs worldwide cannot be determined. On the basis of data collected at ICU admission (±1 hour), the researhers developed regression coefficients by using multilevel logistic regression to estimate the probability of hospital death. The final model, which comprises 20 variables, exhibited good discrimination, without major differences across patient typology, and calibration was satisfactory. Customized equations for major areas of the world were devised and demonstrate overall goodness of fit. Of interest, determinants of hospital mortality changed remarkably from the early 1990s,10 with chronic health status and circumstances of ICU admission being responsible for almost three fourths of the prognostic power of the model.

*ICNARC, Intensive Care National Audit and Research Centre.

Ch074-A04841.indd 1551

To provide all interested intensivists with the ability to calculate and use SAPS 3 scores, an electronic tool kit for this purpose is available free of charge at the original publisher’s website (www.springer.com). Included are complete and detailed descriptions of all variables, as well as additional information on SAPS 3 performance. Moreover, the SAPS 3 Outcomes Research Group provides several additional resources at the project website (www.saps3.org).

The APACHE IV Model In early 2006, Zimmerman, one of the authors of the original APACHE models, in collaboration with colleagues from Cerner Corporation (in Vienna, Virginia), published the APACHE IV model.31 The study was based on a database of 110,558 consecutive admissions during 2002 and 2003 to 104 ICUs in 45 U.S. hospitals participating in the APACHE III database. The APACHE IV model uses the worst values during the first 24 hours in the ICU and a multivariate logistic regression procedure to estimate the probability of in-hospital death. Predictor variables were similar to those in APACHE III, but new variables were added and different statistical modeling has been used. The accuracy of APACHE IV predictions was analyzed in the overall database and in major patient subgroups. APACHE IV had good discrimination and calibration. For 90% of 116 ICU admission diagnoses, the ratio of observed to predicted mortality was not significantly different from 1.0. Predictions were compared with those for the APACHE III versions developed 7 and 14 years previously: Little change was observed in discrimination, but aggregate mortality was systematically overestimated as model age increased. When examined across disease, predictive accuracy was maintained for some diagnoses but for others seemed to reflect changes in practice or therapy. A predictive model for risk-adjusted ICU length of stay also was published by the same goup.49 More information about the model and about the possibility of determining the probability of death for individual patients is available at the website of Cerner Corporation (www.criticaloutcomes.cerner.com).

74

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

Finally, in the United Kingdom, Harrison and colleagues used a massive database with input for 141,106 patients from a total of 163 adult general critical care units in England, Wales, and Northern Ireland, during the period December 1995 to August 2003, participating in the ICNARC* database.47 These researchers compared the published versions of the APACHE II,8 APACHE II UK,48 APACHE III,10 SAPS II,11 and MPM II,12 demonstrating that all models showed good discrimination but imperfect calibration. Recalibration of the models was performed by the Cox method with re-estimation of the coefficients, leading to improved discrimination and calibration, although all models still showed significant departure from perfect calibration.

The MPM III Model The MPM III originally was described by Higgins and coworkers in 2005.29 This model was developed using data from U.S. ICUs participating in the Project IMPACT database, but no data evaluating its behavior have been published.

DEVELOPING PREDICTIVE MODELS Selecting the Target Population Most of the existing general predictive models are not applicable to all ICU patients. Data for patients with burns, patients hospitalized with coronary ischemia (or to rule out myocardial infarction), young patients (younger than 16 or 18 years of age), post–cardiac surgery patients, and patients with a very short length of ICU stay were explicitly excluded from the development of a majority

1551

9/13/2007 11:51:55 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

of systems. This limitation is especially important in evaluating specialized ICUs, with a predefined homogeneous case mix, but also can be important in evaluating general ICUs. In many cases, the application of exclusion criteria leads to an analysis of only a small proportion of the admitted patients, resulting in significant modeling errors for general use.

have therefore been useful in measuring and comparing ICU efficiency. Current outcome prediction models aim to predict survival status at hospital discharge. It is, however, incorrect to use them to predict other outcomes, such as the survival status at ICU discharge or vital status 28 days after ICU admission. Such inappropriate application will result in gross underestimation of mortality rates.53

Outcome Selection Outcome selection identifies the end point of interest. At a minimum, the selected outcome should have the following characteristics: ■ ■ ■ ■

A relatively common event Ease of definition, recognition, and measurability Clinical relevance Independence from therapeutic decisions

Mortality meets all of these criteria; however, confounding factors must be considered with use of mortality as an outcome. The location of the patient at the time of death can considerably reduce hospital mortality rates. For example, in a study of 116,340 ICU patients, a significant decline in the ratio of observed to predicted mortality was attributed to a decrease in hospital mortality as a result of earlier transfer of patients with a high severity of illness to skilled nursing facilities.50 In the APACHE III study, a significant regional difference in mortality was due entirely to variations in hospital length of stay.51 Variations in any of these factors will lead to differences between observed and predicted mortality that have little to do with case mix or effectiveness of therapy. Increases in the use of advance directives, do-not-resuscitate orders, and limitation or withdrawal of therapy all potentially increase hospital mortality. Improvements in therapy, such as the use of thrombolysis in myocardial infarction or steroids in Pneumocystis jiroveci pneumonia and the acquired immunodeficiency syndrome,52 can dramatically reduce hospital mortality. Predictive instruments for measuring long-term mortality provide accurate prognostic estimates within the first month of hospital discharge, but their accuracy falls off considerably thereafter, because other factors, such as human immunodeficiency virus infection or malignancy, dominate the long-term survival pattern. Accordingly, mortality is the most useful outcome for designing general severity of illness scores and predictive instruments. Other outcome measures represent important issues in improving ICU care. These include the following: ■ ■ ■ ■ ■ ■

Morbidity and complication rates Organ dysfunction Resource use Duration of mechanical ventilation, use of pulmonary artery catheters Quality of life after ICU or hospital discharge Length of stay in the ICU

Case mix adjustment is indispensable for studying morbidity, resource utilization, and length of stay. Although these outcomes are difficult to define and are sensitive to local conditions, they are related to the cost of care and

Data Collection The next step in the development of a general outcome prediction model is the evaluation, selection, and registration of the predictive variables. At this stage, major attention should be given to variable definitions, as well as to the time frames for data collection.54-56 Very frequently, models have been applied incorrectly; the most common errors have been identified as ■ ■ ■ ■ ■

The definitions of the variables The time frames for the evaluation and registration of the data The frequency of measurement and registration of the variables The applied exclusion criteria Data handling before analysis

It should be noted that all existing models have been calibrated for nonautomated (manual) data collection. The use of electronic patient data management systems (with high sampling rates) has been demonstrated to have a significant impact on the results57,58: The higher the sampling rate, the more outliers will be found, and the higher the scores will be. The evaluation of intra- and interobserver reliability should always be described and reported, together with the frequency of missing values.

Selection of Variables The number of variables used in severity and prognostic systems is influenced by the data collection burden, statistical considerations, measurement reliability, and frequency. Variable selection reflects a balance between adding variables with a diminishing impact on outcome and limiting variables to the strongest predictors to ease data collection and minimize processing errors. Variables should have the following characteristics: ■ ■ ■ ■

Readily available and clinically relevant Demonstrate plausible relationship to outcome and easily defined and measured Independent of treatment processes Verifiable by checks of data accuracy

Initial selection of variables can be either deductive (subjective), using terms that are known or suspected to influence outcome, or inductive (objective) using any deviation from homeostasis or normal health status. The deductive approach employs a group of experts who supply a consensus regarding the measurements and events most strongly associated with the outcome. This approach is faster and requires less computational work; APACHE I and SAPS I both started this way. A purely

1552

Ch074-A04841.indd 1552

9/13/2007 11:51:55 AM

CHAPTER

■ ■ ■ ■ ■ ■ ■ ■

Age Chronic disease status or comorbid conditions Circumstances of ICU admission Physiologic measures Reasons for ICU admission and admitting diagnoses CPR or mechanical ventilation before ICU admission Location and length of stay before admission Emergency surgery and operative status

Predictor variables should be easily defined and reliably measured to ensure uniform data collection and minimize scoring variations. For statistical purposes, variables are considered dichotomous (e.g., surgery or not), categoric (e.g., disease classification or patient location before admission), or continuous (blood pressure or heart rate). With very large sample sizes, some continuous variables may be rendered dichotomous or categoric if it is discovered that strong and biologically sound threshold values exist beyond which their numeric value has no additional significance. Assigning weights for ICU admission diagnosis or reason for ICU admission (e.g., asthma versus acute respiratory distress syndrome) will significantly augment prognostic accuracy because a similar extent of physiologic derangement reflects substantial variations in mortality risk for different diseases. Of interest, circumstances of ICU admission, such as the planned or unplanned character of the admission, have been demonstrated to be very important. Systems that include weights for admitting diagnosis must include sufficient numbers of patients in each disease category to perform statistical analyses. Predictive instruments that ignore admitting diagnosis reduce the data collection burden but perform poorly in ICUs with a case mix that differs significantly from that for the development database. Location and length of stay before ICU admission accounts, at least partly, for lead-time bias, which has an important impact on outcome. For example, a patient who received treatment for 2 days and then was admitted to the ICU is at greater risk for death than a patient with the same diagnosis and severity of illness admitted from the emergency department. The accuracy of any scoring system depends on the quality of the database from which it was developed. Even with well-defined variables, significant interobserver variability is reported.61,62 In calculating the scores, several practical issues should be considered.63,64 First, when multiple measurements of the same variables are available, which value should be used? It is true that for many of the more simple variables,

several measurements will be taken during any 24-hour period. Should the lowest, highest, or an average be taken as the representative value of that day? By general consensus, for the purposes of the score, the worst value in any 24-hour period should be considered. Second, what about missing values? Should the last known value repeatedly be considered as representative until a new value is obtained, or should the mean value between two successive values be taken? Both options make assumptions that may influence the reliability of the score. The first option assumes that knowledge of the evolution of values with time is absent, and the second assumes that changes usually are fairly predictable and regular. The second option seems preferable, because values may be missing for several days and repeating the last known value may involve considerable error in calculation. In addition, changes in most of the variables measured (platelet count, bilirubin, urea) usually are, in fact, fairly regular, moving up or down in a systematic manner.

Validation of the Model All predictive models developed for outcome prediction need to be validated to demonstrate their ability to predict the outcome under evaluation. Three aspects should be evaluated in this context: the first aspect is calibration, or degree of correspondence between the predictions of the model and observed results. The second is discrimination, or capability of the model to distinguish observations with a positive outcome from those with a negative outcome. The third is uniformity of fit of the model—the performance in various subgroups of patients. The calibration and discrimination components taken together have been named goodness of fit.

Goodness of Fit: Calibration and Discrimination

74

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

inductive strategy, used by MPM, begins with the database and tests candidate variables with a plausible relationship to outcome. In the SAPS 3 model, several complementary methods have been used, such as logistic regression on mutually exclusive categories built using smoothed curves based on LOWESS59 and regression trees (MART).60 As a practical matter, neither technique is used exclusively; all systems now use a combination of these techniques. Variables that have been used in severity and prognostic systems include the following:

As noted, goodness of fit comprises calibration and discrimination as evaluated in the analyzed population. Calibration evaluates the degree of correspondence between the estimated probabilities of mortality and the actual mortality in the analyzed sample. Four methods usually are proposed: observed-to-estimated (O/E) mortality ratio, Flora’s Z score,65 Hosmer-Lemeshow goodness of fit tests,66-68 and calibration curves. O/E mortality ratios are calculated by dividing the observed mortality (in other words, the number of deaths) by the predicted mortality (the sum of the probabilities of mortality of all patients in the sample). In a perfectly calibrated model this value should be 1. Hosmer-Lemeshow goodness of fit tests are two chisquare statistics proposed for the formal evaluation of the calibration of predictive models.66-68 In the HosmerLemeshow H test, patients are classified into 10 groups according to their probabilities of death. Then, a chisquare statistic is used to compare the observed number of deaths and the predicted number of survivors with the observed number of deaths and the observed number of survivors in each of the groups. The formula is: g ˆ g = ∑ ( o1 − e1 ) Cˆ g = H i = 1 e1 ( 1 − π1 ) 2

1553

Ch074-A04841.indd 1553

9/13/2007 11:51:55 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

with g being the number of groups (usually 10), o1 the number of events observed in group 1, e1 the number of – the mean estievents expected in the same group and π 1 mated probability, always in group 1. The resulting statistic is then compared with a chi-square table with 8 degrees of freedom (model development) or 10 degrees of freedom (model validation), in order to know if the observed differences can be explained exclusively by random fluctuation. The Hosmer-Lemeshow Cˆ test is similar, with the 10 groups containing equal numbers of patients. Hosmer and Lemeshow demonstrated that the grouping method used on the Cˆ statistics behaves better when most of the probabilities are low.66 These tests are now considered by most experts to be mandatory for the evaluation of calibration,69 although subject to criticism by some.70,71 It should be stressed that the analyzed sample must be large enough to have the power to detect the lack of agreement between predicted and observed mortality rates.38 Calibration curves also are used to describe the calibration of a predictive model. These types of graphics compare observed and predicted mortality. They can, however, be misleading, because the number of patients usually decreases from left to right (on moving from low probabilities to high probabilities), and as a consequence, even small differences in high-severity groups appear visually more important than small differences in low-probability groups. It should be stressed that calibration curves are not a formal statistical test. Discrimination evaluates the capability of the model to distinguish between patients who die and patients who survive. This evaluation can be made using a nonparametric test such as Harrell’s C index, using the order of magnitude of the error.72 This index measures the probability that, for any two patients chosen randomly, the one with the greater probability will have the outcome of interest (death). This index is directly related with the area under the receiver operating characteristic (ROC) curve and can be obtained as the parameter of the Mann-WhitneyWilcox statistic.73 Additional calculations can be used to compute the confidence interval of this measure.74 The concept of the area under the ROC curve is derived from psycho-physic tests. In an ROC curve, a series of 2 × 2 contingency tables are built, ranging from the smallest to the largest score value. For each table, the rate of true-positives (or sensitivity) and the false-positive rate (or 1 minus the specificity) are calculated. The final plot of all possible pairs of rates of true-positives versus falsepositives, then, gives the ROC curve. The interpretation of the area under the ROC curve is easy: A virtual model with a perfect discrimination would have an area of 1.0, and a model with a discrimination no better than chance an area of 0.5. Discriminative abilities of an ROC curve are said to be satisfactory with a value for this area greater than 0.70. General outcome prediction models usually have areas greater than 0.80. Several methods have been described to compare the areas under two (or more) ROC curves,75-77 but they can be misleading if the shapes of the curves are different.78

Other measures based on classification tables have been used, describing sensitivity, specificity, positive and negative predictive values, and the correct classification rates. Because these calculations must use a fixed cutoff (usually 10%, 50%, or 90%), however, their value is limited. The relative importance of calibration and discrimination depends on the intended use of the model. Some authors advise that for group comparison, calibration is especially important,79 and that for decisions involving individual patients, both parameters are important.80

Uniformity of Fit The evaluation of calibration and discrimination in the analyzed sample is now current practice. More complex is the identification of subgroups of patients in which the behavior of the model is not optimal. The presence of such subgroups, can be viewed as an influential observation in model building, and their contribution to the global error of the model can be very large.81 The most important subgroups are related to the case mix characteristics that can be eventually related to the outcome of interest. Such characteristics may include the following: ■ ■ ■ ■

The intrahospital location before ICU admission The surgical status The degree of physiologic reserve (age, comorbid conditions) The acute diagnosis (including the presence, site, and extent of infection on ICU admission)

Although some authors, such as Rowan and Goldhill in the United Kingdom48,82 and Apolone and Sicignano in Italy,40,83 have suggested that the behavior of a model can depend to a significant extent on the case mix of the sample, no consensus exists about the subpopulations for which such analysis should be mandatory.42

Updating Severity Scores Changes in the characteristics of the populations, changes in the therapy of major diseases, and the introduction of new diagnostic methods all imply modifications that result in necessary updates. Moreover, the use of a model outside its development population can eventually imply its modification and adaptation.

Using a Severity of Illness Score Calculating a Severity of Illness Score Using the original score sheets (or a well-developed and validated computer software program), a score is assigned to each variable, depending on its deviation from normal values. The arithmetic sum of these variable scores (the sum score) represents the severity score for that patient, which is then used in the equation to predict hospital mortality. As described earlier, this approach was not chosen by the MPM systems, in which the variables are used directly to calculate a probability of death in the hospital by a logistic regression equation.

1554

Ch074-A04841.indd 1554

9/13/2007 11:51:55 AM

CHAPTER

The transformation of the severity score into a probability of death in the hospital uses a logistic regression equation. The dependent variable (hospital mortality), y, is related to the set of independent (predictive) variables by the equation logit = b0 + b1x1 + b2x2 . . . bkxk with b0 being the intercept of the model, x1 to xk the predictive variables, and b1 to bk the estimated regression coefficients. The probability of death is then given by: Probability of death =

elogit 1+elogit

with the logit being y as described before. The logistic transformation included in this equation allows the Sshaped relationship between the two variables to become linear (on the logit scale). In the extremes of the score (very low or very high values), changes in the probability of death are small; for intermediate values, even small changes in the score are associated with very large changes in the probability of death. This ensures that outliers do not overly influence the prediction.

dictions. Thus, application of these models to individual patients for decision making is not recommended.106 It should not be forgotten that such statistical models are probabilistic in nature. A well-calibrated model applied to an individual patient may, for example, predict a hospital mortality rate of 46% for that person. The actual meaning of this statistic, however, is that for a group of 100 patients with a similar severity of illness, 46 patients are predicted to die; it makes no statement about whether the individual patient is included in the 46% who will eventually die or in the 54% who will eventually survive. It should be noted that severity scores have been proposed for uses as diverse as determination of the use of total parenteral nutrition107 or the identification of futility in intensive care medicine.108 Some authors have demonstrated that knowledge of predictive information will not have an adverse effect on the quality of care, helping at the same time to decrease the consumption of resources and to increase the availability of beds.109 One area for which the scientific community agrees that these models are useful is in the stratification of patients for inclusion into clinical trials and for the comparison of the balance of randomization to different groups.110

Evaluating Groups of Patients

Application of a Severity of Illness Score All existing models aim at predicting an outcome (vital status at hospital discharge) based on a given set of variables: They estimate the outcome for a patient with a certain clinical condition (defined by the registered variables), treated in a hypothetical reference ICU. Several issues, however, need to be taken into account in order to apply one of the previously described models in another population: ■ ■ ■ ■ ■

Patient selection Evaluation and registration of the predictive variables Evaluation and registration of the outcome. Computation of the severity score Transformation of the score in a probability of death

After validation, the utility and applicability of a model must be evaluated. Literature is full of models developed in large populations that failed when applied within other contexts.40,41,48,84-88 Thus, this question can be answered only by validating the model in its final population. The potential applications of a model—and consequently its utility—are different for individual patients and for groups.89

Evaluating Individual Patients Some evidence suggests that statistical methods behave better than clinicians in predicting outcome,90-97 or that they can help clinicians in the decision-making process.98-100 This opinion is, however, controversial,101-103 especially for decisions to withdraw or to withhold therapy.104 Moreover, the application of different models to the same patient frequently results in very different pre-

At the group level, general outcome prediction models have been proposed for two objectives: distribution of resources and performance evaluation. Several studies were published describing methods to identify and to characterize patients with a low risk of mortality.111-115 These patients, who require only basic monitoring and general care, eventually could be transferred to other areas of the hospital.100,116 It could, however, also be argued that these patients have a low mortality because they have been monitored and cared for in an ICU.117 Also, the use of current instruments is not recommended for the purpose of triage in the emergency department.118 Moreover, the use of early physiologic indicators outside the ICU has been questioned.119 Patient costs in the ICU depend on the amount of required (and utilized) nursing workload. Patient characteristics (diagnosis, degree of physiologic dysfunction) are not the only determinants. Costs depend also on the practices and policies in a given ICU. Focusing attention on the effective use of nursing workload120 or the dynamic evolution of the clinical course121,122 seems a more promising strategy than those approaches based exclusively on the condition of the patient during the first hours in the ICU or in the O/E length of stay in the ICU.51,123,124 On the other hand, general outcome prediction models have been proposed to identify patients who require more resources.125 Unfortunately, these patients only rarely can be identified at ICU admission, because their degree of physiologic dysfunction during the first 24 hours in the ICU tends to be moderate, although variable.126-128 Even if someday these patients can be well identified, the question of what to do with this information remains. Another important area in which these types of models have been used is in evaluation of ICU performance.

74

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

Transforming the Score into a Probability of Death

105

1555

Ch074-A04841.indd 1555

9/13/2007 11:51:55 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

Several investigators proposed the use of standardized mortality ratios (SMR) for performance evaluation, assuming that current models can take into account the main determinants of mortality.129 The SMR is calculated by dividing the observed mortality by the averaged predicted mortality (the sum of the individual probabilities of mortality for all of the patients in the sample). Additional computations can be made to estimate the confidence interval for this ratio.130 The interpretation of the SMR is easy: A ratio lower than 1 implies a performance better than that in the reference population, and a ratio greater than 1 a performance worse than in the reference population. This methodology has been used for international comparison of ICUs,17,48,62,85,131,132 comparison of hospitals,16,51,86,87,123,129,133,134 ICU evaluation,135-138 management evaluation,134,139,140 and the influence of organization and management factors on performance of the ICU.141 Before application of this methodology, six questions should always be answered: 1. Is it possible to evaluate and register all of the data needed for application of the models? 2. Can the models be used in the large majority of ICU patients? 3. Are existent models able to control for the main patient characteristics related to mortality? 4. Has the reference population been well chosen and are the models well calibrated to this population? 5. Is the sample size sufficient for meaningful differences to be identified? 6. Is vital status at ICU discharge the main performance indicator? Each of these assumptions has been questioned, and no definitive answer exists at present. Most investigators, however, believe that performance is multidimensional and consequently should be evaluated in several dimensions.23,142 The problem of sample size seems especially important with respect to the risk of a type II error (in other words, to say that there are no differences when in fact they exist). The comparison between observed and predicted might make more sense if done separately in low-, intermediate-, and high-risk patients, because the performance of an ICU can change according to the severity of the condition of the admitted patients. This approach was advocated in the past on the basis of theoretical concerns143-145 but was used in only a small number of studies.141,146 Multilevel modeling, with varying slopes, can be an answer for the developers of such models.23,147

ORGAN DYSFUNCTION/FAILURE SCORING SYSTEMS Organ failure scores are designed to describe organ dysfunction, not to predict survival. In the development of organ function scores, three important principles need to be remembered.15 First, organ failure is not a simple allor-nothing phenomenon; rather, a spectrum or continuum of organ dysfunction exists, ranging from very mild altera-

tion in function to total organ failure. Second, organ failure is not a static process, and the degree of dysfunction may vary during the course of disease so that scores need to be calculated repeatedly. Third, the variables chosen to evaluate each organ need to be objective, simple, and available but reliable, routinely measured in every institution, specific to the organ in question, and independent of patient variables, so that the score can be easily calculated on any patient in any ICU. Interobserver variability in scoring can be a problem with more complex systems,56,148 and the use of simple, unequivocal variables can avoid this potential problem. Ideally, scores should be independent of therapeutic variables, as stressed by Marshall and associates,13 but, in fact, this is virtually impossible to achieve, because all factors are more or less treatment dependent. For example, the PaO2/FIO2 ratio is dependent on ventilatory conditions and use of positive end-expiratory pressure, platelet count may be influenced by platelet transfusions, urea levels are affected by hemofiltration, and so on. The process of organ function description is relatively new, and general agreement is lacking on which organs to assess and which parameters to use. Numerous scoring systems have been developed for assessing organ dysfunction,13-15,149-157 differing in the organ systems included in the score, the definitions used for organ dysfunction, and the grading scale used.64,158 A majority of scores include six key organ systems—cardiovascular, respiratory, hematologic, central nervous, renal, and hepatic—with other systems, such as the gastrointestinal system, less commonly included. Early scoring systems assessed organ failure as either present or absent, but this approach is very dependent on where the limits for organ function are set, and newer scores consider organ failure as a spectrum of dysfunction. Most scores have been developed in the general ICU population, but some are aimed specifically at the septic patient.15,150,151,155,156 In this section, three of the more recently developed systems are discussed in some detail. The main difference among them is in the definition of cardiovascular system dysfunction (Table 74-2).

Multiple Organ Dysfunction Score The MODS (Multiple Organ Dysfunction Score) system was developed using a literature review of clinical studies of multiple organ failure from 1969 to 1993.13 Optimal descriptors of organ dysfunction were identified and validated against a clinical database. Six organ systems were chosen, and a score of 0 to 4 was allotted for each organ according to function (with 0 indicating normal function and 4, most severe dysfunction), with a maximum score of 24. With MODS, the worst score for each organ system in each 24-hour period is taken for calculation of the aggregate score. A high initial MODS correlated with ICU mortality, and the delta MODS (calculated as the MODS over the whole ICU stay less the admission MODS) was even more predictive of outcome.13 In a study of 368 critically ill patients, the MODS was found to better describe outcome groups than the APACHE II or the organ failure score, although the predicted risk of mortality was similar for all scoring systems.159 The MODS has

1556

Ch074-A04841.indd 1556

9/13/2007 11:51:55 AM

CHAPTER

74

Organ System

MODS*

SOFA†

LODS‡

Respiratory

PaO2/FiO2 ratio

PaO2/FIO2 ratio Mechanical ventilation

PaO2/FIO2 ratio Mechanical ventilation

Cardiovascular

Pressure-adjusted heart rate

Mean arterial pressure Use of vasoactive agents

Systolic arterial pressure Heart rate

Renal

Creatinine

Creatinine Urinary output

Creatinine Urinary output Urea

Hematologic

Platelets

Platelets

Platelets Leukocytes

Neurologic

Glasgow Coma Scale score

Glasgow Coma Scale score

Glasgow Coma Scale score

Hepatic

Bilirubin

Bilirubin

Bilirubin Prothrombin time

LODS, Logistic Organ Dysfunction System; MODS, Multiple Organ Dysfunction Score; SOFA, Sequential Organ Failure Assessment. *From Marshall JC, Cook DA, Christou NV, et al: Multiple organ dysfunction score: A reliable descriptor of a complex clinical outcome. Crit Care Med 1995;23:1638-1652. † From Vincent J-L, Moreno R, Takala J, et al: The SOFA (Sepsis-Related Organ Failure Assessment) score to describe organ dysfunction/failure. Intensive Care Med 1996;22:707-710. ‡ From Le Gall JR, Klar J, Lemeshow S, et al: The logistic organ dysfunction system. A new way to assess organ dysfunction in the intensive care unit. JAMA 1996;276:802-810.

been used to assess organ dysfunction in clinical studies of various groups of critically ill patients, including those with severe sepsis.160-163

Sequential Organ Failure Assessment Score The SOFA (Sequential Organ Failure Assessment) scoring system was developed in 1994 during a consensus conference organized by the European Society of Intensive Care and Emergency Medicine, in an attempt to provide a means of quantitatively and objectively describing the degree of organ failure over time in individual patients and in groups of patients with sepsis.15 Initially termed the “Sepsis-Related Organ Failure Assessment Score,” the score was then renamed the Sequential Organ Failure Assessment following the recognition that it could be applied equally to nonseptic patients. In devising the score, the participants of the conference decided to limit to six the number of systems studied: respiratory, coagulation, hepatic, cardiovascular, central nervous system, and renal. A score of 0 is given for normal function through to 4 for most abnormal, and the worst values on each day are recorded. Individual organ function can thus be assessed and monitored over time, and an overall global score also can be calculated. A high total SOFA score (SOFA max) and a high delta SOFA (the total maximum SOFA minus the admission total SOFA) have been shown to be related to a worse outcome,121,164 and the total score has been shown to increase over time in nonsurvivors compared with survivors.164 The SOFA score has been used for organ failure assessment in several clinical trials, including one in patients in septic shock.165-168

Logistic Organ Dysfunction System Score The LODS (Logistic Organ Dysfunction System) was developed in 1996 using multiple logistic regression

applied to selected variables from a large database of ICU patients.14 To calculate the score, each organ system receives points according to the worst value for any variable for that system on that day. If no organ dysfunction is present, the score is 0, rising to a maximum of 5. Because the relative severity of organ dysfunction differs between organ systems, the LODS score allows for the maximum 5 points to be awarded only to the neurologic, renal, and cardiovascular systems. For maximum dysfunction of the pulmonary and coagulation systems, a maximum of 3 points can be given for the most severe levels of dysfunction, and for the liver, the most severe dysfunction only receives 1 point. Thus, the total maximum score is 22. The LODS score is designed to be used as a once-only measure of organ dysfunction in the first 24 hours of ICU admission, rather than as a repeated assessment measure. The LODS is quite complex and seldom used; nevertheless, it has been used to assess organ dysfunction in clinical studies.169

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

Table 74-2. Organ Dysfunction/Failure Scoring Systems

Comparison of Scoring Systems The main difference among the three described models is the method chosen for the evaluation of the cardiovascular dysfunction: SOFA uses blood pressure and the level of adrenergic support; MODS uses a composed variable (the pressure-adjusted heart rate: heart rate × central venous pressure/mean arterial pressure) and mean arterial pressure; and LODS score uses the heart rate and the systolic blood pressure. A comparison analysis, published only as an abstract, was presented at the 10th Annual Congress of the European Society of Intensive Care Medicine (Paris, 1997), and the results seem to indicate a greater discriminative capability of MODS and SOFA scores over LODS score.170 Owing to the small size of the sample, however, this result requires further validation.

1557

Ch074-A04841.indd 1557

9/13/2007 11:51:55 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

Mixed models, integrating organ failure assessment scores and and general severity scores, have been published154,171 but never gained widespread acceptance.

SCORING SYSTEMS FOR SPECIFIC CLINICAL CONDITIONS Several scoring systems have been developed to be applied on subsamples of patients with specific clinical conditions, such as cardiac surgery, sepsis, trauma, and acute renal failure. This section briefly reviews the most important of these systems.

Septic Patients In several areas, use of the scoring systems discussed earlier can be beneficial in patients with septic shock, as in other groups of critically ill patients.172 First, they can be invaluable in the classification and stratification of patients for enrollment in clinical trials of new antisepsis treatments. Mortality prediction scores can be used to stratify groups of patients and assess outcome in terms of mortality, and organ dysfunction scores can help evaluate the effects of new treatments on morbidity, thereby changing the emphasis of outcome measure from mortality to morbidity. Of importance, improved morbidity must be associated with a reduced, or trend to reduced, mortality. Second, such scoring systems can be used to describe patient populations in epidemiologic studies for comparison of patients over time or from different institutions. Third, estimated probabilities of mortality and actual outcomes can be compared to create an SMR. SMRs from a cross section of different ICUs or from the same ICU over time could then be used to facilitate resource allocation. Before these scores can be used to compare ICU performances in different geographic areas or populations, however, they may need to be customized to the local population, and their use as a management instrument is limited.173 Also, doubts exist about the appropriateness of the SMR as performance indicator.42,174,175 Although these scores are useful in the prediction of mortality for a group of patients, they have not been validated to provide a precise prediction of outcome in individual patients. Clinical decisions concerning individual patient care should not be based exclusively on any scoring system, although such scores may provide valuable information to be used in addition to clinical assessment,107,176,177 even in septic patients.178 As current understanding of the pathophysiologic mechanisms underlying sepsis has advanced, some authors have proposed that the inclusion of biologic markers of disease in scoring systems may be useful in certain categories of patients, such as those with sepsis.179,180 One biologic scoring system developed by Casey and coworkers181 measured levels of lipopolysaccharide and the cytokines tumor necrosis factor-α (TNF-α and interleukins IL-1 and IL-6 and devised a total lipopolysaccharide-cytokine score that correlated well with mortality in their population of 97 patients with sepsis syndrome. The accuracy of cyto-

kine levels in the diagnosis of sepsis is controversial, however, and further study is needed to better define sepsis markers before such scores can be included in currently available disease severity scoring systems. The most recent scoring system for the evaluation of septic patients was developed from a European multicenter study.182 The Risk of Infection to Severe Sepsis and Shock Score (RISSC) was developed to examine the incidence of risk factors worsening sepsis in infected patients. The study found the incidence of worsening sepsis to be 20% at day 10 and 24% at day 30. Several factors were identified to be independently associated with the risk of worsening sepsis. The investigators concluded that the RISSC score may be a valuable tool to stratify septic patients.

Trauma Patients Several different scoring systems have been developed for the evaluation of trauma patients. Principally, two different principles have been followed: One principle is apparent in the morphologic classification of the underlying traumatic injury as it has been proposed already by the Committee on Injury Scaling—the so-called Abbreviated Injury Scale (AIS). This score classifies each injury according to the body region, the anatomic structures involved, and the level of injury. It has been revised several times and currently is available as the AIS 2005 revision (at www.carcrash.org). Subsequently, Baker and colleagues proposed the Injury Severity Score (ISS), based on the AIS.183 It uses the AIS to score the three most severely injured body regions. A second principle applies in physiology-based scores developed to quantify the underlying physiologic deviation for a trauma patient. Main representatives of this category are the Trauma Score developed by Champion and associates184 and its successor, the Revised Trauma Score (RTS).185 The latter system includes the Glasgow Coma Scale, systolic blood pressure, and respiratory rate at the time of the admission to the emergency department. The most successful score—the Trauma and Injury Severity Score (TRISS)186—was, however, again developed by Champion and associates and in fact is derived from a combination of two different principles. TRISS was the result of merging two existing systems, morphologic and physiologic, for trauma assessment: the Injury Severity Score and the RTS.185 It uses the ISS to describe the anatomic injury, the RTS to describe the physiologic malfunction, and, in addition, age as a variable to calculate a predicted probability of survival. A major problem with the TRISS results from the fact that the underlying database—namely, the Major Trauma Outcome Study (MTOS)—was a pure Anglo-American database, which poorly translates into different settings. Trauma patterns are, for example, completely different in European trauma centers. Additionally, another reported problem is a lack of prognostic accuracy in elderly patients presenting with various physiologic derangements and chronic diseases, independent from the traumatic injury.187

1558

Ch074-A04841.indd 1558

9/13/2007 11:51:55 AM

CHAPTER

Cardiac Surgical Patients Several models have been developed to risk-stratify patients who require cardiac surgery.201-204 The most widely used system was the Parsonnet score, which was developed using a database of 3500 admissions and prospectively validated in a single-center study. It used 14 variables shown to be significant in a univariate regression analysis. The Parsonnet score remained the gold standard for preoperative risk assessment more than a decade. Several models to assess the perioperative risk for cardiac surgery patients have been developed from hospital or regional databases, such as the Society of Thoracic Surgeons National Cardiac Surgery Database, the New York State Database, or the Veteran Affairs Database.205-208 A majority of these models, however, have been neither developed nor validated for use in the ICU. In 1999 the results of a multicenter European study were published: the European System for Cardiac Operative Risk Evaluation (EuroSCORE).209 A total of 19,030

patients from 128 centers who underwent cardiac surgery were included in this study. The score was constructed using multiple logistic regression analysis of 68 preoperative and 29 operative risk factors. The final score consisted of 20 variables that allowed, for the first time, a quick assessment of the patient’s operative risk. The EuroSCORE has meanwhile been validated in a variety of settings.210-213 Moreover, it has been found useful to assess costs and resource use among patients undergoing cardiac surgery,214 and to evaluate the incidence of readmission in this population.215 In addition, EuroSCORE was found to be a good predictor for complications in the perioperative setting216 and to be associated with longterm outcome after cardiac surgery.217 The EuroSCORE system exists now in two versions: additive and logistic. After the initial score was published, the authors added a version that was developed with logistic regression methods and recently published the coefficients.218 A review by Gogbashian and colleagues suggested that the additive EuroSCORE may not be well calibrated: Overestimation of mortality in low-risk patients and underestimation in high-risk patients were consistently found.219 Accordingly, these investigators concluded that using the additive EuroSCORE has the effect of penalizing those centers that take on high risk-cases. Moreover, they suggested that a systematic review of the prognostic performance of the logistic EuroSCORE should be undertaken as soon as studies using this score become available.

DIRECTIONS FOR FURTHER RESEARCH Recent years have seen the development of a new generation of general outcome prediction models. More complex than their old counterparts, relying heavily on computerized data registry and analysis (although scores with the SAPS 3 model can be still calculated easily by hand), and incorporating a more extensive array of the reasons and circumstances responsible for ICU admission, these instruments now need to be evaluated outside their development populations. The selection of a severity scoring system remains largely subjective and dependent on the reference database chosen by the user: the U.S. centers participating in the APACHE III database or a more heterogeneous sample of ICUs across all major regions of the globe. The absence of any fee for use of the SAPS 3 model and the availability of equations specific for each region of the world should be weighed against participation in a pay-for-use continuous database program that provides greater professional support and analysis of the data. No matter which model is chosen, users should keep in mind that the accuracy of these models is dynamic and should be periodically retested, and that when accuracy deteriorates the models must be revised or updated. Also, their use should be complementary and not alternative to the use of clinical evaluation, because both predictive methods are prone to error,220 especially in the individual patient.221

74

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

General severity of illness scores, such as the SAPS II, on the other hand, work well in adjustment of physiologic derangement but provide no means to describe the severity of trauma and therefore also do not perform well in trauma patients.188 Specialists in trauma care have expressed reservations about the accuracy of these methodologies,189,190 and comparisons between trauma ICUs have been rendered difficult by the malperformance of TRISS, as well as SAPS II, in trauma patients: Because the TRISS score yields unrealistically high survival probabilities, these departments perform less well than expected. A recent study by Reiter and colleagues thus tried to evaluate the combination of a general severity of illness score with the TRISS method.191 These investigators showed that the combination of both systems was superior in predicting outcome (i.e., survival at hospital discharge). If such a methodology could be of use for the assessment of trauma, ICUs would need further clarification with prospective studies. In 1990, Champion and associates published another system for the assessment of trauma patients, the ASCOT (A Severity Characterization of Trauma) score.192 The score was later validated and used in different settings. Although ASCOT was found to be superior to the TRISS in predicting outcome,70,193,194 its prognostic performance was found to be low in other settings.195,196 Further modifications of TRISS-like methodology have been published by various groups.197-199 Glance and colleagues recently published a retrospective cohort study, using more than 91,000 admissions from 69 hospitals from the National Trauma Databank.200 They used TRISS and ASCOT methodologies to calculate O/E ratios for each center and found a substantial disagreement between the two methods in identifying quality outliers. Moreover, these investigators found both methods to be poorly calibrated in this population.200 Accordingly, they concluded that it is currently impossible to use one of these systems to determine “best practice” for trauma care and recommended to update the existing sysytems.

1559

Ch074-A04841.indd 1559

9/13/2007 11:51:55 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

KEY POINTS ■

Scoring systems have been broadly used in medicine for several decades, both for clinical research and for the evaluation of ICU effectiveness.



The evaluation of severity of illness in the critically ill patient is made through the use of severity scores and prognostic models. Severity scores are instruments that aim at stratifying patients according to their severity, assigning to each patient an increasing score as illness severity increases; in addition to the stratification process, prognostic models aim at predicting a certain outcome based in a given set of prognostic variables and a specific modeling equation.



Different systems are available to describe and classify ICU populations, to compare severity of illness and to predict mortality in the patients. These systems perform globally well, but there are still concerns about errors in prediction caused by differences in patient selection, lead-time bias, sample size, representativeness of the databases used to develop the systems and poor calibration within patient subgroups and across geographic locations.



The most widely used general outcome prognostic systems for adults are APACHE II and III, the SAPS II, and the MPM II. Newer published models are the APACHE IV and the SAPS 3.



At patient-level, severity scores and prognostic models have been used for purposes as diverse as to determine the use of total parenteral nutrition, the identification of futility in intensive care medicine, the use of new therapies in sepsis, the stratification of patients for inclusion into clinical trials or the analysis of the balance of randomization in different groups during clinical trials. At group level, general outcome prediction models

have been proposed for allocation of resources and performance evaluation, though the use of the observed to expected mortality ratio or standardized mortality ratio. ■

Organ failure scoring systems are designed to measure the presence and degree of organ dysfunction or failure in critically ill patients. Most models evaluate six key organ systems, cardiovascular, respiratory, hematologic, central nervous, renal, and hepatic, with other systems, such as the gastrointestinal system, less commonly included. All of them use a combination of physiologic and therapeutic variables to assess organ dysfunction or failure.



Several scoring systems have been developed to be applied in more specific populations with specific clinical conditions, such as cardiac surgery or trauma. Models specifically developed to be used in neonates or children also are available.



The choice between existing systems remains largely subjective and will depend on the reference database selected by the user: the U.S. centers participating in the APACHE III database or a more heterogeneous sample of ICUs across all major regions of the globe. Complexity, cost, and the existence of equations specific for each region of the world should be weighted, as well as participation in a continuous database program for professional support and analysis of the data. No matter which the model is chosen, accuracy should be periodically retested, and as it deteriorates, the model must be revised or updated. The use of such models should be complementary and not alternative to the use of clinical evaluation.

REFERENCES 1. Apgar V: A proposal for a new method of evaluation of the newborn infant. Anesth Analg 1953;32: 260-267. 2. Child CG, Turcotte JG: Surgery and portal hypertension. Major Probl Clin Surg 1964;1:1-85. 3. Killip TK 3rd, Kimball JT: Treatment of myocardial infarction in a coronary care unit. Am J Cardiol 1967;20:457-464. 4. Teasdale G, Jennett B: Assessment of coma and impaired consciousness. Lancet 1974;2:81-84. 5. Bastos PG, Sun X, Wagner DP, et al: Glasgow Coma Scale score in the evaluation of outcome in the intensive care unit: Findings from the Acute Physiology and Chronic Health Evaluation III study. Crit Care Med 1993;21:1459-1465. 6. Knaus WA, Zimmerman JE, Wagner DP, et al: APACHE—Acute Physiology And Chronic Health Evaluation: A physiologically based classification system. Crit Care Med 1981;9:591-597. 7. Le Gall J-R, Loirat P, Alperovitch A: Simplified acute physiological score for intensive care patients. Lancet 1983;2:741.

8. Knaus WA, Draper EA, Wagner DP, Zimmerman JE: APACHE II: A severity of disease classification system. Crit Care Med 1985;13:818-829. 9. Lemeshow S, Teres D, Pastides H, et al: A method for predicting survival and mortality of ICU patients using objectively derived weights. Crit Care Med 1985;13:519-525. 10. Knaus WA, Wagner DP, Draper EA, et al: The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest 1991;100:1619-1636. 11. Le Gall JR, Lemeshow S, Saulnier F: A new Simplified Acute Physiology Score (SAPS II) based on a European/ North American multicenter study. JAMA 1993;270:2957-2963. 12. Lemeshow S, Teres D, Klar J, et al: Mortality Probability Models (MPM II) based on an international cohort of intensive care unit patients. JAMA 1993;270:2478-2486. 13. Marshall JC, Cook DA, Christou NV, et al: Multiple organ dysfunction score: A reliable descriptor of a complex clinical outcome. Crit Care Med 1995;23:1638-1652. 14. Le Gall JR, Klar J, Lemeshow S, et al: The logistic organ dysfunction system.

15.

16.

17.

18.

19.

20.

A new way to assess organ dysfunction in the intensive care unit. JAMA 1996;276:802-810. Vincent J-L, Moreno R, Takala J, et al: The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. Intensive Care Med 1996;22:707-710. Knaus WA, Draper EA, Wagner DP, et al: Evaluating outcome from intensive care: A preliminary multihospital comparison. Crit Care Med 1982;10:491-496. Knaus WA, Le Gall JR, Wagner DP, et al: A comparison of intensive care in the U.S.A. and France. Lancet 1982;642-646. Wagner DP, Draper EA, Abizanda Campos R, et al: Initial international use of APACHE: An acute severity of disease measure. Med Dec Making 1984;4:297. Le Gall JR, Loirat P, Alperovitch A, et al: A simplified acute physiologic score for ICU patients. Crit Care Med 1984;12:975-977. Lemeshow S, Teres D, Avrunin J, Gage RW: Refining intensive care unit outcome by using changing probabilities of mortality. Crit Care Med 1988;16:470-477.

1560

Ch074-A04841.indd 1560

9/13/2007 11:51:56 AM

CHAPTER

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46. 47.

48.

49.

50.

mortality in an ICU population. Crit Care Med 1999;27:A52. Moreno R, Afonso S: Ethical, legal and organizational issues in the ICU: Prediction of outcome. Curr Opin Crit Care 2006;12:619-623. Moreno R, Apolone G: The impact of different customization strategies in the performance of a general severity score. Crit Care Med 1997;25:2001-2008. Zhu B-P, Lemeshow S, Hosmer DW, et al: Factors affecting the performance of the models in the mortality probability model and strategies of customization: A simulation study. Crit Care Med 1996;24:57-63. Le Gall J-R, Lemeshow S, Leleu G, et al: Customized probability models for early severe sepsis in adult intensive care patients. JAMA 1995;273:644-650. Apolone G, D’Amico R, Bertolini G, et al: The performance of SAPS II in a cohort of patients admitted in 99 Italian ICUs: Results from the GiViTI. Intensive Care Med 1996;22: 1368-1378. Metnitz PG, Valentin A, Vesely H, et al: Prognostic performance and customization of the SAPS II: Results of a multicenter Austrian study. Intensive Care Med 1999;25:192-197. Moreno R, Apolone G, Reis Miranda D: Evaluation of the uniformity of fit of general outcome prediction models. Intensive Care Med 1998;24:40-47. Knaus WA, Harrell FE, Fisher CJ, et al: The clinical evaluation of new drugs for sepsis. A prospective study design based on survival analysis. JAMA 1993;270:1233-1341. Knaus WA, Harrell FE, LaBrecque JF, et al: Use of predicted risk of mortality to evaluate the efficacy of anticytokine therapy in sepsis. Crit Care Med 1996;24:46-56. Le Gall1 J-R, Neumann A, Hemery F, et al: Mortality prediction using SAPS II: An update for French intensive care units. Crit Care 2005;9:R645-R652. Aegerter P, Boumendil A, Retbi A, et al: SAPS II revisited. Intensive Care Med 2005;31:416-423. Harrison DA, Brady AR, Parry GJ, et al: Recalibration of risk prediction models in a large multicenter cohort of admissions to adult, general critical care units in the United Kingdom. Crit Care Med 2006;34:1378-1388. Rowan KM, Kerr JH, Major E, et al: Intensive Care Society’s APACHE II study in Britain and Ireland—II: Outcome comparisons of intensive care units after adjustment for case mix by the American APACHE II method. BMJ 1993;307:977-981. Zimmerman JE, Kramer AA, McNair DS, et al: Intensive care unit length of stay: Benchmarking based on Acute Physiology and Chronic Health Evaluation (APACHE) IV. Crit Care Med 2006;34:2517-2529. Sirio CA, Shepardson LB, Rotondi AJ, et al: Community-wide assessment of intensive care outcomes using a physiologically based prognostic measure: Implications for critical care delivery from Cleveland Health Quality Choice. Chest 1999;115:793.

51. Knaus WA, Wagner DP, Zimmerman JE, Draper EA: Variations in mortality and length of stay in intensive care units. Ann Intern Med 1993;118: 753-761. 52. Montaner JSG, Lawson LM, Levitt N, et al: Corticosteroids prevent early deterioration in patients with moderate severe Pneumocystis carinii pneumonia and the acquired immunodeficiency syndrome. Ann Intern Med 1990;113:14-20. 53. Moreno R, Miranda DR, Matos R, Fevereiro T: Mortality after discharge from intensive care: The impact of organ system failure and nursing workload use at discharge. Intensive Care Med 2001;27:999-1004. 54. Abizanda Campos R, Balerdi B, Lopez J, et al: Fallos de prediccion de resultados mediante APACHE II. Analisis de los errores de prediction de mortalidad en pacientes criticos. Med Clin Barc 1994;102:527-531. 55. Fery-Lemmonier E, Landais P, Kleinknecht D, Brivet F: Evaluation of severity scoring systems in ICUs: Translation, conversion and definition ambiguities as a source of interobserver variability in APACHE II, SAPS, and OSF. Intensive Care Med 1995;21:356-360. 56. Rowan K: The reliability of case mix measurements in intensive care. Curr Opin Crit Care 1996;2:209-213. 57. Bosman RJ, Oudemane van Straaten HM, Zandstra DF: The use of intensive care information systems alters outcome prediction. Intensive Care Med 1998;24:953-958. 58. Suistomaa M, Kari A, Ruokonen E, Takala J: Sampling rate causes bias in APACHE II and SAPS II scores. Intensive Care Med 2000;26:1773-1778. 59. Cleveland WS: LOWESS: A program for smoothing scatterplots by robust locally weighted regression. Am Stat 1981;35:54. 60. Ridgeway G: The state of boosting. Comput Sci Statist 1999;31:172-181. 61. Damiano AM, Bergner M, Draper EA, et al: Reliability of a measure of severity of illness: Acute physiology and chronic health evaluation II. J Clin Epidemiol 1992;45:93-101. 62. Moreno R, Reis Miranda D, Fidler V, Van Schilfgaarde R: Evaluation of two outcome predictors on an independent database. Crit Care Med 1998;26:50-61. 63. Guyatt GH, Meade MO: Outcome measures: Methodologic principles. Sepsis 1997;1:21-25. 64. Marshall JD, Bernard G, Le Gall J-R, Vincent J-L: The measurement of organ dysfunction/failure as an ICU outcome. Sepsis 1997;1:41. 65. Flora JD: A method for comparing survival of burn patients to a standard survival curve. J Trauma 1978;18:701-705. 66. Hosmer DW, Lemeshow S: Applied Logistic Regression. New York, John Wiley & Sons, 1989. 67. Lemeshow S, Hosmer DW: A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 1982;115:92-106. 68. Hosmer DW, Lemeshow S: A goodness-of-fit test for the multiple

74

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

21. Castella X, Artigas A, Bion J, for the The European/North American Severity Study Group: A comparison of severity of illness scoring systems for intensive care unit patients: Results of a multicenter, multinational study. Crit Care Med 1995;23:1327-1335. 22. Bertolini G, D’Amico R, Apolone G, et al: Predicting outcome in the intensive care unit using scoring systems: Is new better? A comparison of SAPS and SAPS II in a cohort of 1,393 patients. Med Care 1998;36:1371-1382. 23. Moreno R, Matos R: The “new” scores: What problems have been fixed, and what remain? Curr Opin Crit Care 2000;6:158-165. 24. Angus DC, Linde-Zwirble WT, Lidicker J, et al: Epidemiology of severe sepsis in the United States: Analysis of incidence, outcome and associated costs of care. Crit Care Med 2001;29:1303-1310. 25. Martin GS, Mannino DM, Eaton S, Moss M: The epidemiology of sepsis in the United States from 1979 through 2000. N Engl J Med 2003;348:1546-1554. 26. Bernard GR, Vincent J-L, Laterre P-F, et al: Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med 2001;344:699-709. 27. Ely EW, Laterre P-F, Angus DC, et al: Drotrecogin alfa (activated) administration across clinically important subgroups of patients with severe sepsis. Crit Care Med 2003;31:12-19. 28. Moreno R, Metnitz P, Jordan B, et al: SAPS 3 28 days score: A prognostic model to estimate patient survival during the first 28 days in the ICU. Intensive Care Med 2006;32:S203 (Abstract). 29. Higgins T, Teres D, Copes W, et al: Preliminary update of the Mortality Prediction Model (MPM0). Crit Care 2005;9:S97 (Abstract). 30. Render ML, Kim M, Deddens J, et al: Variation in outcomes in Veterans Affairs intensive care units with a computerized severity measure. Crit Care Med 2005;33:930-939. 31. Zimmerman JE, Kramer AA, McNair DS, Malila FM: Acute Physiology and Chronic Health Evaluation (APACHE) IV: Hospital mortality assessment for today’s critically ill patients. Crit Care Med 2006;34:1297-1310. 32. Metnitz PG, Moreno RP, Almeida E, et al: SAPS 3. From evaluation of the patient to evaluation of the intensive care unit. Part 1: Objectives, methods and cohort description. Intensive Care Med 2005;31:1336-1344. 33. Moreno RP, Metnitz PG, Almeida E, et al: SAPS 3. From evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission. Intensive Care Med 2005;31:1345-1355. 34. Dybowski R, Weller P, Chang R, Gant V: Prediction of outcome in critically ill patients using artificial neural network, synthesised by genetic algorithm. Lancet 1996;347:1146-1150. 35. Engoren M, Moreno R, Reis Miranda D: A genetic algorithm to predict hospital

1561

Ch074-A04841.indd 1561

9/13/2007 11:51:56 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

69.

70.

71.

72. 73.

74.

75.

76.

77.

78. 79.

80.

81.

82.

83.

84.

logistic regression model. Comm Stat 1980;A10:1043-1069. Hadorn DC, Keeler EB, Rogers WH, Brook RH: Assessing the Performance of Mortality Prediction Models. Santa Monica, CA, RAND/UCLA/Harvard Center for Health Care Financing Policy Research, 1993. Champion HR, Copes WS, Sacco WJ, et al: Improved predictions from a severity characterization of trauma (ASCOT) over trauma and injury severity score (TRISS): Results of an independent evaluation. J Trauma 1996;40:42-49. Bertolini G, D’Amico R, Nardi D, et al: One model, several results: The paradox of the Hosmer-Lemeshow goodness-of-fit test for the logistic regression model. J Epidemiol Biostatistics 2000;5:251-253. Harrell FE Jr, Califf RM, Pryor DB, et al: Evaluating the yield of medical tests. JAMA 1982;247:2543-2546. Hanley J, McNeil B: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36. Ma G, Hall WJ: Confidence bands for receiver operating characteristic curves. Med Decis Making 1993;13:191-197. Hanley J, McNeil B: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839-843. McClish DK: Comparing the areas under more than two independent ROC curves. Med Decis Making 1987;7:149-155. DeLong ER, DeLong DM, ClarkePearson DL: Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 1988;44:837-845. Hilden J: The area under the ROC curve and its competitors. Med Decis Making 1991;11:95-101. Schuster DP: Predicting outcome after ICU admission. The art and science of assessing risk. Chest 1992;102: 1861-1870. Kollef MH, Schuster DP: Predicting intensive care unit outcome with scoring systems. Underlying concepts and principles. Crit Care Clin 1994;10:1-18. Miller ME, Hui SL: Validation techniques for logistic regression models. Stat Med 1991;10: 1213-1226. Goldhill DR, Withington PS: The effects of casemix adjustment on mortality as predicted by APACHE II. Intensive Care Med 1996;22: 415-419. Sicignano A, Carozzi C, Giudici D, et al: The influence of length of stay in the ICU on power of discrimination of a multipurpose severity score (SAPS). Intensive Care Med 1996;22: 1048-1051. Castella X, Gilabert J, Torner F, Torres C: Mortality prediction models in intensive care: Acute Physiology and Chronic Health Evaluation II and Mortality Prediction Model compared. Crit Care Med 1991;19:191-197.

85. Sirio CA, Tajimi K, Tase C, et al: An initial comparison of intensive care in Japan and United States. Crit Care Med 1992;20:1207-1215. 86. Bastos PG, Sun X, Wagner DP, et al for the Brazil APACHE III Study Group: Application of the APACHE III prognostic system in Brazilian intensive care units: A prospective multicenter study. Intensive Care Med 1996;22:564-570. 87. Moreno R, Morais P: Outcome prediction in intensive care: Results of a prospective, multicentre, Portuguese study. Intensive Care Med 1997;23:177-186. 88. Rivera-Fernandez R, Vazquez-Mata G, Bravo M, et al: The Apache III prognostic system: Customized mortality predictions for Spanish ICU patients. Intensive Care Med 1998;24:574-581. 89. Moreno R: From the evaluation of the individual ptient to the evaluation of the ICU. Réanimation 2003;12:47S-48S. 90. Perkins HS, Jonsen AR, Epstein WV: Providers as predictors: Using outcome predictions in intensive care. Crit Care Med 1986;14:105-110. 91. Silverstein MD: Prediction instruments and clinical judgement in critical care. JAMA 1988;260:1758-1759. 92. Dawes RM, Faust D, Mechl PE: Clinical versus actuarial judgement. Sci Med Man 1989;243:1674-1688. 93. Kleinmuntz B: Why we still use our heads instead of formulas: Toward an integrative approach. Psychol Bull 1990;107:296-310. 94. McClish DK, Powell SH: How well can physicians estimate mortality in a medical intensive care unit? Med Decis Making 1989;9:125-132. 95. Poses RM, Bekes C, Winkler RL, et al: Are two (inexperienced) heads better than one (experienced) head? Averaging house officers prognostic judgement for critically ill patients. Arch Intern Med 1990;150: 1874-1878. 96. Poses RM, Bekes C, Copare FJ, et al: The answer to “what are my chances, doctor?” depends on whom is asked: Prognostic disagreement and inaccuracy for critically ill patients. Crit Care Med 1989;17:827-833. 97. Winkler RL, Poses RM: Evaluating and combining physicians’ probabilities of survival in an intensive care unit. Manag Sci 1993;39:1526-1543. 98. Chang RWS, Lee B, Jacobs S, Lee B: Accuracy of decisions to withdraw therapy in critically ill patients: Clinical judgement versus a computer model. Crit Care Med 1989;17:1091-1097. 99. Knaus WA, Rauss A, Alperovitch A, et al: Do objective estimates of chances for survival influence decisions to withhold or withdraw treatment? Med Decis Making 1990;10:163-171. 100. Zimmerman JE, Wagner DP, Draper EA, Knaus WA: Improving intensive care unit discharge decisions: Supplementary physician judgment with predictions of next day risk for life support. Crit Care Med 1994;22:1373-1384. 101. Branner AL, Godfrey LJ, Goetter WE: Prediction of outcome from critical

102.

103.

104.

105.

106.

107.

108. 109.

110.

111.

112.

113.

114.

115.

116.

117.

illness: A comparison of clinical judgement with a prediction rule. Arch Intern Med 1989;149:1083-1086. Kruse JA, Thill-Baharozin MC, Carlson RW: Comparison of clinical assessment with APACHE II for predicting mortality risk in patients admitted to a medical intensive care unit. JAMA 1988;260:1739-1742. Marks RJ, Simons RS, Blizzard RA, et al: Predicting outcome in intensive therapy units—a comparison of APACHE II with subjective assessments. Intensive Care Med 1991;17:159-163. Knaus WA, Wagner DP, Lynn J: Shortterm mortality predictions for critically ill hospitalized adults: science and ethics. Sci Med Man 1991;254: 389-394. Lemeshow S, Klar J, Teres D: Outcome prediction for individual intensive care patients: Useful, misused, or abused? Intensive Care Med 1995;21: 770-776. Suter P, Armagandis A, Beaufils F, et al: Predicting outcome in ICU patients: Consensus conference organized by the ESICM and the SRLF. Intensive Care Med 1994;20:390-397. Chang RW, Jacobs S, Lee B: Use of APACHE II severity of disease classification to identify intensive-careunit patients who would not benefit from total parenteral nutrition. Lancet 1986;1:1483-1486. Atkinson S, Bihari D, Smithies M, et al: Identification of futility in intensive care. Lancet 1994;344:1203-1206. Murray LS, Teasdale GM, Murray GD, et al: Does prediction of outcome alter patient management? Lancet 1993;341:1487-1491. Gattinoni L, Brazzi L, Pelosi P, et al: A trial of goal orientated hemodynamic therapy in critically ill patients. N Engl J Med 1995;333:1025-1032. Henning RJ, McClish D, Daly B, et al: Clinical characteristics and resource utilization of ICU patients: Implementation for organization of intensive care. Crit Care Med 1987;15:264-269. Wagner DP, Knaus WA, Draper EA: Identification of low-risk monitor admissions to medical-surgical ICUs. Chest 1987;92:423-428. Wagner DP, Knaus WA, Draper EA, et al: Identification of low-risk monitor patients within a medical-surgical ICU. Med Care 1983;21:425-433. Zimmerman JE, Wagner DP, Knaus WA, et al: The use of risk predictors to identify candidates for intermediate care units. Implications for intensive care unit utilization. Chest 1995;108:490-499. Zimmerman JE, Wagner DP, Sun X, et al: Planning patient services for intermediate care units: Insights based on care for intensive care unit low-risk monitor admissions. Crit Care Med 1996;24:1626-1632. Strauss MJ, LoGerfo JP, Yeltatzie JA, et al: Rationing of intensive care unit services. An everyday occurrence. JAMA 1986;255:1143-1146. Civetta JM, Hudson-Civetta JA, Nelson LD: Evaluation of APACHE II for cost containment and quality assurance. Ann Surg 1990;212:266-276.

1562

Ch074-A04841.indd 1562

9/13/2007 11:51:56 AM

CHAPTER

134. Zimmerman JE, Rousseau DM, Duffy J, et al: Intensive care at two teaching hospitals: An organizational case study. Am J Crit Care 1994;3:129-138. 135. Chisakuta AM, Alexander JP: Audit in intensive care. The APACHE II classification of severity of disease. Ulster Med J 1990;59:161-167. 136. Marsh HM, Krishan I, Naessens JM, et al: Assessment of prediction of mortality by using the APACHE II scoring system in intensive care units. Mayo Clin Proc 1990;65:1549-1557. 137. Turner JS, Mudaliar YM, Chang RW, Morgan CJ: Acute Physiology and Chronic Health Evaluation (APACHE II) scoring in a cardiothoracic intensive care unit. Crit Care Med 1991;19:1266-1269. 138. Oh TE, Hutchinson R, Short S, et al: Verification of the acute physiology and chronic health evaluation scoring system in a Hong Kong intensive care unit. Crit Care Med 1993;21:698-705. 139. Zimmerman JE, Shortell SM, Rousseau DM, et al: Improving intensive care: Observations based on organizational case studies in nine intensive care units: A prospective, multicenter study. Crit Care Med 1993;21:1443-1451. 140. Shortell SM, Zimmerman JE, Rousseau DM, et al: The performance of intensive care units: does good management make a difference? Med Care 1994;32:508-525. 141. Reis Miranda D, Ryan DW, Schaufeli WB, Fidler V (eds): Organization and Management of Intensive Care: A Prospective Study in 12 European Countries. Vol 29. Berlin/Heidelberg, Springer-Verlag, 1997. 142. Moreno R, Matos R: New issues in severity scoring: Interfacing the ICU and evaluating it. Curr Opin Crit Care 2001;7:469-474. 143. Teres D, Lemeshow S: Using severity measures to describe high performance intensive care units. Crit Care Clin 1993;9:543-554. 144. Teres D, Lemeshow S: Why severity models should be used with caution. Crit Care Clin 1994;10:93-110. 145. Teres D, Lieberman S: Are we ready to regionalize pediatric intensive care? Crit Care Med 1991;19:139-140. 146. Pollack MM, Alexander SR, Clarke N, et al: Improved outcomes from tertiary center pediatric intensive care: A statewide comparison of tertiary and nontertiary care facilities. Crit Care Med 1990;19:150-159. 147. Goldstein H, Spiegelhalter DJ: League tables and their limitations: Statistical issues in comparisons of institutional performance. J R Stat Soc A 1996;159:385-443. 148. Polderman KH, Thijs LG, Girbes AR: Interobserver variability in the use of APACHE II scores. Lancet 1999;353:380 (Letter). 149. Fry DE, Pearlstein L, Fulton RL, Polk HC: Multiple system organ failure. The role of uncontrolled infection. Arch Surg 1980;115:136-140. 150. Elebute EA, Stoner HB: The grading of sepsis. Br J Surg 1983;70:29-31. 151. Stevens LE: Gauging the severity of surgical sepsis. Arch Surg 1983;118:1190-1192.

152. Goris RJA, te Boekhorst TP, Nuytinck JKS, Gimbrère JSF: Multiple-organ failure. Generalized autodestructive inflammation? Arch Surg 1985;120:1109-1115. 153. Knaus WA, Draper EA, Wagner DP, Zimmerman JE: Prognosis in acute organ-system failure. Ann Surg 1985;202:685-693. 154. Chang RW, Jacobs S, Lee B: Predicting outcome among intensive care unit patients using computerised trend analysis of daily Apache II scores corrected for organ system failure. Intensive Care Med 1988;14:558-566. 155. Meek M, Munster AM, Winchurch RA, et al: The Baltimore Sepsis Scale: Measurement of sepsis in patients with burns using a new scoring system. J Burn Care Rehabil 1991;12:564. 156. Baumgartner JD, Bula C, Vaney C, et al: A novel score for predicting the mortality of septic shock patients. Crit Care Med 1992;20:953. 157. Bernard GR, Doig BG, Hudson G, et al: Quantification of organ failure for clinical trials and clinical practice. Am J Respir Crit Care Med 1995;151: A323 (Abstract). 158. Bertleff MJ, Bruining HA: How should multiple organ dysfunction syndrome be assessed? A review of the variations in current scoring systems. Eur J Surg 1997;163:405-409. 159. Jacobs S, Zuleika M, Mphansa T: The multiple organ dysfunction score as a descriptor of patient outcome in septic shock compared with two other scoring systems. Crit Care Med 1999;27:741-744. 160. Gonçalves JA, Hydo LJ, Barie PS: Factors influencing outcome of prolonged norepinephrine therapy for shock in critical surgical illness. Shock 1998;10:231-236. 161. Maziak DE, Lindsay TF, Marshall JC, et al: The impact of multiple organ dysfunction on mortality following ruptured abdominal aortic aneurysm repair. Ann Vasc Surg 1998; 12:93-100. 162. Pinilla JC, Hayes P, Laverty W, et al: The C-reactive protein to prealbumin ratio correlates with the severity of multiple organ dysfunction. Surgery 1998;124:799-805. 163. Staubach KH, Schroder J, Stuber F, et al: Effect of pentoxifylline in severe sepsis: Results of a randomized, double-blind, placebo-controlled study. Arch Surg 1998;133:94-100. 164. Vincent J-L, de Mendonça A, Cantraine F, et al: Use of the SOFA score to assess the incidence of organ dysfunction/failure in intensive care units: Results of a multicentric, prospective study. Crit Care Med 1998;26:1793-1800. 165. Di Filippo A, De Gaudio AR, Novelli A, et al: Continuous infusion of vancomycin in methicillin-resistant staphylococcus infection. Chemotherapy 1998;44:63-68. 166. Fiore G, Donadio PP, Gianferrari P, et al: CVVH in postoperative care of liver transplantation. Minerva Anestesiol 1998;64:83-87. 167. Briegel J, Forst H, Haller M, et al: Stress doses of hydrocortisone reverse hyperdynamic septic shock: A

74

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

118. Jones AE, Fitch MT, Kline JA: Operational performance of validated physiologic scoring systems for predicting in-hospital mortality among critically ill emergency department patients. Crit Care Med 2005;33: 974-978. 119. MERIT Study Investigators: Introduction of the medical emergency team (MET) system: A cluster-randomised controlled trial. Lancet 2005;365:2091-2097. 120. Moreno R, Reis Miranda D: Nursing staff in intensive care in Europe. The mismatch between planning and practice. Chest 1998;113:752-758. 121. Moreno R, Vincent J-L, Matos R, et al: The use of maximum SOFA score to quantify organ dysfunction/failure in intensive care. Results of a prospective, multicentre study. Intensive Care Med 1999;25:686-696. 122. Clermont G, Kaplan V, Moreno R, et al: Dynamic microsimulation to model multiple outcomes in cohorts of critically ill patients. Intensive Care Med 2004;30:2237-2244. 123. Zimmerman JE, Shortell SM, Knaus WA, et al: Value and cost of teaching hospitals: A prospective, multicenter, inception cohort study. Crit Care Med 1993;21:1432-1442. 124. Rapoport J, Teres D, Lemeshow S, Gehlbach S: A method for assessing the clinical performance and costeffectiveness of intensive care units: A multicenter inception cohort study. Crit Care Med 1994;22:1385-1391. 125. Teres D, Rapoport J: Identifying patients with high risk of high cost. Chest 1991;99:530-531. 126. Cerra FB, Negro F, Abrams J: APACHE II score does not predict multiple organ failure or mortality in postoperative surgical patients. Arch Surg 1990;125:519-522. 127. Rapoport J, Teres D, Lemeshow S, et al: Explaining variability of cost using a severity of illness measure for ICU patients. Med Care 1990;28:338-348. 128. Oye RK, Bellamy PF: Patterns of resource consumption in medical intensive care. Chest 1991;99:695-689. 129. Knaus WA, Draper EA, Wagner DP, Zimmerman JE: An evaluation of outcome from intensive care in major medical centers. Ann Intern Med 1986;104:410-418. 130. Hosmer DW, Lemeshow S: Confidence interval estimates of an index of quality performance based on logistic regression estimates. Stat Med 1995;14:2161-2172. 131. Rapoport J, Teres D, Barnett R, et al: A comparison of intensive care unit utilization in Alberta and Western Massachusetts. Crit Care Med 1995;23:1336-1346. 132. Wong DT, Crofts SL, Gomez M, et al: Evaluation of predictive ability of APACHE II system and hospital outcome in Canadian intensive care unit patients. Crit Care Med 1995;23:1177-1183. 133. Le Gall JR, Loirat P, Nicolas F, et al: Utilisation d’un indice de gravité dans huit services de réanimation multidisciplinaire. Presse Med 1983;12:1757-1761.

1563

Ch074-A04841.indd 1563

9/13/2007 11:51:56 AM

PART

VIII ADMINISTRATIVE, ETHICAL, AND PSYCHOLOGICAL ISSUES IN CARE OF THE CRITICALLY ILL

168.

169.

170.

171.

172.

173.

174.

175.

176.

177.

178.

179. 180.

181.

182.

prospective, randomized, doubleblind, single-center study. Crit Care Med 1999;27:723-732. Hynninen M, Valtonen M, Markkanen H, et al: Interleukin 1 receptor antagonist and E-selectin concentrations: A comparison in patients with severe acute pancreatitis and severe sepsis. J Crit Care 1999;14:63-68. Soufir L, Timsits JF, Mahe C, et al: Attributable morbidity and mortality of catheter-related septicemia in critically ill patients: A matched, risk-adjusted, cohort study. Infect Control Hosp Epidemiol 1999;20:396-401. Moreno R, Pereira E, Matos R, Fevereiro T: The evaluation of cardiovascular dysfunction/failure in multiple organ failure [abstract]. Intensive Care Med 1997; 23:S153. Timsit JF, Fosse JP, Troche G, et al: Accuracy of a composite score using daily SAPS II and LOD scores for predicting hospital mortality in ICU patients hospitalized for more than 72 h. Intensive Care Med 2001;27:1012-1021. Meade MO, Cook DJ: A critical appraisal and systematic review of illness severity scoring systems in the intensive care unit. Curr Opin Crit Care 1995;1:191. Reis Miranda D, Moreno R: ICU models and their role in management and utilization programs. Curr Opin Crit Care 1997;3:183-187. Boyd O, Grounds M: Can standardized mortality ratio be used to compare quality of intensive care unit performance? Crit Care Med 1994;22:1706-1708 (Letter). Moreno R: Performance of the ICU. Are we able to measure it? In Vincent JL (ed): 1998 Yearbook of Intensive Care and Emergency Medicine. New York, Springer-Verlag, 1998, pp 729-743. Hopfel AW, Taaffe CL, Herrmann VM: Failure of APACHE II alone as a predictor of mortality in patients receiving total parenteral nutrition. Crit Care Med 1989;17:414-417. Esserman L, Belkora J, Lenert L: Potentially ineffective care. A new outcome to assess the limits of critical care. JAMA 1995;274:1544-1551. Moreno R, Matos R, Fevereiro T, Pereira ME: À procura de um índice de gravidade na sépsis. Rev Port Med Intensiva 1999;8:43-52. Carlet J, Nicolas F: Specific severity of illness scoring systems. Curr Opin Crit Care 1995;1:233. Abecasis PB: Quantificação das alterações sistémicas como índice prognóstico em Medicina Intensiva. Lisbon, Universidade Nova de Lisboa, 1997. Casey LC, Balk RA, Bone RC: Plasma cytokine and endotoxin levels correlate with survival in patients with the sepsis syndrome. Ann Intern Med 1993;119:771. Alberti C, Brun-Buisson C, Chevret S, et al: Systemic inflammatory response and progression to severe sepsis in critically ill infected patients. Am J Respir Crit Care Med 2005;171:461-468.

183. Backer S, O’Neill B, Haddon Jr W, Long WN: The injury severity score: A method for describing patients with multiple injuries and evaluating emergency care. J Trauma 1974;14:187-196. 184. Champion HR, Sacco WJ, Carnazzo AJ, et al: Trauma score. Crit Care 1981;9:672-676. 185. Champion HR, Sacco WJ, Copes WS, et al: A revision of the Trauma Score. J Trauma 1989;29. 186. Champion HR, Sacco WJ, Hunt TK: Trauma severity scoring to predict mortality. World J Surg 1983;7:4-11. 187. Pickering SAW, Esberger D, Moran CG: The outcome following major trauma in the elderly. Predictors of survival. Injury 1999;30:703-706. 188. Sicignano A, Giudici D: Probability model of hospital death for severe trauma patients based on the Simplified Acute Physiology Score I: Development and validation. J Trauma 1997;43:585-589. 189. Unertl K, Kottler BM: [Prognostic scores in intensive care]. Anaesthesist 1997;46:471-480. 190. Barbieri S, Michieletto E, Feltracco P, et al: [Prognostic systems in intensive care: TRISS, SAPS II, APACHE III]. Minerva Anestesiol 2001;67:519. 191. Reiter A, Mauritz W, Jordan B, et al: Improving risk adjustment in critically ill trauma patients: The TRISS-SAPS score. J Trauma 2004;57:375-380. 192. Champion HR, Copes WS, Sacco WJ, et al: A new characterization of injury severity. J Trauma 1990;30:539-545. 193. Markle J, Cayten CG, Byrne DW, et al: Comparison between TRISS and ASCOT methods in controlling for injury severity. J Trauma 1992;33:326-332. 194. Hannan EL, Mendeloff J, Farrell LS, et al: Validation of TRISS and ASCOT using a non-MTOS trauma registry. J Trauma 1995;38:83-88. 195. Gabbe BJ, Cameron PA, Wolfe R, et al: Predictors of mortality, length of stay and discharge destination in blunt trauma. Austr N Z J Surg 2005;75:650-656. 196. Hannan EL, Farrell LS, Cayten CG: Predicting survival of victims of motor vehicle crashes in New York state. Injury 1997;28:607-615. 197. Schall LC, Potoka DA, Ford HR: A new method for estimating probability of survival in pediatric patients using revised TRISS methodology based on age-adjusted weights. J Trauma 2002;52:235-241. 198. Davis EG, MacKenzie EJ, Sacco WJ, et al: A new “TRISS-like” probability of survival model for intubated trauma patients. J Trauma 2003;55:53-61. 199. Osler TM, Rogers FB, Badger GJ, et al: A simple mathematical modification of TRISS markedly improves calibration. J Trauma 2002;53:630-634. 200. Glance LG, Osler TM, Dick AW; Evaluating trauma center quality: Does the choice of the severity-adjustment model make a difference? J Trauma 2005;58:1265-1271. 201. Parsonnet V, Dean D, Bernstein A: A method for uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation 1989;79:I3-I12.

202. Higgins TL, Estafanous FG, Loop FD, et al: Stratification of morbidity and mortality outcome by preoperative risk factors in coronary artery bypass patients. A clinical severity score. JAMA 1992;267:2344-2348. 203. Roques F, Gabrielle F, Michel P, et al: Quality of care in adult heart surgery: Proposal for a self-assessment approach based on a French multicenter study. Eur J Cardiothorac Surg 1995;9: 439-440. 204. Tuman KJ, McCarthy RJ, March RJ, et al: Morbidity and duration of ICU stay after cardiac surgery. A model for preoperative risk assessment. Chest 1992;102:36-44. 205. Shroyer AL, Plomondon ME, Grover FL, Edwards FH: The 1996 coronary artery bypass risk model: The Society of Thoracic Surgeons Adult Cardiac National Database. Ann Thorac Surg 1999;67:1205-1208. 206. Hannan EL, Kilburn H Jr, O’Donnell JF, et al: Adult open heart surgery in New York State. An analysis of risk factors and hospital mortality rates. JAMA 1990;264:2768-2774. 207. Grover FL, Shroyer AL, Hammermeister KE: Calculating risk and outcome: The Veterans Affairs database. Ann Thorac Surg 1996;62:S6-S11. 208. O’Connor GT, Plume SK, Olmstead EM, et al: Multivariate prediction of in-hospital mortality associated with coronary artery bypass graft surgery. Northern New England Cardiovascular Disease Study Group. Circulation 1992;85:2110-2118. 209. Nashef SA, Roques F, Michel P, et al: European System for Cardiac Operative Risk Evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999;16:9-13. 210. Kawachi Y, Nakashima A, Toshima Y, et al: Risk stratification analysis of operative mortality in heart and thoracic aorta surgery: Comparison between Parsonnet and EuroSCORE additive model. Eur J Cardiothorac Surg 2001;20:961-966. 211. Sergeant P, de Worm E, Meyns B: Single centre, single domain validation of the EuroSCORE on a consecutive sample of primary and repeat CABG. Eur J Cardiothorac Surg 2001;20:1176-1182. 212. Kurki TS, Jarvinen O, Kataja MJ, et al: Performance of three preoperative risk indices; CABDEAL, EuroSCORE and Cleveland models in a prospective coronary bypass database. Eur J Cardiothorac Surg 2002;21:406-410. 213. Nashef SA, Roques F, Hammill BG, et al: Validation of European System for Cardiac Operative Risk Evaluation (EuroSCORE) in North American cardiac surgery. Eur J Cardiothorac Surg 2002;22:101-105. 214. Sokolovic E, Schmidlin D, Schmid ER, et al: Determinants of costs and resource utilization associated with open heart surgery. Eur J Cardiothorac Surg 2002;23:574-578. 215. Chung DA, Sharples LD, Nashef SA: A case-control analysis of readmissions to the cardiac surgical intensive care unit. Eur J Cardiothorac Surg 2002;22:282-286.

1564

Ch074-A04841.indd 1564

9/13/2007 11:51:56 AM

CHAPTER

218. Roques F, Michel P, Goldstone AR, Nashef SA: The logistic EuroSCORE. Eur Heart J 2003;24:881882 (Letter). 219. Gogbashian A, Sedrakyan A, Treasure T: EuroSCORE: A systematic review of international performance. Eur J Cardiothorac Surg 2004;25:695-700. 220. Sinuff T, Adhikari NKJ, Cook DJ, et al: Mortality predictions in the intensive

care unit: Comparing physicians with scoring systems. Crit Care Med 2006;34:878-885. 221. Booth FV, Short M, Shorr AF, et al: Application of a population-based severity scoring system to individual patients results in frequent misclassification. Crit Care 2006;9: R522-R529.

74

Severity Scoring Systems: Tools for the Evaluation of Patients and Intensive Care Units

216. Gurler S, Gebhard A, Godehardt E, et al: EuroSCORE as a predictor for complications and outcome. Eur J Cardiothorac Surg 2003;51: 73-77. 217. De Maria R, Mazzoni M, Parolini M, et al: Predictive value of EuroSCORE on long term outcome in cardiac surgery patients: A single institution study. Heart 2005;91:779-784.

1565

Ch074-A04841.indd 1565

9/13/2007 11:51:56 AM