Applied Nursing Research 19 (2006) 51 – 53 www.elsevier.com/locate/apnr
Ask an Expert
Multilevel models in health outcomes research Part I: Theory, design, and measurement Eileen T. Lake, RN, PhD School of Nursing, University of Pennsylvania, Philadelphia, PA 19104, USA Accepted 11 October 2005
Health outcomes research studies are frequently multilevel in nature. Often, the focus is on individuals (e.g., patients or nurses) within organizational (e.g., nursing unit, clinic, or hospital) contexts. The multilevel nature of such studies presents researchers with conceptual, measurement, and methodological challenges. This two-part column will introduce health outcomes research and illustrate some unique aspects of its theory and methods. Nursing outcomes research can be defined most broadly as research to identify and quantify the effect of nursing practice on patient outcomes. Although most clinical nursing research could be considered outcomes research, the term boutcomes researchQ has come to be associated with a focus on how the organization of nursing impacts nursing (e.g., burnout), system (e.g., retention), and patient (e.g., 30-day mortality) outcomes rather than on the efficacy of an individual nursing interventions. The organizational focus makes this type outcomes research multilevel in nature. Health outcomes research in nursing is a fairly young field. Its official origin can be traced to the 1991 National Center for Nursing Research conference that resulted in a recommendation to b[Do] research that has the goal of explaining variation in patient outcomes attributable to nursing practice.Q Ideally, we might wish to understand how nursing practice within the one-to-one nurse-to-patient relationship influences patient outcomes. In practice, this approach is neither feasible nor useful. In most situations, more than one nurse cares for one patient. Thus, the nursing effect is shared. Moreover, the nurse manager and nursing administrators have more control over the preferred constellation of nursing staff and the environment for nursing practice than over the one-to-one interactions between 4 Tel.: +1 215 898 2557; fax: +1 215 573 2062. E-mail address:
[email protected]. 0897-1897/$ – see front matter D 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.apnr.2005.10.002
particular dyads of nurses and patients. Thus, we could not isolate the effect of an individual nurse; nor could research findings inform the nurse manager about how to cultivate the practices of individual staff. For these reasons, outcomes research must focus first on how nursing care is organized, rather than on what nurses do. In the future, outcomes research may move to examine how what nurses do influences patient outcomes. This is considered the black box that holds the causal link between how care is organized and the variation in patient outcomes. This black box likely contains nursing surveillance, judgment, and action, which are the bases, broadly speaking, for quality of care. The essential first questions for nursing outcomes research are how the number and mix of staff and the environment in which they practice influence patient outcomes. The answers to these questions will support nurse managers in making decisions that will have the broadest impact on outcomes. Nurse staffing and the practice environment are organization-level phenomena. Therefore, we must study organizations (e.g., hospitals and nursing units) as well as individuals. Does this mean that health outcomes are multilevel phenomena? No, the outcomes themselves are not. In most instances, we are interested in individual patients’ health outcomes. What make outcomes research multilevel are the research questions and the data structure. We are interested in the effects of organizational characteristics on individual outcomes. Our explanatory variables vary across organizational units. Often (but not always), our dependent variables vary across individuals. These factors imply a multilevel research design and analysis. Characteristics of higher organizational levels may be theorized to influence lower-level outcomes. The most common type of outcomes research question focuses on how nursing unit or hospital characteristics, such as staffing
52
E.T. Lake / Applied Nursing Research 19 (2006) 51–53
Fig. 1. Multilevel research question. The classic multilevel research question explores the direct effects of contextual characteristics on individual-level responses.
and the practice environment, influence patient-level outcomes, such as mortality and satisfaction. This question and the associated data structure are diagrammed in Fig. 1. In this diagram, X 2 indicates that the explanatory variable is measured at the second (higher) level and Y 1 indicates that the outcome variable is measured at the first (lower) level. Z 1 indicates a set of control variables, such as patient demographic and severity of illness characteristics, that may influence the outcome of interest. An example is a study that considered whether a hospital’s nurse staffing characteristics were associated with the likelihood of patient death following common surgical procedures (Aiken, Clarke, Cheung, Sloane, & Silber, 2003). In this study, there were two explanatory variables (X 2): the number of patients per nurse and the proportion of nurses with bachelor’s degrees. Both explanatory variables measured nurse staffing at the hospital level. Y 1 was a binary variable indicating whether the patient survived 30 days postadmission. Z 1 was a set of 133 variables, including patient age, sex, surgery type, and comorbid conditions. Some questions focus on how hospital-level characteristics influence hospital- or nursing unit-level rates or proportions, such as hospital-level mortality and nursingunit pressure ulcer prevalence. A diagram (not shown) of the research question and data structure for this type of study would have all variables (X, Y, and Z) at Level 2. An example is a study that considered whether nursing magnet hospitals had lower rates of Medicare patient mortality than a matched control group of hospitals (Aiken, Smith, & Lake, 1994). In this example, X 2 was a binary variable indicating whether the hospital was a magnet hospital or not. There were 39 magnet hospitals and 195 control hospitals. Y 2 was the observed 30-day mortality rate among Medicare discharges in each hospital. The mortality rate was approximately 11%. Z 2 was the predicted mortality rate based on the following patient characteristics: age, sex, four comorbidities, type and source of admission, and the presence and risk of hospitalization within the previous 6 months. The predicted mortality rate was used to control for differences in patient mix across hospitals. Several basic terms help differentiate the organizational levels that pertain to the research question, a particular measure, and the analysis. The focal unit is the level to which generalizations are made. This may be the individual who or the nursing unit or organization that is the focus of study. In the study of surgical mortality noted above (Aiken et al., 2003), the focal unit was the patient. The study
permitted generalizations about the likelihood of death following surgery for patients in hospitals with different staffing levels and compositions. In the study of Medicare patient mortality noted above (Aiken et al., 1994), the focal unit was the hospital. The study permitted generalizations about the rate of mortality in magnet versus nonmagnet hospitals. The level of measurement is the level to which the data are directly attached (e.g., self-report data are generally individual level data whereas the proportion of staff with a BSN degree is group level). In the surgical mortality study (Aiken et al., 2003), the proportion of nurses with a BSN degree was a hospital-level characteristic. In the Medicare mortality study (Aiken et al., 1994), the magnet characteristic was a hospital-level indicator. The mortality rate was calculated at the hospital level. The level of analysis is the unit to which the data are assigned for statistical analysis and hypothesis testing. This level is the one at which the outcome is measured: the patient in the surgical mortality example (Aiken et al., 2003) and the hospital in the Medicare mortality example (Aiken et al., 1994). A common measurement issue in multilevel research is that data derived from one level are combined to represent attributes of a higher level (Forbes & Taunton, 1994; Hughes & Anderson, 1994). Examples include hospital-level mortality and measures of the nursing practice environment that are derived from individual nurses’ responses. The concerns here are validity and reliability. When we use individual-level data to indicate something about organizations, it may be an unjustified shift in level. For example, it may be inappropriate to consider a hospital’s mortality rate when only individuals have the capacity to live or die. Are we justified in considering mortality as a hospital attribute? An example of a measure that requires evidence of reliability and validity at the organizational level is a scale to measure the nursing practice environment. The practice environment is typically measured by surveying staff nurses about the characteristics of their job (e.g., the extent to which nurse – physician relations are collegial or nurse managers are supportive of professional practice; Lake, 2002). Nurse responses are then aggregated to generate a practice environment bscore Q for a hospital or nursing unit. The key question is whether nurses’ perceptions are valid as organizational characteristics. Construct validity at the organizational level implies that an aggregate variable represents an organization-level property. Construct validity can be established several ways. The b known-groups Q approach is to document significantly higher practice environment scores in hospitals known to have exemplary environments (magnet hospitals) as compared with hospitals without distinction for their practice environments (nonmagnet hospitals; Lake, 2002; Lake & Friese, 2006). Another approach to level-specific construct validity is to show that there is more variability in scores across organizational settings (hospitals) than
E.T. Lake / Applied Nursing Research 19 (2006) 51–53
within them. An analysis of variance (ANOVA) of the individual scores (dependent variable) by the hospitals (independent variable) should yield a significant F ratio to establish this. The F ratio is the quotient of the mean square between hospitals divided by the mean square within hospitals. The mean square between hospitals is the average deviation of a hospital’s mean from the overall mean. The mean square within hospitals is the average deviation of the respondents’ scores from the hospital’s mean. Thus, a simple interpretation of the F ratio is as a variability multiplier: how many times greater is the mean square between hospitals than the mean square within hospitals? For example, an F ratio of 16 indicates that variability across hospitals is 16 times greater than that within hospitals. Another variance ratio calculated from this ANOVA procedure is the D2 statistic. This statistic indicates the proportion of the variance in the individual responses that is explained by bmembership Q in a hospital. Although D2 can range from 0 to 1, reported values have been in the range of 0.08– 0.22 in a study of managerial practices and organizational processes across 42 intensive care units (Shortell, Rousseau, Gillies, Devers, & Simons, 1991) and in the range of 0.11– 0.27 across 30 emergency departments for a measure of the quality of nursing care for 10 patient conditions (Georgopoulos, 1986). In the latter study, an D2 of at least 0.16 was considered sufficient to treat the mean score across respondents as a hospital characteristic. The reliability of aggregate measures is assessed several ways. The internal consistency of the aggregated items may be evaluated using the same approaches as those for individual-level items: Cronbach’s a, average inter-item correlation, and average item – total correlation. Just as combining the items of an individual’s responses to a scale instrument reduces the error associated with any one item, aggregating one item across individuals up to the organizational level reduces the error associated with the individuallevel data. The stability of the aggregate value across repeat samples is measured by the intraclass correlation coefficient: ICC(1,k). This statistic is important because it indicates whether similar scores would be obtained from alternate
53
samples within the same organizational setting. Both reliability statistics should exceed 0.60. In the example of the practice environment scale noted above (Lake, 2002), the average inter-item correlations across 16 hospitals ranged from 0.64 to 0.91 and the ICC(1,k) ranged from 0.86 to 0.97 across five subscales. This example illustrates how the psychometric properties of survey instruments are established at the organizational level. Next issue’s column focuses on how multilevel data challenge the assumptions underlying traditional statistical approaches. It details how multilevel analytical models overcome the limitations of traditional approaches and provide answers to multilevel questions that traditional approaches cannot.
References Aiken, L. H., Clarke, S. P., Cheung, R. B., Sloane, D. M., & Silber, J. H. (2003). Educational levels of hospital nurses and surgical patient mortality. Journal of American Medical Association, 290 (12), 1617–1623. Aiken, L. H., Smith, H. L., & Lake, E. T. (1994). Lower Medicare mortality among a set of hospitals known for good nursing care. Medical Care, 32, 771– 787. Forbes, S., & Taunton, R. L. (1994). Reliability of aggregated organizational data: An evaluation of five empirical indices. Journal of Nursing Measurement, 2 (1), 37 – 48. Georgopoulos, B. S. (1986). Organizational structure, problem solving, and effectiveness: A comparative study of hospital emergency services. San Francisco7 Jossey-Bass. Hughes, L. C., & Anderson, R. A. (1994). Issues regarding aggregation of data in nursing systems research. Journal of Nursing Measurement, 2(1), 79 – 101. Lake, E. T. (2002). Development of the Practice Environment Scale of the Nursing Work Index. Research in Nursing and Health, 25(3), 176 –188. Lake, E. T., & Friese, C. R. (2006). Variations in nursing practice environments: Relation to staffing and hospital characteristics. Nursing Research (in press). National Center for Nursing Research. (1991). Patient outcomes research: Examining the effectiveness of nursing practice. Paper presented at the State of the Science of Patient Outcomes Research, Rockville, Maryland. Shortell, S. M., Rousseau, D. M., Gillies, R. R., Devers, K. J., & Simons, T. L. (1991). Organizational assessment in intensive care units (ICUs): Construct development, reliability, and validity of the ICU Nurse– Physician Questionnaire. Medical Care, 29(8), 709 –723.