Validity of a Retrospective National Institutes of Health Stroke Scale Scoring Methodology in Patients With Severe Stroke Christopher J. Lindsell, PhD,*† Kathleen Alwell, RN, BSN,‡ Charles J. Moomaw, PhD,‡ Dawn O. Kleindorfer, MD,‡ Daniel Woo, MD,‡ Matthew L. Flaherty, MD,‡ Ellen L. Air, MD, PhD,‡ Alexander T. Schneider, MD,‡ Irene Ewing, RN,‡ Joseph P. Broderick, MD,‡ Joel Tsevat, MD, MPH,*§ and Brett M. Kissela, MD‡
Objective: Quantifying stroke severity is essential for interpreting outcomes in stroke studies; severity impacts outcomes. Because outcome studies often enroll patients some time after stroke and there is little standardization of the history and physical examination, objective measurement of stroke severity is limited. A method for retrospectively scoring the National Institutes of Health Stroke Scale (NIHSS) based on history and physical examination has been proposed, but has yet to be validated in patients with higher NIHSS score. We evaluate the validity of this scoring method across the spectrum of the NIHSS scores. Methods: The retrospective scoring algorithm was applied to history and physical examinations documented for 58 patients with ischemic stroke presenting to any of 17 regional acute care facilities who had a NIHSS score recorded by a stroke team physician. The retrospective NIHSS score was obtained by standardized chart review. Linear regression was used to estimate scale-dependent and scale-independent bias. Limits of agreement quantify deviation of the retrospective NIHSS score from the prospective NIHSS score. Results: Mean (SD) age at stroke was 66 (14) years; 27 (46.6%) patients were men, and 38 (65.5%) were white. The mean (SD) prospective NIHSS score was 13.6 (7.8); the mean (SD) retrospective NIHSS score was 13.7 (7.8). There were 23 (40%) prospective NIHSS scores above 15, and 13 scores (22%) above 20. The linear regression constant was 0.290 (95% confidence interval ⫺0.107, 0.687); the slope was 0.987 (95% confidence interval 0.962, 1.013). The R2 for the model was 0.991. Limits of agreement were ⫺1.35 and 1.59. Conclusion: The retrospective NIHSS appears valid across the entire spectrum of scores. Key Words: Outcomes—prognosis—stroke assessment. © 2005 by National Stroke Association
It is critical to quantify a patient’s stroke severity for proper interpretation of stroke outcomes research.1 Outcomes such as mortality, functional status, and quality of
life are dependent on the degree of impairment inflicted by the stroke.2,3 Also, health policy decisions often rely on comparisons of case mix and stroke outcomes4; a reliable
From the *Institute for the Study of Health, †Department of Emergency Medicine, and ‡Department of Neurology, University of Cincinnati Medical Center, Cincinnati, Ohio; and §Veterans Administration Medical Center, Cincinnati, Ohio. Received July 26, 2005; accepted August 5, 2005. Supported by grants K23NS045054 and R01NS030678 from the National Institute for Neurological Disease and Stroke.
Address correspondence to: Christopher J. Lindsell, PhD, Institute for the Study of Health, University of Cincinnati Medical Center, PO Box 670840, Cincinnati, OH 45267-0840. E-mail: christopher.lindsell@ uc.edu. 1052-3057/$—see front matter © 2005 by National Stroke Association doi:10.1016/j.jstrokecerebrovasdis.2005.08.004
Journal of Stroke and Cerebrovascular Diseases, Vol. 14, No. 6 (November-December), 2005: pp 281-283
281
C.J. LINDSELL ET AL.
282
Figure 1. Agreement between the retrospective National Institutes if Health Stroke Scale (NIHSS) and the NIHSS estimated by a stroke team physician at time of stroke. The regression equation relating retrospective NIHSS to prospective NIHSS is given. Markers indicate number of overlapping data points.
method to ascertain stroke severity is necessary to inform such decisions. In 1989, the National Institutes of Health Stroke Scale (NIHSS) for quantifying stroke severity based on a patient’s neurologic examination was proposed.5 The NIHSS has been validated as a reliable measure of stroke severity.6 The scale, which includes components related to motor loss, sensory loss, aphasia, vision, level of consciousness, level of alertness, and level of attention, was originally developed to be scored at the time a patient is evaluated. It often occurs, however, that stroke severity must be estimated by chart review because the admitting physician does not score or record the NIHSS. To quantify stroke severity based on chart review, Williams et al7 developed an algorithm for computing the NIHSS score from a patient’s history and physical examination as recorded in the medical record. This algorithm was validated in 38 patients who had an NIHSS score less than 15 in all but 4 cases; the NIHSS has a maximum value of 33 with higher values indicating more disabling strokes. Further validation of this retrospective scoring methodology was conducted with 59 patients at 3 centers. Again, the NIHSS score ranged only as high as 20 and the distribution of the raw data was not reported so it was not possible to determine the number of patients with higher scores.8 Similarly, Kasner et al9 did not report the distribution of the raw data sufficiently to estimate the number of patients with higher scores, although they
did observe an NIHSS score of 23 among their sample of 39 patients. The validity of the retrospective NIHSS scoring method for patients with higher scores remains unclear. The objective of this study was to test the validity of the retrospective scoring system among a patient group inclusive of higher NIHSS scores. We hypothesized that the retrospective NIHSS can be adequately scored from the initial history and physical examination documented by a stroke team physician, and that the magnitude of the NIHSS score does not influence the validity of the retrospective scoring method.
Materials and Methods A cohort of 451 patients presenting to one of 17 acute care facilities with a diagnosis of stroke during 1999 was prospectively enrolled in an epidemiologic investigation of stroke.10 Ischemic stroke was confirmed in each case through physician review of medical record abstracts and neuroimaging results, and was based on a case definition of focal neurologic deficit of sudden onset lasting at least 24 hours. When a stroke team physician evaluated a patient for early thrombolytic therapy, the NIHSS was prospectively scored. For the retrospective NIHSS scoring, trained nurses conducted detailed medical record abstraction that included the prospectively scored NIHSS
RETROSPECTIVE STROKE SCALE
and all of the separately documented elements required to retrospectively score the NIHSS.7 The institutional review boards of participating institutions approved the study, and patients or surrogates provided informed consent at enrollment. We used linear regression to test for scale-independent and scale-dependent bias in scoring the retrospective NIHSS; a significant intercept in the regression model suggests scale-independent bias, whereas a slope different from one suggests scale-dependent bias. We used limits of agreement to quantify the expected variation.11
Results Of 451 patients, 58 (12.8%) had an NIHSS score estimated prospectively. There were 27 men (46.6%) in the sample; 38 (65.5%) patients were white. The mean (SD) age at time of stroke was 65.7 years (14.0 years). The mean (SD) prospective NIHSS score was 13.6 (7.8) whereas the mean (SD) retrospective NIHSS score was 13.7 (7.8). There were 23 (39.7%) prospective NIHSS scores above 15, and 13 scores (22.4%) above 20. Fig 1 shows the agreement between the prospective and retrospective NIHSS scores. The 95% confidence interval (CI) for the constant in the linear regression model is ⫺0.107 to 0.687, whereas the 95% CI for the slope of the line is 0.962 to 1.013. The R2 for the model is 0.991. The lower limit of agreement is ⫺1.35 (95% CI ⫺1.69 to ⫺1.01) and the upper limit of agreement is 1.59 (95% CI 1.25 to 1.93).
Discussion Our results suggest that a retrospectively scored NIHSS is almost identical to the NIHSS score obtained prospectively by a stroke team physician. The limits of agreement indicate that a retrospective NIHSS score will most often be within two points of a prospectively estimated NIHSS score; limits are rounded because the NIHSS is an integer scale. In our sample, 10 of 58 scores (17.2%, 95% CI 9.0% to 29.9%) were discordant, 4 were within one point of the prospectively estimated NIHSS, 5 were an overestimate of two points, and one was an underestimate of 3 points. These data show that the retrospective NIHSS appears valid across the range of scores; we detected no significant scale-dependent or scale-independent bias. Although we found strong agreement between a retrospectively scored NIHSS and the NIHSS scored at the time of stroke, there are some limitations to our study. Most importantly, chart reviewers were not blinded to the prospective estimate of the NIHSS. Nevertheless, this is unlikely to have significantly influenced our results because chart reviewers: (1) were blinded to the goals of this study (this study was conceived after the chart re-
283
view was completed); (2) used standard definitions and objective criteria for abstraction; and (3) did not know which variables were combined, or how they were combined, to score the retrospective NIHSS. It is not possible, using our data, to determine the factors responsible for the observed discordance between retrospective and prospective methods of scoring the NIHSS. Understanding reasons for the observed discrepancies may benefit researchers wishing to score the NIHSS retrospectively for studies of stroke outcomes. Future plans include attempts to validate this scoring system by using the history and physical examination performed by physicians who are not trained in acute stroke care, and will consider factors that might routinely lead to discordance between prospective and retrospective scores. In summary, these results support the validity of the retrospectively scored NIHSS across the whole range of scores. The retrospective NIHSS can be used to enhance understanding of the impact of stroke severity on outcomes, and it allows comparison between retrospective studies and acute stroke studies in which the NIHSS score is obtained prospectively.
References 1. Irwin P, Rudd A, for the Intercollegiate Working Party for Stroke. Case mix and process indicators of outcome in stroke: The Royal College of Physicians minimum data set for stroke. J R Coll Physicians Lond 1998;32:442444. 2. Oxbury JM, Breenhall RCD, Grainger KMR. Predicting the outcome of stroke: Acute stage after cerebral infarction. BMJ 1975;3:125-127. 3. Westling B, Norrving B, Thorngren M. Survival following stroke: A prospective population-based study of 438 hospitalized cases with prediction according to subtype, severity and age. Acta Neurol Scand 1990;81:457-463. 4. Grieve R, Hutton J, Bhalla A, et al. A comparison of the costs and survival of hospital-admitted stroke patients across Europe. Stroke 2001;32:1684-1691. 5. Brott T, Adams HP Jr, Olinger CP, et al. Measurement of acute cerebral infarction: A clinical examination scale. Stroke 1989;20:864-870. 6. Goldstein LB, Bertels C, Davis JN. Interrater reliability of the NIH stroke scale. Arch Neurol 1989;46:660-662. 7. Williams LS, Yilmaz EY, Lopez-Yunez AM. Retrospective assessment of initial stroke severity with the NIH stroke scale. Stroke 2000;31:858-862. 8. Bushnell CD, Johnston DCC, Goldstein LB. Retrospective assessments of initial stroke severity: Comparison of the NIH stroke scale and the Canadian neurological scale. Stroke 2001;32:656-660. 9. Kasner SE, Chalela JA, Luciano JM, et al. Reliability and validity of estimating the NIH stroke scale from medical records. Stroke 1999;30:1534-1537. 10. Kissela B, Schneider A, Kleindorfer D, et al. Stroke in a biracial population: The excess burden of stroke among blacks. Stroke 2004;35:426-431. 11. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307-310.