Magnetic
PI1 SO730-725X( 96) 00119-4
ELSEVIER
l
Resonance Imaging, Vol. 14, NO. 6, pp. 649-655, 1996 Copyright 0 1996 Elsevier Science Inc. Printed in the USA. All rights reserved 0730-725x/96 $15.00 + .OO
Original Contribution RELIABILITY OF BRAIN STRUCTURE MORPHOMETRY HYDROCEPHALIC CHILDREN USING MR IMAGES MICHAEL STEVEN
IN
E. BRANDT,*~- TIMOTHY P. BOHAN,~$ KELLY THORSTAD,~ R. MCCAULEY,~~ KEVIN C. DAVIDSON,\) DAVID J. FRANCIS,~~ LARRY A. KRAMER,~ AND JACK M. FLETCHER?
*Departments of Psychiatry and Behavioral Sciences, tPediatrics, SNeurology, and PRadiology, University of Texas Medical School, Houston, Texas, USA 77030-1501, and IlDepartment of Psychology, University of Houston, Houston, Texas, USA 77204-5341. To assess the ability of human operators to make decisions about region boundaries in signiiicantly malformed brains, we performed a study of the reliability of morphometric measurements of specific brain structures from MRI in children with hydrocephalus and controls. Cross-sectional area measures of the corpus callosum, internal capsules and centrum semiovale, and volumes of the lateral ventricles were made in 50 children. Independent measurements were made by two raters on T1 and T,-weighted MR images. Pearson’s correlation coefficients (r) and intraclass correlation coefficients (ICC) between the two rater’s sets of measures were computed for each structure across all subjects. ICCs ranged from a low of 0.7502 to a high of 0.9895. All ICCs were significant at the p < .OOOl level and were generally less than or equal to the corresponding Pearson’s r value in every case. Therefore, the Pearson’s r may overestimate the reliability, The results of this study support the claim that the ICC should be used rather than the Pearson’s r when assessing interater reliability in situations where large between-group differences are present. In addition, the results show that brains malformed by disorders, such as hydrocephalus, can be reliably assessed using morphometric measures of MR images. Keywords: Hydrocephalus; coefficient.
Interater
reliability;
Morphometrlc
measurement;
Intraclass
correlation
Hydrocephalus is a disorder that is characterized by an increased volume of cerebrospinal fluid (CSF) in the ventricular system. This can lead to increased intracranial pressure and loss of brain volume over time.’ The extent of hydrocephalus may be assessed by estimating the CSF volume in magnetic resonance images (MRI) that include the brain ventricles, as well as by measuring volumes of white and gray matter structures. Reliable quantification of brain tissue volumes in hydrocephalus would be of great benefit in its diagnosis, treatment, and general understanding. However, such assessments may be difficult to complete because
of the degree of brain malformation associated with ventricular dilation and the conditions that lead to early hydrocephalus. In a previous study,2 we reported on the relationship between morphometric measures of four specific brain structures and cognitive development in 44 children (32 with hydrocephalus). In this article we report results of a reliability study of the aforementioned morphometric measures. Two raters made independent measurements using a personal computer (PC) with image analysis software developed by our group. The measurements were performed on MR images from 50 children selected to depict the lateral ventricles and specific cerebral structures. The intraclass correlation coefficients (ICC) re-
2/22/95; ACCEPTED 4125196. Address correspondence to Michael E. Brandt, Ph.D., University of Texas Medical School, Department of Psycbia-
try & Behavioral Sciences, 643 1 Fannin Drive, Room 5.240, Houston, Texas 77030-1501. E-mail:
[email protected]. uth.tmc.edu
INTRODUCTION
RECEIVED
649
Magnetic Resonance Imaging 0 Volume 14, Number 6, 1996
650
ported herein lend further credence to the ability of manual measurements to reliably assess cerebral structure volumes in those children with the major malformations characteristic of hydrocephalus. Few brain structure morphometiy studies report reliability coefficients. Using a multispectral MR approach (proton-density and Tz-weighted MR images), Kohn et a1.3demonstrated reliabilities greater than 0.96 among three observers on 10 subjects. These workers used a semiautomated segmentation approach to separate brain parenchyma, lateral ventricles, and CSF in the parenchymal spaces. They also used Pearson’s correlation coefficient (r) to compute reliability. Use of the Pearson’s Y may lead to inflated estimates of reliability when r is computed based on variance estimates that collapse across groups with large mean differences in the variables being correlated. The relationship of the Pearson’s r and group structure is well-established.4 The total correlation is a function of both the withingroup covariation and the between-group differences. For example, two groups may differ in mean levels on two variables with no relationship of the latter within either group. If the group structure is ignored, positive correlations would occur because of the bimodal distribution of the variables, even though there is no covariation within either group. The use of Pearson’s r in reliability studies based on samples with an explicit group structure is common. Hynd et al.5 reported reliability coefficients (Pearson’s I) from morphometric MR measures of multiple brain regions ranging from 0.87 to 0.97. Again, the study included subgroups who differed in average values, so that the correlational estimates (r’s) of reliability may be inflated. In general, we found that even in studies that included subgroups with varying average measurements, Pearson’s T’S were invariably used for estimating reliability coefficients. None of these studies examined reliability in subjects with grossly malformed brains, such as those found in children with hydrocephalus, which may increase operator error and reduce reliability. The purpose of this article is to evaluate the reliability of morphometric analyses made in a semiautomated fashion in children with and without hydrocephalus. Both Pearson’s T’S and ICC’s are reported for absolute and relative brain area measurements. EXPERIMENTAL
METHODS
Subjects Fifty children were evaluated, including 13 normal controls. Two groups were composed of children who were predominantly shunted hydrocephalics, including
11 children with aqueductal stenosis (AS) and 16 of 18 children with spina bifida (SB ) meningomyelocele. The other two children in the latter group had arrested hydrocephalus but no shunt. Both disorders are commonly associated with pathological changes in the corpus callosum, obstructive hydrocephalus, and other brain anomalies. The other group included 8 children with SB who had spinal dysraphisms other than meningomyelocele. Six of these children had meningocele, including one child with shunted hydrocephalus, two with arrested hydrocephalus, and three with no hydrocephalus. The other two children had spinal lipomas with no hydrocephalus. Table 1 summarizes the patient demographic data. There were no significant differences in age, race, or socioeconomic status between groups (p > .5). There was a trend for the SB groups to have more girls than boys, consistent with the epidemiology of the disorder. Additional details on the subject groups can be found in Fletcher et al2 Imaging Parameters MRI was performed using a General Electric Signa system with a 1.5 Tesla superconducting magnet. Conventional 2-D spin echo sequences were employed using standard multislice/multiecho techniques. For each subject one sagittal series and two axial series of the brain were acquired. Sag&al images emphasizing differences in spin lattice relaxation time (heavily T1weighted) were obtained with a spin echo time (TE) of 20 ms, and a pulse repetition time of 800 ms. T1weighted axial images were acquired with TE = 25 ms, and TR = 900 ms. T2-weighted axial images were acquired with TE = 80 ms, and TR = 3000 ms, and proton-density (PD) weighted images were obtained with TE = 30 ms, and TR = 2000 ms. ICC’s were not analyzed for the PD-weighted image set. For the T,-weighted sagittal and T2-weighted axial images slice thickness was 5 mm and a 2.5-m interslice gap was used to avoid cross-talk between adjacent images. The T,-weighted axial images were 3-mm thick with a l-mm interslice spacing. The field of view parameter for each image was 24 cm, and the image resolution was 256 by 256. The x and y dimension resolution was therefore 0.938 mm/pixel. Intensity resolution was 8 bits per image pixel. The digitized images were archived to magnetic tape and analyzed using a 486-based PC with image analysis software developed by our group. Cross-sectional area measurements of the corpus callosum were made from the most midline sagittal slice depicting this structure. In addition, each sagittal slice that included the left and right lateral ventricles was identi-
Reliability of brain structure morphometry in hydrocephalic children l
M.E.
BRANDT
651
ETAL.
Table 1. Age, race, gender, and socioeconomic status (SES) by group Group Variable Number Mean (LSD) Age (MO.) Race white Black Hispanic Other Gender M F SES Low Middle High
Meingomyelocele
Meningocele/Other
Aqueductal Stenosis
Normal
18 90.7 -t- 9.5
8 93.7 2 12.2
11 89.5 2 13.4
13 91.4 + 5.8
12 1 3 2
6 0 2 0
7 3 0 1
10
6 12
3 5
7 4
8 5
4 4 10
3 1 4
6 1 4
3 4 6
fied. The internal capsules (projection fibers) and the centra semiovale (association fibers) above the lateral ventricles were identified on the Tz-weighted axial images for subsequent area measurement. For the corpus callosum, internal capsules, and centra semiovale, absolute area measures of these three structures were made from a single slice depicting the structure as well as the absolute area of brain or hemisphere for the respective slice. For the lateral ventricles a total volume was estimated by computing the sum of the product of each slice area multiplied by slice thickness and included the intergap volumes. A volume for each gap was estimated as one-half the average of the slice volumes on either side of the gap. Morphometric Measurement Procedure
Images were reproduced using 64 colors. Each color corresponded to 4 gray scale levels in the original MR image (256 total gray scale values). The color palette was created by mixing low, medium, and high intensity values of red, green, and blue. The two operators subjectively reported that the use of color, as opposed to black and white, enhanced their ability to delineate and thus make morphometric measurements of the four main structures of interest. It is well known that the human eye can discern thousands of color shades and intensities, as opposed to only one or two dozen gray shades at any point in an image by the average observer.6 Histogram equalization6 was used to enhance some of the images prior to area measurement, but both raters performed this function on the same images so that the overall reliability between them would not be decreased. A “mouse” was used as a pointing de-
0 2 1
vice, and also to draw delimiting curves onto the image freehand. These curves were used to separate portions of the image, such as cerebrum from cerebellum, prior to computation of structure area. The operator performed an area measurement by first placing the mouse cursor in the interior of the structure to be measured, followed by pressing the mouse button. The structure would be filled with the color white up to its boundaries depending on the color of the pixel pointed to and the number of discriminating colors preselected. A pixel count of the filled region was simultaneously displayed onscreen. The operator could choose to select a region corresponding to the color of the pixel pointed to (cp), or could select a range of colors about cp . The possible ranges were: cp to cp + 4; cp - 4 to cp ; cp +- 4; cp to cp + 8; cp - 8 to cp ; cp 2 8; cp to cp + 12; cp - 12 to cp ; and cp + 12. The ranges selected were based on observations of the original films, the color images themselves, and visual comparisons of the results. Both raters used the same ranges to fill structures in the images. A simple region filling procedure was used to count the total number of pixels enclosed within an irregularly shaped closed structure boundary.7 The structure area in millimeters squared was then computed from the number of pixels counted and the field of view parameter. For the lateral ventricles, the area within each image slice depicting a portion of the ventricle was multiplied by slice thickness and summed across the total number of slices that depicted each ventricle, including the estimates accounting for gap volume. The result was then expressed in cubic millimeters. Likewise, total volume
652
Magnetic Resonance Imaging 0 Volume 14, Number 6, 1996
of brain for the slices depicting a portion of each lateral ventricle was computed in the same manner. A small amount of freehand drawing directly on the digital image was necessary to prevent “bleeding” of the region-filling method into adjacent areas of the image, for example, to separate sinuses and cerebellum from cerebrum or hemisphere. The majority of the quantification differences between the raters was due to the manual outlining that was necessary. Both operators were required to be conservative in their measurements by observing the following set of guidelines: 1. Attempt to make measurements on the raw MR images first. If that was not possible, then perform histogram equalization and remeasure. 2. Use the cp rt 4 option for measuring each structure. 3. If the structure to be quantified is discontinuous, measure the separate regions and sum them. Only include regions that are more than 100 mm’ in area. This prevents endless additions of very small areas. 4. When making multiple measurements within an image for a particular structure, point to the same color with the mouse. This color does not have to be the same for all structures measured for a given subject. 5. Do not use freehand drawing (delimiting) unless bleeding occurs, in which case it becomes necessary. 6. When quantifying lateral ventricle size, do not include the “fuzzy” borders around them (mixture of tissue and CSF). 7. When quantifying centrum semiovale area from
transaxial slice images, if bleeding occurs, limit it by drawing a border about 2-5 mm inside the outer boundary of the brain prior to measurement. Figure 1 shows an example of the result of segmentation of the corpus callosum in a typical midsagittal MR grayscale image for one subject. The left side of Fig. 1 is the image prior to segmentation and the right side shows the segmented region displayed in white. Note that the fornix has been separated from the corpus callosum in two places using freehand drawing (black lines) prior to automatic segmentation. Table 2 provides means and standard deviations for each measure (area in mm2 for corpus callosum, internal capsula, and centra semiovale; volume in mm3 for lateral ventricles) by group and by rater. Table 2 shows that there are large mean differences between groups. For example, both raters were able to measure the large decreasesin corpus callosal area from the midsagittal
slice of the three hydrocephalus groups as compared to the control group. The hydrocephalus groups also had larger left and right lateral ventricle areas compared to controls, and decreased internal capsule areas compared to controls. In data where there are large mean differences, the Pearson’s r may overestimate the relationship between two independent raters. Intraclass Correlation Computation The two raters made independent measurements on each brain structure in all subjects. Hence, the number of raters is fixed and each rater evaluates the same
Fig. 1. SamplemidsagittalT,-weighted MR image for measuringareaof the corpuscallosum,original imageon left side, corpuscallosumshownsegmentedin white on right side. Note manually drawn lines in black for separatingfomix from corpuscallosumprior to automaticsegmentation.
Reliability of brain structure morphometry in hydrocephalic children 0 M.E. BRANDT ET AL.
653
Table 2. Means and standarddeviationsfor absolutemeasurements of four brain structures,whole brain, and left and right hemispheres by rater and subjectgroup A. Rater 1 (KT) Group Meningomyelocele Structure
N
Corpuscallosum Whole brain Right lateral ventricle Whole brain Left lateral ventricle Whole brain Right internal capsule Hemisphere Left internal capsule Hemisphere Right centrum semiovale Hemisphere Left centrum semiovale Hemisphere
1% 1%
1% 1% 1% 18 16 16 16 16 1.5 15 15 15
Mean 256 11712 8250 208219 11879 19301% 131 7873 137 8009 1223 6035 1214 6147
Meningocele/Other
Stdev
N
Mean
Stdev
93 1023 10936 67247 15926 89730 49 743 39 832 655 1142 695 1246
8 413 145 8 10853 1071 8 4746 7420 8 196103 39383 8 6862 12501 8 194851 93605 8 127 49 8 8913 115% 8 147 4% 8 8877 1299 1534 472 8 8 6900 883 8 1273 64% 8 6967 1164 B. Rater 2 (SM)
Aqueductal Stenosis N
Mean
10 342 10 12037 11 6804 11 185273 11 7829 11 224927 9 133 9 832% 9 147 9 8265 7 854 7 6885 7 806 7 7066
Normal
Stdev
N
Mean
17% 2339 11750 87293 1335% 90543 103 1110 115 1316 450 1246 608 1652
13 562 107 13 10697 1160 13 44% 270 13 147410 58936 13 494 227 13 139589 35034 13 146 65 13 8737 1024 13 67 176 13 8730 944 13 1100 529 13 7012 963 13 1189 543 13 6822 1084
Stdev
Group Meningomyelocele Structure
N
Corpuscallosum Whole brain Right lateral ventricle Whole brain Left lateral ventricle Whole brain Right internal capsule Hemisphere Left internal capsule Hemisphere Right centrum semiovale Hemisphere Left centrum semiovale Hemisphere
Mean
Stdev
Meningocele/Other N
18 340 91 8 1% 11464 1221 8 1% 10170 12863 8 1% 202364 63840 8 1% 13316 16173 8 1% 195611 88714 8 16 15% 45 8 16 7916 76% 8 16 159 41 8 16 8166 763 8 15 1246 792 8 15 6255 1092 8 15 1327 829 8 6231 97% 8 15
Mean
N
Mean
Stdev
Normal N
493 134 10 452 169 13 10891 1062 10 11630 1984 13 4502 695% 11 8931 1313% 13 190919 41855 11 182415 84813 13 7451 13643 11 10849 14667 13 183332 88764 11 211773 100299 13 146 55 9 169 112 13 8456 8909 115% 9 1175 13 179 7% 9 182 115 13 8935 1245 9 8396 1367 13 1519 654 7 1216 1137 13 6936 745 7 7214 1240 13 1170 686 7 1129 1127 13 7122 977 7 7387 1541 13
targets (i.e., brain structures). This corresponds to Case 3 in the computation of ICC outlined by Shrout and PIeiss4 yielding the following relation: ICC =
Stdev
Aqueductal Stenosis
BMS - EMS BMS + (k - 1)EMS
where BMS is the between-targetsmean square,EMS is the within-target residual (error) mean square, and k is the number of raters (two in our case). The BMS is simply the between-subjectsmean squareerror, while the
Mean
Stdev
546 103 11103 791 976 601 148567 58506 1161 667 132639 38657 165 71 8751 1003 185 72 8858 937 1341 727 7777 913 1301 796 7684 967
EMS is the within-subjects mean squareerror. Both are computed using a standardRater X Brain Structure ANOVA. In this instance, an ANOVA is computed separately for each brain structure to obtain the BMS and EMS values necessaryto calculate ICC from the above relation. Each ANOVA was computed using the SAS General Linear Models (GLM) procedure.
RESULTS Table 3 lists the Pearson’s r’s and ICC’s rounded to 4 decimal places for 14 absolute measurementswith
654
Magnetic Resonance Imaging l Volume 14, Number 6, 1996
Table 3. Pearson’sr’s and ICC’s for absolute measurements on four brain structures,whole brain, and left and right hemispheres Structure
Number Pearson’sr
Corpuscallosum Whole brain Right lateral ventricle Whole brain Left lateral ventricle Whole brain Right internal capsule Hemisphere Left internal capsule Hemisphere Right centrum semiovale Hemisphere Left centmm semiovale Hemisphere
ICC*
49
0.7735
49
0.7569
0.7502
50
0.9801
0.9713
50 50 50 46 46 46 46
0.9903 0.9843 0.9161 0.9025 0.9869 0.8606 0.9866 0.8354 0.8945 0.8878 0.8707
0.9895 0.9829 0.9159 0.9009 0.9869 0.8582 0.9863 0.7952 0.8939 0.8558 0.8696
43 43 43 43
0.7717
*All ICC’s significant atp < .OOOl level.
the number of subjects as shown. Sample sizes for corpus callosum, internal capsule, and centrum semiovale measureswere reduced from a total N of 50 to those shown in Table 3, since these structures could not be accurately visualized in all subjects due to movement artifact or shunt-related image distortions. Left and right hemisphere measurementswere made of the lateral ventricles, internal capsules, and centra semiovale from transaxial images. Corpus callosum area was measured from the midsagittal image, and lateral ventricle volume was measured from left and right hemisphere sagittal images. For the corpus callosum a total brain area measurementin the corresponding image slice was also performed. For internal capsules and centra semiovale, hemispheric slice area measurements were made in the corresponding transaxial images following manual division along the interhemispheric fissure. The Pearson’s T’S and ICC’s are listed in pairs in
Table 3. The Pearson’s T’S range from 0.7569-0.9903 and the ICC’s from 0.7502-0.9895 (p < .OOOl). Note that the Pearson’s I’S in Table 3 are generally greater than or equal to their corresponding ICC, indicating that the Pearson’sr provides a slightly inflated measure of reliability in this sample. Table 4 lists Pearson’sT’S and ICC’s for structure to whole brain ratio (relative) measures.Observe that in this case as well, the Pearson’s r is greater than or equal to the corresponding ICC for all seven measuresshown in TabIe 4.
DISCUSSION Measures of reliability should be included in all morphometric MRI studies that usemanual or semiautomated methodologies in order to assessthe reliability of the structural measurements. There are important guidelines for choosing how ICC’s should be computed, including the number of factors in the ANOVA design, whether raters are to be considered fixed or random, and whether the measures are to be taken singly (one rater) or averaged over more than one rater.4 Most importantly, ICC’s should be used in any situation where subgroups that vary in the average of the measurements are included in the sample. The Pearson’s Ywill tend to be a less accurate measureof reliability whereas the ICC will correct for the subgroup differences. In the data presentedhere, the maximal difference between Pearson’s r and ICC is small (about 0.04). The difference however, will depend on several factors, including total N, the number of subgroups, and the group mean differences. We reported here Pearson’s Y’S and ICC’s on absolute and relative measures.Generally speaking, relative measuresare inherently lessreliable than absolute ones since there are sources of error in both the numerator and the denominator. This manifests itself in a tendency for ratio measureICC’s to be smaller than their absolute counterparts. This can be seenwhen comparing ICC’s in Table 4 with those in Table 3. In the
Table 4. Pearson’sr’s and ICC’s for relative measurements of sevenbrain structureswith respectto whole brain measuredin the sameslice Structure
Number
Pearson’sr
ICC*
Corpuscallosum/wholebrain Right lateral ventricle/Whole brain Left lateral ventricle/Whole brain Right internal capsule/Wholebrain Left internal capsule/Wholebrain Right centrnm semiovale/Wholebrain Left centrnm semiovale/Wholebrain
49 50 50 46 46 43 43
0.7973 0.9674 0.9784 0.8850 0.8095 0.8160 0.8511
0.7717 0.955 1 0.9734 0.8850 0.8060 0.8054 0.8380
*All ICC’s significant atp < .OOOl level.
Reliability of brain structure morphometry in hydrocephalic children
former, 6 of 7 relative ICC measures are less than or equal to their respective absolute ICC counterparts. Care should be taken when using ratio measures because of their lower reliability. This study also demonstrates that manual morphometric measurements can be reliably performed even from MR images of brains grossly malformed as a result of disorders such as SB and hydrocephalus. Virtually all reliability coefficients reported here were greater than 0.75, which is quite acceptable. Acknowledgment-Supported in part by NINDS grantNS25368, “Neurobehavioral Development of Hydrocephalic Children.”
REFERENCES 1. McCullough, D. Hydrocephalus: Etiology, pathologic effects, diagnosis,and natural history. In: R. McLaurin, J. Venes, L. Schut, G. Epstein(Eds). Pediatric Neurosur-
0
M.E. BRANDT ETAL.
gery (second edition). Philadelphia: 1989: pp. 180-199.
655
W.B. Saunders;
2. Fletcher, J.; Bohan, T.; Brandt, M.; Brookshire,B.; Beaver, S.; Francis,D.; Davidson,K.; Thompson,N.; Miner, M. Cerebralwhite matterand cognitionin hydrocephalic children. Arch. Neurol. 49:818-824; 1992. 3. Kohn, M.; Tanne,N.; Herman,G.; Resnick,S.; Mozley, P.; Gur, R.E.; Alavi, A.; Zimmerman,R.; Gur, R.C. Analysis of brain and cerebrospinalfluid volumeswith MR imaging,Part I. Methods,reliability, andvalidation.Radiology 178:115-122; 1991. 4. Shrout, P.; Fleiss,J. Intraclasscorrelations:Usesin assessingrater reliability. Psych. Bull. 86:420-428; 1979. 5. Hynd, G.; Semrud-Clikeman,M.; Lorys, A.; Novey, E.; Eliopulos, D. Brain morphology in developmentaldyslexia and attention deficit disorder/hyperactivity. Arch. Neurol. 47:919-926; 1990. 6. Gonzalez,R.C.; Wintz, P. Digital ImageProcessing(second edition). Reading:Addison-Wesley; 1987. 7. Foley. J.; van Dam, A.; Feiner, S.; Hughes,J. Computer GraphicsPrinciplesandPractice(secondedition). Reading: Addison-Wesley; 1990.