Reliability of Indiana Birth Certificate Data Compared to Medical Records TERRELL W. ZOLLINGER, DRPH, MICHAEL J. PRZYBYLSKI, PHD, AND ROLAND E. GAMACHE, PHD
PURPOSE: The purpose of this study was to measure the reliability of data reported on Indiana electronic birth certificates. Knowing the accuracy of birth certificate data is crucial when identifying community health needs and evaluating birth outcomes interventions. METHODS: This study compared 1996 electronic birth certificate data on a random sample of 1050 Indiana hospital births to data abstracted from the hospital medical records for the same patients. Kappa scores, Pearson r correlation values, sensitivity, specificity, and positive predictive values of the birth certificate data were used to measure agreement. RESULTS: Parents’ demographic variables had the best agreement, followed by birth outcome variables. Delivery type, cesarean indications, pregnancy history, prenatal care and mother’s risk variables were found to have moderate agreement. Agreement was poor for variables measuring labor and delivery complications, obstetric procedures, concurrent illnesses, pregnancy complications, congenital anomalies, and abnormal conditions. CONCLUSIONS: The results of this study clearly show that some important descriptive and outcome data are reliable while infrequent events are generally not. The results indicate a need to improve the quality of data reported on birth certificates. Ann Epidemiol 2006;16:1–10. Ó 2006 Elsevier Inc. All rights reserved. KEY WORDS:
Birth Data, Data Quality.
INTRODUCTION Natality statistics are vital to the tracking of trends and the prevention of adverse birth outcomes. National standards for registering babies born in the United States were developed in 1900 and continue to be updated periodically to reflect changing procedures (1). Since they record all events in a population, sampling error and sample selection biases are eliminated. As technology advances, the use of birth-certificate data in epidemiological studies and community monitoring has increased, yet the accuracy of the birth certificate information is often unknown by the policy makers using those data. Determining the accuracy of birth certificate data is crucial when making decisions regarding appropriate study design, effective prevention strategies, and ways to accurately measure their impact.
From the Department of Family Medicine, Indiana University Bowen Research Center, Indiana University School of Medicine, Indianapolis (T.W.Z.); Indiana University Bowen Research Center, Indiana University School of Medicine, Indianapolis (M.J.P.); and Epidemiology Resource Center, Indiana State Department of Health, Indianapolis. Address correspondence to Terrell W. Zollinger, Dr.P.H., Bowen Research Center, Long Hospital 245, 1110 W. Michigan Street, Indianapolis, IN 46202-5102. Tel.: (317) 278-0300. fax: (317) 274-4444. E-mail:
[email protected] This study was funded in part by the Indiana State Department of Health Grant #36105198001446. Received May 22, 2002; accepted March 15, 2005. Ó 2006 Elsevier Inc. All rights reserved. 360 Park Avenue South, New York, NY 10010
Several studies of specific populations have compared birth certificate data to other data sources to ascertain the accuracy of the data reported (2–5). At least seven other states have completed some form of validation study of birth certificate data following the 1989 revisions (6–15); however, only three of these studies (6,8,15) focused on most of the birth-certificate variables in a sample of the general population. Authors of these studies found that the accuracy of birth-certificate information varied considerably and concluded that the use of birth-certificate data for health-policy development or resource allocation should be tempered with caution. Since birth-certificate data is population based, and this data source is stable as well as available, they are commonly used to identify and monitor health related indicators in the population (16–19). The use of birth certificate data to monitor progress toward the Healthy People 2010 Maternal, Infant, and Child Health objectives (18) is of particular concern. About one half of the 23 objectives related to the goal of improving the health and well being of women, infants, children, and families may be evaluated using birthcertificate data. In addition, most of these objectives include target values or rates for specific racial, ethnic, or other demographic subpopulations. Birth-certificate data errors could result in the inability to accurately determine that related objective values have been achieved. In addition, systematic errors and changes in systematic errors could 1047-2797/06/$–see front matter doi:10.1016/j.annepidem.2005.03.005
2
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
Selected Abbreviations and Acronyms ISDH Z Indiana State Department of Health APNCU Z adequacy of prenatal care utilization AIDS Z acquired immune deficiency syndrome VBAC Z vaginal birth after cesarean
cause spurious results, both positive and negative, to be reported, which may be caused by the data quality artifact. Knowing the amount of error and the biases in the birthcertificate dataset are critical to correctly interpreting health indicators based on these data and accurately evaluating trends seen in these measures. This study was undertaken to assess the accuracy and completeness of birth-certificate data reported on the electronic birth certificates to the Indiana State Department of Health on a random sample of births during the 1996 calendar year. Reliability measures between data elements reported on birth certificates and data contained in hospital medical records were calculated. The results of this reliability analysis have been used to generate specific policy change recommendations to improve the completeness and accuracy of the birth-certificate data.
MATERIALS AND METHODS Information on 85,554 births reported to the Indiana State Department of Health (ISDH) using the newly established electronic birth-certificate system was obtained for 1996. Excluded from this study were cases where the location of the facility or the residence of the mother was outside Indiana, and all cases of non-hospital births, leaving 80,073 hospital births at 108 hospitals for randomization into the study sample. The data needed for the birth certificates are gathered and reported in various ways, depending on hospital policies, training of staff, and utilization of health department guidelines. The Indiana State Department of Health provided a birth-certificate worksheet to help hospital staff gather the data needed prior to entering the data electronically. A great deal of the information is available in the prenatal care records, when accessible at the time the birth-certificate data are gathered. In some facilities, the mother is asked to complete a short form providing the information needed. In other cases, a hospital staff person interviews the mother to gather the information needed. Mothers were not routinely asked to review the birth-certificate data before they were submitted. A two-stage stratified cluster design was implemented to yield a representative sample of 1200 hospital births reported to the ISDH. Hospitals were stratified into three categories based on the number of deliveries performed in 1996: small – sized less than 1000 deliveries per year (n Z 82), medium
AEP Vol. 16, No. 1 January 2006: 1–10
sized 1000 to 1999 deliveries per year (n Z 18), and large sized 2000 or more deliveries per year (n Z 8). In the first stage of the sampling strategy, ten facilities were selected at random from the small facility strata, and four each from the medium and large facility strata. In the second stage, random samples of 400 deliveries from each hospital group (small, medium, and large) were selected, resulting in a total sample of 1200 births. This sample size allowed for estimates of prevalence to the nearest 2% overall and to test differences of 10%, with a power of at least 0.80 when comparing agreement rates between the hospital-size groupings. A private company specializing in health information was contracted to locate the sample of 1200 medical records from birthing facilities, and then, to abstract specific data elements and enter these elements into an electronic file. If conflicting information was found in two or more areas within the medical record file, abstractors were instructed to choose the most authoritative source. The contracted record abstractors were debriefed periodically during the data collection phase by the research team to address issues, ensure consistency in the data collection, and to obtain qualitative impressions about how birth-certificate data were gathered and reported within the hospitals. For analysis, each case was weighted by the ratio of population births to sample births in all hospitals of that size class to ensure that the statistics calculated would be representative of the population of all births in the state. Data abstracted from the hospital medical records were subjected to a multiphase quality-assurance process at each level of data transfer. After data were cleaned, the medical record files and the birth-certificate files were linked for analysis. Once the linking was complete and verified, the individually identifiable variables were deleted from the electronic files to protect confidentiality. Hospital medical records were not located on 69 of the 1200 randomly selected cases, and the medical records scanned at the hospitals were not usable for 81 cases. Thus, the number of cases used in the analysis was 1050d 359 from the eight small hospitals, 344 from the 4 medium-sized hospitals, and 347 from the large hospitals. There were no significant differences in the demographic characteristics of the cases not in the analysis when compared to those that were included. Measures of reliability were used to compare the data from the two sources. The primary agreement measures were the kappa statistic for the 120 discrete variables (weighted when appropriate) and the Pearson’s product-moment correlation coefficient (r) for the seven continuous variables. These measures provide a more accurate estimate of data agreement than simple match rates because many of the items refer to infrequent events and thus are absent on the vast majority of records. Agreement was considered ‘‘poor’’ if the values of the statistics were below 0.6, ‘‘moderate’’ if they
AEP Vol. 16, No. 1 January 2006: 1–10
were between 0.6 and 0.8, and ‘‘good’’ if they were above 0.8. Even if an item has a good agreement among these cases, high rates of missing data may also indicate a problem. The Pearson’s r and kappa statistics were computed only for items in which the characteristic was noted in one or both of the sources. In addition, the sensitivity, specificity, and positive predictive value of the birth-certificate data matching the medical-record data are shown for each variable. RESULTS Of the 127 birth-certificate data elements included in the study, measures of agreement could be calculated on 115. No mention of the condition was found in either data source for the other 12 variables: previous small for gestational age infant, incompetent cervix, AIDS, cancer, cardiac disease, endocrinopathy, chorionic villi sampling, anesthetic complication, rupture of the uterus, seizures during labor, anemia in newborn, and fetal alcohol syndrome. Demographic Profile Five of the demographic variables for the mother and father had missing values in about a third of the medical records (mother’s level of education, father’s race, father’s ethnic origin, father’s age, and father’s level of education). The agreement measures between the two data sources were very good for mother’s race, age, education, marital status, as well as father’s race, age, and education. The agreement on Hispanic origin was poor among the small proportion (2% to 4%) reported to be Hispanic, as shown in Table 1. Prenatal Care, Pregnancy History, and Risk Factors The rate of missing data from the medical record review for prenatal care variables ranged from 11% percent to 28%, as shown in Table 1. The adequacy of prenatal care utilization index (APNCU) had a high measure of agreement among known cases. However, the kappa statistic for the Kessner index of prenatal care indicated the agreement between the two sources for this variable was poor. The agreement between the sources for the month of the start of prenatal care was in the moderate category and the agreement for number of prenatal visits was poor. For mother’s weight gain, the correlation between the two data sources was good, while the kappa statistic for the categorical variable of low, normal and high weight gain, was lower. Values for the pregnancy history variables were often missing in the medical record; however, when the data were present, the agreement rates were good between the two sources. There was poor agreement on previous high-risk pregnancies between the two sources. Among behavioral risk factors of the mother, there was moderate agreement on smoking but poor agreement on
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
3
whether the mother used alcohol or other substances. More mothers were listed as smokers on the birth certificate than in the medical charts (225 versus 164). Very few mothers responded positively when asked whether they used alcohol or other substances during pregnancy. Complications of Pregnancy Relatively few birth certificates (11.4%) or medical records (6.8%) reported pregnancy complications, as shown in Table 2. There was poor agreement between the two sources on whether there were any pregnancy complications as well as on any specific complication. For several specific complications (eclampsia, fetal growth retardation, Rh sensitization, uterine bleeding, and uterine/cervical malformation) there was no agreement because those identified in one source were not listed in the other source. Concurrent Illnesses Concurrent illnesses of the mother were also in poor agreement between the two data sources. One illness, genital herpes, reached a moderate level of agreement. The seven others had poor agreement. It was noted that both sources showed a similar number of cases with complications overall but that some complications seem to be more prevalent in one data source than the other. Method of Delivery and Indications for cesarean Delivery Agreement between these two sources was moderate to good for most of the variables in this group. Among the method of delivery variables, agreement was very good for cesarean delivery and moderate for the others. Two specific indications for cesarean delivery had good agreement: genital herpes and repeat cesarean. Three others had moderate levels of agreement: breech or malpresentation, placenta previa, and fetal distress. Agreement was poor for cephalopelvic disproportion, unsuccessful VBAC, and ‘‘other.’’ There was no agreement at all for the few cases of abruptio placenta and prolapse of cord. Obstetric Procedures Agreement was poor among the obstetric procedures variables with an overall kappa statistic of 0.059. In this group, only one amniocentesis had a moderate level of agreement. All other variables in this group had poor agreement, including two that were present in a majority of cases. Delivery and Labor Complications In general, agreement was also poor for the labor and delivery complications variables, as shown in Table 3. Placentia
4
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
AEP Vol. 16, No. 1 January 2006: 1–10
TABLE 1. Agreement among parent’s demographics, characteristics, prenatal care, pregnancy history, and risk factors (n Z 1050) Characteristics
Mother’s demographic profile Race (% white) Hispanic origin (% Hispanic) Age in years (mean) Educational attainment (% high school graduate) Marital status (% married) Father’s demographic profile Race (% white) Hispanic origin (% Hispanic) Age in years (mean) Educational attainment (% high school graduate) Group average Prenatal care Mo. Pregnancy prenatal care began Number of prenatal care visits (mean) Date last normal menses began Kessner index of prenatal care (3 categories) APNCU index of prenatal care (4 categories) Weight gain during pregnancy (mean lbs) Weight gain during pregnancy (4 categories) Group average Pregnancy history Number of previous live births (mean) Number of previous terminations (mean) Group Average Number with previous high risk pregnancies None Any Previous 4000+ gram infant Previous pre-term infant Previous small for gestational age infant Other Group average Number of mothers with behavioral risk factors Mother smoked during pregnancy Mother used alcohol Mother used other substances Group average
Percent missing in
Kappa higher Cl limit
Sensitivity
Specificity
Positive predictive value
0.824 0.430 * 0.895
0.926 0.768 * 0.965
0.958 0.988 * 0.924
0.976 0.667 * 0.992
0.827 0.992 * 0.961
0.949
0.927
0.971
0.983
0.986
0.963
0.826 0.626 0.975 0.919
0.759 0.461 * 0.876
0.697 0.791 * 0.962
1.000 0.987 * 0.916
0.965 0.652 * 0.991
0.733 0.989 * 0.946
* *
*
* *
* *
Birth certificate
Medical record
Medical records
Birth certificate
Pearson’s r Kappa
90.1 2.3 26.7 79.0
88.6 2.7 26.8 83.0
8.0 9.7 0.9 28.5
0.2 0.2 0.0 1.7
0.875 0.599 0.994 0.930
68.6
72.3
7.7
0.0
90.7 3.2 29.1 84.0
88.9 3.6 29.5 85.7
28.0 28.0 29.3 34.9
11.7 11.7 2.6 13.3
Kappa lower Cl limit
0.848 2.8 10.7
2.9 11.1
20.2 19.6
3.5 0.0
0.753 0.583
* *
* 1.3
* 1.6
10.9 27.7
4.6 0.0
0.633 0.310
* 0.267
* 0.352
* *
* *
* *
3.0
3.0
26.1
0.0
0.938
0.913
0.962
*
*
*
32.1
30.4
24.9
0.0
0.820
*
*
*
*
3.0
2.9
20.5
4.5
0.776
0.814
*
*
*
* 0.742
0.712 1.0
1.2
20.4
1.0
0.935
*
*
*
*
*
0.4
0.7
31.8
0.4
0.839
*
*
*
*
*
0.887
984 88 13 14 0
810 240 9 17 0
** ** ** ** **
1.5 1.5 1.5 1.5 1.5
0.079
0.023
0.135
0.394
0.783
0.107
0.357 0.238 *
0.093 0.034 *
0.621 0.442 *
0.995 0.986 *
0.308 0.286 *
0.991 0.990 *
25
45
**
1.5
0.027 0.175
ÿ0.051
0.105
0.957
0.080
0.977
225 13 15
164 16 9
** ** **
0.2 0.3 3.2
0.702 0.266 0.410 0.481
0.646 0.046 0.161
0.758 0.486 0.659
0.979 0.988 0.996
0.653 0.308 0.333
0.912 0.991 0.990
*Not applicable. **No distinction between missing and absence of characteristic.
previa showed perfect agreement for the two cases of this complication. This result illustrates that the kappa statistic can vary greatly among variables measuring infrequent events. Of the other variables in this group, only oned
breech, or malpresentationdhad a moderate level of agreement, although four others were only slightly below this level: dystocia, cephalopelvic disproportion, dysfunctional labor, and chorioamnionitis. Three of the complications
AEP Vol. 16, No. 1 January 2006: 1–10
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
5
TABLE 2. Number of case and agreement among complications of pregnancy, concurrent illnesses and procedures (n Z 1050) Number of cases
Complications of this pregnancy None Any Eclampsia Fetal growth retardation Hydramnlosis Incompetent cervix Preeclampsia Rh sensitization Uterine bleeding Uterine/cervical malformation Other Group average Concurrent illnesses None Any AIDS Anemia Cancer Cardiac disease Diabetes Endocrinopathy Epilepsy Obesity Renal disease Thromobophiebitis Urinary tract infection Sickle cell Chronic/pregnancy related hypertension Lung disease Gonorrhea Genital herpes Syphilis Other Group Average Method of delivery Vaginal Cesarean Use of forceps Use of vacuum Group average Indications for Cesarean Delivery None Any Abruptio placenta Breech or malpresentation Cephalopelvic disproportion Fetal distress Genital herpes Placenta previs Prolapse of cord Repeat Cesarean delivery Unsuccessful VBAC Other Group average
Kappa
Kappa lower Cl limit
0.215
0.129
0.301
0.220
0.953
0.403
0.000 0.000 0.106 * 0.456 0.000 0.000 0.000 0.098 0.097
* * ÿ0.098 * 0.274 0.000 0.000
* * 0.310 * 0.638 0.000 0.000
0.996 0.998 0.997 * 0.964 1.000 1.000
0.000 0.000 0.071 * 0.550 0.000 0.000
0.997 0.996 0.988 * 0.991 0.995 0.996
0.014
0.182
0.966
0.077
0.918
0.216
0.140
0.292
0.305
0.896
0.368
* 0.000 * * 0.184 * 0.570 0.379 0.000 0.000 0.244 0.000 0.049
* * * * ÿ0.008 * 0.122 0.137 * * 0.058 * ÿ0.053
* * * * 0.376 * 1.018 0.621 * * 0.430 * 0.151
* 0.979 * * 0.982 * 0.989 0.999 * 0.999 0.979 * 0.998
* 0.000 * * 0.333 * 0.500 0.250 * 0.000 0.417 * 0.029
* 0.991 * * 0.884 * 0.988 0.986 * 0.996 0.993 * 0.968
3 5 21 2 29
0.000 0.000 0.542 0.000 0.201 0.168
* * 0.458 * 0.085
* * 0.826 * 0.317
0.997 0.995 0.991 0.998 0.982
0.000 0.000 0.750 0.000 0.167
0.987 0.997 0.996 0.998 0.948
750 238 26 63
757 228 32 65
0.775 0.934 0.658 0.684 0.763
0.731 0.908 0.610 0.588
0.819 0.960 0.806 0.780
0.830 0.991 0.987 0.980
0.940 0.929 0.750 0.714
0.847 0.979 0.994 0.982
812 238 2 37 42 45 4 4 1 69 10 64
826 222 2 31 34 43 5 7 0 78 5 21
0.618
0.775
0.863
0.832
0.969
0.886
0.000 0.775 0.582 0.876 0.888 0.726 0.000 0.803 0.396 0.284 0.541
* 0.863 0.446 0.558 0.688 0.422 * 0.729 0.076 0.154
* 0.887 0.718 0.794 1.110 1.030 * 0.877 0.718 0.414
0.999 0.995 0.989 0.988 0.998 0.997 * 0.982 0.998 0.992
0.000 0.730 0.535 0.687 1.000 1.000 * 0.870 0.300 0.203
0.998 0.990 0.980 0.885 1.000 1.000 * 0.991 0.993 0.990
Birth certificate
Medical record
918 132 3 4 14 0 20 8 5 3 91
976 72 2 2 4 0 26 1 1 0 20
878 174 0 9 0 0 9 0 3 20 0 2 12 1 34
906 144 0 22 0 0 22 0 3 7 3 1 27 0 4
2 3 15 2 66
Kappa higher Cl limit
Sensitivity
Specificity
Positive predictive value
(continued)
6
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
AEP Vol. 16, No. 1 January 2006: 1–10
TABLE 2. Continued Number of cases
Obstetric procedures None Any Amniocentesis Chorionic will sampling Induction of labor Stimulation of labor Tocolysis Ultrasound Fetal monitoring Other Group average
Birth certificate
Medical record
81 969 22 0 180 163 15 728 894 122
145 905 18 0 105 106 3 699 834 3
Kappa
0.059 0.059 0.627 * 0.371 0.335 0.296 0.255 0.129 0.024 0.239
Kappa lower Cl limit
Kappa higher Cl limit
Sensitivity
Specificity
Positive predictive value
ÿ0.009
0.127
0.868
0.213
0.930
0.449 * 0.283 0.253 0.030 0.193 0.061 ÿ0.016
0.805 * 0.449 0.417 0.582 0.317 0.197 0.064
0.995 * 0.955 0.944 0.838 0.510 0.333 0.998
0.565 * 0.353 0.344 0.188 0.746 0.817 0.016
0.990 * 0.870 0.887 0.988 0.473 0.241 0.884
*Not applicable.
did not appear in either source, and another two had no agreement at all. All other variables in this group, including ‘‘other’’, had poor levels of agreement. It was noted that these results differed somewhat from similar variables listed in the ‘‘Indications for Cesarean Delivery’’ section of the birth certificate.
Birth Outcome Variables Overall, the birth outcome variables had the second best agreement of any group of variables in this study. The mean birth weight was slightly higher on the birth certificates than on the medical records (3436 grams versus 3373 grams). In over 10% of the cases (11.5 %), the birth weights were not listed in the mothers’ medical records; however, all of the birth certificates recorded a value for birth weight. The agreement rate between the two sources was poor for birth weight when matching to the nearest ounce. This poor performance was partly due to the necessity to convert data from pounds and ounces on many of the medical records, while the birth certificate reported the weight directly in grams. Rounding differences and differences in conversion rates likely account for much of this error. When the agreement criterion was relaxed to plus or minus two ounces (about 57 grams), the two sources were in agreement for 92% of the cases. Kappa statistics for low birth weight and very low birth weight showed good agreement when cases were classified into these categories. The estimate of gestation to the nearest week was in moderate agreement. However, the agreement was slightly better between the two sources on whether the infant should be classified as having early gestation (less than 37 weeks) and very early gestation (less than 32 weeks).
The agreement on the other birth-outcome variables was quite good, with measures above 0.900. Both the one- and five-minute Apgar scores were in good agreement. While the agreement between the two sources for plurality was good, the higher rate of missing values (9.8%) on the medical records indicates a potential problem with medical record documentation. Only five of the 1050 cases (0.48%) indicated that the baby died before discharge; however, the two sources agreed exactly on this item. Among the very few cases that were transferred to another facility, one in the medical record and three on the birth certificate, none were in agreement between the sources.
Newborn Congenital Anomalies and Abnormal Conditions All of these are infrequent conditions and agreement between the two data sources was very poor overall (kappa Z 0.176), as has been documented in other studies (14). Among specific congenital anomalies, the kappa values indicate there was no agreement between the two data sources for anomalies of the central nervous system; cranial or facial area; gastrointestinal tract; genital or urinary tract; heart; chromosomal or orthopedic anomaly; or other anomalies. Agreement between the two sources for respiratory system anomalies was very poor. Similarly, poor agreement was seen for the variables measuring abnormal conditions of the newborn (kappa Z 0.202 overall). Three others had no agreement, birth injury, meconium aspiration syndrome, and seizures. The rest that could be calculated had poor agreement, assisted ventilation, hyaline membrane disease, and other.
AEP Vol. 16, No. 1 January 2006: 1–10
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
7
TABLE 3. Agreement between birth certificate and medical record informationddelivery and labor complications (n Z 1050) Birth certificate Labor and delivery complications None Any Abruptio placenta Anesthetic complication Excessive bleeding Breech or malpresentation Cephalopelvic disproportion Chorioarnnlanitls Cord prolapse Dysfunctional labor Dystocia Febrila Fetal distress Moderate/heavy meconium Placenta previa Preciptous labor Premature rupture of mambrane Prolonged labor Rupture of uterus Seizures during labor Other Group average Birth outcome Birth weight in grams (mean) + -102 Low birth weight (#!2500 g) Very low birth weight (#!1500 g) Gender (% female) Clinical estimate of gestation (mean weeks) Early gestation (#!37 weeks) Very early gestation (#!32 weeks) Plurality (% single births) One minute Apgar scores (mean) Five minute Apgar scores (mean) Died before discharge (% died) Transfer to another facility (# transferred) Date of birth Group average Congenital anomalies None Any Anomaly of the central nervous system Cranial or facial area Gastrointestinal tract Genital/Urinary tract Heart Respiratory system Chromosomal or orthopedic anomaly Other Group average Abnormial conditions of the newborn None Any Anemia Birth injury Fetal alcohol syndrome Hyaline membrane disease
799 251 1 0 7 31 31 7 0 14 7 9 36 41 2 11 26 5 0 0 71
Medical record 884 186 3 0 1 31 30 5 1 7 3 2 27 10 2 3 30 8 0 0 11
Kappa
Kappa lower CI limit
Kappa higher CI limit
Sensitivity
Specificity
Positive predictive value
0.371
0.303
0.439
0.434
0.904
0.588
0.499 * 0.000 0.824 0.563 0.543 * 0.568 0.598 0.362 0.330 0.290 1.000 0.282 0.326 0.138 * * 0.058 0.386
ÿ0.113 * * 0.480 0.416 0.179 * 0.172 0.308 0.230 ÿ0.008 0.126 1.000 ÿ0.034 0.162 ÿ0.118 * * ÿ0.020
1.111 * * 0.768 0.720 0.907 * 0.488 0.828 0.966 0.732 0.454 1.000 0.598 0.490 0.394 * * 0.136
0.998 * 0.999 0.988 0.988 0.998 * 0.999 1.000 1.000 0.984 0.997 1.000 0.999 0.980 0.992 * * 0.993
1.000 * 0.000 0.645 0.583 0.500 * 0.429 0.428 0.222 0.306 0.190 1.000 0.182 0.357 0.200 * * 0.042
1.000 * 0.993 0.989 0.986 0.997 * 0.992 0.998 0.993 0.976 0.967 1.000 0.991 0.982 0.996 * * 0.935
3436 60 6 49% 39.3 77 12 98% 7.8810 8.8855 0.48% 3 *
3373 81 10 48% 39.1 84 16 98% 7.9070 8.8672 0.48% 1 *
0.364 0.887 0.748 0.931 0.660 0.791 0.710 0.972 0.919 0.910 1.000 0.000 0.909 0.750
* 0.799 0.504 0.909 * 0.719 0.514 0.916 0.893 0.879 1.000 0.000 *
* 0.935 0.992 0.953 * 0.863 0.906 1.028 0.945 0.940 1.000 0.000 *
* 0.991 0.996 0.962 * 0.979 0.994 0.947 * * 1.000 0.999 *
* 0.833 1.000 0.968 * 0.844 0.833 1.000 * * 1.000 0.000 *
* 0.992 1.000 0.955 * 0.987 0.998 1.000 * * 1.000 0.997 *
1019 31 5 1 5 3 6 7 6 3
1012 38 0 3 0 7 3 5 0 4
0.176
0.044
0.308
0.226
0.970
0.184
0.000 0.000 0.000 0.000 0.000 0.162 0.000 0.000 0.038
* * * * * ÿ0.130 * *
* * * * * 0.454 * *
* 0.997 * 0.993 0.997 0.996 * 0.995
* 0.000 * 0.000 0.000 0.143 * 0.000
* 0.999 * 0.999 0.997 0.994 * 0.997
999 51 0 2 0 7
1009 41 0 0 0 1
0.202
0.082
0.322
0.216
0.989
0.262
* 0.000 * 0.249
* * * ÿ0.155
* * * 0.653
* * * 1.000
* * * 0.143
* * * 0.994 (continued)
8
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
AEP Vol. 16, No. 1 January 2006: 1–10
TABLE 3. Continued
Mecordum aspiration syndrome Assisted ventilation Seizurea Other Group average
Birth certificate
Medical record
0 12 1 36
2 3 0 16
Kappa 0.000 0.263 0.000 0.214 0.133
Kappa lower CI limit
Kappa higher CI limit
Sensitivity
Specificity
Positive predictive value
0.000 ÿ0.037 * 0.058
0.000 0.563 * 0.370
0.998 0.999 * 0.990
* 0.167 * 0.167
1.000 0.990 * 0.971
*Not applicable.
Agreement by Number of Positive Cases and Hospital Size Table 4 presents median kappa or correlation statistics for the variables by number of positive cases and by hospital size. In general, there was a gradient between the number of positive cases for a variable and its agreement rate. Agreement was good for most variables that had positive responses for more than 10% of the cases (100 or more positive cases). Agreement was much worse for variables that had few nonzero (or nonunitary) cases. The two groups with less than 50 positive cases (less than 5% of births had the condition) contained 74 of the 115 variables for which agreement scores could be calculated. Thus a clear majority of the variables on the birth certificate refer to infrequent occurrences that agree poorly with the medical record. This contributes to an overall poor median agreement rate (0.364). The median agreement value was good (0.820) for the variables in which the condition was present in at least 10% of the cases. Of variables in this group that do not measure infrequent occurrences, more than half met the criteria for good agreement. The median agreement rates for each category of variables by the three hospital size strata are also shown in Table 4. For all variables, the median agreement rate was highest at small hospitals (0.562), lower at medium hospitals (0.482), and lowest at large hospitals (0.422). The agreement by categories of variables did not clearly indicate better (or worse) agreement at hospitals of any individual size stratum. Large hospitals showed better agreement on
demographic variables. Better, although still poor, agreement on obstetric procedures was found for the large and medium size hospitals. Small hospitals showed better agreement in the other categories.
DISCUSSION The level of agreement between the birth-certificate data and the medical record data in this study varied widely. The results of this study clearly show that some important descriptive and outcome data are reliable while infrequent events are generally not. Thus, birth-certificate data on infrequently occurring events should be used with great caution, if at all, until improvements in documenting these conditions can be made. In addition, some of the abnormal conditions and anomalies may not show up for some time after the baby’s birth, such as fetal alcohol syndrome. Thus, it may not be reasonable to expect that data on these variables would ever be reliable on the birth certificate. The findings of this study appear to be similar in general to the results of other studies. In the New York study (15) and in the Tennessee study (8), the authors reported variability in the measures of agreement across the variables studied and express some concern about interpreting prevalence of rare conditions. The kappa scores for this study and those reported by DiGiuseppe and colleagues (5) were also relatively similar for most of the data elements studies. For example, the reliability of tobacco use was fair to good, while the reliability of alcohol and other substances
TABLE 4. Median agreement statistics by number of positive cases and hospital size Size of hospital
Number of Positive Cases Fewer than 10 (40 variables) 10–49 (34 variables) 50–99 (7 variables) 100 or more (34 variables) All 115 variables
Small (n Z 359) (! 1000 births)
Medium (n Z 344) (1000–1999 births)
Large (n Z 347) (O2000 births)
Total (n Z 1050)
0.495 0.408 0.677 0.810 0.562
0.000 0.370 0.774 0.858 0.486
0.000 0.260 0.644 0.764 0.435
0.000 0.328 0.684 0.820 0.364
AEP Vol. 16, No. 1 January 2006: 1–10
was found to be much lower. However, reliability of the cesarean delivery measure was much higher in Indiana compared to the findings of the Washington State study (6) and the Georgia study (10). Differences in definition of variables and choice of measures of agreement limited the ability to compare the results of this study with the findings of other studies. For the prenatal care variables, the information in the medical record or the birth certificate may have been taken from the care provider’s records or from the mother’s self report at time of delivery. It has been found that accurate information on prenatal care fell somewhere between the mother’s overestimate and the physician’s underestimate (20). Prenatal care data was very difficult to obtain from the medical record because of inconsistent data content and format of prenatal flow sheets used by doctor’s offices. Standardizing the dataflow sheets throughout the state would improve the reliability of the data available in the medical record and presumably the birth certificate. For the measures of substance use, agreement between the sources is better for smoking than for alcohol or other substance use. Since the use of alcohol may be perceived as a greater threat to the baby’s health, and use of ‘‘other’’ substances refers to illegal activities, the women may be more honest with their prenatal care provider than with the nurse in the maternity ward. The reported rates of alcohol use and use of other substances were lower than expected in both the medical records and in the birth-certificate data. The results of this study indicate that some of the health indicators based on birth-certificate data that are needed to monitor Indiana’s progress toward the Healthy People 2010 objectives may be more reliable than others. The measures of adequacy of prenatal care (Objective 16-6) were found to be in very good agreement between the two sources. For example, the kappa value for APNCU was 0.938. Also, there was very good agreement (kappa Z 0.934) between the two data sources for cesarean delivery (Objective 16-9). There was good agreement between the two data sources for measures needed to assess several other Healthy People 2010 objectives. The weight gain during pregnancy measure (Objective16-12) yielded a kappa score of 0.820; while measures of birth weight (Objectives 16-8, 16-10) yielded kappa scores of 0.867 for very low birth weight and 0.748 for low birth weight births. Similarly, the measure for preterm birth (Objective 16-11) yielded kappa scores of 0.791 for preterm and 0.710 for early preterm births. Thus it would be expected that using birth-certificate data to monitor progress toward these objectives should provide a good level of confidence in the conclusions. Only one of the measures for prenatal substance exposure (Objective 16-17) was found to be fairly reliable. Whether the mother smoked during pregnancy yielded a kappa score of 0.702. The measures for alcohol use and other substance
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
9
use were found to have poor reliability (kappa Z 0.266 and 0.410, respectively). For the three measures as a group, the reliability is also poor (kappa Z 0.481). Consequently, using birth-certificate data to track changes in tobacco use would provide a moderate level of confidence, but using these data would not provide confidence in conclusions about prevalence or trends in use of alcohol and other substances during pregnancy. The measures needed to assess maternal illness and complications during pregnancy (Objective 16-5) were found to be very unreliable between the two data sources. The kappa score for having any maternal illness was 0.216 and for the group of maternal illnesses the kappa score was 0.166. Similarly, the kappa score for any pregnancy complications was 0.215 and 0.097 for the group of pregnancy complications. Also, congenital anomaly measures were not reliable (kappa Z 0.176 for any). This measure is needed to address Objective 16-15, Spina bifida, and other neural tube defects. Perhaps because of the small number of cases in this study, the kappa score was 0.000. Based on these findings, using birth-certificate data would not allow policy makers to establish prevalence and track progress will confidence. Experience in this study found that, overall, medical records departments were well organized and the medical records were complete according to existing standards. However, assembling the record from multiple departments at some hospitals caused difficulties in the data collection of this study. If hospitals kept all records together in one department in one medical record jacket or one electronic file, there would be greater likelihood that the data needed for the birth certificate could be efficiently located. In the review of medical records, the worksheet, when available, was most often the only source of some information, such as demographic information on fathers. It was also noted that the discharge summary was not part of the medical record for all births in this study, as it should be. Having more complete records for this study would likely not have changed the conclusions (21). Recommendations to improve the quality of the birthcertificate data fell into three areas: improvement in the prenatal care record keeping, changes in the process of gathering the data at the birth facility, and increased training of individuals charged with gathering and reporting the birth-certificate data. There was an obvious lack of consistency in the information gathered and the format used by prenatal care providers to record prenatal information. Producing a standard prenatal care tracking system would substantially improve the ability of the hospital staff to quickly and accurately identify critical information needed not only for the patients’ care in the hospital, but for reporting purposes as well. This issue should be addressed with the vendors of electronic medical record systems for
10
Zollinger et al. RELIABILITY OF BIRTH CERTIFICATE DATA
care providers. There was also a lack of consistency in the use of the birth-certificate worksheet provided by the department of health to assist hospital staff in gathering information needed for the birth-certificate reporting. In addition, much of the information needed was only available in paper records, which were often disorganized and occasionally unavailable. Finally, some of the information, such as information about the father, was not available anywhere in the record and was gathered from the mother only for reporting on the birth certificate. The hospitals need to review their policy for gathering, storing and reporting information needed for the birth certificate. Specific guidelines provided by the health department would be useful for this purpose. Finally, due at least partly to the frequent turnover of staff involved in gathering and reporting birthcertificate data, there appears to be a need for more frequent training and monitoring of these staff to ensure that they are following the health department and hospital guidelines. The degree to which the birth-certificate data are used by the policy makers for administrative and program decision making is not known. Presumably, the more accurate and complete these records are, the more valuable these data will be to the policy makers. It would seem prudent to find ways to impress the hospital staff with the value of these data, as well as to make sure systems are put into place and that the staff is well trained to ensure the quality of birth-certificate information.
AEP Vol. 16, No. 1 January 2006: 1–10
6. Buescher PA, Taylor KP, Davis MH, Bowling M. The quality of the new birth–certificate data: A validation study in North Carolina. Am J Public Health. 1993;83:1163–1165. 7. Parrish KM, Holt VL, Connell FA, Williams B, LoGerfo JP. Variations in the accuracy of obstetric procedures and diagnoses on birth records in Washington state, 1989. Am J Epidemiol. 1993;138:119–127. 8. Piper JM, Mitchel EF, Snowden M, Hall C, Adams M, Taylor P. Validation of 1989 Tennessee birth certificates using maternal and newborn hospital records. Am J Epidemiol. 1993;137:758–768. 9. Clark K, Fu CM, Burnett C. Accuracy of birth–certificate data regarding the amount, timing, and adequacy of prenatal care using prenatal clinic medical records as referents. Am J Epidemiol. 1997;145:68–71. 10. Green DC, Moore JM, Adams MM, Berg CJ, Wilcox LS, McCarthy BJ. Are we understanding rates of vaginal birth after previous cesarean birth? The validity of delivery methods from birth certificates. Am J Epidemiol. 1998;147:581–586. 11. Dobie SA, Baldwin LM, Rosenblatt RA, Fordyce MA, Andrilla CH, Hart LG. How well do birth certificates describe the pregnancies they report? The Washington State experience with low-risk pregnancies. Matern Child Health J. 1998;2:145–154. 12. Braveman P, Pearl M, Egerter S, Marchi K, Williams R. Validity of insurance information on California birth certificates. Am J Public Health. 1998;88:813–816. 13. Smulian JC, Ananth CV, Hanley ML, Knuppel RA, Donlen J, Kruse L. New Jersey’s electronic birth–certificate program: Variations in data sources. Am J Public Health. 2001;91:814–816. 14. Adams M. Validity of birth–certificate data for the outcome of the previous pregnancy, Geloria, 1980–1995. Am J Epidemiol. 2001;154:883– 888. 15. Roohan PJ, Josberger RE, Acar J, Dabir P, Feder HM, Gagliano PJ. Validation of birth–certificate data in New York State. J Comm Health. 2003;28:335–346. 16. Ventura SJ, Hamilton BE, Mathews TJ, Chandra A. Trends and variations in smoking during pregnancy and low birth weight: Evidence from the birth certificate, 1990–2000. Pediatrics. 2003;111:1176–1180.
REFERENCES 1. Taffel SM, Ventura SJ, Gay GA. Revised US certificate of birthdnew opportunities for research on birth outcome. Birth. 1989;16:188–193. 2. Buescher PA. Method of linking Medicaid records to birth certificates may affect infant outcome statistics. Am J Public Health. 1999;89:564–566. 3. Costakos DT, Love LA, Kirby RS. The computerized perinatal database: are the data reliable? Am J Perinatology. 1998;15:453–459. 4. Reichman NE, Hade EM. Validation of birth–certificate data: A study of women in New Jersey’s HealthStart program. Ann Epidemiol. 2001;11:186–193. 5. DiGiuseppe DL, Aron DC, Ranbom L, Harper DL, Rosenthal GE. Reliability of birth–certificate data: A multi-hospital comparison to medical records information. Matern Child Health J. 2002;6:169–179.
17. Honein MA, Paulozzi LJ, Watkins ML. Maternal smoking and birth defects: Validity of birth–certificate data for effect estimation. Public Health Rep. 2001;116:327–335. 18. Centers for Disease Control and Prevention, Health Resources and Services Administration. Healthy People 2010: Objectives for Improving Health. http://www.healthypeople.gov/document/HTML/Volume2/16MICH.html. Accessed February 19, 2005. 19. Watkins ML, Edmonds L, McClearn A, Mullins L, Mulinare J, Khoury M. The surveillance of birth defects: the usefulness of the revised US standard birth certificate. Am J Public Health. 1996;86:731–734. 20. Forrest JD, Singh S. Timing of prenatal care in the United States: How accurate are our measurements? Health Serv Res. 1987;22:235–253. 21. Gould JB, Chavez G, Marks AR, Liu H. Incomplete birth certificates: A risk marker for infant mortality. Am J Public Health. 2002;92: 79–81.