Accident Analysis and Prevention 53 (2013) 46–54
Contents lists available at SciVerse ScienceDirect
Accident Analysis and Prevention journal homepage: www.elsevier.com/locate/aap
Road traffic fatalities in Arkhangelsk, Russia in 2005–2010: Reliability of police and healthcare data Alexander V. Kudryavtsev a,b,∗ , Nikolai Kleshchinov c , Marina Ermolina d , Johan Lund e , Andrej M. Grjibovski a,b,f , Odd Nilssen a , Børge Ytterstad a a
Department of Community Medicine, University of Tromsø, N-9037 Tromsø, Norway International School of Public Health, Northern State Medical University, Troitsky Av. 51, Arkhangelsk 163000, Russia c Medical Informational Analytic Centre, Ministry of Health and Social Development of the Arkhangelsk Region, Lomonosov Av. 311, Arkhangelsk 163045, Russia d State Road Safety Inspectorate for Arkhangelsk, Smolny Buyan St. 20, Arkhangelsk 163002, Russia e Institute of Health and Society, University of Oslo, PO Box 1130, Blindern N-0318, Norway f Norwegian Institute of Public Health, Postbox 4404, Nydalen, 0403 Oslo, Norway b
a r t i c l e
i n f o
Article history: Received 15 February 2012 Received in revised form 26 November 2012 Accepted 19 December 2012 Keywords: Traffic accidents Mortality Reliability Police data Healthcare data Russia
a b s t r a c t Purpose: To estimate and compare reliability of traffic mortality data of the police and the healthcare sector in Arkhangelsk, Russia. Methods: The study matched traffic mortality data of the police and the regional healthcare statistics centre for the period from 2005 to 2010. Individual investigations of unmatched cases were performed, and the underlying causes of the non-matches were established. The obtained distribution of non-matches by causes served as basis for estimating the true numbers of traffic fatalities in the two sources, in appliance with corresponding fatality definitions and registration rules. A data accuracy index (DAI) was calculated for each source by using an adapted version of the formula for calculating accuracy of a diagnostic test. This was used as a measure for data reliability. Time trends in annual DAIs were estimated for each source by 2 -test for linear trend. Results: During the 6-year period, the police and the healthcare statistics centre registered 217 and 237 traffic fatalities in Arkhangelsk, respectively. Matching of data from the two sources resulted in 162 matched cases, 55 unmatched cases in the police data, and 75 unmatched cases in the healthcare data. More than a half (56%) of the non-matches were attributed to incompatibility of the definitions in the two data registration systems; 39% were attributed to failures in the healthcare data. Other non-matches were due to scarce identifying information (2%) or were not classifiable (2%). None of the non-matches were clearly attributable to failures in the police data. The 6-year DAI was 98% for the police data and 80% for the healthcare data. The DAI for the police data was stable over 2005–2010 (ranging from 96% to 100%). The DAI for the healthcare data increased from 66% in 2005 to 98% in 2010 (Ptrend < 0.001). Conclusion: The findings suggest that traffic mortality data of the police were more reliable, compared to the healthcare data. However, reliability of the healthcare data was improving during the study period. © 2012 Elsevier Ltd. All rights reserved.
1. Introduction Russia has the second highest road traffic mortality (25.2 per 100,000 population) in the WHO European region (World Health Organization, 2009a). However, there is an acknowledged concern that reliability deficits of national data on traffic fatalities may lead to biased mortality estimates and threaten the validity of the local road safety assessments and international comparisons (Elvik and
∗ Corresponding author at: International School of Public Health, Northern State Medical University, Troitsky Av. 51, office 1501, Arkhangelsk 163001, Russia. Tel.: +7 8182 287936; fax: +7 8182 263226; mobile: +7 921 7212125. E-mail address:
[email protected] (A.V. Kudryavtsev). 0001-4575/$ – see front matter © 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.aap.2012.12.022
Mysen, 1999; World Health Organization, 2004, 2009a,b; Derriks and Mak, 2007; Bhalla et al., 2009). Traffic mortality rates for Russia are based on the data collected by the State Traffic Safety Inspectorate of the Ministry of Internal Affairs (Government of the Russian Federation, 2009; World Health Organization, 2009a,b), later called the police. In this respect, it is essential to determine whether the traffic mortality data of the police are reliable. Matching the data on traffic fatalities of the police with those collected by the health sector is a common way to assess the completeness and reliability of police reports (Razzak and Luby, 1998; Elvik and Mysen, 1999; Morrison and Stone, 2000; Rosman, 2001; Meuleners et al., 2006; Derriks and Mak, 2007; Petridou et al., 2009; Bhalla et al., 2010; Lateef, 2010; Hu et al., 2011). However, linkage and comparison of data from these two
A.V. Kudryavtsev et al. / Accident Analysis and Prevention 53 (2013) 46–54
STEP 1
Police data (N=217)
47
Healthcare data (N=237)
Combined dataset of police and healthcaredata after matching (N=292) STEP 2 Matched fatality cases (N=162)
STEP 3
Fatality cases only in police data (N=55)
Analysis of unmatched cases
Fatality cases only in healthcare data
STEP 4
Estimation of true numbers of cases for police data and healthcare data
STEP 5
Estimation of reliability of police data and healthcare data
(N=75)
Fig. 1. Steps of management and analysis of Arkhangelsk traffic mortality data for 2005–2010.
sources can be complicated by existing differences in systems of case accounting, non-corresponding definitions, and registration errors (Aptel et al., 1999; Morrison and Stone, 2000; Clark, 2004; Derriks and Mak, 2007; Lujic et al., 2008; Lyons et al., 2008; Petridou et al., 2009; Hu et al., 2011). The purpose of this paper is to investigate and compare the reliability of police and healthcare traffic mortality data in a Russian urban area by matching fatality cases from the two sources and analysing the causes of inconsistencies. 2. Materials and methods 2.1. Study design and site This is a reliability study of traffic mortality data performed in Arkhangelsk, a city in the Northwestern Russia with a population of 355,556 on 1 January 2011. 2.2. Data sources and description There are two key sources of traffic mortality data for the study site – the police and the Regional Medical Informational Analytic Centre, later called the healthcare statistics centre. The Arkhangelsk police have a computerized database of traffic accidents (crashes) with fatal and non-fatal injuries that occur in the Arkhangelsk region. The data is fed into the database from standardized police accident report forms that contain information about accident time, site, circumstances, involved vehicles,
personal and demographic data for the individuals involved and their health outcomes (Kudryavtsev et al., 2012a). According to the definition adopted by the police in 2009, traffic fatality is a fatal injury resulting from a traffic accident and causing death within 30 days (Government of the Russian Federation, 2009). Previously, a seven-day fatality definition was used by the police. The police registration of traffic fatalities is linked to places and dates of accidents. The police database was the source of data on all traffic fatalities that had occurred in Arkhangelsk and were registered by the police over 2005–2010. For each fatality, the data variables include date of birth, gender, place of residence, date of the accident (injury), type of accident, type of motor vehicle used, road user type and related traffic violations. The Arkhangelsk regional healthcare statistics centre follows the national regulations concerning medical data collection. Analogue centres exist in all the other regions of Russia. The centre routinely receives and accounts standardized medical reports from hospitals and other healthcare institutions (general practitioners, primary health care units, out-patient clinics, emergency ambulance services, morgues) on all cases of fatalities, diseases, and injuries in the residential population of the region and among temporary visitors. Before 2009, the data on fatalities originated only from death certificates, and is also supplied by reports from hospitals thereafter (Ministry of Health and Social Development of the Russian Federation, 2009). Causes of deaths in both these original sources are coded using the International Classification of Diseases, 10th revision (ICD-10). The data are fed into the regional mortality register and used for mortality reports. Registration of traffic fatalities in
48
A.V. Kudryavtsev et al. / Accident Analysis and Prevention 53 (2013) 46–54
the mortality register is linked to places and dates of deaths, is based on pathologists’ diagnosis of the major underlying cause of death, and is not tied to number of days between accidents and deaths. For this study, the regional mortality register was the source of the healthcare data on all traffic fatalities in the city over 2005–2010 (ICD-10 codes V02–04, V09, V12–14, V19, V20–79, and V86–89). For each case, the data variables include date of birth, date of injury, date of death, hospital of death (where applicable), ICD-10 code (containing information on road user type and type of vehicle), gender, and place of residence. Notably, the police and the healthcare sector in Russia are linked by the legislatively mandated system of data collection on road traffic casualties (Government of the Russian Federation, 2009). The system obliges hospitals to report all fatal and non-fatal traffic injuries to the police, and authorizes the police to verify their data on registered cases against the data of hospitals and other healthcare institutions.
false positives and false negatives); (d) non-matches due to scarce identifying information (corresponding cases were regarded as potential false positives in the source where they were present with scarce information and as potential false negatives in the source where they were absent); and (e) non-matches due to nonestablished causes (the same ambiguity regarding corresponding cases). Categorization of each case was elaborated and agreed upon by three persons: the police employee, the employee of the healthcare statistics centre, and the first author. Step 4. The distribution of non-matches by causes was used to calculate estimates of true numbers (ETN) of traffic fatalities in Arkhangelsk in 2005–2010 for each source in appliance with corresponding registration systems and definitions: ETNpolice data =
DAIsource =
cases in the original police data
− +
2.3. Data management and analysis Step 1. The police and the healthcare data on all registered traffic fatalities in Arkhangelsk in 2005–2010 were obtained (Fig. 1). Proportions of missing values by the studied variables were described. Step 2. The police and the healthcare data were initially matched on four variables: date of accident (injury), date of birth, gender, and road user type. Place of accident could not be used as a matching variable as it is not recorded in the healthcare data. Similarly, place of death could not be a matching variable as it is not a part of the police data. Cases were considered matched if the date of injury was the same in the two sources, or differed maximum by ±1 day, and the other variables were identical. Further matching (for 52% of the total cases in the study) was facilitated by the mandated procedure of the police versus healthcare data verification (Government of the Russian Federation, 2009). This verification was performed by authorized employees of the police and the healthcare statistics centre. It involved matching of cases by name (first, middle, and last), gender, date of birth, date of injury and road user group (pedestrian or bicyclist; driver or passenger of a motorized vehicle with four or more wheels; motorcycle rider or passenger). Place of residence was used as a supplementary matching variable where possible. Matching was considered achieved if at least the first and last name, gender, road user group, and year of injury were the same in both datasets. Casualties that were ‘unidentified’ (not established name and approximately estimated year of birth) in one or both datasets were considered matched if the date of accident, gender, and road user group were the same, while the estimated year of birth was similar (±10 years). All matching was performed manually. Step 3. Cases in the police data without matches in the healthcare data were searched for in the regional mortality register of the healthcare statistics centre among all causes of death. Cases in the healthcare data without matches in the police data for the city were searched for in the regional police database of traffic accidents among all registered traffic fatalities and injuries in the Arkhangelsk region (that is, broader than the Arkhangelsk city). If an unmatched case was identified in either of the sources, a cause of the failed matching was investigated. Based on this, the non-matches were categorized into five classes (a) non-matches due to incompatibility of definitions in the two data registration systems; (b) non-matches due to the
− +
ETNhealthcare data =
confirmed false positives in the police data
confirmed false negatives in the police data
potential false positives in the police data
potential false negatives in the police data
cases in the original healthcare data
− + − +
confirmed false positives in the healthcare data
confirmed false negatives in the healthcare data
potential false positives in the healthcare data
potential false negatives in the healthcare data
Step 5. The assessment of reliability of the two sources was performed by adaptation of the formula for diagnostic accuracy (Glasser, 2008; Cleophas et al., 2009; Parfrey and Barrett, 2009): Accuracy =
True positives + True negatives True positives + True negatives + False positives + False negatives
Our calculated ETNs are virtually the estimates of total cases to be present in a data source according to what is most close to be true and can be accepted a as sort of a ‘gold standard’. According to this estimate of a ‘gold standard’ for each data source, every case in the combined set of the police and the healthcare data was categorized as positive, negative, or ‘unclear’ (either positive in the source where present or negative in the source where absent). To calculate accuracy for each source, the obtained distribution of cases according to the estimated ‘gold standard’ was cross-tabulated with the presence or absence of cases in the original data. To solve the problem of the ‘unclear’ cases, we adapted the accuracy formula by including corresponding potential false positives and potential false negatives into its denominator in addition to confirmed false positives and confirmed false negatives. This gave the formula of what we called a data accuracy index (DAI):
True positives + True negatives × 100 True positives + True negatives + Confirmed false positives + Confirmed false negatives + Potential false positives + Potential false negatives
police data failures (confirmed false positives and false negatives); (c) non-matches due to the healthcare data failures (confirmed
With the denominator loaded up by potential false positives and potential false negatives, a DAI gives a conservative estimate
A.V. Kudryavtsev et al. / Accident Analysis and Prevention 53 (2013) 46–54
of the data accuracy in a source. Otherwise, interpretation of a DAI value is identical to interpretation of a customary accuracy value: a DAI value of 100% reflects absolute accuracy of the data and infers high reliability, while a DAI value tending to zero means poor data accuracy and reliability. Therefore, DAIs were used to judge and compare the reliability of the two data sources over the study period. Finally, to estimate changes in the data reliability of our sources, 6-year time trends in annual DAIs for both sources were analyzed using Cochrane-Armitage 2 tests, which were performed using WinPepi program (Abramson, 2004). 2.4. Ethical considerations Ethical approval for the study was obtained from the Ethical Committee of the Northern State Medical University in Arkhangelsk, Russia. To preserve confidentiality, the names were removed from all case records after the matching procedure and search of non-matching cases in police and healthcare databases was completed. Therefore, casualties could not be identified on a personal level by non-employees of the police and the healthcare statistics centre. 3. Results The police registered 217 traffic fatalities in Arkhangelsk in 2005–2010 while the healthcare statistics centre registered 237 traffic fatalities during the same period. The date of injury was recorded for all cases in the police data, but was missing for 2% of cases in the healthcare data. Dates of birth were stated for all except 2% of the cases in the police data. Both police and healthcare data contained cases with approximately estimated years of births (11% and 1%, respectively). Gender was registered for all except one case in the police data. Road user type was recorded for all cases in the police data, and was ascertained from ICD-10 codes for 87% cases in healthcare data. ICD-10 codes of others (V29.9, V43.9, V44.9, V45.9, V48.9, V49.8, V49.9, V58.9, and V89.2) restricted distinguishing between drivers and passengers. Names and places of residence were unknown for 8% of cases in the police data and 2% of cases in the healthcare data. The matching procedure resulted in a database of 292 cases, including 162 matched cases and 130 non-matched cases, 55 being only in the police data and 75 only in the healthcare data (Table 1). 3.1. Analysis of cases in the police data with no matches in the healthcare data Most (89%) of the 55 cases in the police data without matches in the healthcare data were identified in the mortality register (that is, extended to all causes of death) of the healthcare statistics centre (Table 2A). Thirty-seven were wrongly categorized in the mortality register as fatalities in nontraffic accidents (ICD-10 codes V02.0, V03.0, V04.0, V09.0, V13.0, V43.0, V44.0, V47.0, V47.1, V48.0, V49.0, V49.3) although they met the police criteria for traffic fatalities. All could be classified by the type of traffic accident and were associated with concrete traffic violations (for example, ‘pedestrian roadway crossing at improper place’, ‘ignoring traffic lights’). These cases were classified as false negatives in the healthcare data. Six cases were found as fatalities in traffic accidents (ICD-10 codes V03.1, V44.5), but were absent in the original healthcare data for unclear reasons. These were regarded as false negatives in the healthcare data. Four traffic fatalities were identified as fatalities due to other causes. Two of them were classified in the mortality register as
49
deaths due to ‘contact with blunt object, undetermined intent’ (Y29.0). According to the police, both cases were pedestrian casualties in traffic accidents on public roads that involved specified traffic violations (‘pedestrian roadway crossing at improper place’ and ‘ignoring pedestrian crossing by a driver’). Both were classified as false negatives in the healthcare data. The third fatality was registered as drowning (Y21.0), and was a passenger of a car that fell from a bridge, according to the police. The fourth case was identified as a fatality due to misadventure during medical care (Y65.4) which, as concluded from the police data, occurred to a pedestrian casualty. The latter two cases primarily resulted from traffic accidents, but died from other causes. Therefore, the two cases were treated as non-matches due to incompatibility of the definitions used by the two registration systems. One case was found as a fatality in an ‘unspecified transport accident’ (V99.0), although the case was clearly a traffic fatality according to the police data (a driver killed in collision due to ‘driving into opposite lane’), and thus was a false negative in the healthcare data. One more case was identified in the mortality register as a fatality in January 2011 that resulted from a traffic accident in December 2010. The classification was correct in both data sources: the police registered it in 2010 by the date of causal accident, while in the healthcare system it was registered in 2011 by the date of death. The case was classified as a non-match due to incompatibility of definitions in the two systems. Six (11%) fatalities in the police data without matches in the healthcare data were not identified in the mortality register. Two of them could not be found because they had no name and date of birth records in the police data (were recorded as ‘unidentified’), while the available police information (date of injury, gender, and road user type) did not obtain matches in the healthcare data. These non-matches were attributed to scarce identifying information. They could be regarded either as potential false negative cases in the healthcare data or false positives in the police data. The four remaining cases were casualties in traffic accidents according to the police and the purposely checked court documents, but were not in the mortality register. These were classified as false negatives in the healthcare data.
3.2. Analysis of cases in the healthcare data with no matches in the police data Most (93%) of the 75 cases in the healthcare data without matches in the police data for the city were found in the regional police database of traffic accidents (Table 2, Part B). Forty were identified in the regional police database as fatalities resulting from accidents outside the city, although all died at the city hospitals according to the healthcare records. These cases were correctly classified by both data sources according to their registration rules (the police counted them as fatalities from accidents outside the city, and the healthcare counted them as cases of traffic deaths in the city). These cases were clearly non-matches due to the incompatibility of definitions in the two systems. Fourteen fatalities were found registered in the regional police database as non-fatal injuries resulting from accidents in the city. Eleven of them died over 2005–2008 within 8–32 days after the accidents, and, therefore, did not meet the 7-day fatality definition used by the police during this period. Three remaining cases died over 2009–2010 within 31–104 days after accidents, and therefore did not meet the 30-day fatality definition adopted by the police from 2009. All these 14 cases were correctly classified in both sources according to the used definitions. All of them were classified as non-matches due to the discrepancy of definitions in the two systems.
50
Table 1 Analysis of unmatched traffic fatality cases in police and healthcare data (N = 130). Number
(A) Present only in police data, total Identified in regional mortality register: Fatalities wrongly coded as nontraffic accidents Omitted traffic fatalities in the city Fatalities due to other causes A fatality due to unspecified transport accident A fatality in 2011 Not identified in regional mortality register: Lack of personal information in police data Not registered fatality cases
55 (100.0)
(B) Present only in healthcare data, total Identified in regional registry of traffic accidents: Fatalities resulting from accidents outside the city Non-fatal injuries from accidents in the city Non-fatal injuries from accidents outside the city – A fatality from accident in 2004 Not identified in regional registry of traffic accidents: – A fatality from accident with industrial vehicle – Lack of personal information in healthcare data – Unclear reason Grand total
Non-matches due to incompatibility of registration systems
Non-matches due to police data failures
Non-matches due to healthcare data failures
False negatives
False positives
False negatives
False positives
Non-matches due to scarce identifying information
Not classifiable non-matches
37 (67.3)
–
–
–
37
–
–
–
6 (10.9) 4 (7.3) 1 (1.8) 1 (1.8)
– 2 – 1
– – – –
– – – –
6 2 1 –
– – – –
– – – –
– – – –
2 (3.6) 4 (7.3)
– –
– –
– –
– 4
– –
2 –
– –
40 (53.3)
40
–
–
–
–
–
–
14 (18.7) 15 (20.0)
14 15
– –
– –
– –
– –
– –
– –
1 (1.3)
1
–
–
–
–
–
–
1 (1.3)
–
–
–
–
1
–
–
1 (1.3)
–
–
–
–
–
1
–
3 (4.0) 130 (100.0)
– 73 (56.2)
– 0 (0.0)
– 0 (0.0)
– 1 (0.8)
– 3 (2.3)
3 3 (2.3)
75 (100.0)
– 50 (38.5)
A.V. Kudryavtsev et al. / Accident Analysis and Prevention 53 (2013) 46–54
Types of cases
A.V. Kudryavtsev et al. / Accident Analysis and Prevention 53 (2013) 46–54
51
Table 2 Estimation of the 6-year true number of fatalities for the police data with the use of the police’s traffic fatality definitiona , registration rulesb , and the obtained distribution of non-matches between the police and the healthcare data by causes. Formulae’s components
Values
Number of cases in the original police data Number of confirmed false positives in the police data Number of confirmed false negatives in the police data Number of potential false positives in the police data (unmatched cases in the police data due to scarce identifying information in the police data) Number of potential false negatives in the police data including: • 1 unmatched case in the healthcare data due to scarce identifying information in the healthcare data • 3 unmatched cases in the healthcare data due to not established causes (not classifiable non-matches) Estimated true number of cases in the police data
217 0 0 2 (subtracted) 4 (added)
a b
219
7-day fatality definition in 2005–2008 and 30-day fatality definition in 2009–2010. Case registration is linked to places and dates of traffic accidents.
Fifteen fatalities were identified in the regional police database as non-fatal injuries in accidents outside the city. According to the healthcare data, all these casualties died in Arkhangelsk hospitals. Being injured in accidents outside the city, these cases could not be present in the police data among the casualties in Arkhangelsk. These non-matched cases were attributed to the incompatibility of definitions in the two systems. One case of traffic death in January 2005 was identified in the regional police database as a fatality resulting from an accident in December 2004. It was correctly not included into the police mortality data for 2005–2010 according to the police registration rules, and thus was regarded as a non-match due to the incompatibility of definitions in the two systems. Five (7%) fatalities in the healthcare data without matches in the police data were not found in the regional police database of accidents with fatal and-non fatal injuries. One of them was coded in the healthcare data as driver of special industrial vehicle who died in a traffic accident (V83.0). The accident occurred on industrial territory (not on a public road), and thus the traffic fatality was a false positive in the healthcare data. One case could not be found in the police regional database because of the name and date of birth were absent from the healthcare data. This non-match was attributed to scarce identifying information. Three remaining cases were not found in the police regional database for unclear reasons. They could be classified as (i) false negatives in the police data due to being non-reported to the police by the hospitals, or as (ii) false positives in the healthcare data due to incorrect ICD-10 coding. Both causes were probable and none could be proved. Therefore, these non-matches were defined as not classifiable.
and thereafter increased up to 91%, 93% and 98% in 2008, 2009 and 2010, respectively (2 for linear trend = 23.90, DF = 1, P < 0.001). 4. Discussion To our knowledge, this is the first Russian study addressing reliability of traffic mortality data of the police and the healthcare sector. Our findings suggest that reliability of the police data on traffic fatalities was higher and more stable in 2005–2010, compared to reliability of the healthcare data. However, reliability of the healthcare traffic mortality data showed a substantial improvement over the study period. 4.1. Possible explanations of the findings The higher reliability of the police data may be explained by a well-established statutory system used for the registration of traffic casualties (Government of the Russian Federation, 2009). The police receive information from hospitals, and verify its own data against the data of healthcare institutions. The opposite flow of information is not regulated; therefore, the hospitals get less case information, compared to the police. Based on that, one can expect better reliability of the healthcare data given a better supply of the police data on related traffic accidents. Similar derivations were made in a study in New York, the USA (Thihalolipavan et al., 2011). We hope that our study will facilitate mutual information exchange between the two sources. Another plausible explanation of the lower reliability of the healthcare data is a higher proportion of errors, with the incorrect ICD-10 coding being the most common problem. For
3.3. Summarized results of the analysis of unmatched cases
3.4. Data reliability estimates The 6-year ETN of traffic fatalities for the police data was 219 (Table 3), and the ETN for the healthcare data was 284 (Table 4). The difference between the two ETNs is a result of the difference between the two sources in traffic fatality definitions and registration rules. The 6-year DAI for traffic mortality data of the police was 98%, while the DAI for the healthcare data was 80%. The DAI for the police data was stable in 2005–2010, and ranged from 96% to 100% (2 for linear trend = 0.31, DF = 1, P = 0.580) (Fig. 2). The DAI for the healthcare data decreased from 66% in 2005 to 57% in 2007,
100
Data accuracy index, %
In total, 73 (56%) of the 130 unmatched cases were due to the incompatibility of definitions in the two registration systems, 51 (39%) were due to failures in the healthcare data, and none could be distinctly attributed to failures in the police data. Three nonmatches (2%) were due to scarce identifying information in either of the sources, and three (2%) could not be clearly classified.
80 60 40 20 0 2005
2006
2007
2008
2009
2010
Year Police data P trend = 0.580
Healthcare data P trend < 0.001
Fig. 2. Reliability of police and healthcare data on traffic fatalities in the city of Arkhangelsk in 2005–2010.
52
A.V. Kudryavtsev et al. / Accident Analysis and Prevention 53 (2013) 46–54
Table 3 Estimation of the 6-year true number of fatalities for the healthcare data with the use of the healthcare’s traffic fatality definitiona , registration rulesb , and the obtained distribution of non-matches between the healthcare and the police data by causes. Formulae’s components
Values
Number of cases in the original healthcare data Number of confirmed false positives in the healthcare data Number of confirmed false negatives in the healthcare data Number of potential false positives in the healthcare data including: • 1 unmatched case in the healthcare data due to scarce identifying information in the healthcare data • 3 unmatched cases in the healthcare data due to not established causes (not classifiable non-matches) Number of potential false negatives in the healthcare data (unmatched cases in the police data due to scarce identifying information in the police data) Estimated true number of cases in the healthcare data
237 1 (subtracted) 50 (added) 4 (subtracted)
a b
2 (added) 284
The definition is based on pathologist’s diagnosis of the major underlying cause of death and is not related to a time between accident and death. Case registration is linked to places and dates of deaths.
an unknown reason, this problem occurred most often in 2007, and this explains the drop in the DAI in that year. The increase in reliability of the healthcare data in recent years can be explained primarily by the introduction of a double reporting of traffic fatalities to the healthcare statistics centre. Thus from 2009, hospitals reported each traffic fatality to the police by using a new standardized form (Ministry of Health and Social Development of the Russian Federation, 2009), and copies of these forms were received by the healthcare statistics centre in addition to the death certificates data. Other possible explanations are: (i) improved competence of the personnel responsible for data coding and (ii) general increase in appreciation of the importance of accurate ICD-10 coding for public health. 4.2. Limitations 4.2.1. Restriction to mortality data Restriction of our study to traffic mortality data may be a limitation of our study, because the reliability of police data on traffic fatalities is known to be high across countries (Elvik and Mysen, 1999; Derriks and Mak, 2007; Amoros et al., 2008; Petridou et al., 2009). Therefore, one could argue for a greater relevance of a similar study on non-fatal injuries, which are more frequently underreported (Ytterstad and Wasmuth, 1995; Razzak and Luby, 1998; Aptel et al., 1999; Elvik and Mysen, 1999; Morrison and Stone, 2000; Rosman, 2001; Amoros et al., 2006, 2008; Derriks and Mak, 2007; Jeffrey et al., 2009; Petridou et al., 2009). However, there are also recent studies showing that the reliability of traffic mortality data needs improvement globally. For example, the completeness of police data on traffic fatalities was estimated to be only 37% in China (Hu et al., 2011), 45% in Pakistan (Lateef, 2010), and 60% in Western Australia (Meuleners et al., 2006). Therefore, a reliability study on the Russian traffic mortality data seemed timely. 4.2.2. Restriction to small setting Using the data from only one city with a relatively small number of cases may also be a limitation of our study. The data limits were stipulated by authors’ involvement in safety activities at the particular setting (Kudryavtsev et al., 2012b). On the other hand, an average of 35 traffic fatalities at the setting per year allowed individual analysis of each case and facilitated accurate manual matching. For instance, a high confidence matching only by date of accident, gender and road user type was possible even for casualties with non-registered names and roughly estimated ages. The small number of cases also allowed in-depth investigations of all the non-matches. The latter would be problematic in a large study using computerized matching systems. The benefits of manual matching of relatively small numbers of cases have been also described by other authors (Whitfield and Kelly, 2002; Clark, 2004; Petridou et al., 2009). Nevertheless, we would recommend further similar studies to cover larger settings.
Despite being rather local, we believe our study has touched upon issues that should be considered in future traffic injury and mortality research on larger territories. The major issue is the impropriety of comparisons between police and healthcare data unless the differences between the registration systems are taken into account. Another is the problem of imprecise ICD-10 coding in the healthcare data (Brenner, 1994; Morrison and Stone, 2000; Derriks and Mak, 2007; Lyons et al., 2008; Thihalolipavan et al., 2011). 4.2.3. Unidentified false negative and false positive cases within the two sources A potential problem of cross-validating two dependent data sources against each other is the probability of unregistered casualties in both of them (Hook and Regal, 1995; Elvik and Mysen, 1999). For that reason, there can be non-established false negative cases in both sources. This problem may have caused underestimation of the ETNs for both sources and overestimation of their reliability (International Working Group for Disease Monitoring and Forecasting, 1995). However, we believe the problem is not sufficiently large to bias our conclusions as a traffic death is a rather serious event not to be often missed in a Russian city by both the police and the healthcare. Notably, we considered using insurance companies as a third traffic mortality data source. This was not done for two reasons: (i) there is a variety of insurance companies in the city, and collecting data from all of them is problematic; (ii) the insurance data depends on the police and healthcare data (insurance is not paid in Russia until death is certified by healthcare, and cause of death is documented by the police). An alternative problem could have arisen from false positive cases in both sources. For instance, it has been shown that some of the registered traffic fatalities may in fact be natural deaths or suicides (Routley et al., 2003). In the 6-year period, the police identified only four natural deaths during driving, and no single traffic suicide. That might reflect a lack of autopsies and poor in-depth event history investigations. Therefore, some natural deaths or suicides may be false positive cases in our study. Similarly to the problem of non-established false negative cases, this could have led to an overestimation of data reliability estimates. 4.2.4. Likelihood of false negative and false positive matches between the two sources Another potential problem might have originated from imperfect reliability of the matching procedure itself. Thus aside from the non-established false positive and false negative cases within the two sources, we might have some false positive and false negative matches between the two sources. False negative matches might have resulted from data inaccuracies and lack of compatibility between the compared sources, and might have lead to an underestimation of the reliability of both sources (Brenner, 1994; Hook and Regal, 1995). We addressed that validity threat by a
A.V. Kudryavtsev et al. / Accident Analysis and Prevention 53 (2013) 46–54
thorough individual investigation of causes of all the non-matches, so the number of potential false negative matches should not be large enough to bias our conclusions. On the other hand, our efforts to minimize the number of possible false negative matches might have resulted in some false positive matches, and these might have led to overestimated reliability of both sources. Reflecting to that, we may argue that the matching was based on several variables including names, so the possible number of false positive matches should also be rather small to substantially affect our conclusions. 4.3. Methodological issues 4.3.1. Disrupted matching on date variables In our study we encountered frequent inconsistencies between dates of accidents in the police data and dates of injuries in the healthcare data (25% of matched cases with differences ranging from 1 to 47 days). This was explained at the healthcare statistics centre by the common practice of recording a date of injury as identical to a date of death when no exact date of injury had been specified in the medical records. An approximately estimated date of birth was also a common problem in the police data. Similar data deficiencies were acknowledged by other authors (Rosman and Knuiman, 1994; Clark, 2004). These deficiencies distorted our initial plan of matching on the four variables (date of the injury, date of birth, gender, and road user type). 4.3.2. Inapplicability of capture–recapture method Capture–recapture method is often used to test the completeness and estimate the reliability of traffic injury and mortality data (Razzak and Luby, 1998; Morrison and Stone, 2000; Meuleners et al., 2006; Lateef, 2010; Hu et al., 2011). However, this method was not applicable to our study. The four main assumptions of the method are: (i) record-linkage in capture and recapture sources is based on a common definition, is perfect, and has no errors; (ii) capture and recapture sources are independent from each other; (iii) the studied population is closed for in- and out-migration during the study period; and (iv) all cases in the studied population have the same probability of being ascertained (Wittes et al., 1974; Brenner, 1994; Hook and Regal, 1995; International Working Group for Disease Monitoring and Forecasting, 1995; Razzak and Luby, 1998; Morrison and Stone, 2000; Whitfield and Kelly, 2002; Lateef, 2010). At least the first two of these assumptions would have been violated in our study. The major problem is the absence of a common fatality definition in the two sources: the police uses the definition with specified time limit (the seven-day fatality definition before 2008 and the 30-day limit since 2009) and register cases according to places of traffic accidents, while the healthcare defines a case according pathologists’ diagnosis of the major underlying cause of death (that is, with no time restraints) and records cases according to places of deaths. As a result: (i) the cases are defined differently, and (ii) the police data de facto contain no information on places of deaths, while the healthcare data contain no information about the places of the accidents. These data inconsistencies between the two sources make the capture–recapture obviously inappropriate. Other obstacles are the dependency between the sources and the described registration errors. For these reasons we refrained from using capture–recapture method and employed methods of assessing diagnostic accuracy to estimate and compare the reliability of the two data sources. 5. Conclusion Our findings suggest that traffic mortality data of the Arkhangelsk police were more reliable in the study period, compared to the healthcare data. Therefore, the police data seem more
53
suitable for local road safety assessment as well as for international comparisons, particularly from 2009, after the 30-day traffic fatality definition was adopted. Reliability of the healthcare data was lower, but it has substantially improved over the study period. Funding The study was supported by SpareBank1 Nord-Norge and the Department of Community Medicine, University of Tromsø. Acknowledgements We thank Alexey Maximov, Andrei Stolyarov, Alexander Milyakov, Olga Gushchina, and other employees of the Arkhangelsk State Road Safety Inspectorate and the Arkhangelsk Regional Medical Informational Analytic Centre for their collaboration in data collection and verification. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.aap.2012.12.022. These data include Google maps of the most important areas described in this article. References Abramson, J.H., 2004. Winpepi (PEPI-for-Windows): computer programs for epidemiologists. Epidemiologic Perspectives and Innovations 1, 6. Amoros, E., Martin, J.-L., Lafont, S., Laumon, B., 2008. Actual incidences of road casualties, and their injury severity, modelled from police and hospital data, France. The European Journal of Public Health 18 (4), 360–365. Amoros, E., Martin, J.-L., Laumon, B., 2006. Underreporting of road crash casualties in France. Accident Analysis and Prevention 38 (4), 627–635. Aptel, I., Salmi, L.R., Masson, F., Bourde, A., Henrion, G., Erny, P., 1999. Road accident statistics: discrepancies between police and hospital data in a French island. Accident Analysis and Prevention 31 (1–2), 101–108. Bhalla, K., Navaratne, K.V., Shahraz, S., Bartels, D., Abraham, J., Dharmaratne, S., 2010. Estimating the incidence of road traffic fatalities and injuries in Sri Lanka using multiple data sources. International Journal of Injury Control and Safety Promotion 17 (4), 239–246. Bhalla, K., Shahraz, S., Bartels, D., Abraham, J., 2009. Methods for developing country level estimates of the incidence of deaths and non-fatal injuries from road traffic crashes. International Journal of Injury Control and Safety Promotion 16 (4), 239–248. Brenner, T., 1994. Application of capture–recapture methods for disease monitoring: potential effects of imperfect record linkage. Methods of Information in Medicine 33 (5), 502–506. Clark, D.E., 2004. Practical introduction to record linkage for injury research. Injury Prevention 10 (3), 186–191. Cleophas, T.J., Cleophas, E.P., Cleophas, T.F., Zwinderman, A.H., 2009. Statistics Applied to Clinical Trials. Springer Netherlands, Dordrecht. Derriks, H.M., Mak, P.M., 2007. Underreporting of traffic casualties. IRTAD Special Report. Ministry of Transport, Public Works and Water management, The Netherlands. Retrieved from:
. Elvik, R., Mysen, A.B., 1999. Incomplete accident reporting: meta-analysis of studies made in 13 countries. Transportation Research Record: Journal of the Transportation Research Board 1665, 133–140. Glasser, S.P., 2008. Essentials of Clinical Research. Springer Netherlands, Dordrecht. Hook, E.B., Regal, R.R., 1995. Capture–recapture methods in epidemiology: methods and limitations. Epidemiologic reviews 17 (2), 243–264. Hu, G., Baker, T., Baker, S.P., 2011. Comparing road traffic mortality rates from policereported data and death registration data in China. Bulletin of the World Health Organization 89, 41–45. Government of the Russian Federation, 2009. Decree ‘on approval of rules for accounting of road traffic accidents’. No. 647 of 29.06.1995, edition of 14.02.2009. Retrieved from: (in Russian). International Working Grou p for Disease Monitoring and Forecasting, 1995. Capture–recapture and multiple-record systems estimation. I: history and theoretical development. American Journal of Epidemiology 142 (10), 1047–1058. Jeffrey, S., Stone, D.H., Blamey, A., Clark, D., Cooper, C., Dickson, K., Mackenzie, M., Major, K., 2009. An evaluation of police reporting of road casualties. Injury Prevention 15 (1), 13–18. Kudryavtsev, A.V., Nilssen, O., Lund, J., Grjibovski, A.M., Ytterstad, B., 2012a. Road traffic crashes with fatal and non-fatal injuries in Arkhangelsk, Russia
54
A.V. Kudryavtsev et al. / Accident Analysis and Prevention 53 (2013) 46–54
in 2005-2010. International Journal of Injury Control and Safety Promotion, http://dx.doi.org/10.1080/17457300.2012.745576 Kudryavtsev, A.V., Nilssen, O.R., Sumarokov, Y., Ytterstad, B., 2012b. Injury prevention and safety promotion course in a Russian master of public health programme. International Journal of Injury Control and Safety Promotion 19 (3), 290–296. Lujic, S., Finch, C., Boufous, S., Hayen, A., Dunsmuir, W., 2008. How comparable are road traffic crash cases in hospital admissions data and police records? An examination of data linkage rates. Australian and New Zealand Journal of Public Health 32 (1), 28–33. Lyons, R.A., Ward, H., Brunt, H., Macey, S., Thoreau, R., Bodger, O.G., Woodford, M., 2008. Using multiple datasets to understand trends in serious road traffic casualties. Accident Analysis and Prevention 40 (4), 1406–1410. Meuleners, L.B., Lee, A.H., Cercarelli, L.R., Legge, M., 2006. Estimating crashes involving heavy vehicles in Western Australia, 1999–2000: a capture–recapture method. Accident Analysis and Prevention 38 (1), 170–174. Ministry of Health and Social Development of the Russian Federation, 2009. Order ‘On approval of statistical tools for accounting victims of road traffic accidents’. No. 18 of 26.01.2009. Retrieved from: (in Russian). Morrison, A., Stone, D.H., 2000. Capture–recapture: a useful methodological tool for counting traffic related injuries? Injury Prevention 6 (4), 299–304. Lateef, M.U., 2010. Estimation of fatalities due to road traffic crashes in Karachi, Pakistan, using capture–recapture method. Asia-Pacific Journal of Public Health 22 (3), 332–341. Parfrey, P.S., Barrett, B., 2009. Clinical Epidemiology. Humana Press, New York, NY. Petridou, E.T., Yannis, G., Terzidis, A., Dessypris, N., Germeni, E., Evgenikos, P., Tselenti, N., Chaziris, A., Skalkidis, I., 2009. Linking emergency medical department and road traffic police casualty data: a tool in assessing the burden of injuries in less resourced countries. Traffic Injury Prevention 10 (1), 37–43.
Razzak, J.A., Luby, S.P., 1998. Estimating deaths and injuries due to road traffic accidents in Karachi, Pakistan, through the capture–recapture method. International Journal of Epidemiology 27 (5), 866–870. Rosman, D.L., 2001. The western Australian road injury database (1987–1996): ten years of linked police, hospital and death records of road crashes and injuries. Accident Analysis and Prevention 33 (1), 81–88. Rosman, D.L., Knuiman, M.W., 1994. A comparison of hospital and police road injury data. Accident Analysis and Prevention 26 (2), 215–222. Routley, V., Staines, C., Brennan, C., Haworth, N., Ozanne-Smith, J., 2003. Suicide and natural deaths in road traffic: review. Monash University Accident Research Centre, Victoria. Retrieved from: . Thihalolipavan, S., Madsen, A., Smiddy, M., Li, W., Begier, E., Zimmerman, R., 2011. Etiology of nonspecific cause of death coding in New York city motor vehicle crash-related fatalities. Traffic Injury Prevention 12 (1), 18–23. Whitfield, K., Kelly, H., 2002. Using the two-source capture–recapture method to estimate the incidence of acute flaccid paralysis in Victoria, Australia. Bulletin of the World Health Organization 80, 846–851. Wittes, J.T., Colton, T., Sidel, V.W., 1974. Capture–recapture methods for assessing the completeness of case ascertainment when using multiple information sources. Journal of Chronic Diseases 27 (1–2), 25–36. World Health Organization, 2004. World Report on Road Traffic Injury Prevention. World Health Organization, Geneva, Switzerland. World Health Organization, 2009a. European Status Report on Road Safety: Towards Safer Roads and Healthier Transport Choices. WHO Regional Office for Europe, Copenhagen. World Health Organization, 2009b. Global Status Report on Road Safety: Time for Action. World Health Organization, Geneva. Ytterstad, B., Wasmuth, H.H., 1995. The Harstad injury prevention study: evaluation of hospital-based injury recording and community-based intervention for traffic injury prevention. Accident Analysis and Prevention 27 (1), 111–123.