Forensic Science International 208 (2011) 173–179
Contents lists available at ScienceDirect
Forensic Science International journal homepage: www.elsevier.com/locate/forsciint
An analytical evaluation of eight on-site oral fluid drug screening devices using laboratory confirmation results from oral fluid Tom Blencowe a,*, Anna Pehrsson a, Pirjo Lillsunde a, Kari Vimpari a, Sjoerd Houwing b, Beitske Smink c, Rene´ Mathijssen b, Trudy Van der Linden d, Sara-Ann Legrand d, Kristof Pil d, Alain Verstraete d a
Alcohol and Drug Analytics Unit, National Institute for Health and Welfare, P.O. Box 30, FI-00271 Helsinki, Finland SWOV Institute for Road Safety Research, P.O. Box 1090, 2260 BB, Leidschendam, The Netherlands Department of Pathology and Toxicology, Netherlands Forensic Institute, P.O. Box 24044, 2490 AA, The Hague, The Netherlands d Department of Clinical Chemistry, Microbiology and Immunology, Ghent University, De Pintelaan 185, 9000 Ghent, Belgium b c
A R T I C L E I N F O
A B S T R A C T
Article history: Received 10 September 2010 Received in revised form 1 November 2010 Accepted 26 November 2010 Available online 22 December 2010
The performance of eight on-site oral fluid drug screening devices was studied in Belgium, Finland and the Netherlands as a part of the EU-project DRUID. The main objective of the study was to evaluate the reliability of the devices for testing drivers suspected of driving under the influence of drugs (DUID). The performance of the devices was assessed by their ability to detect substances using cut-offs which were set at sufficiently low levels to allow optimal detection of positive DUID cases. The devices were evaluated for the detection of amphetamine(s), cannabis, cocaine, opiates and benzodiazepines when the relevant test was incorporated. Methamphetamine, MDMA and PCP tests that were included in some devices were not evaluated since there were too few positive samples. The device results were compared with confirmation analysis results in oral fluid. The opiates tests appeared to perform relatively well with sensitivity results between 69 and 90%. Amphetamines and benzodiazepines tests had lower sensitivity, although the DrugWipe test evaluated was promising for amphetamine. In particular, it is evident that the cannabis and cocaine tests of the devices still lack sensitivity, although further testing of the cocaine tests is desirable due to the low prevalence and low concentrations encountered in this study. ß 2010 Elsevier Ireland Ltd. All rights reserved.
Keywords: On-site testing Oral fluid Drugs of abuse Driving under the influence
1. Introduction Enforcement of driving under the influence of drugs (DUID) is based on detection. The form of detection that may be used is specific to the legislation in consideration. With regard to this, several EU countries have introduced zero-tolerance (per se) legislation for drugged driving, making it sufficient merely to prove that a driver has used an illegal or prescribed substance rather than actual impairment. On-site drug screening can be used to indicate drug use, however the device should be easy to use, rapid and reliable. Such an on-site drug screening device is a requirement that police officers involved in road safety have expressed for a number of years [1]. Nonetheless, even with efficient on-site testing the necessity for confirmation analysis of a body fluid sample from the suspect driver is not negated. Previously, selective police enforcement on drugged driving has been performed in a limited number of EU member states, mostly using urine based screening devices. The experiences have shown that performing a urine on-site screening procedure is very
* Corresponding author. Tel.: +358 20 610 7762; fax: +358 20 610 8553. E-mail address: thomas.blencowe@thl.fi (T. Blencowe). 0379-0738/$ – see front matter ß 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.forsciint.2010.11.026
complicated at the roadside. Additionally, the presence of drugs, and their metabolites, in urine does not necessarily indicate the use of those substances immediately preceding, or even within a few hours before, driving [2]. However, it is already apparent that with new national legislations continuously arising, on-site screening devices are being used more and more. Germany was the first European country to introduce zero-tolerance legislation for drugs, prohibiting driving under the influence of the drugs cannabis, cocaine, heroin, morphine, amphetamine and the designer drugs MDMA and MDEA in 1998 (Road Traffic Code, Section 24 a StVG). In most of the Federal States of Germany roadside drug tests have been introduced on a routine basis where there is suspicion of drug use, these can take the form of a urine, sweat or oral fluid (OF) screening device [3]. Studies have suggested that random roadside drug screening can act as an effective deterrent to DUID [1,4,5]. Indeed random OF screening of drivers in the state of Victoria, Australia has already proven to be a useful means for providing a deterrent to drug driving. In December 2004 police in Victoria, Australia, started a random drug testing program for drivers, modeled on more than 30 years of random testing for alcohol in Victoria and throughout Australia [4]. The observed prevalence in screened drivers for the drugs which are currently included in the legislation of Victoria
174
T. Blencowe et al. / Forensic Science International 208 (2011) 173–179
(MDMA, methamphetamine and cannabis) have all decreased [5]. Furthermore, awareness of random OF testing has increased from 78% to 92% of drivers in the initial period following the introduction of testing [1]. By using two on-site screening devices in combination and a confirmation analysis, a false positive rate of about 1% over 5 years was also attained. For this purpose a false positive case was defined as both devices giving positive results which are not confirmed in the laboratory toxicological analysis [5]. The first on-site screening devices for drug testing in OF were developed in the 1990s [6]. Large-scale multinational evaluations of the performance of on-site OF screening devices have been conducted in the ROSITA project [7] and the following Rosita-2 project [1]. In addition, numerous different on-site OF screening devices have been evaluated in several publications, e.g. [8–22]. The Rosita projects, especially Rosita-2, provide a reasonable baseline for comparison with the results of this study. The latter publications were based on widely varying study protocols, and accordingly, their contents cannot be directly compared with this study without more thorough consideration. The focus of the European Integrated Project DRUID (Driving under the Influence of Drugs, Alcohol and Medicines) has been on the improvement of road safety related to the problem of alcohol, drugs and medicines used or abused by drivers of vehicles in the road transport system. The objectives of DRUID were to give scientific support to the EU transport policy by providing a solid basis to generate harmonized, EU-wide regulations for driving under the influence of alcohol, drugs and medicine. 2. Materials and methods 2.1. General information This analytical evaluation of OF screening devices was carried out as an integral part of the DRUID project from October 2007 to December 2009. The reliability and accuracy of the screening devices were assessed by comparing the results from the devices with the results of the confirmation analyses. For this purpose, OF samples were collected from the participants. The performance of the tests was assessed based on sensitivity, specificity and accuracy for the individual substance tests of the devices. These were assessed based on cut-offs which were set within the DRUID project to allow optimal detection of drug positive cases. 2.2. Devices Eight on-site OF screening tests were evaluated in the study: BIOSENS1 Dynamic (Biosensor Applications Sweden AB, Solna, Sweden), Cozart1 DDS 806 (Concateno UK plc, Abingdon, UK), DrugWipe1 5+ (Securetec Detektions-Systeme AG, Munich, Germany), Dra¨ger DrugTest1 5000 (Dra¨ger Safety AG & CO. KGaA, Lubeck, Germany), OraLab16 (Varian, Inc., Palo Alto, CA, USA), OrAlertTM (Innovacon, Inc., San Diego, CA, USA), Oratect1 III (Branan Medical Corporation, Irvine, CA, USA), and RapidSTAT1 (Mavand Solutions GmbH, Mo¨ssingen, Germany). RapidSTAT1 was tested in all three countries and DrugTest1 5000 in Belgium and the Netherlands. All other devices were tested in only one country. As a result of product development by the manufacturers some of the on-site OF screening tests evaluated, i.e. DrugWipe1 5+, Dra¨ger DrugTest1 5000 and RapidSTAT1, were modified during the course of the study. The manufacturers of the devices provided the tests for free. Where provided by the manufacturer the target compound/cut-off of each individual test is indicated in parentheses. The BIOSENS1 Dynamic OF test consists of a collector and a reader. The detection system is based on a piezoelectric quartz crystal microbalance (QCM) technology using a monoclonal antibody as the specific element. Samples for testing are collected with a collection device by wiping the tongue of the subject. The sample is then processed in the BIOSENS1 oral screening system. A complete analysis takes approximately 2–3 min including sample acquisition. The detected drugs are: methamphetamine and MDMA, D9-THC, cocaine, opiates and benzodiazepines. At the time of the study the manufacturer stated that the cut-off values for this device were undetermined. The Cozart1 DDS system comprises a collector swab, a buffer bottle, a disposable test cartridge and a handheld instrument for result interpretation. When testing, the collector swab is swabbed around the gums, tongue and inside the cheek until the sample presence indicator turns completely blue. The swab is placed into the buffer bottle and the contents of the bottle are mixed. Four drops of fluid from the dropper are applied across the sample well of the test cartridge. As soon as fluid appears on each of the four white cartridge membrane strips (between 2 and 30 s), the cartridge is inserted into the DDS instrument and a new test is initiated. Once the test has been completed, the results are displayed on the instrument. The
manufacturer states that 2 drugs can be tested in 90 s, and 5/6 drug classes in 5 min. The DDS 806 device is a six-drug test kit for amphetamine (50 ng/ml), methamphetamines (methamphetamine, 50 ng/ml), cannabis (D9-THC, 31 ng/ml – validation for cut-off performed using real patient samples), cocaine (benzoylecgonine, 30 ng/ml), opiates (morphine, 30 ng/ml) and benzodiazepines (temazepam, 20 ng/ml). The DrugWipe1 5+ device consists of an OF collector, a detection element and an integrated liquid ampoule. To carry out the test, the OF collector is separated from the test body. The tongue or the cheek of the tested person is wiped and the collector is then attached to the test body. The ampoule is pressed so that it opens and the buffer solution flows to the test strips. The results can be read when control lines appear on both strips. Red control lines indicate a successful test while other red lines indicate a positive result. The total time needed for testing is 3–10 min. The device incorporates four individual tests: amphetamines (amphetamine, 50 ng/ml) including methamphetamine (25 ng/ml) and MDMA (25 ng/ml), cannabis (D9-THC, 30 ng/ml), cocaine (benzoylecgonine, 30 ng/ml) and opiates (codeine, 10 ng/ml). The Dra¨ger DrugTest1 5000 test system comprises the Dra¨ger DrugTest1 5000 Analyzer and a test kit. The test kit consists of a test cassette with an OF collector. OF is collected by using the OF collector on the test cassette. The built-in indicator turns blue when collection is ready. Then the test cassette and the cartridge are placed into the analyzer. Results are shown within a few minutes. The following substances are detected: amphetamine (50 ng/ml), methamphetamines (methamphetamine 35 ng/ml), cannabis (D9-THC, 5 ng/ml: newer version and 25 ng/ml: previous version), cocaine (cocaine, 20 ng/ml), opiates (morphine, 20 ng/ml) and benzodiazepines (diazepam, 15 ng/ml). The OraLab16 test consists of a collection stick and an expresser vial. The collector stick is kept in the mouth of the tested person for 3 min until the foam is thoroughly soaked. The collector is then placed foam-first into the expresser. The cap of the expresser is used to push the collector all the way down into the expresser and the cap is twisted tightly into place. The OF collected will flow to the vial, and results can be read from between 10 and 15 min. Drug classes tested are amphetamine (50 ng/ml), methamphetamine (50 ng/ml), cannabis (D9-THC, 50 ng/ ml), cocaine (20 ng/ml), opiates (morphine, 40 ng/ml) and PCP (10 ng/ml). The OrAlertTM test consists of a sucrose and citric acid coated collector stick and of a test device. OF is collected with the sponge of the collector stick. When the sponge softens slightly, the tested person is instructed to gently press the sponge between the tongue and teeth to ensure saturation of the collector. The sample is collected for 3 min. The collector stick is inserted to the test device. After 1 min the collection chamber is turned counterclockwise. The results can be read after another 9 min. The total test time is approximately 10 min. The device enables testing of six (types of) drugs simultaneously: amphetamine (50 ng/ml), methamphetamines (methamphetamine, 50 ng/ml) including MDMA (50 ng/ml), cannabis (D9-THC, 100 ng/ml), cocaine (benzoylecgonine, 20 ng/ml), opiates (morphine, 40 ng/ml) and PCP (10 ng/ml). Oratect1 III is a one-step device for qualitative detection of drugs in OF. Prior to carrying out the test, the cap is removed from the device. The collection pad is rubbed inside the mouth and then placed underneath the tongue to collect OF. Once a blue line indicator for the sufficient collection of OF appears, the collection pad is removed from the mouth and the device is recapped. The test is laid on a flat surface and results can be read within 5–30 min. Red lines indicate a negative result. The collection pad can be used for confirmation analysis in a laboratory. For this, the pad is detached from the collector and put into the buffer vial included in the kit. The device evaluated in this study incorporates six tests: amphetamine (25 ng/ml), methamphetamines (methamphetamine, 25 ng/ml) including MDMA (25 ng/ml), cannabis (D9-THC, 40 ng/ml), cocaine (cocaine, 40 ng/ml), opiates (morphine, 10 ng/ml) and benzodiazepines (5 ng/ml). The RapidSTAT1 consists of a collection device, with aroma field, a buffer solution and a test strip. The aroma field is used in order to increase salivation. OF is collected by rotary movements of the microfiber collector stick inside the cheeks and gums. The collector stick is then put into the buffer bottle and agitated before removal. Seven drops of the buffer fluid mixture are pipetted to each well of the test device. The lid is closed to the first position and left for 4 min. The test is started by pressing down the lid completely so that the buffer flows to the test strips. If all lines have formed the test is interpreted as negative. The results should be read within 8 min of the buffer flowing. The total time needed for testing is 7–12 min. The test results can be read either straight from the test device or by inserting the device into a mobile reader (Reader 600). The device evaluated in this study comprised six individual tests: amphetamine (25 ng/ml), methamphetamines (methamphetamine, 25 ng/ml) including MDMA (50 ng/ml), cannabis (D9-THC, 15 ng/ml), cocaine (benzoylecgonine, 12 ng/ml), opiates (morphine, 25 ng/ml) and benzodiazepines (oxazepam, 25 ng/ml). 2.3. Study populations The Belgian part of the study was carried out in centers for drug addiction and a small part during the epidemiological roadside survey of the DRUID project. Five devices were tested: OraLab16 (250 tests), Dra¨ger DrugTest1 5000 (139 tests), Cozart1 DDS (138 tests), RapidSTAT1 (138 tests) and OrAlertTM (125 tests). Altogether 767 tests were conducted successfully. The study protocol was approved by the ethics committee of Ghent University Hospital. An informed consent form was signed by all participants.
T. Blencowe et al. / Forensic Science International 208 (2011) 173–179 In Finland, the subjects were recruited from the DRUID roadside study, suspect DUI drivers apprehended in routine police work, patients at a rehabilitation clinic and the personnel of the Drug Analytics Unit. Altogether 221 subjects participated in the study of which 136 subjects were tested with the DrugWipe1 5+ and 132 with the RapidSTAT1. Ethical approval for the study was obtained from the coordinating ethical committee of Hospital District of Helsinki and Uusimaa and written consent was obtained from all of the participants. In the Netherlands, initially the subjects were selected at random during the DRUID roadside survey. Due to the low number of suspected DUI drivers after two years of roadside testing the OF screening tests were carried out in a coffeeshop, a location where the purchase and consumption of small quantities of cannabis was allowed. In the coffeeshop, smoking of cannabis was only allowed in a designated area and the testing with the devices was carried out in a separate room. Four devices were evaluated in the Netherlands: RapidSTAT1 (79 tests), Dra¨ger DrugTest1 5000 (84 tests), Oratect1 III (58 tests), and BIOSENS1 (118 tests; only 39 cannabis test results were taken into account due to a production error in the device during one testing session). Three of the devices were tested at the roadside and at the coffeeshop. The BIOSENS1 was only tested in the coffeeshop. Evaluation of the Oratect1 III at the roadside was aborted because the device frequently failed to collect OF in a sufficiently short time. For the Dra¨ger DrugTest1 5000, 64 tests were performed at the roadside and 20 at the coffeeshop. After the roadside testing of the RapidSTAT1 (35 tests) the collection method of the device was modified. For the testing at the coffeeshop the modified version of the device was used (44 tests). This latter version was the same as that tested in Belgium and Finland. Ethical approval was granted by the ethical committees associated with the three hospitals that were involved in the DRUID study in the Netherlands. However, written informed consent by the participants was not required. 2.4. Substances analyzed and cut-offs The substances analyzed within the DRUID project that are relevant to the onsite tests, and their respective cut-offs are listed in Table 1. Most substances were mandatorily analyzed by each country for the DRUID project, as denoted. Additional substances were analyzed according to the respective countries’ national DUI situation as specified in Table 1.
concentrations of analytes were adjusted for the volume of OF sample based on the weight of the device after collection. In the Netherlands OF was collected in polypropylene containers (Deltalab, S.L.U., Barcelona, Spain), which did not contain any buffer solution. After transportation to the laboratory, all OF samples were stored at 20 8C until analysis. In Belgium and the Netherlands, the results of the on-site tests were evaluated based on confirmation analysis of OF samples using ultra-performance liquid chromatography–mass spectrometry (UPLC–MS/MS) [24,25]. In Finland, the confirmation analysis was performed with either gas chromatography-electron impact ionization/mass spectrometry (GC-EI/MS) or gas chromatography-negative chemical ionization/mass spectrometry (GC-NICI/MS) [25]. Almost without exception the limit of quantitation (LOQ) for each analyte was equal to, or below, the DRUID cut offs in each of the analytical methods used in the separate countries. The only exception was benzoylecgonine in the method used in the Netherlands, however the limit of detection (0.1 ng/ml) was still lower than the DRUID cut off. External quality control was organized by regular participation in Round Robin tests organized by RTI International (NC, USA) for all the DRUID project toxicology laboratories. Based on the results, the performance of all the analytical methods was excellent [26]. 2.6. Interpretation of results The evaluation of the results is based on classification into the following categories: true positives (TP) = cases with a positive test result and a positive confirmation analysis result from the OF sample; false positives (FP) = cases with a positive test result and a negative confirmation analysis result; true negatives (TN) = cases with a negative test result and a negative confirmation analysis result; and false negatives (FN) = cases with a negative test result and a positive confirmation analysis result. Using these classifications, and based on the DRUID cut-offs, several parameters for the evaluation can be calculated, namely sensitivity, specificity and accuracy (Table 2). In order to obtain more reliable sensitivity and specificity calculations, at least six positive cases (for sensitivity) or six negative cases (for specificity) according to the confirmation analysis were required.
3. Results
2.5. Analytical methods For OF collection, the Belgian and Finnish studies used the StatSure SalivaSamplerTM (StatSure Diagnostic Systems, Inc., Framingham, MA, USA) device. This device, consisting of an OF collector and a transport tube, was chosen for sample collection based on a previous study [23]. The StatSure device has an indicator which turns blue when 1 ml of OF is collected. Due to variation, the
Table 1 List of relevant core substances and their DRUID cut-offs in oral fluid. Substance
OF cut-off (ng/ml) Country
6-Acetylmorphine Alprazolam Amphetamine Benzoylecgonine Clonazepam Cocaine Codeine Diazepam Flunitrazepam Lorazepam 3,4-Methylenedioxyamphetamine 3,4-Methylenedioxy-N-ethylamphetamine 3,4-Methylenedioxymethamphetamine Methamphetamine Morphine Nordiazepam Oxazepam D9-Tetrahydrocannabinol Aminoclonazepam Aminoflunitrazepam Bromazepam Chlordiazepoxide Clobazam Desalkylflurazepam Flurazepam Lormetazepam Midazolam Nitrazepam Temazepam Triazolam
5 1 25 10 1 10 20 5 1 1 25 25 25 25 20 1 5 1 1 1 5 10 5 2 1 1 2 2 10 1
BE, Belgium; FI, Finland; NL, The Netherlands.
175
Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory BE BE and NL FI, BE and NL FI and NL NL NL NL NL FI and NL FI and NL FI and NL NL
Results for sensitivity, specificity and accuracy using the DRUID cut-offs are presented according to the substance category of the individual tests. Despite the need to consider all the results together, sensitivity remains perhaps the primary parameter of interest. Sensitivity values, with 95% confidence intervals, for all the successfully evaluated tests are shown in Fig. 1. The confidence intervals were calculated with the modified Wald method [27]. For devices tested in two or three countries (i.e. RapidSTAT1 and DrugTest1 5000), the results from each country were combined to obtain a comprehensive evaluation. In addition, regarding those devices that were modified during the study, results for both versions of the device were combined. Average values for sensitivity, specificity and accuracy were calculated based on results from all the on-site tests performed which incorporated the relevant type of substance test. Where appropriate, results for modified devices are also presented. Instead of applying the strict criteria proposed in the ROSITA project [7] (sensitivity and specificity >90%; accuracy >95%), 80% was set as a desirable target value for sensitivity, specificity and accuracy. This is by no means an established target value, but may rather be considered as the minimum desirable for DUID testing, which is at the same time attainable; none of the devices evaluated in the Rosita-2 project met the former criteria for all individual tests [1]. Evaluations of methamphetamine, MDMA and PCP tests for the devices in which these were included were not made due to the lack of positive cases for these substances.
Table 2 Calculations used for the device performance evaluation. Parameter
Calculation
Sensitivity Specificity Accuracy
TP TPþFN TN TNþFP TPþTN TPþTNþFPþFN
TP, true positive; TN, true negative; FP, false positive; FN, false negative.
[()TD$FIG]
176
T. Blencowe et al. / Forensic Science International 208 (2011) 173–179
Fig. 1. Sensitivity plots for individual tests of each device.
Comprehensive screening result data (Tables 3-7) and concentration ranges of findings (Tables 8-15) for each test are presented as supplementary data. 3.1. Amphetamines tests For the amphetamines tests, the average sensitivity, specificity and accuracy was 60%, 97% and 93% respectively. DrugWipe1 5+ showed the best performance reaching over 80% for sensitivity (87%), specificity (95%) and accuracy (93%). DrugTest1 5000 and Cozart1 DDS were above the average device performance for specificity and accuracy (98–99% for both devices), but already had some problems with sensitivity (67% for both devices). OraLab16 and RapidSTAT1 were below average in sensitivity (58% and 54% respectively), but above average in specificity (100% and 97% respectively) and accuracy (94% and 92% respectively). For the evaluation of BIOSENS1, the sensitivity was 0%, but the specificity and accuracy were 84% and 92% respectively. Not enough amphetamine positive cases were gathered to evaluate sensitivity for the OrAlertTM and Oratect1 III devices. A separate evaluation of sensitivity of the earlier version of the RapidSTAT1 device was also not possible (specificity 94%, accuracy 86%). It is of concern that positive test results by the OrAlertTM and BIOSENS1 devices were frequently shown to be false positives in the confirmation analysis. The standard error bars for the amphetamines tests (Fig. 1) show a large variability which can be attributed to the small numbers of positive cases generally. The margin of error is relatively small for the DrugWipe1 5+, a device that was solely evaluated in Finland, a country with a relatively high prevalence of amphetamine use in DUID – a group from which a number of suspect cases were sampled. 3.2. Cannabis tests The performance of the BIOSENS1 Dynamic device was evaluated in two sessions because, initially, sensitivity of the device for cannabis turned out to be low. The manufacturer claimed this was caused by a production error. A second session was conducted after the error was rectified, and the evaluation of the BIOSENS1 cannabis test is based on the results of the latter ‘successful’ session. The manufacturer of the DrugWipe1 5+ launched a new version of the device with an enhanced test strip for cannabis during the testing period. In this study, 12 tests with the improved cannabis test strip were performed. All of the cases tested negative for cannabis, although one case contained 6.4 ng/ml of D9-THC. Results for the RapidSTAT1 from the coffeeshop (sensitivity 88%, specificity 50% and accuracy 84%), using the device with a
modified collection method differed from those at the roadside in the Netherlands (sensitivity 27%, specificity 100% and accuracy 77%), for which testing was carried out using the previous version. For the testing in Belgium and Finland the modified version of the device was used. The results of the latter two studies varied: sensitivity 32%, specificity 89% and accuracy 68% in the Belgian study (50 cannabis positive cases tested) compared to sensitivity 68%, specificity 89% and accuracy 86% in the Finnish study (19 cannabis positive cases tested). Still neither sensitivity result was close to that achieved in the coffeeshop. The version of the Dra¨ger DrugTest1 5000 tested in Belgium had a lower cut-off for D9-THC in OF (5 ng/ml) than the device tested in the Netherlands (25 ng/ml), which was an older version. Sensitivity results achieved in the separate Belgian and Dutch studies were similar when the evaluation at the coffeeshop was not included (53% and 56% respectively). Corresponding values for specificity and accuracy were 99% and 84%, respectively, for the Belgian study and 89% and 80% in the Dutch study. Sensitivity achieved at the coffeeshop in the Netherlands, with the older version was even higher (76%), accuracy was 80% and it was not possible to evaluate specificity due to the low number of cannabis negative cases [25]. Results were, nevertheless, combined since the tests were evaluated against the DRUID cut-off (1 ng/ml). For cannabis tests, the average sensitivity, specificity and accuracy was 38%, 95% and 73% respectively. None of the devices were near 80% sensitivity. The best sensitivity (59%) was observed for DrugTest1 5000. In addition to this, three devices, RapidSTAT1 (56%), BIOSENS1 (50% in the successful evaluation) and DrugWipe1 5+ (43%) had above average sensitivity. It is important to note that the DRUID cut-off for D9-THC (1 ng/ ml) is low, and indeed much lower than the manufacturers cut-offs (typically 25–50 ng/ml), which can at least partially explain the low sensitivity for all the devices. Nonetheless, for the study in the Netherlands at least part of the evaluation of devices (DrugTest1 5000, RapidSTAT1, BIOSENS1 and Oratect1 III), and in the case of BIOSENS1 all of it, was performed in the coffeeshop. In this setting very recent use of cannabis is to be expected, resulting in higher concentrations of D9-THC in OF which are easier to detect. For specificity, all devices had values between 90 and 100%. The problems in sensitivity are, to some extent, reflected in the accuracy results. DrugWipe1 5+ had the highest accuracy score (88%), and the second highest score was 82% for DrugTest1 5000. Cozart1 DDS (71%), OrAlertTM (78%) and RapidSTAT1 (78%) were all above average for accuracy. However, the high proportion of negative cases in the study populations for DrugWipe1 5+ (113 negative cases from 134 tests) and OrAlertTM (83 from 110) should be noted in relation to the high accuracy results of these devices.
T. Blencowe et al. / Forensic Science International 208 (2011) 173–179
The standard error bars for the cannabis tests (Fig. 1) generally show less variability than for the amphetamines, which reflects the higher prevalence of cannabis in the sampled populations. 3.3. Cocaine tests For cocaine tests, the average sensitivity, specificity and accuracy was 36%, 100% and 94% respectively. None of the devices were near 80% for sensitivity. The highest performance was achieved by the DrugTest1 5000 and OrAlertTM devices (sensitivity 50% and specificity 100% for both devices; accuracy 94–95%). For the Oratect1 III and BIOSENS1 devices not enough positive cases were found to evaluate sensitivity. The BIOSENS1 cocaine test was not performed for 14 cases due to removal of the cocaine testing component. No positive cases at all were found in the evaluation of the DrugWipe1 5+. For testing in the coffeeshop, the sensitivity of the RapidSTAT1 device with a modified collection method was 33% compared to 50% for the earlier version in the Dutch study, however in both assessments only six cocaine positive cases were screened. Specificity and accuracy were the same for both devices (100% and 91%). The Belgian study, in which there were 10 cocaine positive cases, achieved values of 30% sensitivity, 98% specificity and 92% accuracy. Sensitivity values for all the individual cocaine tests incorporated to the devices are shown in Fig. 1. Again the small number of positive cases in the study populations gives large variability for the standard error bars. In addition, the concentrations of cocaine (and benzoylecgonine) found in the positive cases were generally low which could, to some extent, account for the low sensitivity of the devices for cocaine. 3.4. Opiates tests Since not all opiates (e.g. pholcodine) were included in the confirmation analysis, some FP results could in fact be TPs. Concerning the opiates tests, three devices reached 80% or more in sensitivity, specificity and accuracy. These devices were RapidSTAT1 (sensitivity 90%, specificity 97% and accuracy 96%), DrugTest1 5000 (sensitivity 89%, specificity 94% and accuracy 92%) and Cozart1 DDS (sensitivity 83%, specificity 95% and accuracy 91%). For the OraLab16 and OrAlertTM devices, sensitivity was 69% and 73% respectively, but values for specificity (98% and 81% respectively) and accuracy (84% and 75% respectively) were higher. Attention should be paid to the fact that the devices evaluated in the Belgian study (DrugTest1 5000, RapidSTAT1, OraLab16, Cozart1 DDS and OrAlertTM) were all evaluated in treatment centers for drug addiction, with a high prevalence of opiates, often at high concentrations; three of these devices were those, previously mentioned, that achieved the best performance. For DrugWipe1 5+, Oratect1 III and BIOSENS1, there were not enough positive cases to make reliable sensitivity calculations, however the specificity and accuracy of each of these devices was in the range from 96% to 100%. In addition, a separate evaluation of the sensitivity of the earlier version of the RapidSTAT1 device was not possible (specificity and accuracy were both 100%). Sensitivity plots for all the individual opiates tests incorporated in the devices are shown in Fig. 1. Variability of the standard error bars for the devices, all of which were tested in Belgium, is small. 3.5. Benzodiazepines tests Since not all benzodiazepines were included in the confirmation analysis, some FPs could be TPs. None of the tests for benzodiazepines had over 80% values for all the calculated parameters. The average sensitivity was 62%, however, it is
177
notable that the average sensitivity was only calculated from results for three devices. There were neither any TP nor any FN cases for benzodiazepines in the evaluations of the BIOSENS1 and Oratect1 III, hence sensitivity calculations for these devices were not possible. RapidSTAT1 and DrugTest1 5000 were both above average sensitivity with 67% and 65%, respectively, while the sensitivity of Cozart1 DDS was 48%. Specificity was again above 94% for each of these devices whereas the accuracy of the RapidSTAT1 and DrugTest1 5000 devices was 90% and 92% respectively and that of the Cozart1 DDS lower at 77%. BIOSENS1 and Oratect1 III both had excellent scores for specificity and accuracy, probably at least partly due to the absence of any positive cases. The earlier version of the RapidSTAT1 gave a specificity of 100% and accuracy of 97%, not enough positive screening results were achieved to calculate sensitivity. Sensitivity for all the individual benzodiazepines tests incorporated in the devices are shown in Fig. 1. The variability of the error bars for the DrugTest1 5000 and Cozart1 DDS devices is somewhat larger than for the RapidSTAT1. This may be partly expected due to the fact that the latter device was evaluated also in Finland, a country which has a relatively high DUID prevalence of benzodiazepine use. 4. Discussion The results of the evaluation of each device need to be viewed in the context of the study population on which they were tested. For some of the devices, a full performance evaluation was not possible for all of the test strips on the panel due to low prevalence of the substance(s) in question. A reliable sensitivity result is easier to obtain if the study population has a high prevalence for the screened drugs. In addition sensitivity is probably enhanced to some extent if the concentrations of the drugs present in the samples from the study population are high because of recent consumption. Conversely, when interpreting specificity values it should be noted that testing a population with a lot of non-users of a specific substance category can be expected to result in a high proportion of true negative cases and specificity can be expected to be high. Such a population can also be expected to result in higher accuracy results in a similar manner. The accuracy of a device is the percentage of true negative and true positive results from all the test results. However, a high accuracy alone does not imply that the devices are necessarily good at correctly identifying positive cases. The respective sensitivity, specificity and accuracy of the devices therefore need to be considered together when evaluating performance, along with the study population used for the testing. Prevalence of drugs within the study population needs to be considered when assessing the evaluation results. The sensitivity, or proportion of positive cases that are correctly identified by the test, as well as the number of positive cases included in the study, have to be examined to allow a better understanding of how well the device performs. Similarly, the specificity, or proportion of negative cases that are correctly identified by the test and the number of negative cases included in the study are important. It is notable that in this study all of the specificity results for individual drug tests of the devices are in the range of 90–100%, except for the OrAlertTM opiates test (81%). The study populations in general contained a high number of negative cases; notable exceptions to this are populations with a high proportion of cannabis and opiates positive cases in the Netherlands and Belgium respectively. The sensitivity, specificity and accuracy targets used were set at the beginning of the study. In retrospect a sensitivity goal of 80% is reasonable, but the specificity goal of 80% is too low. Overall, when the evaluations carried out in each country were combined only six of the 36 specificity scores successfully evaluated were below 95%. In fact only the previously mentioned opiates test incorporated in
178
T. Blencowe et al. / Forensic Science International 208 (2011) 173–179
the OrAlertTM device would have failed to attain the 90% goal set in ROSITA. None of the devices reached the target value of 80% for sensitivity, specificity and accuracy for all the separate tests they comprised. Compared with the results obtained in the Rosita-2 project [1], there is some slight improvement for the amphetamine results. In the Rosita-2, sensitivity and specificity varied from 40% to 83% and from 80% to 100%, respectively. Corresponding results for sensitivity and specificity in this study were 54–87%, except for BIOSENS1 (0%), and 92–100%. For the opiate and benzodiazepine tests some minor development can also be seen. Opiate sensitivity was 51–100% in the Rosita-2 project and 69–90% in DRUID (average sensitivity 79%). For benzodiazepines, sensitivity was 33–60% in the Rosita-2 study whereas in this study values of 48–67% were observed. Specificity values for the opiate and benzodiazepine tests obtained in DRUID are somewhat higher than the values obtained in Rosita-2, but it should be remembered that for many of the devices tested in this study there were no positive cases for these substances. Unfortunately, no significant improvements can be seen for the cannabis and cocaine tests since the Rosita-2 project. On the contrary, the sensitivity values for cocaine did not reach the level achieved in the earlier study. It should, however, also be remembered that at least some of the devices tested in Rosita-2 are from different manufacturers than those tested in this study. Nonetheless, in general, devices can be expected to improve in performance as the state of the art develops. Comparison of the analytical cut-offs used in Rosita-2 and DRUID shows that they are generally about the same, or only slightly lower for the DRUID project. Thus comparison of the Rosita-2 results to the DRUID results gives a good insight to the direction of the development of on-site testing. The main differences observed in cut-offs are for the benzodiazepines (e.g. 1 ng/ml instead of 5 ng/ml for nordiazepam, lorazepam and clonazepam). In addition, some of the benzodiazepines analyzed in the Rosita-2 project were not included as compulsory substances in DRUID so that direct comparison is impossible. Nevertheless, better results against lower cut-off values would appear to indicate progress for these benzodiazepines tests. The evaluation of devices described here is useful for demonstrating some of the strengths and weaknesses of the assessed onsite OF screening tests and also provides evidence of some overall improvement in the field of OF screening, nevertheless there were numerous limitations to this study. Chief amongst these was the failure to successfully evaluate some the individual substance tests, in particular for methamphetamines, due to low numbers of positive cases, or even none at all. Furthermore, the lack of uniformity of study design for the separate devices, i.e. the individual study populations used, means that comparison of the evaluation results is not straightforward. For example, the testing of devices in the coffeeshop in the Netherlands, where cannabis use could be expected to be very recent, led to a relatively high prevalence of positive cases, often at higher concentrations than found in the other study populations. This also complicates comparison of the evaluations of the two versions of the RapidSTAT1 carried out in that country – since the older version (sensitivity = 27%) was only tested at the roadside it cannot be ascertained to what extent the higher sensitivity of the modified version (88%) was due to the improved collection method. Similarly, comparison of the Dra¨ger DrugTest1 5000 evaluations in the Netherlands and Belgium is affected by the different study populations sampled. Several device failures were noted in the study. The reasons for device failure varied. For example, the device may have been used incorrectly or only part of the integrated device was successful (e.g. appearance of a control line, indicating a successful screening, in only one of the test strips). Therefore, for at least some of the tested devices, only some of the individual drug tests failed. 15 OrAlertTM devices were observed to fail completely (12%). Failures of the
RapidSTAT1 devices were observed in 5–11 cases depending on the individual substance test. The BIOSENS1 cocaine test could not be performed for 14 cases for other reasons (i.e. removal of the cocaine testing component to allow testing for cannabis to continue). For all the other tests, there were two or fewer failures. In the Netherlands, the evaluation of Oratect1 III was stopped at the roadside because collection of OF with the device frequently took too long. However, all tests with Oratect1 III were successful in the coffeeshop. These data illustrate that the failure rate was markedly decreased compared to the Rosita-2 study. It is worth noting that when a substance category, e.g. benzodiazepines, includes a large number of compounds, the on-site tests do not necessarily cover all the compounds that should be screened for. It is also possible that the tests might have cross-reactivity for some substances even though the manufacturers do not report this. This can occur simply because the manufacturer has not tested the device for all the compounds within a category, for example, benzodiazepines (e.g. bromazepam), or due to unexpected cross-reactivity (e.g. venlafaxine for PCP tests [28]). During the study performance of the on-site tests for screening and sampling for the confirmation analysis took place almost simultaneously. In actual police enforcement practice, however, sampling for confirmation purposes, whether it be OF or whole blood, may be seriously delayed. For substances with a short halflife in particular, e.g. cocaine, this may result in an increased number of false positive screening results. 5. Conclusions Taken as a whole, the results of the study indicate that the opiates tests appear to perform relatively well when the DRUID cut-offs are used as a target level. For amphetamines and benzodiazepines, the tests did not perform very well when considering sensitivity, although there was one promising test for amphetamine. In particular, it is evident that the cannabis and cocaine tests of the devices still lack sensitivity, although further testing of the cocaine tests is desirable due to the low prevalence of cocaine in each of the study populations and low concentrations encountered in this study. Somewhat disturbingly, none of the devices in the study performed at above 80% for sensitivity, specificity and accuracy for all of the separate tests that they comprised. The suitability of the device for the intended national DUID population, i.e. the expected type and prevalence of drugs, should be considered. Since on-site tests are relatively expensive the suitability and performance of all the individual substance tests incorporated in the device are important. Positive and negative predictive values, which can be calculated with drug prevalence specific to the population for which the screening is intended, should also be considered when assessing the suitability of a particular on-site device for use. Disclaimer This article has been produced under the project Driving Under Influence of Drugs, Alcohol and Medicines (DRUID) financed by the European Community within the framework of the EU 6th Framework Program. This report reflects only the authors’ view. The European community is not liable for any use that may be made of the information contained therein. Acknowledgements The authors would like to thank all the police officers, laboratory staff and students involved in the collection and
T. Blencowe et al. / Forensic Science International 208 (2011) 173–179
analysis of samples and data from each of the countries involved. The collaboration of the Medical Social Centres in Belgium, the rehabilitation centre in Finland and the coffeeshop in the Netherlands is gratefully acknowledged. Thanks must also go to all members of the respective national DRUID teams, past and present who have contributed to this study. The participation of Inger Marie Bernhoft, of the Department of Transport, Technical University of Denmark (DTU), as task leader for the DRUID project is also acknowledged. Finally we much appreciate the assistance of all the device manufacturers, in particular, for supplying the onsite testing devices and training free of charge. Co-funding for the Belgian part of the study was obtained from the National Lottery of Belgium.
Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.forsciint.2010.11.026. References [1] A.G. Verstraete, E. Raes, Rosita-2 project. Final report. Academia Press, Ghent, 2006. [2] O.H. Drummer, The Forensic Pharmacology of Drugs of Abuse, Arnold, London, 2001. [3] C. Kuijten, Evaluation of oral fluid screening devices by TISPOL to Harmonise European police Requirements (ESTHER), DRUID Report, 2009. [4] O.H. Drummer, D. Gerostamoulos, M. Chu, P. Swann, M. Boorman, I. Cairns, Drugs in oral fluid in randomly selected drivers, Forensic Sci. Int. 170 (2–3) (2007) 105– 110. [5] O. Drummer, Summary from the experience in Victoria, AUS, in: T. Blencowe, P. Lillsunde (Eds.), Protocol of ‘‘Workshop on drug driving detection by means of oral fluid screening’’. DRUID report, 2009, pp. 66–68. [6] A.G. Verstraete, Oral fluid testing for driving under the influence of drugs: history, recent progress and remaining challenges, Forensic Sci. Int. 150 (June (2–3)) (2005) 143–150. [7] A.G. Verstraete (Ed.), ROSITA, Roadside Testing Assessment Final Report, Academia Press, Ghent, 2001. [8] P. Kintz, V. Cirimele, B. Ludes, Codeine testing in sweat and saliva with the Drugwipe, Int. J. Legal Med. 111 (February (2)) (1998) 82–84. [9] N. Samyn, C. van Haeren, On-site testing of saliva and sweat with Drugwipe and determination of concentrations of drugs of abuse in saliva, plasma and urine of suspected users, Int. J. Legal Med. 113 (April (3)) (2000) 150–154. [10] M. Gro¨nholm, P. Lillsunde, A comparison between on-site immunoassay drugtesting devices and laboratory results, Forensic Sci. Int. 121 (1–2) (2001) 37–46.
179
[11] S. Pichini, M. Navarro, M. Farre, J. Ortun˜o, P.N. Rose, R. Pacifici, P. Zuccaro, J. Segura, R. De la Torre, On-site testing of 3,4-methylenedioxymethamphetamine (ecstasy) in saliva with drugwipe and drugread: a controlled study in recreational users, Clin. Chem. 48 (January (1)) (2002) 174–176. [12] J.M. Walsh, R. Flegel, D.J. Crouch, L. Cangianelli, J. Baudys, An evaluation of rapid point-of-collection oral fluid drug-testing devices, J. Anal. Toxicol. 27 (October (7)) (2003) 429–439. [13] T. Biermann, B. Schwarze, B. Zedler, R. Betz, On-site testing of illicit drugs: the use of the drug-testing device ‘‘Toxiquick (R)’’, Forensic Sci. Int. 143 (June (1)) (2004) 21–25. [14] S.L. Kacinko, A.J. Barnes, I. Kim, E.T. Moolchan, L. Wilson, G.A. Cooper, C. Reid, D. Baldwin, C.W. Hand, M.A. Huestis, Performance characteristics of the Cozart (R) RapiScan oral fluid drug testing system for opiates in comparison to ELISA and GC/ MS following controlled codeine administration, Forensic Sci. Int. 141 (April (1)) (2004) 41–48. [15] D.J. Crouch, J.M. Walsh, R. Flegel, L. Cangianelli, J. Baudys, R. Atkins, An evaluation of selected oral fluid point-of-collection drug-testing devices, J. Anal. Toxicol. 29 (May–June (4)) (2005) 244–248. [16] P. Kintz, W. Bernhard, M. Villain, M. Gasser, B. Aebi, V. Cirimele, Detection of cannabis use in drivers with the Drugwipe device and by GC–MS after Intercept (R) device collection, J. Anal. Toxicol. 29 (October (7)) (2005) 724–727. [17] J.M. Walsh, D.J. Crouch, J.P. Danaceau, L. Cangianelli, L. Liddicoat, R. Adkins, Evaluation of ten oral fluid point-of-collection drug-testing devices, J. Anal. Toxicol. 31 (January–February (1)) (2007) 44–54. [18] L. Wilson, J. Ahmed, C. Hand, G. Cooper, R. Smith, Evaluation of a rapid oral fluid point-of-care test for MDMA, J. Anal. Toxicol. 31 (2) (2007). [19] D.J. Crouch, J.M. Walsh, L. Cangianelli, O. Quintela, Laboratory evaluation and field application of roadside oral fluid collectors and drug testing devices, Ther. Drug Monit. 30 (2) (2008) 188–195. [20] A. Pehrsson, T. Gunnar, C. Engblom, H. Seppa¨, A. Jama, P. Lillsunde, Roadside oral fluid testing: comparison of the results of Drugwipe 5 and Drugwipe benzodiazepines on-site tests with laboratory confirmation results of oral fluid and whole blood, Forensic Sci. Int. 175 (March (2–3)) (2008) 140–148. [21] P. Kintz, B. Brunet, D.F. Muller, W. Serra, M. Villain, V. Cirimele, P. Mura, Evaluation of the Cozart DDSV test for cannabis in oral fluid, Ther. Drug Monit. 31 (February (1)) (2009) 131–134. [22] S.M.R. Wille, N. Samyn, M. del Mar Ramı´rez-Ferna´ndez, G. De Boeck, Evaluation of on-site oral fluid screening using Drugwipe-5+1, RapidSTAT1 and Drug Test 50001 for the detection of drugs of abuse in drivers, Forensic Sci. Int. 198 (May (1)) (2010) 2–6. [23] K. Langel, C. Engblom, A. Pehrsson, T. Gunnar, K. Ariniemi, P. Lillsunde, Drug testing in oral fluid—evaluation of sample collection devices, J. Anal. Toxicol. 32 (July–August (6)) (2008) 393–401. [24] A.S. Goessaert, K. Pil, J. Veramme, A. Verstraete, Analytical evaluation of a rapid on-site oral fluid drug test, Anal. Bioanal. Chem. 396 (April (7)) (2010) 2461–2468. [25] T. Blencowe, A. Pehrsson, P. Lillsunde (Eds.), Analytical evaluation of oral fluid screening devices and preceding selection procedures. DRUID report, 2010. [26] K. Pil, F.M. Esposito, A.G. Verstraete, External quality assessment of multi-analyte chromatographic methods in oral fluid, Clin. Chim. Acta 411 (15–16) (2010) 1041–1045. [27] A. Agresti, B.A. Coull, Approximate is better than ‘‘Exact’’ for interval estimation of binomial proportions, Am. Stat. 52 (1998) 119–126. [28] K. Pil, S. Van Heule, A.S. Goessaert, J. Veramme, A. Verstraete, False positive results for phencyclidine on on-site oral fluid test Varian Oralab 6 caused by venlafaxine and o-desmethylvenlafaxine, Ann. Toxicol. Anal. 21 (S1) (2009) 87.