Accident Analysis and Prevention 37 (2005) 852–861
Modeling drug detection and diagnosis with the ‘drug evaluation and classification program’ Edna Schechtman, David Shinar ∗ Industrial Engineering and Management, Ben Gurion University of the Negev, Ben Gurion Blvd, Beer Sheva 84105, Israel Received 16 August 2004; received in revised form 4 April 2005; accepted 4 April 2005
Abstract In this study, we propose formal models and algorithms to detect drug impairment and identify the impairing drug type, on the basis of data obtained by a Drug Evaluation and Classification (DEC) investigation. The DEC program relies on measurements of vital signs and observable signs and symptoms. A formal model, based on data collected by police officers trained to detect and identify drug impairments, yielded sensitivity levels greater than 60% and specificity levels greater than 90% for impairments caused by cannabis, alprazolam, and amphetamine. For codeine, with a specificity of nearly 90% the sensitivity was only 20%. Using logistic regression, the formal model was much more accurate than the trained officers in identifying impairments from cannabis, alprazolam, and amphetamine. Both the formal model and the officers were quite poor in identifying codeine impairment. In conclusion, the joint application of the DECP procedures with the formal model is useful for drug detection and identification. © 2005 Elsevier Ltd. All rights reserved. Keywords: Drug evaluation and classification program; Drugs and driving; Drug impairment
1. Introduction The use of drugs is a significant traffic safety issue since multiple epidemiological surveys indicate that some impaired drivers have drugs in their blood. To combat that the Los Angeles Police Department, the International Association of the Chiefs of Police, and the National Highway Traffic Safety Administration developed in the early 1980s a Drug Evaluation and Classification Program (DECP). The program trains police officers (certified as Drug Recognition Experts, DREs) to detect and identify drugs used by suspects by interviewing them, giving them a series of behavioral assessments, and evaluating the observed signs and symptoms. With the above skills, they are presumed to be able to (a) detect impairment from drugs, and (b) classify the type of drug as belonging to one of seven categories. The categories are: CNS stimulants, CNS depressants, narcotic/analgesics, phencyclidine (PCP), ∗
Corresponding author. Tel.: +972 8 647 2215; fax: +972 8 647 2958. E-mail address:
[email protected] (D. Shinar).
0001-4575/$ – see front matter © 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.aap.2005.04.003
cannabis, hallucinogens, and inhalants. In a separate paper (Shinar and Schechtman, 2005), we analyzed the accuracy of the DRE assessments when the only cues they can rely on are the observable signs and symptoms, and reviewed some of the relevant literature. Over the years the DECP has become better known and has often been cited as an exemplary program that enables the identification of drug impairment on the basis of a combination of an interview and observable signs and symptoms. The DECP has been officially sanctioned in several countries, such as parts of Canada, Australia, Norway and Sweden (Senate of Canada, 2002), and as of early 2005, in 37 states in the US (Governors Highway Safety and Association, 2005). Despite its widening use, to the best of our knowledge, the DECP has never been subjected to a serious effort to find an optimal way of combining the information obtained from the multiple signs and symptoms observed as part of the DRE’s information gathering process, in order to diagnose drug impairment based on the signs and symptoms.
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861
The DECP does provide a matrix of signs and symptoms that may be associated with different drug impairments, but the relative weights (and even the absolute importance) that should be assigned to these signs has not been determined to date. In this study, we propose formal models and algorithms to detect drug impairment and identify the impairing drug type, on the basis of data obtained by a Drug Evaluation and Classification (DEC) investigation. The DEC program (DECP) includes a procedure to detect and diagnose major drug categories based on observable signs and symptoms and interviews with the impaired persons. This study used data limited to observed signs and symptoms and suggested models for identifying drugs belonging to one of the following four categories: (1) cannabis—represented by marijuana cigarettes, (2) depressant—represented by alprazolam, (3) narcotic analgesics—represented by codeine, and (4) stimulant—represented by amphetamine. The prediction power of the model was compared to detection performance and drug diagnoses made by experienced drug recognition experts (DREs), using stepwise logistic regression models that related drug impairment to the signs and symptoms noted on the DECP forms. In two respects, this study was conservative and partial. First, the study design is particularly severe in its criteria relative to the naturalistic setting in which DREs typically detect and identify drug impairment. Typically, the DRE has access to the arresting officer’s report and therefore knows the conditions under which the suspect was arrested, as well as the drugs that are most prevalent in that environment, and therefore the prior probabilities of the different drugs are relatively well known. Second, in the complete DECP procedure, in addition to the observable signs and symptoms, the DRE also interviews the subject. In the course of this interview the DRE may obtain additional significant information as well as an actual admission of drug use. Despite these differences, the ‘limited’ scope studied here is important because the scientific validity of the DECP relies primarily on the value of the physical evidence. Therefore, it was important to see to what extent the physical signs and symptoms by themselves can account for specific drug impairments. One recent study that attempted to evaluate the utility of the signs and symptoms as interpreted in the coding procedures of the DECP, and the drug assessment, did indicate some relationships between the two. Unfortunately, the data set were biased (involving only cases with confirmed drug presence, as sampled from impaired drivers, thus tending to reflect whatever biases the officers might already have), and the statistics performed on the data were rudimentary and did not include any statistical tests (Smith et al., 2002). The goals of this evaluation were limited to (1) assessing the net value of the observable signs and symptoms for identifying drug impairment, and (2) providing DREs with a formal decision support tool when they make their assessment.
853
2. Method 2.1. Subjects, procedures, and study design The analysis conducted in this report was based on data collected as part of a larger effort, and the detailed description of the data collection method is described in other papers (see Heishman et al., 1996; Shinar et al., 2005). In brief, the subjects were 48 self-admitted regular users of the study drugs (codeine for users of heroin, morphine, prescription opiate; marijuana for users of hashish; alprazolam for users of any benzodiazepine, barbiturates, or alcohol; d-amphetamine for users of cocaine) in good health. In a double-blind design, each subject was assigned to one of the four drugs. In the course of six sessions, each subject participated in two sessions with a high dose, two sessions with a low dose, and two sessions with a placebo only. Although in reality at any session the subject was under the influence of one drug only or placebo, the DREs were told that the subjects might be under the influence of none, one, or a combination of two or more drugs. They were further told that the drugs could belong to any of the categories they were trained to identify except hallucinogens and inhalants. Consequently, they could have expected impairments from ethanol, CNS depressants, CNS stimulants, PCP, narcotic analgesics, cannabis, and any combinations of these drugs. The DECP guidelines allow the DREs to note up to two drug categories, and if one of them is correct the DRE’s conclusion is considered as correct. The evaluation included examination of vital signs, psychophysical tests for motor control, nystagmus, and pupil response to light. A detailed description of the DECP testing procedure is provided elsewhere (Kosnoski et al., 1998; NHTSA, 1991; Smith et al., 2002; Shinar et al., 2005). The DRE’s observations are all recorded on a structured form called the Drug Influence Evaluation Form. For the purpose of this study, the DREs were not allowed to interview the subjects and their conclusions had to rely strictly on the signs and symptoms they observed in the course of the physical exam. To assist the DREs in their interpretation of signs and symptoms, the DECP provides them with a ‘matrix’ that specifies the patterns of signs and symptoms (vital signs and pupil response to light) associated with impairments from each of the seven drug categories. The seven signs and symptoms included in the DRE matrix relate to nystagmus, convergence, pupil size, pupil reaction to light, pulse rate, blood pressure, and body temperature. Although the complete set of 6 evaluations was completed only for the amphetamine-dosed group, the total number of sessions with each drug and each level of each drug was approximately the same, varying from a low of 23 sessions to a high of 28 sessions. A total of 39 DREs participated in the study. The rotation of the DREs did not match the schedule of the subjects and consequently each subject was typically tested by more than one DRE and each DRE evaluated more than one subject.
854
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861
2.2. Analytical methods The primary purpose of the analyses reported below was to derive models that would predict specific drug impairments, based on specific signs and symptoms. Since the goal of this evaluation was to provide the best prediction possible from the recorded signs and symptoms (rather than to describe the relationships among all measures), stepwise logistic regressions (rather than multiple logistic regressions) were run on each of the four drugs. The dependent variable was binary: 1, if a person is drug impaired, and 0, otherwise. The predictor variables used in the regression analyses were the signs and symptoms listed in the DRE ‘matrix’ plus other tests that are expected to reflect drug impairment according to the DECP. These include effects of selected drug categories on muscle tone (possibly flaccid for narcotic analgesics), speech (slowed and raspy for narcotic analgesics), and reddening of the eye sclera (for cannabis). Consequently, the analysis below tested the relationship between the signs observed and recorded by the DREs and the actual drug administered. Finally, since the DECP included four psychophysical tests, these were also included in the regression models (SAS, 1995). The outcome for each regression was the set of estimated ˆ that the subjects were not drug impaired probabilities, Ps (i.e., placebo-dosed), based on signs and symptoms chosen separately for each drug. Thus, each subject had four ˆ P-values, one for each drug. When applied to individual observations, the logistic regressions derived from the discrimination between the placebo and dosed conditions yielded a distribution of Pˆ (the estimated probability that a person is not drugged). These estimated probabilities were then used to calculate the concordance and discordance percentages, by making all possible pairwise comparisons between all the drug-dosed subjects and all the placebodosed subjects. A concordance for each comparison was ˆ noted whenever the regression score (in terms of P—the probability of not being impaired) of the drug-dosed subject was lower than that of the placebo-dosed subject. Another way of evaluating the strength of the regression model is to determine the sensitivity and specificity that it can provide. This is done by comparing the actual dosing to the model-based classification. The usual default in assignment of a subject to one of the two categories in a logistic regression is to select a cut-off probability of Pˆ = 0.5. Then, whenever Pˆ > 0.5, the subject is classified as not drugged. ˆ In the present analysis, the P-values were distributed over ˆ a large range and some subjects obtained P-values slightly above or below the decision cut-off level of Pˆ = 0.5, making the allocation in some cases questionable. Given the overall levels of sensitivity and specificity obtained from these data, it was decided to adopt a more conservative rule. According ˆ to this rule, subjects with P-values close to 0.5 (0.3 < Pˆ < 0.7) are considered ‘questionable’, and become candidates for a retest. While this approach may reduce the sensitivity of the test (by letting some subjects pass as questionable), it does
reduce the number of false alarms. This is particularly important for a drug-screening device where the need to reduce false alarms is more important than the need to increase correct detections. Furthermore, with the present strategy, a person obtaining an estimated probability around the Pˆ = 0.5 level, would be re-tested rather than be classified erroneously. The DECP – and its associated matrix – are not always specific as to how performance on the different tests is to be scored. Instead, they usually indicate that performance may be above or below normal, slow or fast, etc. Unfortunately, the ‘normal’ range provided for most measures is so liberal that it is difficult to apply absolute categories of fast, slow, etc. Instead, the validation criterion here was based on whether or not performance in the drug-impaired condition was significantly above or significantly below that of the performance in the placebo condition. The proposed procedure does not provide an immediate identification of the impairing drug-category. Instead, once the DRE feeds the signs and symptoms data into a computer, the recommended procedure is to apply the data to each of the four category-specific models. The models that yield ˆ greater than probabilities of being drug-impaired (1 − P) the predetermined threshold (0.7 in the current analysis), indicate the best candidates for the impairing drugs. This whole procedure can be easily automated as a decision support system, so that once the data has been entered, the DRE ˆ is provided with the P-values of each of the drug categories evaluated, as is often done in computer assisted medical diagnosis. Separate analyses of the placebo versus high-dose only and the placebo versus low dose and high dose together yielded similar results. Therefore, to increase the power of the statistical tests we only report the results obtained from combined dose levels. Statistical tests (ANOVAs for continuous measures, and Chi squares for categorical measures) showed that under the placebo condition there were no significant differences among the groups. We, therefore, decided that for all tests of significance for a drug effect, the control condition would be the placebo condition for the subjects from all four groups; i.e., regardless of the drug category to which the subject was assigned.
3. Results The following analyses compare the effects of the four different drugs – relative to the placebo condition – on different signs and symptoms recorded by the DRE on the Drug Influence Evaluation Form. First, the power of specific signs and symptoms recorded by the DRE to predict drug impairment is evaluated. Then, the DECP guidelines for drug impairment are validated, and finally, models that would predict specific drug impairments are derived. The relationship between impairment on each of the drugs and each of the psychophysical tests, vital signs, and ocular measures was assessed by conducting Chi square analyses
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861
855
on the categorical and ordinal measures, and t-tests on the interval measures.
When we classified all subjects with 0.3 < Pˆ < 0.7 as ‘undecided’, this regression model yielded the frequencies listed in Table 1.
3.1. Identification of cannabis impairment
Table 1 Allocation of subjects to cannabis-impaired vs. unimpaired, using a logistic regression model based on cannabis-dosed and placebo-dosed subjects
Cannabis dosing was significantly associated with bloodshot eyes, slowed reaction to light, larger pupil size both in darkness and under direct light, and an elevated pulse rate that dropped in the course of the session. Bloodshot eyes were much more common in the cannabisdosed condition than in the placebo-dosed condition (Chi square = 10.4, p < 0.001). Under placebo 47.5% of the subjects had normal eyes, whereas under cannabis only 22.2% had normal eyes. In contrast, bloodshot eyes were noted for 73.3% of the cannabis cases versus only 44.4% of the placebo cases. The reaction to light was more likely to be impaired among the cannabis-dosed subjects than among the placebo-dosed subjects. Seventy nine percent of the placebo-dosed subjects had a ‘normal’ reaction to light, while only 55% of the dosed subjects had a normal reaction (Chi square = 9.38, p = 0.009). A significant effect was also obtained when the reaction to light was coded as an interval variable (1.23 for placebo versus 1.49 for cannabis-dosed; t = −2.97, d.f. = 148, p = 0.004). Pupil sizes in the dark and in direct light were larger in the cannabis condition than in the placebo condition. In the dark, the diameters were 7.38 mm versus 6.88 mm (t = −3.13, d.f. = 118, p = 0.002), and under direct light, they were 4.13 mm versus 3.57 mm (t = −3.39, d.f. = 148, p < 0.001), respectively. The pulse rate at the start of the session was significantly lower in the placebo condition than in the cannabis-dosed condition: 67.9 beats/min versus 84.8 beats/min (t = −7.66, d.f. = 63, p < 0.001). Over the course of the session, there was a significant drop in pulse rate in the cannabis condition (−5.7 beats/min), but none in the placebo (−0.3 beats/min; t = 3.70, d.f. = 55, p < 0.001). Stepwise logistic regression analyses (without the psychophysical tests that did not add any explanatory power) yielded three significant variables—average pulse rate, pupil reaction to light, and pupil size under direct light. The following is the regression model, based on the three variables and its associated levels of prediction: ln
Pˆ = 14.186 − 0.532 ×(pupil diameter in direct light) 1 − Pˆ −1.056 × (reaction to light) −0.135 × (average pulse rate),
where Pˆ is the estimated probability of being in the placebo condition. With a total of 4900 pairs of comparisons, the concordance level was 86.7% and the discordance level was 12.9%, with 0.4% ties.
Cannabis
Predicted condition
Total
True condition
Unimpaired
Undecided
Cannabisimpaired
Placebo Cannabis
77 12
21 13
2 24
100 49
Total
89
34
26
149
This function provides: Sensitivity Specificity False alarms Misses
24/49 77/100 2/100 12/49
0.49 0.77 0.02 0.24
In summary, the three variables – pupil diameter in direct light, reaction to light, and average pulse rate – provide a relatively good prediction of cannabis dosing, with a very low rate of false alarms, and approximately 50% rate of correct identifications of drug impairments. Furthermore, this good performance came with an acceptable rate (23%) of undecided cases. 3.2. Identification of alprazolam (depressant) impairment Alprazolam dosing was significantly associated with poorer performance on all four tests of psychophysical performance, slowed reaction to light, nystagmus, smaller pupil size in darkness and under direct light, raised pulse rate (which increased over the session), and lower temperature. As would be expected for a depressant (and consistent with the typical results obtained from alcohol impairment), alprazolam-dosed subjects performed significantly worse on the one-leg-stand (total score of 1.65 for placebo versus 4.61 for alprazolam; t = −6.62, d.f. = 73.3, p < 0.001); on the Romberg balance test their average sway was twice as large as in the placebo-dosed condition (1.3 degrees for placebo versus 2.8 degrees for alprazolam; t = −3.33, d.f. = 56.8, p = 0.002); in the walk-and-turn test they had more points deducted (1.71 versus 4.04, t = −6.55, d.f. = 69.3, p < 0.001); and they lost more points in the finger-to-nose test (1.23 versus 1.90, t = −2.43, d.f. = 72.9, p = 0.017). Reaction to light was significantly associated with alprazolam dosing. In the alprazolam-dosing condition, slowed or no pupil reaction to light was observed in 45 % of the cases, whereas in the placebo condition it was observed in only 21 % of the cases (Chi square = 10.14, p = 0.006). A significant effect was also obtained when the reaction to light was coded as an interval variable (1.23 for placebo versus 1.53 for alprazolam-dosed; t = −2.92, d.f. = 72.9, p = 0.005).
856
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861
The likelihood of observing nystagmus in the alprazolamdosed condition was twice as high as in the placebo condition (53.1% versus 25.7%, Chi square = 10.87, p = 0.001). A more detailed analysis with another measure of nystagmus – based on the total number of indicators out of a maximum of six – also showed a highly significant difference (Chi square = 22.01, p = 0.001). Pupil sizes, both in the dark and under direct light, were slightly but significantly smaller in the alprazolam condition than in the placebo condition. In the dark, the diameters were 6.30 mm versus 6.88 mm (t = 2.98, d.f. = 147, p = 0.003), and under direct light the diameters were 3.17 mm versus 3.57 mm (t = 2.73, d.f. = 147, p = 0.007), respectively. Pulse rate was higher under alprazolam dosing, being 75.5 beats/min versus 67.9 beats/min in the placebo condition (t = −5.24, d.f. = 149, p < 0.001). Furthermore, over the course of the session the pulse rate of alprazolam-dosed subjects continued to drop (−2.37 beats/min), while the pulse rate of the placebo-dosed subjects remained essentially unchanged (−0.3 beats/min; t = 2.01, d.f. = 64.5, p = 0.049). Temperature in the placebo condition was significantly higher than in the alprazolam-dosed condition (97.6 versus 96.9; t = 5.54, d.f. = 148, p < 0.001). Stepwise logistic regression analysis yielded five significant measures: convergence, average pulse rate, temperature, horizontal gaze nystagmus (based on the sum test) and the walk-and-turn test. The following is the regression model based on these five significant variables, and its associated levels of prediction: ln
Pˆ = −181.4 + 0.60 × (convergence) 1 − Pˆ −0.16 × (average pulse rate) +1.99 × (temperature) −0.25 × (HGN − sum) − 0.54 × (walk-and-turn),
where Pˆ is the estimated probability of being in the placebo condition. With a total of 4462 pairs of comparisons, the concordance level was 92.1% and the discordance level was 7.8%, with 0.1% ties. When we classified all cases with 0.3 < Pˆ < 0.7 as ‘undecided’, this regression model yielded the frequencies listed in Table 2. Table 2 Allocation of subjects to alprazolam-impaired vs. not impaired, using a logistic regression model based on alprazolam-dosed and placebo-dosed subjects Alprazolam
Predicted condition
Total
True condition
Unimpaired
Undecided
Alprazolamimpaired
Placebo Alprazolam
82 10
10 9
5 27
97 46
Total
92
19
32
143
This function provides: Sensitivity Specificity False alarms Misses
27/46 82/97 5/97 10/46
0.59 0.85 0.05 0.22
In summary, the five variables – convergence, average pulse rate, temperature, HGN-sum, and walk-and-turn – provide quite a good prediction of alprazolam dosing, with a very low rate of false alarms (5%), and a good rate of correct identifications of impairment (59%). Furthermore, this good performance came with a fairly low rate (13%) of undecided cases. 3.3. Identification of codeine (narcotic/analgesic) impairment Codeine dosing was significantly associated only with a slightly smaller pupil size (both in the dark and in direct illumination) and a very slightly lower temperature than under placebo dosing. Pupil sizes in the dark and under direct light were slightly but significantly smaller in the codeine condition than in the placebo condition: in the dark the diameters were 6.28 mm versus 6.88 mm for codeine dosed and placebo-dosed subjects, respectively (t = 2.78, d.f. = 74.5, p = 0.007). Under direct light, the pupils were expectedly smaller, but the difference between the two conditions remained with 3.08 mm under codeine dosing versus 3.57 mm under placebo dosing (t = 3.35, d.f. = 148, p = 0.001). Temperature in the placebo condition was very slightly but significantly higher than in the codeinedosed condition (97.6 versus 97.4; t = 2.40, d.f. = 126.6, p = 0.018). Stepwise logistic regression analyses (without the psychophysical tests that did not add any explanatory power) yielded only two predictive measures: pupil diameter in direct light, and temperature. The following is the regression model based on these two significant variables and its associated levels of prediction: ln
Pˆ = −61.53 + 0.74× (pupil diameter in direct light) 1 − Pˆ +0.61 × (temperature),
where Pˆ is the estimated probability of being in the placebo condition. With a total of 4559 pairs of comparisons, the concordance level was 67.1% and the discordance level was 30.8%, with 2.1% ties. When we classified all cases with 0.3 < Pˆ < 0.7 as ‘undecided’, this regression model yielded the frequencies listed in Table 3.
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861 Table 3 Allocation of subjects to codeine dosed vs. not dosed, using a logistic regression model based on codeine-dosed and placebo-dosed subjects Codeine
Predicted condition
Total
True condition
Unimpaired
Undecided
Narcotic/anal. impaired
Placebo Codeine
53 1
43 32
1 14
97 47
Total
54
75
15
144
blood pressure (sum of the two; 209 versus 194, t = −4.45, d.f. = 79.4, p < 0.001). Stepwise logistic regression analyses (with or without the inclusion of the data from the four psychophysical tests) yielded only two significant predictors: average pulse rate and the total blood pressure. The following is the regression model based on the two variables and its associated levels of prediction: Pˆ ln = 24.92 − 0.18 × (average pulse rate) 1 − Pˆ
This function provides: Sensitivity Specificity False alarms Misses
857
−0.05 × (total blood pressure), 14/47 53/97 1/97 1/47
0.30 0.53 0.01 0.02
In summary, the two variables – pupil diameter in direct light and temperature – are not very sensitive to codeine dosing. However, using the 0.3 and 0.7 cut-off levels, we still obtained a very low rate of false alarms (2%) and a moderate rate (30%) of correct identifications of codeine impairments. These acceptable rates of performance came at a price of approximately 50% rate of ‘undecided’ cases (i.e., in need of a retest).
where Pˆ is the estimated probability of being in the placebo condition. With a total of 4656 pairs of comparisons, the concordance level was 90.7% and the discordance level was 9.1%, with 0.2% ties. When we classified all cases with 0.3 < Pˆ < 0.7 as ‘undecided’, this regression model yielded the frequencies listed in Table 4. Table 4 Allocation of subjects to stimulant-impaired vs. not impaired, using a logistic regression model based on amphetamine-dosed and placebo-dosed subjects Amphetamine
Predicted condition
Total
3.4. Identification of amphetamine impairment
True condition
Unimpaired
Undecided
Stimulantimpaired
Amphetamine dosing was significantly associated with slightly poorer convergence, but with a marked increase in pulse rate (that remained elevated throughout the session), and a marked increase in systolic and diastolic blood pressures. Convergence was categorized as normal for 44.4% of the cases, partial failure for 35.3%, total failure of one eye for 10.5%, and total failure of both eyes for 9.8%. There was a marginal difference between the two conditions in convergence when analyzed as a categorical variable, but it was in the opposite-than-expected direction with 40.2% of the subjects in the placebo condition having normal convergence and 52.9% of the amphetamine-dosed subjects having normal convergence (Chi square = 7.40, p = 0.06). Only when analyzed as an interval variable, was convergence significantly poorer in the amphetamine-dosed condition than in the placebodosed condition (3.4 versus 3.0; t = −2.74, d.f. = 130.6, p = 0.007). Pulse rate under amphetamine dosing remained high throughout the evaluation session. Average pulse rate under amphetamine dosing was 80.8 beats/min, whereas under placebo dosing it was 67.9 beats/min (t = −9.25, d.f. = 151, p < 0.001). Blood pressure under amphetamine dosing was significantly higher than under placebo dosing. This was manifested in the systolic blood pressure (128 versus 120; t = −4.25, d.f. = 151, p < 0.001), in the diastolic blood pressure (80 versus 74; t = −4.14, d.f. = 151, p < 0.001), and in the total
Placebo Amphetamine
75 7
18 18
4 23
97 48
Total
82
36
27
145
This function provides: Sensitivity Specificity False alarms Misses
23/48 75/97 4/97 7/48
0.48 0.77 0.04 0.15
In summary, the two variables – average pulse rate and total blood pressure – provide a good prediction of amphetamine dosing, with a very low rate of false alarms (4%), and moderate rate (48%) of correct identifications. This comes at a cost of 25% of undecided cases that must be re-evaluated. 3.5. Validation of the DECP diagnostic guidelines The DECP diagnostic guidelines are summarized in the form of a matrix that specifies which signs and symptoms should be associated with each of the drug categories. It was therefore interesting to compare the DECP matrix with the results obtained in this study. The comparisons between the DECP guidelines and the effects obtained in this validation study are summarized in Table 5. The table contains the average score obtained for each drug for each of the measures evaluated, and an indication of whether or not that average score was significantly different from the
858
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861
Table 5 Mean effects of each drug on ocular and vital signs (see text for explanations) Measure Nystagmus+
(sum score) Pupil diameter (in dark) Pupil diameter (in light) Pupil diameter (dark–light) Pulse average Pulse (last − first) Blood pressure (systolic) Blood pressure (diastolic) Blood pressure (systolic + diastolic) Temperature RT to light (% slowed) Convergence (% impaired)
Cannabis
Alprazolam
Codeine
Amphetamine
Placebo
ANOVA
2.78 7.37* Up 4.13* Up 3.24 84.8* Up −5.67* 118.3 Up 72.7 Up 191.1 Up 97.5 44.9# 71.4# Yes
3.80*
2.39 6.28* Dn 3.08* Dn 3.20 66.2 −1.16 123.1 Dn 72.5 Dn 195.6 Dn 97.4 Dn 31.3 Yes 53.1
2.51 6.89 Up 3.73 Up 3.14 80.8* Up 0.04 128.4* Up 80.3* Up 208.6* Up 97.7 Up 23.5 Yes 47.1
2.62 6.88 3.57 3.33 67.9 −0.32 119.7 73.8 193.5 97.6 20.8 59.8
F(4,295) = 8.11, p < 0.001 F(4,290) = 9.48, p < 0.001 F(4,293) = 12.45, p < 0.001 F(4,292) = 0.68, p = 0.61 F(4,295) = 41.81, p < 001 F(4,295) = 7.69, p < 0.001 F(4,295) = 8.86, p < 0.001 F(4,295) = 7.13, p < 0.001 F(4,295) = 10.24, p < 0.001 F(4,294) = 10.66, p < 0.001 Chi2 = 15.13, p = 0.004 Chi2 = 13.44, p = 0.009
Yes 6.30* 3.17* 3.11 75.5* Dn −2.37 117.1 Dn 75.1 Dn 192.2 Dn 97.0* 44.9 Yes 77.6# Yes
Significantly different from the placebo condition, based on Dunnett test with α = 0.05. An analysis conducted on the dichotomous variable HGN revealed a similar effect; with 53% of the Alprazolam-dosed subjects noted as having nystagmus vs. 26% of the placebo-dosed subjects (Chi square = 18.34, p < 0.001). # There are no post hoc tests for Chi square, therefore only the most extreme differences from placebo are noted as significant. *
+
placebo condition (based on ANOVAs and Dunnett’s post hoc tests for interval variables, and Chi square tests for categorical measures). Starred entries indicate that the mean for that drug is significantly different (at α = 0.05) from the mean in the placebo condition. The ‘Up’, and ‘Dn’ codes next to some of the entries are the drug impairment indicators noted in the DECP matrix. They refer to scores that are expected to be higher (Up) relative to no impairment or lower (Dn) relative to no impairment. When a symptom is simply supposed to be present (such as nystagmus) it is denoted by a ‘Yes’. Cells marked in bold indicate an agreement between the DECP matrix and the results obtained here. From the table above, it can be seen that there is a partial agreement between the results obtained in this controlled drug dosing environment and the DECP matrix. Of the 29 expected impairment signs (indicated by ‘Up’, ‘Dn’, or ‘Yes’), 13 were confirmed in this study (starred and in bold). Of the 16 signs that were not confirmed, 15 were not significantly different from the placebo condition; one (pulse average) was significantly different from the placebo but in the opposite-than-expected direction. Four were significantly different from the placebo but not predicted to be so. It is important to note – before making a judgmental evaluation of the adequacy of the DECP guidelines – that the DECP manual is very liberal with its definitions of the ‘normal’ ranges that it lists for the different signs. Thus, according to the DECP, pupil size under normal illumination
can be 3.0–6.5 mm; normal pulse rate is 60–90 beats/min; normal systolic blood pressure is 120–140 mmHg; diastolic blood pressure is 70–90 mmHg; total blood pressure (by extrapolation) can be 190–230 mmHg; and the temperature is 97.6–99.6 F. The DECP procedure includes the psychophysical tests, but it does not mention how performance is affected by these tests for the different drug impairments. However, since these tests have a good predictive validity for alcohol impairment, and they are conducted as a part of the DRE drug impairment evaluation, we also evaluated the four drug effects on the summary scores of the one leg stand test (OLSSUM), the Romberg balance test (SWAYAVG), the walk-and-turn test (WTSUM), and the finger-to-nose test (FTNSUM) (which is not part of the standardized sobriety field test). The results are summarized in Table 6. Although no specific predictions were made with respect to the psychophysical measures, it is interesting – and gratifying – to find that the only significant effects were obtained for alprazolam. Being a depressant, alprazolam should elicit responses similar to those observed for alcohol, and all of these tests were originally designed to detect alcohol impairment. The one test that was not sensitive to alprazolam was the finger-to-nose test. The one-leg-stand and the walk-andturn yielded both the highest levels of significance and also showed the greatest deterioration relative to placebo; with alprazolam-impaired subjects scoring approximately three times as bad as the placebo-dosed subjects.
Table 6 Drug effects on the psychophysical measures Measure OLSSUM SWAYAVG WTSUM FTNSUM *
Cannabis
Alprazolam
Codeine
Amphetamine
Placebo
ANOVA
2.12 1.18 1.96 1.42
4.61∗
1.65 1.31 1.73 1.37
1.18 1.32 1.31 1.26
1.65 1.33 1.71 1.23
F(4,295) = 22.28, p < 0.001 F(4,295) = 6.79, p < 0.001 F(4,295) = 19.69, p < 0.001 F(4,295) = 2.24, p = 0.06
2.81∗ 4.04∗ 1.90
Significantly different from the placebo condition, based on Dunnett test with α = 0.05.
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861
859
Table 7 The frequencies of predictions for each drug, using the logistic regression models in predicting drug impairment, with a decision criterion Pˆ = 0.50 Drug
Chi-square
Correct impairment
Correct unimpairment
False impairment
False unimpairment
Cannabis Alprazolam Codeine Amphetamine
46.23**
29 31 10 32
88 88 86 89
9 9 11 8
17 15 37 16
52.30** 2.51 n.s. 54.86**
n.s., not significant, p > 0.10. ** p < 0.001.
The entries in this table reveal a striking advantage of the quantitative approach, based on the combination of DECP signs and symptoms as recorded by the DRE on the Drug Influence Evaluation Form, for all drugs except codeine (narcotic/analgesic prediction). The summary measures of Phi correlations and uncertainty coefficients for cannabis, amphetamine, and alprazolam show a much higher level of association with the logistic regression models than with the DREs’ decisions. In the case of codeine, the formal model and the DREs’ decisions are approximately the same—performance of both being not much better than chance. For cannabis, alprazolam, and amphetamine, the sensitivity and specificity levels are higher and the misses and false alarms are lower with the formal logistic regression models than with the DREs’ actual decisions (with the exception of the miss rate for cannabis, which is slightly lower for the DREs than the formal model: 31% versus 37%). In the case of codeine, the DREs’ sensitivity is twice as high as that of the regression model, but that advantage is at a cost of a false alarm rate that is also twice as high as that of the regression model. When the decision criterion for the formal model was based on a false alarm rate of 5%, we obtained the frequencies listed in Table 9. Setting the false alarm rate at 5% implied that the frequency of false impairments is always five and the frequency of correct unimpairments is always 92. This is because the placebo condition included a total of 97 cases. The measures of association derived from these frequencies are presented in Table 10. From the results obtained in Table 10, it is obvious that by setting the false alarm rate to an acceptable level of 5%,
3.6. The added value of the regression models relative to the DREs’ performance To evaluate the advantage of the logistic regression models based on data derived from the DECP, relative to the performance of experienced DREs, the conclusions of the DREs who observed the subjects and recorded the data used in this analysis were compared to the predictions based on the logistic regression models. Since the DREs did not have the benefit of retesting the subjects in case of doubt, this analysis did not provide for an ‘undecided’ category. Instead, two different approaches were used. The first approach was to set the decision criterion for each drug impairment at Pˆ = 0.50, so that whenever the subject’s predicted probability of being unimpaired was < 0.50 the subject was judged impaired by the drug to which the regression function applied. The second approach was to base the decision criterion on a socially acceptable conservative ‘false alarm’ error rate. The level chosen for the purpose of this analysis was set at 5% that is the same as setting the acceptable type I error, α at 0.05. Table 7 lists the frequencies of correct identifications of impairment, correct identifications of unimpairment (by the specific drug), incorrect judgments of drug-impairment and incorrect judgments of unimpairment for each of the four drugs, as predicted by the logistic regression model with cutoff value of Pˆ = 0.50. From the above frequency tables several measures of the association between the DECP logistic regression model predictions and the actual drug dosing were derived for each drug. The results are presented in Table 8, alongside with the results obtained from the DREs’ performance (from Shinar et al., 2005).
Table 8 The validity of the logistic regression models and the DREs decisions in predicting drug impairment, using Pˆ = 0.50 as a decision criterion (the DRE data are from Table 4 in Shinar et al., 2005) Drug
Logistic vs. DRE
Sensitivity
Specificity
False alarms
Misses
Phi correlations
Uncertainty coefficients
Cannabis
Logistic DRE
63 49
91 69
9 31
37 51
0.57 0.14
0.25 0.02
Alprazolam
Logistic DRE
67 47
91 80
9 20
33 53
0.61 0.23
0.29 0.05
Codeine
Logistic DRE
21 45
89 72
11 28
79 55
0.13 0.14
0.01 0.02
Amphetamine
Logistic DRE
67 10
92 91
8 9
33 90
0.62 0.01
0.30 0.00
860
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861
Table 9 The frequencies of predictions of each drug, using the logistic regression models in predicting drug impairment, with a false alarm rate = 5% Drug
Chi-square
Correct impairment
Correct unimpairment
False impairment
False unimpairment
Cannabis Alprazolam Codeine Amphetamine
51.50**
27 28 6 26
92 92 92 92
5 5 5 5
19 18 41 22
54.56** 2.60 n.s. 45.89**
n.s., not significant, p = 0.11. ** p < 0.001. Table 10 The validity of the logistic regression models in predicting drug impairment, when the false alarm rate is set at 5% Drug
Sensitivity
Specificity
False alarms
Misses
Phi correlations
Uncertainty coefficients
Cannabis Alprazolam Codeine Amphetamine
58.7 60.9 12.8 54.2
94.8 94.8 94.8 94.8
5.2 5.2 5.2 5.2
41.3 39.1 87.2 45.8
0.60 0.62 0.13 0.56
0.28 0.30 0.01 0.24
the regression model can provide a slightly above 50% rate of correct identifications of drug impairments for all drugs except codeine. With a sensitivity to codeine impairment of approximately 10%, the regression model is not a useful tool for identification of impairment from this narcotic/analgesic. Thus, it appears that even with the best predictive regression model, predicting codeine impairment is not significantly better than chance.
for which the quantitative model did not provide better than chance predictions, the models derived for cannabis, alprazolam, and amphetamine can be very useful to improve the DREs’ performance. For the most part, the signs and symptoms that were significantly associated with each of the drugs were consistent with those indicated by the DECP, thus validating the DECP. The specific consistent associations (all relative to the placebo conditions) were:
4. Conclusions and recommendations
1. For cannabis impairment—enlarged pupil, raised pulse rate, and lack of convergence. 2. For alprazolam (depressant) impairment—horizontal gaze nystagmus, slowed pupil reaction, and lack of convergence. 3. For codeine (narcotic/analgesic) impairment—constricted pupil. 4. For amphetamine (stimulant) impairment—raised pulse rate and blood pressure.
This validation effort was based only on DREs’ recordings of signs and symptoms they observed in drug-dosed and placebo-dosed subjects. The full DECP procedure includes an interview with the subjects, and – in the context of law enforcement – the DREs typically have the arresting officer’s report of the suspect’s behavior upon arrest. Thus, with the full DECP the DREs have much more information about the suspect. Therefore, it is likely that the validity estimates provided by this evaluation are very conservative. This is because the officer’s report and the DRE’s interview can sensitize the DRE to make better observations as well as provide him/her with direct information on the drug ingested. With these limitations in mind, the following conclusions and recommendations emerge from the preceding analyses. The results of this validation study demonstrate that the DECP can be a significant tool in the detection of drugimpaired people, and in the identification of three of the four drug categories evaluated: cannabis, depressants, and stimulants. While the information collected by the DREs for the most part is quite useful, the DECP guidelines for data interpretation and drug identification are not totally consistent with the regression models that yielded the high levels of drug identification. One benefit of the quantitative model is that its decision cut-off levels can be adjusted to different – socially acceptable – rates of false identifications of drug impairments (false alarms) without sacrificing the sensitivity of the identification. Thus, with the exception of codeine
This study did not confirm some of the signs and symptoms expected from the DECP. The signs and symptoms not found were: 1. For cannabis impairment—raised blood pressure. 2. For alprazolam impairment—low pulse rate and low blood pressure. 3. For codeine impairment—low blood pressure, low temperature, and slowed pupil reaction. 4. For amphetamine impairment—enlarged pupils, raised temperature, and slowed pupil reaction. Of the four drugs, only alprazolam – the depressant – was consistently associated with poor performance on the psychophysical tests. This is consistent with the expected effects of any depressant on this behavior. However, this also means that the indiscriminant reliance of the DREs to expect impaired motor and sensori-motor coordination is not warranted, and probably reduces their accuracy in identifying specific drug impairments.
E. Schechtman, D. Shinar / Accident Analysis and Prevention 37 (2005) 852–861
It appears that the utility of the signs and symptoms examined as a part of the DECP is not fully utilized by the DREs. The validity of predictions based on regression models derived from the actual recorded data was significantly higher than that of the DREs’ validity in the case of cannabis, alprazolam, and amphetamine. However, to improve the DREs’ performance, we recommend that the guidelines for the interpretation of the signs and symptoms be changed to conform to the regression models obtained in this study, so that they reflect more accurately which signs and symptoms are indicative of specific drug impairments. To improve DRE performance, their training should include the use of the regression models obtained here as means of improving the diagnosis, or at least reconsidering it before relying on their intuitive combinations of signs and symptoms. These models and the recommended thresholds for identifying impairments due to cannabis, alprazolam, and amphetamine, are easy to apply to actual evaluations. Finally, it is noteworthy that placebo-dosed subjects tend to perform at the same levels on all signs and symptoms, regardless of their drug ingestion history (as represented in the four different drug groups). However, these people may still differ from people who have no drug history at all. But the findings do reduce the suspicion that the observed signs and
861
symptoms are due to long-lasting behavioral and physiological drug effects. References Governors Highway Safety Association, 2005. ww.naghsr.org/html/ state info/laws/dre perse laws.html. March 17. Heishman, S.J., Singleton, E.C., Crouch, D.J., 1996. Laboratory validation study of drug evaluation and classification program: ethanol, cocaine, and marijuana. J. Anal. Toxicol. 20, 466–483. Kosnoski, E.M., Volton, R.L., Citek, K., Hayes, C.E., Evans, R.B., 1998. The drug evaluation classification program: using ocular and other signs to detect drug intoxication. J. Am. Optom. Assoc. 69 (4), 211–227. NHTSA, 1991. Drug Evaluation and Classification Program. Briefing Paper. National Highway Department of Transportation, U.S. Department of Transportation, Washington, DC. SAS Institute Inc., 1995. Logistic Regression Examples Using the SAS System. SAS Institute Inc., Cary, NC. Senate of Canada, 2002. Special Committee on Illegal Drugs. Cannabis: Our Position for a Canadian Public Policy, September. Shinar, D., Schechtman, E., 2005. Drug detection performance on the basis of observable signs and symptoms. Accid. Anal. Prev. 37 (5), 843–851. Smith, J.A., Hayes, C.E., Yolton, R.L., Rutledge, D.A., Citek, K., 2002. Drug recognition expert evaluations made using limited data. Forensic Sci. Int. 130, 167–173.