Forensic Science International 128 (2002) 98–103
External quality assessment schemes for toxicology John Fawcett Wilson* Department of Pharmacology, Therapeutics & Toxicology, University of Wales College of Medicine, Heath Park, Cardiff CF14 4XN, UK
Abstract A variety of external quality assurance (EQA) schemes monitor quantitative performance for routine biochemical analysis of agents such as paracetamol, salicylate, ethanol and carboxyhaemoglobin. Their usefulness for toxicologists can be lessened where the concentrations monitored do not extend fully into the toxic range or where the matrix is synthetic, of animal origin or serum as opposed to whole human blood. A scheme for quantitative determinations of a wider range of toxicological analytes such as opioids, benzodiazepines and tricyclics in human blood has been piloted by the United Kingdom National External Quality Assessment Scheme (UKNEQAS). Specialist schemes are available for drugs of abuse testing in urine and for hair analysis. Whilst these programmes provide much useful information on the performance of analytical techniques, they fail to monitor the integrated processes that are needed in investigation of toxicological cases. In practice, both qualitative and quantitative tests are used in combination with case information to guide the evaluation of the samples and to develop an interpretation of the analytical findings that is used to provide clinical or forensic advice. EQA programs that combine the analytical and interpretative aspects of case studies are available from EQA providers such as UKNEQAS and the Dutch KKGT program (Stichting Kwaliteitsbewaking Klinische Geneesmiddelanalyse en Toxicologie). # 2002 Elsevier Science Ireland Ltd. All rights reserved. Keywords: External quality assurance; Analytical performance; Data interpretation
1. Schemes for therapeutic drug monitoring An external quality assessment (EQA) scheme for anticonvulsant drug assays was established by Professor Alan Richens in 1972 following his demonstration of a wide variation in measurements of phenytoin in a survey of laboratories in the London area [1]. The scheme was subsequently moved to Cardiff and, since 1983, the Cardiff Heathcontrol schemes have been part of the network of United Kingdom National External Quality Assessment Schemes (UKNEQAS). The schemes started in the academic department of the University and are now operated by Cardiff Bioanalytical Services Ltd. under the direction of Professor Richens, Dr. John Williams and Dr. John Wilson by a full-time staff led by Mrs. Kathleen Barnett. The scientific direction of the schemes is controlled by a Steering Committee of representatives from the day-to-day
* Tel.: þ44-29-2074-2068; fax: þ44-29-2074-8316. E-mail address:
[email protected] (J.F. Wilson).
organizers, participants and expert advisors chaired by Dr. Ian D. Watson of Aintree University Hospital, Liverpool. The original survey by Alan Richens showed a coefficient of variation (CV) in measurements of phenytoin of 26.7%. The definitive study of this type was an American-wide blindsurvey undertaken by Pippenger et al. [2] in which samples were submitted to laboratories from several distribution centers as if they were patient samples. The observed CVs ranged from 49.3% for phenytoin to 269.3% for ethosuximide. These appallingly bad results led to the introduction of regular EQA services for a number of therapeutic drug groups and a marked improvement in analytical standards. The relevance of such schemes for toxicologists is that they closely mirrored developments in other areas of clinical chemistry that had or were occurring by the early 1970s. They have become the model on which most toxicology EQA schemes now operate. Many countries have developed national programs for monitoring drug measurements. The major differences lie in the ethos behind the schemes. The UKNEQAS schemes are at the educational end of the
0379-0738/02/$ – see front matter # 2002 Elsevier Science Ireland Ltd. All rights reserved. PII: S 0 3 7 9 - 0 7 3 8 ( 0 2 ) 0 0 1 6 6 - 4
J.F. Wilson / Forensic Science International 128 (2002) 98–103
99
spectrum whereas some schemes in the USA, for example, exist largely for regulatory purposes of laboratory registration. There is, however, much overlap in purpose. The schemes operating to support therapeutic drug monitoring are of limited value to toxicologists, samples being based on a serum matrix and with drug concentrations that are therapeutic. The UKNEQAS program for psychoactive drugs includes compounds of toxicological interest. It was founded in 1979 for four tricyclic compounds. Over the years, the range of analytes has expanded to 30 compounds ranging from amitriptyline, through SSRIs, to zuclopenthixol. At the very least, the scheme provides a useful source of calibration material.
2. Quantitative schemes for toxicology EQA schemes to monitor quantitative measurements of toxicological interest have been introduced by many scheme providers around the world. The Heathcontrol programs are a typical example. A serum scheme for toxicological concentrations of analgesics was introduced in 1991 and for ethanol in 1995. A blood sample was added in 1995 when a scheme for carboxyhaemoglobin was launched and a urine ethanol sample added. Salicylate and paracetamol were added to the blood sample in 1997 and the most recent introduction is a quantitative scheme in blood which aims to range widely over drugs of toxicological interest and includes regular distributions containing opiates, opioids, benzodiazepines and tricyclics. Of necessity, the variety of potential toxicological analytes means that other drug groups will appear irregularly. Quantitative EQA schemes of this type generally offer participants comparable statistical evaluations (Fig. 1). It is quite common to display the raw data as a histogram. The main choice is what value to use as the target value. The usual choices are the weighed-in or spike value and the consensus mean from all or a sub-group of participants. The laboratory result is then compared to the target value and bias values given as a percentage or in units of standard deviation as a z-score. A very useful additional function of EQA schemes is to present data broken down into different technique subgroups. This information can be collected and used to make comparisons between techniques. For example, in a comparison of the accuracy and precision of a range of techniques for the analysis of carboxyhaemoglobin, accuracy was generally within half of one percent of the spike value whilst precision was between 1 and 2% with the dedicated machines performing better than spectrophotometric techniques [3]. This type of data is clearly helpful when laboratories consider a replacement instrument but has other application. From the precision data, one might take two standard deviations as being a minimum difference between specimens that might be considered significant. Though 3– 4% might be a reasonable difference in clean quality control
Fig. 1. Example laboratory report from the Heathcontrol external quality assessment scheme for salicylate for a single sample distributed in July 2001.
sample material, in true forensic samples, precision may halve and only differences greater than 6% should be treated as real. Where a quantitative EQA scheme distributes several samples containing different analyte concentrations at one time, more complex and informative statistics can be provided (Fig. 2). It is possible to identify if a laboratory has combinations of errors of four types: errors of bias, slope errors, errors of curvature or random noise [4]. The example shows a pure slope error of 10% in blood and serum measurements of ethanol. Such errors are uncommon, most errors being of bias or random in nature. A third type of graphical display common to EQA schemes is of time-dependent changes (Fig. 3). In the example shown from the Heathcontrol program, data for 12 months are plotted vertically as differences from the consensus mean against months identified by their initial letter. The bias index score (BIS) units approximate to multiples of the standard deviation times 100. Running across the plot are shaded areas set at levels within which we would expect a laboratory to perform. The laboratory started the period reporting serum data (solid symbols) that were slightly high. Their blood data (open symbols) were suspect running into the shaded regions. In February or March, corrective action was taken and measurements came into alignment.
100
J.F. Wilson / Forensic Science International 128 (2002) 98–103
Fig. 2. Example laboratory report from the UKNEQAS for ethanol showing a weighted linear regression analysis from three serum samples (*) and one blood sample (*) distributed in July 2001.
Although the quantitative schemes described can offer much important information on assay performance to the laboratory, they can mislead if the influence of the nature and quality of the sample matrix is overlooked. Two examples will demonstrate the difficulty. Table 1 shows the CV of measurements reported by Heathcontrol participants in October 2000 for a range of analytes in both serum and in blood. For analytes measured routinely such as ethanol, salicylate and paracetamol, blood measurements show a consistently higher variability than in serum. Measurements of the tricyclics in serum, which have been part of our program since 1979, are good but their measurement in the blood toxicology program introduced in 2000 are no better than the serum measurements were in the early days of therapeutic drug monitoring. There is work to be done to improve the situation before toxicological measurements currently being made in blood can be taken as having much value. The second example is of variation in measurements of tricyclics in serum in the Heathcontrol program from the start of the EQA scheme in 1980 until the present day (Fig. 4). Initially, CVs fell but in the late 1980s they started to increase again. The reason was clear at the time. With the arrival of human immunodeficiency virus infection, supplies
Fig. 3. Example laboratory report from the Heathcontrol external quality assessment scheme for salicylate for July 2001 showing the bias index score (BIS) for the previous 12 months. Months are indicated by their initial letter. Data for the three serum samples distributed each month are shown by (*) and for the single monthly blood sample by (*). Shaded areas show regions 2 and 3 intra-laboratory standard deviations from the consensus mean.
of human serum used to prepare the samples became very difficult to obtain and material of poor quality was used. The scheme changed the matrix to a clean animal serum and CVs dropped dramatically and have continued to fall gradually ever since. Thus, not only should the choice of sample matrix be correct for the analysis to be monitored, but the quality of matrix can have a profound effect. Performing well in clean samples typical of EQA programs should not be taken as a complete guide to performance in real forensic material.
Table 1 Coefficient of variation (%) of measurements in serum and blood reported by participants in the Heathcontrol external quality assessment scheme in October 2000 Analyte
Serum
Blood
Ethanol Salicylate Paracetamol Amitriptyline Nortriptyline
6.8 6.3 10.9 10.0 13.4
9.0 19.7 17.2 46.0 48.4
J.F. Wilson / Forensic Science International 128 (2002) 98–103
Fig. 4. Variation in coefficient of variation of measurements of four tricyclic drugs in serum samples distributed by the UKNEQAS between 1980 and 2000.
3. Schemes for other matrices Since the matrix is so important to toxicological analysis, a number of schemes have been introduced for toxicologists that circulate samples of drugs in matrices other than blood and serum. There are a number of schemes for drugs in urine aimed at clinical, forensic and workplace testing groups. Schemes for matrices other than urine are less common. Two examples will illustrate what can be achieved. The French scheme for drugs in hair [5] attempts to circulate authentic samples whilst an American scheme circulates solid tissues and decomposed samples [6]. Whilst these initiatives deserve support, a note of caution is needed. The next up-and-coming area requiring EQA is drug detection in oral fluids. Should this proliferation of separate schemes continue? The possibility of using case-orientated schemes that circulate a variety of matrices to limit expansion in schemes for different matrices will be discussed in the following section. The UKNEQAS for drugs of abuse in urine commenced operation in 1986. It was an extension of an existing scheme
101
operated from the Maudsley Hospital, London by Mr. Brian Smith. The UKNEQAS for drugs in urine has continued to be directed over the years largely by Brian Smith as a member of the Steering Committee for the Scheme. The key feature of the scheme is that the samples are not generally analytically clean like most commercially available calibrators but they simulate realistic clinical samples with complex mixtures of interferents, therapeutic and abused drugs. Fig. 5 shows the percentage of reports that were in error for samples distributed between the start of the scheme and the present day for laboratories failing to detect amfetamine when present at a concentration between 1.2 and 1.6 mg/l and for those giving false positive reports for amfetamine in samples containing ephedrine. Historically, 7–10% of laboratories missed amfetamine at the 1.2– 1.6 mg/l level in clinically realistic samples. Recent improvement has followed the introduction of improved immunoassays by the kit manufacturers. Misidentification of ephedrine as amfetamine is perhaps a greater problem. At the start of the scheme, over 30% of laboratories got this important distinction wrong. Some learning seemed to take place and recent improvements in immunoassay specificity have helped. Similar data is shown in Fig. 6 for morphine. The percentage of false negative reports is displayed for samples containing a free morphine concentration of 1.2– 1.6 mg/l. As the variability in error rate demonstrates, this is a more difficult analyte than amfetamine to detect against a variable background of different non-opiate interferences present in UKNEQAS samples. The second line shows false positive reports for morphine in samples containing pholcodine as the only opiate. Typically, about 20% of laboratories will report morphine as present. If you repeat the exercise immediately, as in the autumn of 1993, participants remember their mistake and the number of false positives drops. Omit the challenge for a couple of years and performance is as bad as ever. The important message here is that not only does the available technology affect performance, but a degree of operator interpretation is needed to
Fig. 5. Percentage of reports for amfetamine that were in error for samples containing amfetamine and ephedrine distributed by the UKNEQAS between 1987 and 2000.
102
J.F. Wilson / Forensic Science International 128 (2002) 98–103
Fig. 6. Percentage of reports for morphine that were in error for samples containing morphine and pholcodine distributed by the UKNEQAS between 1987 and 2001.
make biological sense of the data produced by analytical instruments.
4. Schemes to monitor interpretation of analytical data Schemes designed to monitor the ability of the laboratory to interpret their analytical data are available for toxicologists. These schemes circulate samples together with a case scenario and the laboratory is asked both to analyse the samples and to interpret their results. The credit for first attempting this type of program must go to the Dutch Stichting Kwaliteitsbewaking Klinische Geneesmiddelanalyse en Toxicologie (KKGT) program. Dr. Ido Dijkhuis started the Dutch scheme in 1978 [7]. The comparable UKNEQAS scheme was launched in 1995. The major difference to note between the schemes is that KKGT circulates only clinical cases whilst UKNEQAS a mixture of clinical and forensic cases. The rational underlying these schemes was stated at the outset by Ido Dijkhuis; ‘‘Toxicological analysis should be approached via clinical questioning and, vice versa, laboratory data should be translated into medical prose’’ [7]. Inserting the word forensic instead of clinical gives the appropriate meaning for forensic toxicologists. It is only by considering a sample in its correct context can you possibly decide what analyses to undertake, in what order, to what accuracy and what the limits on time and cost might be. And further, it is necessary to interpret the findings as you progress and work towards the final report to be returned to the client. This rule will be broken immediately here by presenting some data from the UKNEQAS toxicology case interpretation scheme in two halves. Table 2 lists the major analytes to be detected in 10 different samples circulated by the scheme. Cases are sorted into increasing order of success by the laboratories. For unusual drugs such as propranolol and nicotine, about half the laboratories missed the drugs concerned. The number of successful identifications increased towards 60% for choral hydrate, thioridazine and ethylene
glycol. The tricyclics were done better as was strychnine and virtually all laboratories spotted an opiate, though only 50% could identify it as dihydrocodeine. The best performance was for paracetamol where quantification was carried out by 97% of participants. If analytical performance was mixed, how did laboratories interpret their results? The range in response was extreme. Some participants provide nothing in addition to an analytical result whilst others provided large amounts of undigested and irrelevant material and left the reader to sort it out. Neither of these two extremes is appropriate. Two further examples will illustrate problems with the interpretation itself. First, in a case of paracetamol overdose without adequate knowledge of the time of overdose and complicated by alcohol, 10% of participants said incorrectly that toxicity was unlikely. Secondly, in the thioridazine case, participants were specifically asked what additional samples they would like in order to differentiate between a therapeutic or an overdose case. Only 17% requested the
Table 2 Analytical results from 10 toxicology case studies distributed by the UKNEQAS between 1995 and 2000 Case
Compound
Laboratories reporting (%)
Nicotine Propranolol Chloral hydrate Thioridazine Ethylene glycol
Nicotine or cotinine Propranolol Trichloroethanol Phenothiazine Glycol or methanol
50 51 57 58 64
Lofepramine þ ethanol Strychnine Amitriptyline
Tricyclic Ethanol Strychnine Tricyclic
74 77 85 90
Dihydrocodeine (DHC) þ ethanol Paracetamol þ ethanol
Opiate (DHC) Ethanol Quantitative data
98 (50) 90 97
J.F. Wilson / Forensic Science International 128 (2002) 98–103
Fig. 7. Frequency distribution of mean performance scores from six samples distributed to participants in the UKNEQAS for toxicology cases between 1995 and 2000.
necessary tissue sample. Training is clearly needed to improve interpretation by laboratory personnel. Support for education is an important remit of many EQA programs. A method of assessing the integrated activities of analysis and interpretation is required in order to encourage laboratories to improve. The Steering Committee for the UKNEQAS toxicology case interpretation scheme is currently developing an integrated laboratory performance score based on a peer review panel allocating scores to participant reports [8]. Fig. 7 shows a frequency histogram of the mean laboratory score for six cases. The maximum score is 20 and penalties can be awarded for incorrect analyses and interpretation. There is currently a small group of laboratories scoring over 15 who are detecting all relevant drugs and providing good interpretation. At the other extreme is a group of laboratories scoring 6 or less. On inspection, these laboratories are seen to be undertaking a fixed and limited panel of assays; ethanol, salicylate, paracetamol, and possibly carboxyhaemoglobin and lithium. Several offer no interpretation of data. It is the view of the Steering Committee that these laboratories are not offering a toxicological service worthy of the name and should be advised to stop. The middle band of participants with mean scores from 8 to 14 is attempting to offer a toxicological service and to provide interpretation. We hope that the education and support offered by the scheme will encourage them to match the best performers.
5. Discussion A number of significant problems have been identified by EQA schemes in toxicological analyses. The way forward
103
will not, however, be dependent solely on actions by laboratories. EQA providers and laboratories must work in partnership. EQA organisers have come a long way in providing schemes with the correct drugs in the correct matrices at the correct concentrations. There are examples of good practice where authentic and realistic samples are provided. The combinatorial effect of an increasing range of drugs of toxicological potential with every possible matrix cannot sensibly be matched by proliferation of relevant EQA schemes. The development of schemes that integrate analysis with interpretation may provide a solution. If such integrated schemes were to include a variable range of matrices appropriate to each particular case then, over time, assessment of the totality of the performance of a toxicology laboratory could be monitored. The need to be able to demonstrate satisfactory performance by such means has, or will soon, become a necessity to all toxicological laboratories as assessment by outside agencies becomes universal. Laboratories would be wise not to view EQA as merely a necessary chore. There are currently opportunities to become involved in discussions via EQA schemes about standards and to help define the criteria against which laboratory performance will be judged. Please make use of them.
References [1] A. Richens, Quality control for drug monitoring, Br. Med. J. 1 (1977) 1411. [2] C.E. Pippenger, J.K. Penry, B.G. White, D.D. Daly, R. Buddington, Interlaboratory variability in determination of plasma antiepileptic drug concentrations, Arch. Neurol. 33 (1976) 351–355. [3] K. Barnett, J.F. Wilson, Quantitation of carboxyhaemoglobin in blood: external quality assessment of techniques, Br. J. Biomed. Sci. 55 (1998) 123–126. [4] J.F. Wilson, R.G. Newcombe, R.W. Marshall, J. Williams, A. Richens, Analysis of assay errors in drug measurements from the Heathcontrol interlaboratory quality assessment schemes, Clin. Chim. Acta 143 (1984) 203–216. [5] M. Deveaux, P. Kintz, J.P. Goulle, J. Bessard, G. Pepin, D. Gosset, The hair analysis proficiency testing program of the French Society of Analytical Toxicology, Forensic Sci. Int. 107 (2000) 389–394. [6] A.K. Chaturvedi, The FAA’s postmortem forensic toxicology self-evaluated proficiency test program: the first 7 years, J. Forensic Sci. 45 (2000) 422–428. [7] I.C. Dijkhuis, Performance of clinical toxicology laboratories, Lancet 2 (1982) 1341. [8] I.D. Watson, J.F. Wilson, Quality assurance of toxicological interpretation, Ther. Drug Monit. 23 (2001) 498.