Prog. Neuro-Psychophormocol & Biol. Psychiot 1982, Vol. 6. pp. 601-606 Printed in Great Britain. AU rights reserved,
0278-58461821060601-06503,0010 Copyright © 1982 Pergamon Press Ltd.
PATIENT ASSESSMENT IN CLINICAL TRIALS
WILLIAM GUY Dept. of Psychiatry, Vanderbilt U n i v e r s i t y School of Medicine, Nashville, TN, U.S.A.
(Final form, August 1982)
Abstract I.
Appropriate c l i n i c a l assessment procedures are c r i t i c a l f o r the successful execution of a psychotropic drug t r i a l and essential f o r the proper documentation of r e s u l t s .
2.
Clinical instruments may be classified in four general areas: demography, efficacy, safety, administration.
3.
Since questions concerning the precise characteristics of the research sample are among the f i r s t to be raised, the detailed assessment of sample characteristics is mandatory and should include demographic, historical and diagnostic data.
4.
Instruments f o r the assessment of e f f i c a c y are, f o r the most part, r a t i n g scales which provide measures of i n i t i a l psychopathology and subsequent change. This area of assessment has been the focus of much research over the past two decades and, for many diagnostic populations, r e l i a b l e and s e n s i t i v e standard instruments are a v a i l a b l e .
5.
The assessment of safety, in contrast to e f f i c a c y , is less well defined. The e a r l i e r c h e c k l i s t approach is being replaced by newer, more sophisticated instruments but, as y e t , standardized procedures f o r the assessment of safety are not in general use.
6.
Lastly are the administrative documents which record the events of t r i a l , e.g., dosage, concomitant medication, intercurrent events, disposition. Regarded as "bookkeeping" chores, these instruments are frequently neglected--often to the detriment of the exposition and interpretation of results.
Key words:
C l i n i c a l assessment, demographic data, documentation, e f f i c a c y , r a t i n g scales, safety Introduction
Judging how sick a patient was or how much that patient changed a f t e r some kind of i n t e r vention is an ancient exercise and is the foundation of a l l c l i n i c a l assessment. For a long time, p a r t i c u l a r l y in the area of mental i l l n e s s , assessment remained e n t i r e l y at t h i s global l e v e l . Gradually, as c l a s s i f i c a t i o n became more refined and e t i o l o g y better understood, the global level was supplemented by the separate r a t i n g of those symptoms or syndromes thought to characterize a p a r t i c u l a r condition. Global assessments, however, were not discarded; rather they remain to t h i s day, sensitive and v a l i d measures of patient status. Since they represent an amalgamation of t o t a l c l i n i c a l i n t u i t i o n , they often incorporate unique aspects of the i l l n e s s not covered by the r a t i n g scales. Unfortunately, these aspects are submerged and, therefore, u n i d e n t i f i e d w i t h i n the single t o t a l score. The rating of sets of s p e c i f i c symptoms was i n i t i a l l y an i d i o s y n c r a t i c process; i . e . , each i n v e s t i g a t o r devised his own set of symptoms and r a t i n g contexts. This d i v e r s i t y made i t d i f f i c u l t to i n t e r p r e t the results of a single t r i a l and almost impossible to compare r e s u l t s from one t r i a l to another. 601
602
W. Guy
The last two decades have seen significant advances in c l i n i c a l assessment. Assessment instruments have become "standardized" through the use of common items and rating contexts. Their employment in c l i n i c a l t r i a l s have become the rule rather than the exception and they have served as the basis for computer-based documentation systems. Most of these systems are designed either for monitoring and actuarial purposes; e.g., c l i n i c a l census, d i s t r i b u tion of symptomatology in large populations, and/or for sequential assessment across time; e.g., c l i n i c a l t r i a l s . Some systems employ a fixed set of assessment instruments which are considered broad enough to be applicable to most c l i n i c a l situations. The Association for Methodology and Documentation in Psychiatry (AMDP) system (Guy and Ban, 1982) is an example of a fixed instrument system which can be used for both monitoring and sequential assessment and is now available in French, Spanish and English as well as the original German. Other systems make use of an expansible battery of independently developed assessment instruments from which subsets of scales may be selected by the investigator for his specific studies. The German International Collegium for Psychiatric Scales (CIPS, 1977) and the American NCDEU/BLIPS systems (Guy, 1976) are prime examples of this type and both are primarily employed for sequential assessment. All of these systems as well as those developed by individual pharmaceutical firms have extended the notion of standardized assessment to standardization of documentation, thereby f a c i l i t a t i n g c r i t i c a l review by creating a degree of uniformity in data displays. Selection of Assessment Instruments Appropriate c l i n i c a l assessment procedures are c r i t i c a l for the successful execution of psychotropic drug t r i a l s and essential for the proper documentation of results. In selecting instrumentation, i t is necessary to t a i l o r the choices to the aims and purposes of the t r i a l . Since, in drug development involving multiple sequential t r i a l s , the aims of ~he several developmental phases vary, the extensiveness and emphasis of assessment d i f f e r accordingly. For example, early Phase 2 studies are basically concerned with safety and dosage levels and only secondarily with efficacy. In contrast, late Phase 2 and Phase 3 t r i a l s center attention on comparative efficacy. Outpatient studies may be much more concerned with social adjustment than inpatient studies. Thus, the focus of assessment interest changes. This does not necessarily mean that a t o t a l l y different battery of scales is required at each phase--rather that, within a single battery, the importance of a given subset is emphasized over other subsets. Thus, essentially the same core battery--with additions and deletions--can be employed throughout a series of t r i a l s . This enables documentation and analyses to be more consistent, cohesive and above a l l , more comprehensible. Assessment batteries are assembled with the primary purpose of encompassing all substantive areas in which the drug has presumed action. Such comprehensiveness is usually attained without any great d i f f i c u l t y . The problem more often involves redundancy; i . e . , the use of multiple scales which perform a similar function. Such duplication does not increase precision; rather i t increases errors and creates interpretive ambiguity. Although the same item may seem to be present in the several scales, each scale may define and score that item quite d i f f e r e n t l y . Since raters generally f i l l out a l l assigned assessment scales after completing a single interview, such differences may be ignored or misinterpreted leading to incongruent or contradictory results. Rater training and group discussions prior to the i n i t i a t i o n of t r i a l are v i t a l in making certain that the chosen battery is employed as i t was intended to be. As important are the elements of fatigue and frustration generated by such scale duplication. I t is d i f f i c u l t for raters to maintain research zeal over the extended period of a t r i a l when the battery is excessively burdensome. To sum up, assesssment batteries should be relevant in focus, unambiguous in content and facile in use. Areas of Assessment In i t s broadest sense, c l i n i c a l assessment involves the collection and evaluation of a l l relevant patient data necessary for the documentation of the events of a c l i n i c a l procedure. Given f u l l and comprehensive documentation, a reviewer can reconstruct the who, where, when, what and sometimes the why of a t r i a l . Assessment instrumentation for c l i n i c a l t r i a l s can be classified in four general areas: demography, efficacy, safety, administration. For most c l i n i c i a n s , the assessment of efficacy and safety are the "raison d'etre" for the t r i a l and, hence, data in these areas are usually collected w i l l i n g l y and with few omissions. Demographic and administrative data, though equally necessary for documentation, are of much less c l i n i c a l interest and, consequently, rater indifference and/or resistance leads to many more errors.
Patient assessment
603
Demographic Data Demographic data encompass h i s t o r i c a l , current status and diagnostic information. These data are collected p r i m a r i l y to characterize study samples and to provide means of comparison or contrast among treatment groups and treatment outcomes. Though there are very few standard demographic scales per se, there is a great deal of commonality among the items usually included in demographic i n v e n t o r i e s . Variables such as age, sex, diagnosis, educational and employment h i s t o r y , mental status, past medical and p s y c h i a t r i c h i s t o r y , family p s y c h i a t r i c i l l n e s s e s and previous treatment h i s t o r y are almost always included in very s i m i l a r formats. The most d i f f i c u l t decisions in the selection of demographic content are the extensiveness and a v a i l a b i l i t y of the data to be collected. While extensive data coll e c t i o n can be valuable f o r epidemiological purposes, i t is often over-elaborate for a s p e c i f i c study and a l l too often unused in subsequent analyses. The degree of precision which may be attained with demographic data is l i m i t e d by the r e l i a b i l i t y of the various sources u t i l i z e d ; e . g . , p a t i e n t , r e l a t i v e s , case records. Further, these data are often gathered and encoded by several i n d i v i d u a l s whose t r a i n i n g may vary greatly. H i s t o r i c a l data are n o t o r i o u s l y d i f f i c u l t to obtain and even more d i f f i c u l t to v e r i f y . Thus items which require great d e t a i l are often only p a r t i a l l y answered or answered in a suspect manner. For the usual c l i n i c a l t r i a l , i t is f a r more s a t i s f a c t o r y to c o l l e c t only that demographic data necessary to i d e n t i f y and characterize c l e a r l y the c l i n i c a l sample in respect to the aims and objectives of the study. Of p a r t i c u l a r contemporary importance are diagnostic data which must provide systematic evidence that the t r i a l sample is t r u l y representative of the targeted diagnostic group. The recently introduced DSM-lll--along with i t s predecessor, the Research Diagnostic C r i t e r i a - provide schema for detailed descriptions of nosological groups. More s p e c i f i c c l a s s i f i c a tions e x i s t for subgroups; e . g . , Schneider's f i r s t rank symptoms, Klein's depressive subtypes, Leonhard's schizophrenic subtypes. These schema permit the use of checklists which c l e a r l y document the variables which j u s t i f y the i n d i v i d u a l diagnosis and answer the frequent and vexing c r i t i c i s m s regarding the diagnostic composition of the research sample. Along with a c h e c k l i s t for diagnosis, the value of a c h e c k l i s t documenting adherence to i n c l u s i o n / exclusion c r i t e r i a as a useful adjunct for demographic documentation should be emphasized. Efficacy Data instruments f o r the assessment of e f f i c a c y in c l i n i c a l t r i a l s consist, f o r the most part, of r a t i n g scales which provide measures of i n i t i a l psychopathology and subsequent change. Assessment here is the most psychometrically sophisticated of the four areas under discussion. A large number of r e l i a b l e and v a l i d instruments f o r use by c l i n i c i a n s , paraprofess i o n a l s , patients and r e l a t i v e s are available for the assessment of psychopathology, behavior, social adaption and a t t i t u d e s . Generally, the psychopathological scales are based on behavi o r a l rather than i n f e r e n t i a l judgments and survey e i t h e r a broad spectrum of psychopathology or a targeted group of symptoms. Broad spectrum scales, by v i r t u e of t h e i r comprehensiveness, have advantages in the assessment of heterogeneous populations and f o r research where c l a s s i f i c a t i o n and epidemiological factors are important. Conversely, they are time-consuming, usually require a r a t e r knowledgable in the i n t r i c a c i e s of psychopathology and are excessively detailed f o r use in the usual homogeneous samples employed in c l i n i c a l t r i a l s . For the t y p i c a l t r i a l , much of the data are inappropriate, wasted or subsumed under larger a n a l y t i c e n t i t i e s ; e . g . , f a c t o r s , c l u s t e r s , etc. Nevertheless, good use has been made of such instruments by completing the e n t i r e scale at pretreatment and only appropriate subsets of items at subsequent ratings. Among the well-known instruments of t h i s type is the Present State Examination (PSE) (Wing et a l . , 1974). The PSE has been used in a number of large c r o s s - c u l t u r a l studies which have provided valuable data on the prevalence of psychopathological symptoms among d i f f e r e n t c u l tures as well as serving as the basis f o r the development of nosological constructs. The Schedule f o r A f f e c t i v e Disorders and Schizophrenia (SADS) (Spitzer and Endicott, 1977) is an example of an instrument which u t i l i z e s alternate forms that can be used f o r e i t h e r monitoring or sequential assessment. The German AMDP Psychopathological Symptoms Scale and the Scandinavian Comprehensive Psychiatric Rating Scale (CPRS) (Asberg et a l . , 1978) are two f u r t h e r examples of dual function broad spectrum instruments. More f r e q u e n t l y employed in c l i n i c a l t r i a l s are those scales which focus on a more narrow
604
W. Guy
segment of the psychopathological spectrum. Their advantages are brevity and s p e c i f i c i t y . Brevity, of course, is the major attribute of any scale seeking rater a c c e p t i b i l i t y . Specif i c i t y , on the other hand, can be a two-edged sword--focusing on symptomatology which is anticipated and missing that which is unexpected. In earlier phase t r i a l s this may result in the failure to detect or highlight, within the rating scales, secondary or dual therapeutic actions of a new drug. This risk is counter-balanced, however, by the speed with which the scale can be completed--shortening s i g n i f i c a n t l y the overall time needed to complete a c l i n i c a l t r i a l . Best known among the many scales of this type are the psychiatrist-rated Hamilton Depression Scale (HAMD) (Hamilton, 1967) and the Brief Psychiatric Rating Scale (BPRS) (Overall and Gorham, 1962). Both of these scales were developed during the early years of the psychopharmacological era and have attained the position of "standards" for subsequent scale development; i . e . , new scales are almost always compared with these two "warhorses." While the HAMDis specifically targeted for affective illnesses, the BPRS is applicable to a more diverse diagnostic population. Both scales define their items in rather specific fashion which may often appear arbitrary to the rater. In the BPRS, for example, the symptom "Anxiety" refers to the psychic component while "Tension" refers to the somatic component of the more general two-component concept of anxiety. Should the rater have problems in accepting the stated contexts or f a i l to adhere to the scoring directions, the ratings themselves may become suspect. The HAMDhas, unfortunately, been plagued by another problem. Through no f a u l t of the author, the original scale has been modified, expanded and often redefined so often that i t is necessary at each of i t s appearances in research to ascertain which version has beem employed. Although i t is the very nature of scales of this type to be restricted in breadth, investigators are nevertheless prone to "add an item or two" to cover some unique feature they feel v i t a l for their study. Doing so, however, without clearly identifying the modification, defeats the very purpose of standardization. While the endurance of a scale such as the HAMDreflects i t s deserved merit, advancements in the f i e l d tend to "date" any scale. Current interests in the bipolar/unipolar dimension, cognitive aspects, t r a i t / s t a t e differences etc. among affective illnesses cannot be f u l l y addressed with the HAMDas o r i g i n a l l y formatted. Similarly, the BPRS or the Nurses' Observation Scale for Inpatient Evaluation (NOSIE) (Honigfeld and Klett, 1966) are not e n t i r e l y satisfactory for assessing geriatric patients. Nevertheless, these scales and others o r i g i nally developed for adult patients have been taken over for assessment in the geriatric population without regard for their a p p l i c a b i l i t y . Investigators should be aware that standard scales may not be appropriate instruments when new or unusual syndromes or treatments are to be studied. New scale development is, and always has been, an energetic f i e l d , however. Fortunately, valid scales are being introduced continually to meet the new needs of research. Although the emphasis has been upon psychiatrist-rated scales, traditional measures of behavior and performance are also valuable in the evaluation of efficacy since they are quant i f i a b l e and less prone to judgmental bias. They are less commonly employed because they frequently require skilled administrators and a degree of cooperation not always found in acutely disturbed patients. Ratings by other professionals; e.g., nurses, social workers extend the sometimes narrow psychopathological content of psychiatrist-rated scales. Patient-related scales are very frequently used but with mixed results. They are often hard to administer at pretreatment and are highly influenced by the mood of the moment. They can, however, provide cues for the treatment and research teams concerning those effects and/or changes which are important to the patient. Safety Data An integral part of the assessment of safety consists of c l i n i c a l laboratory tests, EKG, EEG and other standard medical tests. Assessment procedures and documentation for these tests have been well standardized, although interpretation of minor deviations can vary greatly particularly when automated readout systems for laboratory tests are used. Since such systems flag any value outside predetermined l i m i t s , there is a need for an additional c l i n i c a l judgment of the "abnormality" in order to reduce the number of irrelevant citations.
Patlent assessment
605
The rating of adverse reactions has remained on a psychometrically primitive level. At this w r i t i n g , there are no standardized adverse reaction scales of the stature of the HAMD or BPRS. There is l i t t l e agreement regarding the manner in which adverse reactions should be assessed and, for that matter, no consensus as to what constitutes an adverse reaction. Two general methods of collecting data have dominated assessment in this area. The openended approach concentrates only on those symptoms which are observed or which are reported spontaneously by the patient. Suggestibility is kept to a minimum by this technique. The second approach is the checklist approach in which the patient is asked directly about a set of specific symptoms. Suggestion, obviously, plays a greater role here, but the probability of missing a symptom which is present is much less with this approach. Determining whether an adverse reaction to the prescribed drug is a much more complicated problem. Somatic complaints such as headache, constipation, assorted aches and pains, etc. are present at any given time in a significant proportion of the general population--making d i f f e r e n t i a t i o n very d i f f i c u l t . Further, many of the more frequently reported "side effects" can be both effects due to drug and somatic symptoms of various psychopathological conditions. While not completely solving the problem, a thorough i n i t i a l survey of somatic symptomatology at pretreatment can serve to i d e n t i f y those symptoms which are t r u l y treatment emergent. Among the currently available instruments are targeted scales, such as the Simpson-Angus Extrapyramidal Rating Scale (1970) and the Abnormal Involuntary Movement Scale (AIMS) (Guy, 1976) which focus on extrapyramidal symptomatology. The Simpson-Angus Scale provides welldefined scale points for rating each of i t s lO items; while the AIMS provides a r e l a t i v e l y well-structured procedure for assessment. Both scales have made an effort to be more psychometrically sophisticated and, consequently, have received wide acceptance as measures of this narrow set of symptoms. Amongthe instruments applicable to a wider range of adverse reactions are two developed by the NIMH ECDEUprogram--the Dosage Record and Treatment Emergent Symptoms Scale (DOTES) and the Treatment Emergent Symptom Scale (TESS) (Guy, 1976). Both of these scales incorporate judgments of severity, relationship of the symptom to the prescribed drug and action undertaken as a consequence of the emergence of the symptom. While the judgemental levels are defined with reasonable c l a r i t y , the symptoms themselves are not and neither are the instructions for administration. Variants of the DOTESand TESS, however, are in general use despite their shortcomings. Scales with defined symptoms have recently been introduced; e.g., the Somatic Signs of the AMDP (Guy and Ban, 1982) which presage further sophistication in this much neglected area. Nevertheless, i t must be said that the assessment of adverse reactions is the least satisfactory and the most controversial area in c l i n i c a l assessment. The area of administrative assessments which includes data on dosage, concomitant medication, intercurrent events, compliance, disposition, etc., is also a neglected area but not as a result of unsatisfactory instruments. Regarded as "bookkeeping chores," clinicians apparently find the task of completing these documents onerous and, hence, avoid or delegate the task to others--often to the detriment of t h e i r study. Quite often, these data provide the matrix which permits a clearer and more cogent exposition of the results of the efficacy and safety assessments. One of the major reasons for urging all investigators to designate one person as data coordinator responsible for the collection and completeness of data is to overcome the data deficiencies usually rampant in this area. As in the demographic area, standardized instruments per se are rare, although the data collected on administrative documents is remarkably uniform. The choice of format should be based on the simplicity of the encoding procedures, Dosage data, for example, is often replete with errors because the encoder records the date the change is written rather than the date the change commences. A format which l i s t s dosage changes as a "running account" avoids the problem. Clear recording of remedial and/or concomitant medications is v i t a l to v e r i f y adherence to the protocol and, frequently, to document d i f f e r e n t i a l usage among treatment groups. Disposition of patients after the completion of a t r i a l also can serve as an indicator of treatment effectiveness. Therefore, despite their lack of rater appeal, admini s t r a t i v e instruments must be given more emphasis during the training of raters and data coordinators.
606
W. Guy
Summary Given the number of excellent surveys of assessment instruments available, a catalogue of scales has been omitted. I t has been stressed that the choice of instrumentation should be determined by the nature and purpose of the t r i a l . The final selection should consist of scales which w i l l provide data for the verification of hypotheses and should not attempt to be an all-inclusive collection net gathering obscure or tangential facts. The importance of demographic and administrative data for the f u l l documentation of a t r i a l has been noted. Finally, the desirability of a uniform cohesive battery for use across the phases of drug development was emphasized. References ASBERG, M., MONTGOMERY, S.A., PERRIS, C., SCHALLING, D. and SEDVALL, G. (1978). A comprehensive psychopathological rating scale. Acta Psychiatry Scand. Suppl., 271: 5-27. GUY, W. (1976). ECDEUAssessment Manual for Psychopharmacology, DHEWPub. No. (ADM) 76-338, Washington, D.C. GUY, Wand BAN, T.A. (1982). The AMDPSystem: Manual for the Assessment and Documentation of Psychopathology. Springer-Verlag, Heidelberg. HAMILTON, M. (1967). Developmentof a rating scale for primary depressive illness. J. Soc. Clin. Psychol. 6: 278-296.
Brit.
HONIGFELD, G. and KLETT, C. (1965). The Nurses' Observation Scale for Inpatient Evaluation (NOSIE): A new scale for measuring improvement in schizophrenia. J. Clin. Psychol. 21: 69-71. INTERNATIONALE SKALEN FUR PSYCHIATRIECIP$ (1977). CIPS Secretariat, Berlin. OVERALL, J.E. and GORHAM, D.R. (1962). lO: 799-812.
The Brief Psychiatric Rating Scale.
Psychol. Rep.
SIMPSON, G.M. and ANGUS, J.W.S. (1970). A rating scale for extrapyramidal side effects. Acta Psychiatry Scand. Suppl. 212: ll-19. SPITZER, R.L. and ENDICOTT, J. (1977). Schedule for Affective Disorders and Schizophrenia. Biometrics Research, New York State Psychiatric Institute, New York. WING, J.K., COOPER, J.E. and SARTORIUS, N. (1974). The Measurement and Classification of Psychiatric Symptoms. CambridgeUniversity Press, Cambridge, England. Inquiries and reprint requests should be addressed to: William Guy, Ph.D. Tennesseee Neuropsychiatric Institute 1501Murfreesboro Road Nashville, TN 37217 U.S.A.