A comparison of clinical ratings with vocal acoustic measures of flat affect and alogia

A comparison of clinical ratings with vocal acoustic measures of flat affect and alogia

Journal of Psychiatric Research 36 (2002) 347–353 www.elsevier.com/locate/jpsychires A comparison of clinical ratings with vocal acoustic measures of...

119KB Sizes 3 Downloads 23 Views

Journal of Psychiatric Research 36 (2002) 347–353 www.elsevier.com/locate/jpsychires

A comparison of clinical ratings with vocal acoustic measures of flat affect and alogia Murray Alperta,*, Richard J. Shawb, Enrique R. Pougeta, Kelvin O. Limc a

Department of Psychiatry, New York University Medical Center, HN323, 550 First Avenue, New York, NY 10016, USA b Stanford University School of Medicine, Palo Alto, California, USA c University of Minnesota, Minnesota, USA Received 25 May 2001; received in revised form 19 February 2002; accepted 12 March 2002

Abstract In this report we compare clinical ratings of flat affect and alogia with objective measures of the patient’s speech prosody and productivity. Thirty schizophrenic patients were evaluated with the Scale for the Assessment of Negative Symptoms (SANS) and the St. Hans Rating Scale for extra pyramidal side effects. Their speech was recorded and analyzed acoustically for measures of prosody and productivity. Correlations between pairs of SANS items and acoustic measures (e.g. Vocal Inflection and Fundamental Frequency Variance) were weak. The SANS item and global ratings were strongly related. Ratings of bradykinesia overlapped with the SANS ratings but not with the acoustic measures. The SANS ratings appear to be derived from global impressions, with diffuse confounding of flat affect with alogia, and with bradykinesia. Acoustic analysis has the potential to provide objective measures that may help develop operational definitions of these constructs and enhance clinical assessment. # 2002 Elsevier Science Ltd. All rights reserved. Keywords: Flat affect; Alogia; Acoustic analysis; Scale for the Assessment of Negative Symptoms

1. Introduction Many authors have promoted the concept of distinct dimensions of symptoms in patients with schizophrenia, and differentiated positive from negative symptoms (Andreasen et al., 1995; Carpenter et al., 1988; Crow, 1985). Andreasen et al. (1995) have further subdivided positive symptoms into psychotic and disorganized dimensions, and suggested that these may show different patterns of exacerbation and remission during the course of the illness. Negative symptoms have been associated with structural brain abnormalities, including ventricular enlargement and decreased activity of the enzyme, monoamine oxidase B, in frontal and temporal cortices and the amygdala (Andreasen et al., 1990). Negative symptoms are considered an important prognostic indicator, and are associated with psychosocial impairment, poor performance on cognitive testing, early age of onset, and poor response to treatment * Corresponding author. Tel.: +1-212-263-5716; fax: +1-212-2637513. E-mail address: [email protected] (M. Alpert).

(Andreasen et al., 1990). Interest in negative symptoms is also strengthened by the development of new medications that have efficacy in their treatment, independent of positive symptoms. Although the importance of negative symptoms in schizophrenia is well recognized, their assessment presents a challenge. Part of the difficulty arises from the need to make a subjective rating of the absence or reduction in a normal behavior. Andreasen (1984) defines flat affect as ‘‘the characteristic impoverishment of emotional expression, reactivity, and feeling,’’ and alogia, as ‘‘the impoverished thinking and cognition that often occur in schizophrenia . . . thinking processes that seem empty, turgid or slow.’’ Added to this difficulty is the finding that negative symptoms often overlap with other frequently co-occurring processes such as depression, neuroleptic-induced bradykinesia or environmental understimulation (Flaum and Andreasen, 1995). It is clinically important to differentiate among these conditions since they reflect different etiologies and respond to different interventions. The Scale for the Assessment of Negative Symptoms (SANS) is one of the primary measures used to rate

0022-3956/02/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved. PII: S0022-3956(02)00016-X

348

M. Alpert et al. / Journal of Psychiatric Research 36 (2002) 347–353

negative symptoms (Andreasen, 1984). The SANS includes five subscales directed at Andreasen’s (1984) view of the components of the negative syndrome: Flat Affect, Alogia, Anhedonia-Asociality, Avolition-Apathy and Impaired Attention. Andreasen had designed the SANS to proceed from part-to-whole ratings: ‘‘. . .detailed observations are made in order to achieve the global rating’’ (Andreasen, 1990, p. 90), a ‘bottomup’ or part-to-whole process. The major evidence supporting current SANS assessment procedures derives from the high reliability (often in the range, r=0.8) of assessment. However, there is substantial evidence that clinicians using the SANS confound the different aspects of the negative syndrome. In one study (Alpert et al., 1995a), we altered recordings of the patient’s speech electronically to extend or decrease the duration of pauses, leaving other aspects of speech unchanged. Raters, unaware of the manipulation, judged the extended pause version to show both increased alogia and increased flat affect. Ratings of the shortened pause versions changed judgments in the opposite direction. In a recent study (Alpert et al., 2000), raters could not discriminate between negative syndrome outcomes with olanzapine and haloperidol whereas acoustic measures could. It appears that SANS clinical ratings may lack precision and sensitivity. The glossary of the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 1994) distinguishes objective signs from subjective symptoms. Signs, at least potentially, can be operationalized and objectively measured. SANS ratings of Flat Affect and Alogia, although sometimes called symptoms, are observable behaviors and apparently without subjective content. One way to more objectively assess these signs is by evaluating acoustic measures of aspects of the patient’s free-speech behaviors that relate specifically to the constructs of flat affect and alogia. 1.1. Operational constructs for flat affect and alogia In this study, we use a system of acoustic analysis called VOXCOM (Alpert et al., 1986, 1993) to provide objective correlates of SANS item ratings with the goal of operationalizing the constructs of flat affect and alogia (Table 1). Table 1 VOXCOM Acoustic Correlates of SANS item ratings Negative SANS item symptom

VOXCOM acoustic measure

Flat affect Lack of Vocal Inflection Frequency Variance (hertz) Amplitude Variance (dB) Alogia Poverty of Speech Percent Time Talking (%) Blocking Interviewer Response Latency (s) Increased Latency Subject Response Latency (s) of Response

Flat Affect: This SANS subscale contains ratings of Unchanging Facial Expression, Decreased Spontaneous Movement, Paucity of Expressive Gestures, Poor Eye Contact, Affective Nonresponsivity, and Lack of Vocal Inflection, as well as a Global Rating of Flat Affect. Vocal inflection usually refers to variation in voice fundamental frequency (F0: F zero; the vibration rate of the vocal cords). The SANS guidelines suggest including rating the variation in voice level (emphasis) in this item. The combination of inflection with emphasis may be referred to as vocal stress. A speaker directs the listener’s attention to significant words or clarifies syntax or pragmatics with voice stress. For example, a rising pitch at the end of a sentence will indicate that the statement is a question. Falling stress may indicate that the speaker is at the end of a turn and wants to yield the floor (Walker and Trimboli, 1984). Vocal stress to highlight salient words is less prominent in flat affect schizophrenia (Alpert and Anderson, 1977). Our acoustic analysis separates pitch (associated with the fundamental frequency of voice) from voice level (associated with voice amplitude) and we will examine each separately as correlates of the SANS Vocal Inflection item, Lack of Vocal Inflection. We report the variance of F0, called Frequency Variance and the variance of voice level, called Amplitude Variance across syllables. We detect the sound pulses frequently associated with vowel production, usually syllables, and measure F0 at the point of maximal loudness of each syllable. To compensate for differences in the mean F0 of different speakers, we record these measures in semitones related to the speaker’s mean F0. In this way a speaker with a mean F0 at 100 hertz and variance of 1 semitone can be directly compared with a speaker whose mean F0 is 200 hertz, with a similar range. In a bell-shaped distribution, the variance is not correlated with the mean. Similarly, we measure emphasis in decibels (dB) in reference to a calibration level. With several hundred syllables in a 15min speech sample, the variances of F0 and voice level are stable measures. Alogia: This SANS subscale is concerned with the quantity of speech. Items include ratings of Poverty of Speech (also of speech content), Blocking, and Patient Response Latency, as well as a Global Rating of Alogia. We calculate the percent of the patient’s floor time that is used for talking, called Percent Time Talking, as a correlate of the SANS Poverty of Speech item. The numerator is the sum of the time that the patient’s speech signal is above the silence threshold. The denominator includes the sum of speaking time plus the patient’s response latency as well as the sum of time spent pausing. A number of other measures of speech quantity such as syllables per sentence, or per turn, speech rate, etc. could be used. However, the percent of floor time spent talking is simple to describe and obtain and we report this as a correlate of Poverty of Speech. It

M. Alpert et al. / Journal of Psychiatric Research 36 (2002) 347–353

is highly correlated (r > 0.95) with the number of words produced by the subject. The Alogia scale includes two items that are without a clear referent. Blocking is rated to reflect an ictal-like process whereby a patient trails off in the middle of a sentence. Rating guidelines for the SANS (Andreasen, 1984) suggest that the patient is expected to indicate an interruption in their train of thought. In reviewing the text of the 30 interviews analyzed for this study, no patient made such a comment. Yet, blocking was rated as present for about a third of the interviews. Perhaps, if the patient stops in the middle of a sentence without completing their thought, the interviewer might wait some time before realizing what had happened. The interviewer would show long response latency. We examine measures of Interviewer Response Latency as a correlate of ratings of the SANS Blocking item. In addition, the referent for the item, Poverty of Content of Speech is unclear but does not involve reduced speech quantity. Finally, there is a SANS item to rate patients for Increased Latency of Response. As a correlate of this item we measure the time from the end of the interviewer’s question or comment to the beginning of the patient’s response, called Subject Response Latency. Sometimes interviewers make encouraging sounds to keep the patient talking. We separate responses that follow brief comments or ‘uhms’ and ‘ahs’, and report only response latencies that follow at least one second of interviewer speech. Latencies tend to produce asymmetrical distributions and we transform (square root) measures of time before averaging them. Hypotheses 1. We will investigate Andreasen’s (1990) hypothesis that SANS clinical ratings use a ‘bottom-up’ path from item to global ratings. We will examine: (i) the correlation between the objective acoustic measures of Flat Affect and Alogia and the related SANS item; and (ii) the correlation between SANS subscale items and SANS Global ratings of Flat Affect and Alogia. Although a zero-order correlation cannot establish a causal path, if correlations between SANS item and Global ratings are strong while those with the acoustic measures are not, the hypothesis of a ‘bottom-up’ path to the global rating is less tenable. 2. To explore the ability of raters to discriminate between Flat Affect and Alogia, we will examine the matrix of correlations between acoustic and SANS items for the Flat Affect and Alogia scales. If there is a strong correlation between the SANS item ratings of Flat Affect and Alogia but no correlation between the acoustic measures of Flat Affect and Alogia, it would suggest that

349

raters are not able to reliably differentiate these items. 3. Finally, to examine the degree to which SANS ratings of flat affect and alogia are confounded by neuroleptic motor side effects, we will examine the percent of variance (r2) that is shared by ratings of bradykinesia and the other measures.

2. Methods 2.1. Subjects Thirty clinically stable, male veteran inpatients who met DSM-III-R criteria for schizophrenia provided written consent to participate in the study. The mean age of the subjects was 42  9.0 years. By self-report, 21 of the subjects were Caucasian, six African American, and one each Hispanic, Asian, and Other. Eighty-six percent of the subjects had completed at least high school or the equivalency test. Diagnosis was reached by a consensus between clinical evaluation by a senior clinician and SCID-III-R interview (Spitzer et al., 1992) conducted by a trained research assistant. Typically, the schizophrenic disorder showed a subchronic course with a recent exacerbation requiring admission to the hospital a month or so prior to participation in this study. Mean age at first hospitalization was 25.7  5.85 years. The subjects averaged 10  10.3 prior admissions. All subjects were right handed, as assessed by the Handedness Quantification Test (Crovitz and Zener, 1962), except for one who was ambidextrous. Eight of the patients did not show a satisfactory clinical response to their assigned medication and they were switched to a different neuroleptic after the recording and rating. Twenty-two subjects were considered nearly ready for discharge. This indicates some heterogeneity in the condition of the subjects at the time of the recording. We report results for 30 interviews with their concurrent clinical ratings. Subjects were medicated with either risperidone (N=7), clozapine (N=5) or a conventional antipsychotic agent (N=18) combined with an anticholinergic drug. Medication was determined by a non-blind ward psychiatrist and was not influenced by this study. There were no significant differences in the severity of bradykinesia ratings, as assessed by the St. Hans Rating Scale (Gerlach et al., 1993), associated with the different agents. 2.2. Procedures Subjects participated in a 15–20 min recorded interviewer. Interviews were conducted in ward offices that lacked sound isolation. The subject and the interviewer wore a head-mounted microphone to reduce background noise and improve separation. Each microphone

350

M. Alpert et al. / Journal of Psychiatric Research 36 (2002) 347–353

was recorded on a separate track of a stereo digital audio tape-recorder. Following a semi-structured interview (about 5 min) the patient was asked to describe a happy and a sad experience from their life, with each narrative lasting about 5 min. They were instructed to recall the situation, to remember who was present, to reconstruct what had led to the situation, and to describe what happened later. The order of task valence was counter-balanced across subjects. The interviewer asked open-ended questions if the patient’s description was too brief or if the patient strayed from the topic. Subjects also completed a clinical interview during the same session for the purposes of providing a rating on the SANS, and an examination for the St. Hans Rating Scale, which assesses neuroleptic motor side effects. Two trained raters who were blind to the purpose of the study independently performed these ratings. The raters both attended the same interview and the scores are an average of their ratings. 2.3. St. Hans Rating Scale The St. Hans Rating Scale (Gerlach et al., 1993) is a reliable, sensitive, validated multidimensional scale used to rate neuroleptic-induced symptoms of hyperkinesia, parkinsonism, akathisia and dystonia. The Parkinsonism subscale consists of eight items, including bradykinesia, as well as a global rating. Each item is rated on a scale from 0 (not present) to 6 (present to an extreme degree). Inter rater reliability for this scale is high (0.82– 0.97). The Parkinsonism scale has high construct validity, as reflected in the homogeneity coefficient of Cronbach (0.82). 2.4. VOXCOM acoustic analysis The tapes were analyzed using VOXCOM (Alpert et al., 1986, 1993) by assistants who were blind to the clinical ratings. VOXCOM is a computer driven program, which separates the voice signal into two channels, amplitude (loudness) and frequency (pitch). The program samples the amplitude signal, and through software logic locates peaks in the speech stream, which represent the acoustic equivalent of syllables, noting their duration, maximal amplitude, and the fundamental frequency at the amplitude peak. Utterances consist of one or more peaks uninterrupted by pauses greater than 0.2 s. The program also measures the duration of all utterances and pauses. These durations are square-root transformed for averaging and back transformed so they can be reported in seconds. Inflection is measured in semitones, and emphasis in decibels referred to a calibration signal that has been recorded on the tape. Alpert et al. (1989) have applied VOXCOM to examine differences between schizophrenic and parkinsonian patients. In addition, it has proved to be a

reliable and objective measure of the negative syndrome in schizophrenia (Alpert et al., 2000). VOXCOM has been used to provide vocal markers for ratings of flat affect and alogia (Alpert et al., 1995a,b), and to examine characteristics of depressed speech (Alpert et al., 2001).

3. Results Table 2 presents descriptive statistics for four acoustic measures, the St Hans Rating Scale Global Rating of Bradykinesia, and six SANS measures. To reduce confusion we underline the acoustic measure, italicize the SANS item rating and bold the SANS Global Rating. McNemar (1963) recommends that skewness be considered to depart from symmetry if it exceeds 2.58 times its standard error. With N=30, the standard error is 0.45, which yields limits of  1.15. Acoustic response latency is outside this range. This measure had an outlier. One subject’s score was three sigma from the mean. We listened to this recording again, and found that the patient was quite deliberate in his responses and markedly slower than the other subjects. Since response latency already is incorporated into the Percent Time Talking measure as part of the subject’s floor time, it will not be examined in the correlation matrix. To explore the SANS rating process we examine the Pearson correlation between the acoustic and SANS items. Among the SANS items, Poverty of Speech, and Blocking suggest some asymmetry, because of many zero scores. We could find no acoustic referent for ratings of Blocking, neither in patient pausing or interviewer pausing, and we do not report this item in the results. In regard Poverty of Speech, we repeated the Pearson correlations with Spearman correlations, but found none where the latter departed from the former. The standard error for Kurtosis is twice that of SkewTable 2 Descriptive statistics for SANS and acoustics (N=30) Item

Mean

SD

Skewness

Kurtosis

(A) Acoustic measures Frequency Variance (semitones) Amplitude Variance (dB) Percent Time Talking Subject Response Latency (s)

0.5 18.9 63.9 1.1

0.28 5.34 9.46 0.56

0.89 0.19 0.09 2.36

1.02 0.69 0.78 1.59

(B) St. Hans Bradykinesia

0.6

0.11

0.52

0.91

(C) SANS Items Lack of Vocal Inflection Global Rating of Flat Affect Poverty of Speech Blocking Increased Latency of Response Global Rating of Alogia

1.1 1.5 0.6 0.4 0.5 1.1

0.86 0.86 0.76 0.86 0.54 0.80

1.05 0.26 0.71 10.04 0.44 1.70

1.06 0.52 1.22 2.94 0.87 1.03

351

M. Alpert et al. / Journal of Psychiatric Research 36 (2002) 347–353

ness, and with only 30 subjects, Kurtosis can be evaluated with less confidence. However, Kurtosis is of less a concern, and it appears that attention to Skewness also handles concerns with Kurtosis. Table 3 presents the Pearson correlations between each acoustic measure, the isomorphic SANS item and the SANS Global Rating. With 28 degrees of freedom, an r 50.36 is required for significance at alpha 40.05 for a two-tailed test. The acoustic measures show few interrelations, with the exception of the Percent of Talk Time. This measure correlates with the Subject Response Latency. Across domains, Variance of Frequency is modestly correlated with Global Rating off Flat Affect. The Percent Time Talking correlates with the Poverty of Speech item and, also, with Lack of Vocal Inflection and Global Rating of Flat Affect. The bradykinesia ratings are unrelated to the acoustic measures but correlate with all of the SANS ratings. The SANS measures are all interrelated, equally strongly across the Flat Affect and Alogia subscales as well as within their own subscale. Hypothesis 1: the Path to SANS ratings Do raters arrive at SANS Global ratings based on the SANS item ratings, as intended by Andreasen (a ‘bottom-up’ approach)? Or, are SANS item ratings derived from global impressions (a ‘top-down’ approach)? We compared the average correlation between the four acoustic and SANS item pairs with the average correlation between the SANS item and Global Rating pairs. Distances between points on a scale of correlations do not satisfy interval assumptions, required for averaging; a correlation of 0.75 is much further from 0.70 than is a correlation of 0.35 from a correlation of 0.30. To adjust correlations to permit averaging, we transformed each r to Fisher’s Z, averaged the Zs, and back transformed. We did not include the SANS Blocking item because it has a skewed distribution and is unrelated to any of the

patient’s acoustic measures. The average correlation between the acoustic measure and the SANS item is r=0.28; for the SANS item with its Global measure the average correlation is r=0.82. Item ratings are much closer to, and appear to derive from, Global impressions. What, then, is the basis for the global impression? For the Alogia rating there are a number of measures that reflect aspects of speech productivity. We examined the relations of some of these measures, entered as a block, to the Global Rating of Alogia in a hierarchical Multiple Regression Analysis (MRA). With only 30 subjects, we restricted the number of predictors to two or three. Several of the equations trended to explain (R2) about 20% of the Global Rating of Alogia. It appears that the Global impression is diffusely formed, extending beyond speech quantity, to other patient behaviors. The impression is not arbitrary since substantial interrater reliabilities can be demonstrated. This reliability appears to be at the expense of rather than in the service of rating validity. Hypothesis 2. The differentiation of Flat Affect from Alogia The correlation between Global Rating of Flat Affect and Global Rating of Alogia (r=0.79) is large. The Poverty of Speech item crosses scales to correlate with Global Rating of Flat Affect (r=0.72). Similarly, the affect item, Lack of Vocal Inflection correlates with Global Rating of Alogia (r=0.76). The items are promiscuous. While Poverty of Speech and Lack of Vocal Inflection are strongly correlated (r=0.70), the acoustic measures, Frequency Variance and Percent Time Talking, are not [r=0.25]. It appears that ratings on the different SANS scales are quite confounded. This suggests that SANS ratings derive from a general negative dimension, a super-top ‘top-down’ impression. A number of authors have examined the structure of schizophrenic

Table 3 Intercorrelations between acoustic measures, Gerlach Bradykinesia Ratings and SANS Flat Affect Ratings (N=30) Item Acoustic measures 1. Frequency Variance 2. Amplitude Variance 3. Percent Time Talking 4. Subject Response Latency 5. St. Hans Bradykinesia SANS 6. Lack of Vocal Inflection 7. Global Rating of Flat Affect 8. Poverty of Speech 9. Blocking 10. Increased Latency of Response 11. Global Rating of Alogia * P <0.05. ** P <0.01.

2 0.18

3

4 0.25 0.32

5 0.12 0.15 0.52**

6 0.06 0.06 0.12 0.08

7 0.34 0.01 0.47** 0.06 0.44*

8

9

10

11

0.37* 0.01 0.46** 0.01 0.64**

0.32 0.12 0.49** 0.10 0.43*

0.18 0.05 0.00 0.19 0.36*

0.29 0.16 0.24 0.14 0.54**

0.33 0.12 0.16 0.23 0.53**

0.88**

0.70** 0.72**

0.54** 0.55** 0.18

0.71** 0.74** 0.50** 0.61**

0.76** 0.79** 0.44* 0.74** 0.66**

352

M. Alpert et al. / Journal of Psychiatric Research 36 (2002) 347–353

pathology through factor analysis of such ratings. These approaches may be more revealing of processes in the rater than the patient. Hypothesis 3. The confounding of SANS Ratings by Extrapyramidal Side Effects The assessment of flat affect is challenged by the presence of neuroleptic-induced bradykinesia. Both domains are concerned with similar aspects of reduced expressiveness. We examined the contribution of ratings of bradykinesia, done with the St. Hans Rating Scale, to Global Rating of Flat Affect and Global Rating of Alogia. The squared zero-order correlation (as a measure of shared variance) between bradykinesia and Global Rating of Flat Affect is 0.41 (r=0.64), and between bradykinesia and Global Rating of Alogia is 0.28 (r=0.53). Between approximately 30 and 40% of the variance in the Global impressions overlaps with impressions of extra pyramidal processes. This does not imply a oneway path from bradykinesia to negative signs. The overlap reflects shared variance, not cause. The clinician is unable to identify the source of the patient’s reduced expressiveness. By contrast, the acoustic measures show null correlations with bradykinesia ratings.

4. Discussion In this paper, we explore some of the difficulties related to the rating of the negative syndrome. We contrast the use of the SANS with an acoustic method that offers the possibility for a more objective assessment of these clinical phenomena. We examined the process for completing SANS ratings within a psychophysical paradigm, the physical being the stimulus presented to the interviewer by the patient’s speech paralinguistics, the psychological being the rater’s clinical impressions. The model assumes that flat affect and alogia are measurable signs and not symptoms that require attention to the content of the patient’s report. The interview, a primary psychiatric assessment tool, provides the opportunity for the patient to report subjective distress while the clinician observes the patient’s speech, demeanor and dyadic interactions. In addition, the model assumes that paralinguistic aspects of talking provide a sufficient basis for the ratings. To enhance the reliability of the SANS ratings, we base the rating on the consensus of two calibrated raters. However, ‘‘Reliability is a necessary but not sufficient condition for validity’’ (Nunnally, 1978, p. 192) and our study suggests that SANS ratings, while reliable, are not measuring what they claim. For example, we found that the relation between the SANS rating for Increased Latency of Response and the acoustic measure of the average time required for the patient to respond to the interviewer produced a correlation that was virtually zero. Yet, the acoustic measure

has face validity, the switching pause is the response latency. It appears that the SANS is completed on a ‘topdown’ basis. With the use of objective measures it is possible to demonstrate that there is an indirect path from patient behavior to global impression. This impression, then, leads to the item rating. Andreasen (1990), in psychometric studies of the SANS, reported results consistent with this view. Her report noted a Cronbach’s alpha of 0.86 for the five global ratings. The five scales behaved like members of a single scale. Such high coherence suggests that SANS ratings derive from an undifferentiated global impression, which would constitute a ‘top-down’ rating. A number of factors may contribute to this ‘topdown’ rating process. The SANS requires discriminations which exceed the abilities of clinicians. For example, the mean response latency of our subjects was about 1 s, with a standard deviation of about 0.5 s. To rate the SANS item, Increased Latency of Response, the rater might be expected to assign a zero (normal) to latencies at or below 1 s. Latencies around 2 s, or more, would be assigned a five, the highest SANS score. Between the zero rating and the five rating would be increments each 0.2 s; a score of one for latencies of 1.2 s, etc. We find that our research assistants cannot reliably detect a 0.2 s pause simply while listening to a recorded interview. They need to review the sample repeatedly while examining an oscillographic display of the voice. A typescript of the interview improves performance. The demands of this rating exceed the sensory capacity of the rater. Newman and Mather (1938) reported an early systematic attempt to operationalize important clinical signs of depression. They found that an experienced speech pathologist required many repetitions of the recordings of the patient’s speech before the type and severity of vocal changes could be established with confidence. Yet, clinicians are expected to capture subtle differences ‘on line’ and while distracted by other tasks and ratings. A similar analysis could be applied to the item, Reduced Vocal Inflection. The rater is expected to judge the variance of F0 across syllables, and to discriminate fractions of a semitone. The subtleties of the required discriminations exceed the abilities of most raters, especially if we consider that the rater is expected to process the several speech parameters simultaneously. The ratings were obtained within the assessment core of a clinical research center, and reflect state of the art procedures. Raters are trained to reliability criteria, and are checked periodically to prevent drift. Another factor that might contribute to a ‘top-down’ process is that a number of the items, such as Blocking or Poverty of Content of Speech, or the distinction between Spontaneous Movements and Expressive Gestures are not well operationalized. To rate blocking, the

M. Alpert et al. / Journal of Psychiatric Research 36 (2002) 347–353

patient should stop in the middle of a sentence and report that that they had forgotten what they had intended to say. We examined the typescripts of the interviews, and this never happened. Yet, about a third of the patients were rated as showing this sign. Blocking was rated when increased Global Alogia was rated. The impression of blocking appears to result from global impressions of alogia. The lack of operational definitions invites a ‘top-down’ approach. It appears that the high rater reliability is at the expense of rater validity. There has not been a great deal of enthusiasm for investing in the infrastructure of clinical laboratory procedures in psychiatry; rather there is a preference for the more exciting, cutting-edge technology of imaging and genetics. However, progress in these exciting areas is limited by the imprecision of the assessment infrastructure. High tech findings based on heterogeneous samples are unlikely to survive replication attempts. We have focused, narrowly, on a specific sign of disturbance of schizophrenia. Such an approach has the risk of learning more and more about less and less. We have noted that the acoustic inflection measure might serve as surrogate for the range of expressive gestures that are important in affects (Alpert et al., 2000). Emphasis does not appear to share this role. Inflection, rather than emphasis, is responsive to atypical neuroleptics. Recently, negative symptoms have been added to DSM as an area contributing to diagnosis, heightening the importance of valid assessment. Our findings that the acoustic measures can separate outcomes for typical and atypical neuroleptics, while clinical ratings could not (Alpert et al., 1996, 2000), provides converging evidence of the usefulness of these methods. There are other areas in DSM that might profit from a focus on validity measures.

Acknowledgements This study was supported by the US Department of Veteran Affairs, MH 30854, and the US Public Health Service. An earlier version of this work was presented at the 150th Annual Meeting of the American Psychiatric Association in San Diego, California. We are grateful to Melissa Dong who made valuable technical contributions.

References Alpert M, Anderson LT. Imagery mediation of vocal emphasis in flat affect. Archives of General Psychiatry 1977;34:208–12. Alpert M, Merewether F, Homel P, et al. Voxcom: a system for analyzing natural speech in real time. Behavior Research Methods, Instruments, and Computers 1986;18:267–72.

353

Alpert M, Rosen A, Welkowitz J, et al. Vocal acoustic correlates of flat affect in schizophrenia: similarity to parkinson’s disease and right hemisphere disease and contrast with depression. British Journal of Psychiatry 1989;154:51–6. Alpert M, Pouget ER, Welkowitz J, Cohen J. Mapping schizophrenic negative symptoms onto measures of the patient’s speech: set correlational analysis. Psychiatry Research 1993;48:181–90. Alpert M, Pouget ER, Silva R. Cues to the assessment of affects and moods: speech fluency and pausing. Psychopharmacology Bulletin 1995a;31:421–2. Alpert M, Pouget ER, Sison C, et al. Clinical and acoustic measures of the negative syndrome. Psychopharmacology Bulletin 1995b;31: 321–424. Alpert M, Allan E, Sison C, et al. A comparison of clozapine or olanzapine Vs haloperidol in treatment resistant schizophrenic patients: acoustic measures of change. Presented at the annual meeting. Boca Raton, Florida: NCDEU; 1996. Alpert M, Pouget, ER, Smith, RC, et al. 2000. Acoustics measures of the patient’s free speech can identify differences in the actions of olanzapine or haloperidol. Presented at the annual meeting, NCDEU, Boca Raton, Florida. [Also, submitted for publication]. Alpert M, Pouget, ER, Silva, R 2001. Improvement in depression is reflected in measures of the patient’s speech flow. Journal of Affective Disorders 2001;66:59–69. American Psychiatric Association. Diagnostic and statistical manual of mental disorders, 4th ed. Washington, DC: American Psychiatric Association; 1994. Andreasen NC. The scale for the assessment of negative symptoms (SANS). Iowa City: University of Iowa Press; 1984. Andreasen NC. Methods for assessing positive and negative symptoms. In: Andreasen NC, editor. Schizophrenia: positive and negative symptoms and syndromes. Modern problems in pharmacopsychiatry, vol. 24. Basel: Karger; 1990. p. 73–88. Andreasen NC, Flaum M, Swayze II VW, et al. Positive and negative symptoms in schizophrenia. A critical reappraisal. Archives of General Psychiatry 1990;47:615–21. Andreasen NC, Arndt S, Miller D, et al. Correlational studies of the scale for the assessment of negative symptoms and the scale for the assessment of positive symptoms: an overview and update. Psychopathology 1995;28:7–17. Carpenter Jr WT, Heinrichs DW, Wagman AM. Deficit and nondeficit forms of schizophrenia: the concept. American Journal of Psychiatry 1988;145:578–83. Crovitz HF, Zener K. A group test for assessing hand- and eye-dominance. American Journal of Psychology 1962;75:271–6. Crow TJ. The two-syndrome concept: origins and current status. Schizophrenia Bulletin 1985;11:471–86. Flaum M, Andreasen NC. The reliability of distinguishing primary versus secondary negative symptoms. Comprehensive Psychiatry 1995;36:421–7. Gerlach J, Koorsgard S, Clemmesen P, et al. The St. Hans rating scale for extrapyramidal syndromes: reliability and validity. Acta Psychiatrica Scandinavica 1993;87:244–52. McNemar Q. Psychological statistics, 3rd ed. NY: John Wiley & Sons; 1963. Newman S, Mather VG. Analysis of spoken language of patients with affective disorders. American Journal of Psychiatry 1938;94:913–42. Nunnally JC. Psychometric theory, 2nd ed. NY: McGraw Hill; 1978. Spitzer RL, Williams JB, Gibbon M, et al. The structured clinical interview for DSM-III-R (SCID). I: history, rationale, and description. Archives of General Psychiatry 1992;49:624–9. Walker MB, Trimboli C. The role of nonverbal signals in coordinating speaking turns. Journal of Language and Social Psychology 1984;3: 257–327.