3OURNAL OF VERBALLEARNING AND VERBAL BEHAVIOR4, 118-128
(1965)
Verbal Responses and Contextual Constraints in LanguageI ANNE M. TREISMAN Medical Research Council Psycholinguistics Research Unit, Institute o/ Experimental Psychology, Oxlord, England
Estimates of the probability and information content of words in a set of statistical approximations to English, 2 passages of normal prose and a passage of "syntactical English" were obtained from the guesses of 100 Ss at each of 20 missing words for each passagel The information in the missing word and the entropy in the distribution of guesses were shown to be linearly related to the degree of contextual constraint weighted for distance by the formula ~
(i-l), where n is the order of approxima-
i~1
tion. The entropy in the distributions of .parts of speech and the entropy of conceptual units ("synonym clusters") were also calculated. It was found that while the choice of
grammatical category makes a relatively independent contribution to the change in redundancy with different passages, the change in entropy of "meaning" parallels the change in entropy of the particular words. This paper reports an experiment on the emission of verbal responses to missing words in prose passages, whose sequential structure is systematically varied. The investigation had two main aims: (1) to obtain an estimate of the probability and information content of w o r d s in a set of statistical approximations to English, which could provide quantitative measures for Other experiments using these passages in reading, repeating and translating tasks (Treisman, 1961); (2) to throw some light on the processes underlying verbal guessing behavior. The distributions of parts of speech and synonym clusters in the responses were also analyzed to help determine the functional sources of t h e sequential dependencies revealed by the guesses. One possibility is that the order of approximation to English affects the contextual constraints only at one level, so that the difference in entropy be1 I wish to thank professor R. C. 01dfield who supervised and provided the facilities for this research and the Medical Research Council who supported it financially.
tween words and parts of speech would be constant for all passages. For example, once .the grammatical category was determined, the number of possible words might be a constant multiple for all passages. At the other extreme, it might be that grammatical or conceptual constraints make quite independent contributions to redundancy for passages with differing contextual coherence. Salzinger, Portnoy, and Feldman (1962) have recently published a report of a similar experiment discussing some of the same problems. I t will be of interest to compare their results with the present ones (Treisman, 1961). There are two common ways of estimating the information content in-Ss' guesses: one is that devised by Shannon (1951), using the ranks at which the correct guesses occur as a coded form of the original message and basing the estimate on their frequencies of occurrence; the other, based on Taylor's "cloze procedure" (1953), is to use a number of Ss and take the information measure directly from the frequencies in their distri118
C O N T E x T u A L CONSTRAINTS b u t i o n s of guesses. H e r e t h e r a t i o n a l e is t h a t Ss choosing w o r d s to replace the b l a n k s cons t i t u t e a n o r m a l sample of l a n g u a g e b e h a v i o r , so t h a t their guesses are d e t e r m i n e d b y all the usual constraints of g r a m m a r , sense a n d f a m i l i a r usage. S h a n n o n ' s m e t h o d has the d r a w b a c k t h a t it b e c o m e s i m p r a c t i c a b l e w h e n one uses w o r d s r a t h e r t h a n letters. Y e t words m a y be p r e f e r a b l e to letters in t h a t t h e y are p r o b a b l y m o r e n a t u r a l units for Ss to h a n d l e in speech. S h a n n o n ' s m e a s u r e m a y also be biased to some e x t e n t b y t a k i n g consecutive guesses f r o m one S, since these are unlikely to be i n d e p e n d e n t of one another. O n e difficulty w i t h the cloze p r o c e d u r e is t h a t h i g h p r o b a b i l i t i e s m a y b e exaggerated, since speakers do not always choose the m o s t p r o b a b l e words, while Ss trying to guess the correct w o r d in one t r y ma3r do so. T h i s bias can be c o u n t e r a c t e d to some e x t e n t b y ins t r u c t i o n s to give a n y w o r d w h i c h c o m e s to m i n d a n d w h i c h m a k e s a c o h e r e n t sequence w i t h t h e context, r a t h e r t h a n always the m o s t p r o b a b l e word. T h e cloze p r o c e d u r e was the m a i n m e t h o d used in the p r e s e n t investigation, b u t an a t t e m p t was also m a d e to s t u d y single Ss' c o n s e c u t i v e guesses a t missing words. N e i t h e r of these is l i k e l y to r e p r e s e n t exactly the "average" S's response h i e r a r c h y a r o u s e d b y a given context, b u t it m a y be of i n t e r e s t to see h o w t h e y compare. METHOD
Stimulus Materials A set of statistical approximations to English constructed by Moray and the writer (Taylor and Moray, 1960) were used. These were each 100 words long and included a 1st, 2nd, 4th, 6th, 8th and 16th order approximation. Two passages of normal prose were also used, one selected from a simple, children's story about camping which appeared to be very predictable, and one from Conrad's novel Lord Jim. A passage of "Syntactical English" was also constructed by selecting words from Lord Jim at random, except for the constraint that each word must be the same part of speech as the word in the same position in a sample 100-word extract chosen to provide a grammatical skdeton. An example of the sequence composed in this way reads as follows:
119
"Up that scene the way had forgotten, maddening lumpily down a beard. He is perfunctorily soft with them to scatter you if he was called and held to process . . . . "
Guessing Procedure A sample of 20 words was removed from each 100-word passage. These were divided into two groups of 10 words each, which were guessed by separate groups of Ss, so that each S could use a consecutive context of up to 9 words on either side of the one he guessed. One set of missing words consisted of the 5th and every succeeding 10th word, and the other of the 7th and succeeding 10th words of the passages. These were replaced by blanks in 2 sets of typescripts, and Ss were asked t o m a k e a guess at any word which would make a good approximation to coherent prose. They wrote their guesses into the typescripts at their own rate. Two hundred Ss were used, 100 for each sample of omitted words; nearly all were students, research workers or lecturers at Oxford University. Seven of these Ss produced a :second set of guesses at the ~ m e missing words several months later. Two Ss were. asked to make a large number of guesses at a sample of missing words (2 each from the Conrad, 16th, 8th, 6th and 2nd order approximations) over a time interval of 3 minutes for each blank.
Calculation o] In]ormation Content and Method o/ Analyzing Responses (I) Words. The first step was to use the guesses to obtain an estimate of the probabilities and information content of the original missing words for the Ss. The probability of the correct word was taken from i t s frequency in the 100 guesses. The information in each such word was taken as --l°g2 Pe and the mean information in the 20 missing words was calculated for each passage. This will be referred to as word information. Secondly, since the competing responses in the total ensemble of Ss' guesses might also affect Ss' performance itl other tasks, the average information in- all the guesses made was calculated, in other words the entropy of the distribution of responses for each blank. This was taken as - - ~ (p logzp), and will be referred to as the distribution entropy. The probabilities and word-information values for the 1st order approximation were not directly comparable with the others. Since no S guessed any correct word in this passage, the values were obtained by calculating their probability of occurrence from the mean frequency given in the Thorndike and Lorge word count (1944). To avoid this difficulty in calculating probability and word informa-
120
TREISMAN
tion for the other passages, the correct word was to particular grammatical categories), while the always assigned a minimum probability of 0.01, entropy in the parts of speech should be as low which allowed for the one guess made in the original as that in the normal prose passage from the Conprocedure of constructing the passages, by the S rad novel on which its grammar was modelled. who contributed the word. (In all these cases only 99 other guesses were used.) The cut-off maximum RESULTS for word information was therefore 6.64 bits. This In/ormation in Choice o/Words. The probwas also necessarily the maximum distribution entropy for all passages, since only I00 guesses were ability of guessing the correct word, the word made. information, and the distribution entropy are (2) Conceptual Constraints. A tentative analysis of given in Table 1. For th~ correct choices at the conceptual constraints determining the guesses was made by counting the number of words of least, the probabilities appear to be fairly "similar meaning." There was no attempt to define stable a n d reliable, judged from a split-half the "similarity" exactly, but a subjective criterion comparison. The differences in probability for was kept as constant as possible in judging the dif- the first and second 50 guesses ranged beferent passages.. Words classed as similar were tween 0.005 and 0 . 0 2 7 for the different grouped and called one "synonym cluster," and the distribution entropy of these synonym clusters was passages. The distribution entropy might be calculated in the same way as that of the individual expected to change more, since the maximum words. with 100 guesses is 6.64 bits, while with 200 (3) Grammatical Constraints. The degree of gram- guesses it rises to 7,64, a n d so on for increasmatical constraint shown by the guesses was also ing numbers of guesses. A second set of 100 examined to throw fight on its contribution to the entropy at word level. The distribution entropy was guesses was obtained for the 1st order pasagain calculated, this time with parts of speech as sage (which would be expected to show least the basle units. The classes used in the somewhat agreement and therefore maximum change). arbitrary categorization of the guesses were nouns, The average entropy for the total 200 guesses verbs, pronouns, adjectives, adverbs, prepositions, increased from 5.37 at 100 guesses to 5.93 conjunctions and articles. If all had been equiprobable, the 8 categories would give a maximum of 3 bits. Figure 1 shows how the estimated disbits; but with the unequal distribution of 1st order tribution entropy changes up to 125 guesses. frequencies in normal English (French, Carter and I t appears to level off as the guesses increase Koenig, 1930) this reduces to about 2.7 bits, as and to be close to the asymptote for the more compared with ~he 9.2 bits calculated for the 1st coherent passages, b u t clearly the values for order words. The frequency with ~vhich different parts of speech the low orders given in Table 1 are u n d e r were interchanged as the contextual constraints de- estimates. creased was examined by drawing up 2 matrices I t is clear, as Salzinger also found, that a relating the frequencies of each guessed part of linear scale should not be used in plotting speech to each correct part of speech for (a). the two prose passages and the 16th order, and (b) the the results for the different orders of approximation. I t seems more plausible that the ad6th, 4th and 2nd order passages. The other approach adopted to elucidate the con- dition of a word to the contextual string reletribution of grammar and syntax to contextual con- v a n t to a blank will on the average increase straints was the use of a passage of "Syntactical the constraint in inverse proportion to its English" with constraints at this level only. The distance from the blank. Specifically, it was word information and distribution entropy for words, hypothesized that the constraint, measured b y synonym clusters and parts of speech were calculated for the guesses ~with this passage in the same the word information and distribution entropy way as for the others. If the two types of con- of the guesses, would be a linear function of straint--grammatical and meaningful--operated quite independently, the entropy in the guesses at par- t h e q u a n t i t y ( i - 1 ) , where n is t h e order i~l ficular words and particular concepts should approximate that for the 1st order passage (although of approximation of a passage. Figure 2 shows remaining a little lower because of the restriction that the mean results fit this function very
CONTEXTUAL CONSTRAINTS
O
12
O O
~ v
~-
0
N M M ~
~
0
0
o
o
O
~ vO
~
n
~
4.o
~
od
O
m o o .
.
N N 0 N N 0
N N gl
'~ ~
do
O
O
g 121 ~
~
dd
0
0
N N bl
N 0
"~
c~
0
O
0
~
g~
~o
O
m
o
~.t
o
o
O
o
O
u
O
12 2
TREISMAN
Ut .D ¢-
4
1St" o r d e ~ . . ~ . ~ O
~
orde~
8 th' orde~
>-i
Conrad.
C LLI
2'o Number of guesses.
Fzo. 1. Changes in entropy with increase in number of guesses. closely, despite the wide scatter of individual words for each passage (shown by the size of the standard errors). This wide scatter is not surprising when the great variation in ensemble size for differents parts of speech is considered. The 1st order word information was not included in calculating the regression line, since it was estimated in a different way, and without the cut-off maximum'of 6.64 bits.
Constraints o] Grammar and Meaning. Figure 2 also shows the regressions for distribution entropy in the synonym clusters and in the parts of speech. The main point of interest here was to see how far the choice of unit at one level determined the choice at another level for passages with differing contextual constraints. If the slope of the regression were the same for words and for synonym clusters or for words and for parts of speech, the implication would be that tt~e
order of approximation affected the choice at one level only and that the unit at the other level was a fixed fraction or multiple of it for all passages. If the slope was different, on the other hand, the choice of unit at each level would be contributing independently to the change in information betwen different orders of approximation. I t appears from Fig. 2 that the regression for synonym clusters is almost parallel to that for particular words. In fact the average number of words per synonym cluster remained almost constant at 1.5 for all passages. This figure of 1.5 may simply be, the mean number of words of similar meaning currently used in English, with the particular criterion of similaritY applied here. On the other hand, the regression for parts of speech is considerably less steep. To elucidate the point, two further regression lines were cal-
CONTEXTUAL CONSTRAINTS
Orders of approximation.
123
Prose and Syntactical English.
0
8
r~
0
Word informatiorl
• .... •
Distribution Entropy for words.
A. . . . .
D.E. for synonym clusters.
• .... •
D.E. for parts of speech.
6
t-
.o 5
E /
r• ...
x.
~
.---
A
N~,aF u.
"~
/
,/ /
~ ...~,,IG'lj
/
/A
... ".
/"
"'Zz--s~ " - . .
/"/ /~P. / 2
/
/
/
",¢t.o "~. "o,a~,,
& ~.
~•
....
•
D.Ep.s.:2"0-0"4 ~'i",(i").
'~/'~-
1
~.
i
i
2
3 n
:ri=l(i
-1
).
A
~/ I
Prose.
(easy).
&....// I
Conrad.
I
Synt. Eng.
FI¢. 2. Word information and distribution entropy of words, parts of speech and synonym clusters as a function of contextual constraints. culated: the regression of the entropy in synonym clusters on the entropy in words, which was Es.c. = 1.11 Ew. - - 1.19, and the regression of the entropy in parts of speech on the entropy in words, which came to Ep.s. ~- 0.33 t~w. - - 0.21. These results confirm that while the main difference betwen the entropy in the words and that in synonym clusters is a reduction of 1.2 bits in the constant, the entropy in the parts of speech is closer to a
constant proportion of that in the words. The actual proportions for Prose, Conrad, 16th to 1st order and Syntactical English, respectively; were 0.24, 0.24, 0.29, 0'.26, 0.26, 0.27, 0.33, 0.30, and 0.27. This implies that the number of different words selected approximates a p o w e r function of the number of different parts of speech. Thus it seems that the effect of order of approximation on the choice of particular words is not independent
12 4
~REIS~AN
o
~
~°
~l °
~~
~~ ~
~
~ ~ o ~
H
0
~
m
0
g~
0
o
o
~e
f~
"
o
N
~
d
~
g
d
d
d
"~ o
CONTEXTUALCONSTRAINTS of its effect on the choice of meaning, but that grammatical structure does make a separate contribution to the redundancy of speech as the degree of contextual constraint is varied. Table 2 shows in more detail the distribution of the guesses between different parts of speech, and the way these changed as the contextual constraints decreased. For the coherent passages the least determined parts of speech are the adjectives and articles, but this is mainly because they are very interchangeable, both in practice here and in grammatical usage. As the contextual constraints decrease, the prepositions drop most markedly in accuracy of guessing, followed by conjunctions, pronouns, adverbs and verbs, most of the grammatical non-referential words showing the most marked decreases. There is a definite increase in the proportion of nouns, verbs and adjectives guessed as the contextual constraints decrease, while the grammatical structure words decrease not only in accuracy but in number. The information values for individual words, as described earlier, were also analyzed with respect to their grammatical class. The mean values for the information in the choice of particular words and in the choice of parts of speech for all passages are given in Table 3. The ratio of the information in the part of speech to that in the particular word is high for articles, conjunctions and pronouns and low for nouns, adjectives, and verbs. Syntactical English. Finally, the results for the passage of Syntactical English are also given in Fig. 2. If the constraints of grammar
12 5
and meaning operated independently on this particular passage, the word entropy should approximate that for the 1st order while the entropy in the parts of speech should equal that for the passage of Conrad. In fact, the information content of the correct words was as high as that in the 1st order passage, since extremely few words were ever correctly guessed. The choice of particular words again seems to be relatively independent of the grammatical framework. On the other hand the information in the choice of parts of speech for this "syntactical" passage lay between the 2nd and 4th order approximation. This is surprisingly high, since its grammatical structure was that of normal prose. It seems that, although the sentence structure of the 4th, 6th and 8th order passages was partly lost, the greater conCeptual coherence of the immediate context was more effective in determining the part of spech of the missing word than the long-range grammatical structure of the "syntactical" passage.
Consecutive Guesses from Individual Ss. Two Ss made a large number of consecutive guesses over a 3-minute interval at a sample of missing words. The graph in Fig. 3 shows the mean probability (from the first choice frequencies) of these 2 individual Ss' guesses plotted against t h e i r ordinal positions; the maximum probability is also shown for comparison (this was the mean probability in decreasing rank order without replacement). The probability of the consecutive guesses decreased rapidly between the first and fourth guess, but remained well below the maximum. This was partly because they missed some of
TABLE 3 INFORMATION IN CHOICE OF PARTICULAR WORDS AND OF PARTS OF SPEECH FOR DIFFERENT PARTS OF SPEECH DELETED
Noun Information in choice of correct word Information in choice of part of speech Ratio P.S. 'W
Pronoun Conjunct. Adverb
Verb
Prepos. Adject. Article
3.96
1.19
0.97
2.84
3.23
2.17
3.87
1.18
0.30
0.49
0.49
1.12
0.62
0.60
0.56
0.84
0.076
0.412
0.505
0,394
0.192
0.276
0.145
0.712
126
TREISMAN
A -24: e~
~
E
~.~ u
.~ .1E
um.
sl
..J
-!~
E ,o ~.Ot st
•
i
I
I
Ordinal i:x)sition of g u e s s , Fig. 3. Mean probability of consecutive guesses over 3-minute interval by 2 individual Ss.
the high-probability words, but chiefly because their guesses were interspersed with many words that did not occur at all in the ensemble of first choices from the group. However, they did follow the trend of this ensemble in the order in which they gave their guesses; the most frequent tending to occur early. The interspersed low-probability words were usually synonyms, associated words, or occasionally similar sounds following a highprobability word, as if this word aroused with it a cluster of word associations that were given before a break was made (cf. Bousfield, 1953, using a verbal recall task). These clusters of associated words also tended to come in rapid temporal succession, separated by slight pauses. It might be that if one of several probable responses occurs, it tends to block the others temporarily. Seven Ss repeated the guessing procedure for the first choice after several months' interval (4 of these gave a second and third choice as well). Although 6 of them obtained very similar average probabilities on the two occasions (the mean was 0.225), the proportion of their guesses that were the same on both occasions was only 0.326. In other words, they agreed only 10% more with their own first choices on a previous occasion than
with the total sample of 100 Ss. When the second and third choices (for the 4 Ss who made them) were added, the mean agreement on the two occasions increased by' only a further 0.068. This can be compared with the average probability of those first choices that were different on the two occasions, which was 0.113. This evidence, though fragmentary, is interesting because it suggests that the hierarchy of first-choice frequencies taken from 100 Ss may in fact reflect the probabilities of words occurring to an individual S as well as, or better than, his own consecutive guesses on one occasion, some of which appear to be drawn from different ensembles. DISCUSSION This investigation has served two purposes: (1) to obtain some quantitative and comparative estimates of the information Ss extract from passages of prose and its dependence on the coherence of the context; (2) to throw some light on the determinants of word selection in guessing missing words. The context was found to exert a decreasing effect on language responses as it became more remote, the contribution of each word being inversely proportional to its distance. The relative values for the information content of the different passages have been shown to predict Ss' mean performance on various speech reception and transmission tasks quite well (Treisman, 1961 ). It should be stressed, however, that the findings given in this paper are statistical means whose standard deviations are large and which probably conceal a number of different variables. Moreover, by using statistically constructed passages with a deliberately fragmented linguistic structure, the experiment could not by its nature reflect some important features of normal language behavior, such as the planning and syntactical organization of sentences as wholes (Chomsky, 1957). It may be that the method of constructing the passages, taken directly from Miller and Selfridge (1950), distorts the
CONTEXTUAL CONSTRAINTS h y p o t h e s i s of increasing approximation to normal English by omitting any punctuation marks. This means that in the higher-order passages, the grammatical structure sometimes becomes unnaturally complex because no sentence can ever be completed. I t would be interesting to repeat some of the experiments with statistical approximations in whose construction Ss had contributed punctuation marks, where appropriate, as well as words. T h e values found here for the probabilities of missing words are similar to those obtained by Salzinger et al. (1962) using Miller and Selfridge's approximations (1950), although most of the present ones are rather lower. T h e prose passages are an exception, both giving considerably higher probabilities here than in Salzinger's experiment (0.57 and 0.41 compared with 0.28). T h e values for the mean information content of a word in normal prose are considerably lower than Shannon's values (1951) or Sumby and Pollack's (1954) for the sum of the information in a word's consecutive letters. This gives 4 to 6 bits for Shannon and 9 to 10 bits for Sumby and Pollack, instead of the 1 or 2 bits found here. These values are, of course, estimated in different ways and assume different units to be coded. T h e low value found here m i g h t be attributed to the fact that Ss ~were given a bilateral rather than a unilateral context. On the other hand possible biases in S h a n n o n ' s method m ay also lead to overestimates: examples are response dependencies, and the fact that the letter m a y be a less natural unit for Ss to handle. T h e discrepancy confirms that one must be cautious in estimating h u m an channel capacity from m a x i m u m rates of speech transmission based on these estimates of word information. One m a y be able to obtain only comparative rather than absolute values. To consider the results in more detail: it seems clear that with the 1st order passage, Ss show considerably less entropy in their incorrect guesses than
127
if they were selecting words at random with Ist order probabilities, which would give the 9.2 bits derived from the Thorndike-Lorge frequencies. One reason may be the instructions to choose words that made the passages coherent. However, it is interesting that the guesses made here have a higher subjective probability than the original 'correct' word; that Ss introduce functional transition probabilities into a sequence of objectively random words, for this may influence their performance in tasks other than guessing. Similarly, with the other passages Ss may not guess according to the appropriate n-gram probabilities with which the passages were constructed; they may, for example, be affected by words beyond the one preceding and succeeding the blank in the 2nd order passage, so that the entropy for the guesses obtained here is likely to differ from the entropy for isolated diagrams. The fact that the distribution entropy for the 1st order guesses lies so close to the regression line suggests that the "subjective" sequential dependencies aroused by a context of random words lie on the same dimension as those for more constrained contexts. The easy prose passage gives results that would be equivalent to a 45th order approximation, if one extrapolated the regression line downwards. The average sentence length in this passage was 33 words. However, there is clearly a limit to how far one can meaningfully extrapolate the regression line, quite apart from the fact that the lack of punctuation in the higher-order passages may put them on a somewhat different dimension from normal English. It is unlikely that a 100th order approximation would convey no information at all, as implied by the point at which the regression line for the correct guess cuts the axis. One can assume a cut-off point due to some other factor, such as the length of context Ss can hold in their minds as they make the guesses, or breaks in the sequential structure of normal prose, for instance between sentences, or paragraphs. A series of consecutive guesses from 2 Ss was investigated to test the assumption that the individual's internal ranking of word probabilities for each context are reflected in the response hierarchy of first-choice guesses from a group of Ss. This assumption may be implied when an information estimate based on the group's guesses is applied to the performance of individuals in different verbal tasks. In fact, the correspondence between individual and group distributions was far from perfect. However, the strong response dependencies obvious in the consecutive guesses suggest that the method of successive choices may itself give a misleading picture of the single S's initial response hierarchy aroused by the verbal context of the passage. One
128
TREISMAN
factor tending to make the second choice depart from the initial ensemble would be the recently made first choice, still in the S's mind, influencing the second by becoming part of the context in which it is made. Thus a sequence of response hierarchies, each different from the first, might be aroused as each guess is made. The first choices from 7 Ss who repeated the experiment after several months were predicted almost as well by the group frequencies as by their own previous first choice, while the second and third choices were actually better predicted by the group frequencies; this is some confirmation that the group data can be usefully applied to individuals' performance. However, we need to know a great deal more about how the verbal context can define the hierarchy of possible responses through which the search for the proper word is made; and what are the relations between the different units--words, parts of speech and concepts in determining the hierarchy. There seems to be a linear relation between entropy at all three levels and the degree of contextual constraint weighted for distance by the formula ( i - 1 ) , but the slope of the regression and the i=l constant differ for words and for parts of speech, while the constant only is different for words and synonym clusters. The latter finding suggests that the measurement of information at the word level is not too arbitrary, since it seems to be closely paralleled by the information in the "meaningful" units of speech. In addition it gives some support to Miller and Selfridge's belief (1950) that their technique for constructing statistical approximations to English provides us with a "scale for what can be loosely called 'meaningfulness.' " Thus the greater the number of words determining the choice of succeedings ones, the greater is the precision with whic h a particular meaning is conveyed. The difference in slope for the grammatical categories suggests that these contribute to the redundancy relatively independently of the choice of particular words. However the selection of parts of speech is itself considerably affected by the general conceptual coherence of the passage, as shown by the results for the "Syntactical English." It may be tentatively concluded that meaningful associations to the context come to mind first, and are then adapted to fit the grammatical framework. Where passages lack a complete grammatical structure, the short-term associations tend to predominate. This does not imply that grammatical structure can never he manipulated independently, for instance with nonsense words; b u t that where the general co-
herence of a passage using familiar'words is stressed, grammatical constraints may be subordinated to others. It is interesting, as shown in Table 2, that the relational words--the prepositions, adverbs, conjunctions and pronouns--seem to be determined by longer term sequential dependencies than the 'reference' words--the nouns, adjectives, and verbs. They are more dependent on the "whole sentence" structure, on the formal skeleton or Chomsky's "kernel" with its transformation rules (1957), and they lose their function and identity as the sentence is fragmented. Thus the relatively high influence of immediate context on choice of parts of speech is exerted primarily on nouns, adjectives and verbs; for the grammatical, non-reference words, the constraints may be more similar to those operating on the choice of particular words. REFERENCES BOUSFIELD, W. A. The occurrence of clustering in the recall of randomly arranged associates. J. gen. Psychol. 1953, 49, 229-240. C~JO~SKY, N. Syntactic structures. The Hague: Mouton & Co. 1957. FRENCh, N. R., CARTER, C. W., AND KOENIG, W. The words and sounds of telephone conversations. Bell Syst. tech. J., 1930, 9, 290-324. M~LER, G. A., AND SELFRmGE, J. A. Verbal context and the recall of meaningful material. Amer. Y. Psychol., 1950, 68, 176-183. SA:LZINGER~ K., PORTNOY, S., AIqD FELDMAN, R. S. The effect of order of approximation to the statistical structure of English on the emission of verbal responses. J. exp. Psychol., 1962, 64, 52-57. SHA~I~O~¢, C. E. Prediction and entropy of printed English. Bell Syst. tech. J. 1951, 80, 50-64. SUMBY, W. H., AND POLLACK, I. Short-time processing of information. HFORL Report, TR-54-6, 1954. TAYLOR, W. L. "Cloze procedure": a new tool for measuring readability. Journ. Quart., 1953, 80, 415-433. TAYLOR, A. M., ANn MORAy, N. Statistical approximations to English and French. Lang. and Speech. 1960, 3, 7-10~ THORNDIKE, E. L., AND LORGE, I. The teacher's word book o] 30,000 words. New York: Bureau of Publications, Teacher's College, Columbia Univer., 1944. TREIS~VIAN, A. M. Attention and speech. Unpublished doctoral thesis, University of Oxford, 1951. (Received March 11, 1954)