JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR 1, 443-450 (1963)
Word Frequency and Serial Position Effects 1 WILLIA~¢[ H. Su~BY Operational Applications Laboratory, Electronic Systems Division, Bed]ord, Massachusetts
Hail (1954) demonstrated that, within limits, recall is significantly greater after the presentation of a series of randomly arranged, high-frequency words than after the presentation of a list of words each appearing infrequently in the total language. Analysis of data presented by Noble (1952) yielded a statistically significant correlation (P approximately .03) between the frequency with which a word appears in the language and the number of word associations, actually associations of meaning, which can be made to the stimulus word within a limited period of time. That is, high-frequency words tend to elicit more associations than low. The same finding was explicitly stated by Cofer and Shevitz (1953). Miller and Selfridge (1950) showed that the recall of verbal material, presented sequentially, is correlated positively with the degree of contextual constraint or structuring present in the passages, i.e., approximations to English. Deese and Kaufman (1957) further examined the influence of sequential structuring in verbal material upon the order of recall of individual words and upon the serial position curve for frequency of recall. Their results show that when unstructured verbal material is presented and free recall is allowed, the last words of a series generally are recalled most frequently and the middle 1 The research reported in this paper was sponsored by the Air Force Electronic Systems Division, Air Force Systems Command. The paper is identified as AFSC Technical Documentary Report No. ESD-TDR-62-15. F u r t h e r reproduction is authorized to satisfy needs of the U.S. Government. 443
of the series least. When textual material is presented, however, the highest frequency of recall is at the beginning of the series and the lowest just past the middle. The latter result is typical of the classical recall function when the method of serial anticipation is followed with unstructured verbal material. The related questions examined in the present study are as follows. (1) Are different intraserial learning effects exhibited in free recall for unstructured verbal material as a function of word frequency? (2) If differences are indicated, are they attributable to variables other than simply the frequency of occurrence of a single word in the language? The hypothesis under consideration is that there is a strong tendency for individuals to associate semantically high-frequency words with other high-frequency words. The tendency toward interword association might be less pronounced with words of lower frequency. Thus, a certain amount of structuring will develop when disconnected words are presented serially. If this is the case, the serial-position recall function for high-frequency words should tend to be similar to the free recall function for textual material. The low-frequency function, on the other hand, should tend to be skewed in the direction of the end of the series, comparable to the results of Raffel (1936) for nonsense syllables. An additional question is also under examination. What effect does word frequency have on learning specific word orders when the orders to be learned contain words of equivalent response strengths? Presumably,
444
SUI~BY
in this case the effect of w o r d f r e q u e n c y w o u l d b e c o n s i d e r a b l y r e d u c e d and differences a p p e a r i n g in the l e a r n i n g r a t e m u s t be a t t r i b u t e d to o t h e r factors. I f a difference occurs, it is again suggested t h a t a m a j o r c a u s a t i v e factor is the t e n d e n c y of individuals to f o r m p a r t i c u l a r w o r d series f r o m a p p a r e n t l y d i s c o n n e c t e d words. T h e i m p l i c a t i o n is t h a t w i t h h i g h - f r e q u e n c y words associations will develop m o r e rapidly. METHOD The study was conducted in two phases. In the first phase a comparison was made of the serialposition curves in the free recall of series differing in word frequency. In the second phase an examination was made of the serial-position curves resulting from a task in which the learning of a particular word order was attempted after the same words had been presented serially in eight different orders.
instructions were given to write as many of the words as possible. Approximately 30 sec. were allowed for recall and immediately after recall the series was again presented. Phase 2
In the second phase each S was shown two lists from each of the extreme frequency ranges, 0-1 and 900-1100. One list was presented in each of four weekly sessions. The lists were arranged in different orders for each of the first eight presentations. The method of free recall was again used. After the responses to the eighth presentation were recorded by the S, instructions were given to write the words in their given serial position for the remainder of the trials, but free recall was permitted. A single fixed order was shown from the ninth through fifteenth presentations. The presentation rate was the same as in the previous trials. The order of frequency presentation was counterbalanced. Twenty Regis students served as Ss; none had previously participated. RESULTS
Phase 1
Phase 1
In the first phase, words were chosen from four word-frequency intervals, 0-1, 9-11, 90-110, and 9001,100 word occurrences per four and one-half million words according to the Lorge Magazine Count (Thorndike and Lorge, 1944). Twelve monosyllabic word lists were selected, three lists of 15 words for each frequency interval. The words selected are either without homophones, or the addition of the frequency of the homophone does not cause the combined frequency to exceed the prescribed range. The lists were arranged by first selecting words on the basis of their frequencies in order to obtain equal mean frequencies for each list within each interval. The order of the words in a list was determined randomly. In this phase ten students from Regis College for Women served as paid Ss. Prior to the experiment proper, two practice lists of random word-frequency were learned to a criterion of one correct recall. The words were presented visually at a rate of one word every 2 sec. Following this, in each of four additional weekly sessions and at the same rate, each S was shown one series from one frequency interval per session, until a criterion of one perfect recitation was met, i.e., all words but in any order. The same word order was maintained during a session. Five of the Ss began with a series of lowfrequency words and five with a series of high-frequency words. Two seconds after the fifteenth word,
I n this p h a s e the serial order of the words r e m a i n e d fixed to criterion. I t was not required, however, t h a t S respond in the pres e n t e d order. T h e rates at w h i c h the criterion of one c o m p l e t e r e c i t a t i o n was r e a c h e d are p r e s e n t e d in Fig. 1, in terms of the a v e r a g e n u m b e r correct as a f u n c t i o n of the m e a n w o r d f r e q u e n c y of the list. H a l l ' s (1954) results are confirmed. W i t h i n limits, recall is significantly greater a f t e r the p r e s e n t a t i o n of a series of h i g h - f r e q u e n c y words t h a n a f t e r l o w - f r e q u e n c y words, w h e n either of the high w i t h either of the low-freq u e n c y classes is c o m p a r e d . A n analysis of v a r i a n c e yielded a P of less t h a n .01 ( F ~-~ 6.79, d] ~- 3 / 3 9 ) for the comparison. T h e differences b e t w e e n the two h i g h - f r e q u e n c y classes and the differences b e t w e e n the two l o w - f r e q u e n c y classes were n o t s t a t i s t i c a l l y significant. T h e e v i d e n c e suggests t h a t e v e n t h o u g h recall tends to be less difficult for h i g h - f r e q u e n c y words, the difference b e t w e e n a v e r a g e frequencies m u s t be g r e a t in o r d e r for the recall to r e a c h statistical significance. T i l e curves shown in Figs. 2 a n d 3 re-
FREQUENCY AND SERIAL POSITION 15
i
i
i
i
I
i
i
i
i
i
~
/
i
~ 1
,
i
..
....-;<'. ......... -"
/----
14
445
×
x.."
.'x
...-"
F 12 O
//i
....,,... '
hi
n~ oo -
/i /
o n,-10 "' m~
// // ..-" // .-,2, /'h."""
z < n" Ld <
......×.x.'"
WORD FREQUENCY
7 .-""
-- 900 - I I 0 0 90 - II0 ....... 9 II PER 45. MILLION xxx <1 I
----
-
/I..~" I.:
6 "1.'
41--I
I
I
f
I
I
5
I
I
I
I
I
I0
I
I
I
I
15
TRIAL FIG. 1. Average number of words correct as a function of trial when the same word order was repeated to criterion. present the per cent correct for all Ss on trials 1, 2, and 3 as a function of the serial position of the word when the serial order of the words remained fixed to criterion. Figure 2 represents the high-frequency series and Fig. 3 the low. T h e abscissa represents the average of each three successive positions, so as to reduce the v a r i a b i l i t y a t t r i b u t a b l e to the small number of Ss used. A bow-shaped curve summarizes the results from the first trial with the high-frequency material. I t also is evident t h a t the p o i n t of most difficult recall on the third trial is approaching the second half of the word series. T h e point did, in fact, reach the middle of the series on the fifth trial. W i t h the low-frequency material the tendency to recall the words of the second half of the series more often than the words of the first half is indicated, and the bow shaped effect is not as apparent, especially for the first trial. T a b l e 1 presents the mean number of words recalled for each half of the
TABLE 1 MEAN NUMBER OF WORDS RECALLED FROM FIRST HALF OF SERIES COMPARED WITli MEAN NUMBER RECALLED
Trial
1
I~RO~v[ SECOND
First half
Low Frequency 1.4
HALF
Second half
4.9
2 3
4.7 5.7
6.1 6.4
Mean
3.9
5.8
High Frequency 1 2 3
4.1 6.9 8.0
4.7 5.9 8.3
Mean
6.3
6.3
series for the first three trials. A Wilcoxon matched-pairs signed-ranks test showed that statistically significant differences between halves occur only in the first and second trials of the low-frequency material. The rank-order correlations (p's) shown in T a b l e
446
SUMB¥
2 represent the correlations between the numbers of words recalled on each of the first three trials as a function of serial position. Such correlations offer evidence of the degree
lID hi
°°I 8o
of similarity in the shapes of the curves. It is apparent that the high-frequency serial curves do not change radically from trial to trial• The low-frequency curves, on the other
HIGH FREQUENCY
. . . . /1 5rd /
-%.
FF
rr
B
60-..
J
\ \
I..Z I.u 0
''••.
2n~?
\
\
\
"..
40
••
/
,/
\/
."
/
/
,
U.l 13..
......
.
20
''...• I0 2
4
..'"
I
I
6
8
llO
14
112
S E R I A L POSITION FIO. 2. P e r cent w o r d s correct on T r i a l s 1, 2, a n d 3 for series of h i g h - f r e q u e n c y w o r d s as a f u n c t i o n of serial p o s i t i o n a n d w h e n the same w o r d o r d e r w a s r e p e a t e d to criterion•
I'00
i
i
i
i
i
I
-
i
i !
LOW FREQUENCY
8O
I-C, ILl rr
3rd / /
0 60 0 F-Z W (.j 4O rr Ld O_
/ / / ~ .
\ \
\
2nd
/ -..
I
-- --.'~".
.... '"
•"'~
rJ Ist TRIAL
• -"
20 I0
I
2
"'." I "" " " I' " " 4
"" I I 6 SERIAL
I
I
8 10 POSITION
I
iI
I
14
FIO. 3. P e r cent w o r d s correct on T r i a l s 1, 2, a n d 3 for series of l o w - f r e q u e n c y w o r d s as a f u n c t i o n of serial p o s i t i o n a n d w h e n the same w o r d o r d e r w a s r e p e a t e d to criterion•
FREQUENCY
TABLE 2 RA1NK-0RDER CORRELATIONS (p) BETWEEN THE NLrl%~[BERS OF WORDS RECALLED ON EACX O:F THE FYRST THREE TRIALS AS A FUNCTION OF SERIAL
POSITION
(H I represents first trial, high frequency, etc.) H2
H3
HI
.91'**
.81"*
.27
H2
--
.79**
--.02
H3
--
--
.15
L1
--
--
--
.59"
L2
.
* .01 ~ ****P ~
. P ~
L1
.
L2
L3
.79"*
.70**
.56
.60*
.73**
.71"* .54
.
.05
.67*
** ,001 <
Y ~
.01
.001
hand, tend to be different from one trial to the next. It can be noted, however, that the shape of the low-frequency function rapidly approached that of the high. Phase 2
That high-frequency words are learned with greater ease than low-frequency ones is again demonstrated in the left half of Fig. 4. The functions here summarize the data from the free recall of different orders of the same words for each of the first eight trials, and then a single order for trials 9-15. During this phase only wor d frequency intervals 0-1 .
i
100
•
I-- 8 0 C.) ILl n." n,-" o (D 60 I.Z UJ ,(,..) n,."
.
.
.
.
and 900-1,100 were used. Between Trials 8 and 9 the Ss were instructed that the words should be now placed in the same serial position as presented, although free recall was still allowed. The right half of Fig. 4 indicates that ]earning the order of the series was somewhat easier with high-frequency words; the level is higher although the actual learning rates appear to be approximately the same. Figure 5 is a comparison of the serial position functions for both frequency groups for Trials 1-3 when the word order was changed for each trial, and Trials 9-11, i.e., the initial three trials of the same word order when the task was to place the words in their given positions. The functions for Trials 1 through 3 are approximately the same for both groups, but with the low-frequency function are approximately 15% lower throughout. The functions summarizing Trials 9 through 11 are markedly similar, although the point of poorest recall is more to the right for the high-frequency series. Such a result is typical of results obtained by the method of serial anticipation with noncontextual material. The low-frequency function is very nearly
.
•
447
AND S E R I A L P O S I T I O N
% ,
0
.~
/
19/
/ / /
40 ¢
2 • o-----
HIGH FREQUENCY LOW FREQUENCY
~
/o /
/ o
A
B
20 3
4
5
6
Y
8
9
I
14
15
TRIALS F I o . 4. P e r c e n t w o r d s c o r r e c t as a f u n c t i o n of trial. Left half--when w o r d o r d e r w a s c h a n g e d f o r e a c h t r i a l ; right half--when o n e o r d e r w a s r e p e a t e d f o r t h e s a m e w o r d s a s t h o s e g i v e n i n T r i a l s 1 t h r o u g h 8.
448
SUMB¥
lOOt' \
.
.
.
.
.
Ill \
\
t 80
~
[ .] - -
.
. . HIGH E I-S
I---
HIGH F. 9-11
o "
LOW F. I - 3 LOWF. 9.-II
/ S
E
,/ ',' o
°k \
nuJ
I /
IK ~IL
/ ./
\
o_
20
/
%
0
I 2
3
I 4-
I 5
/
-11--I
•
I I 6 7 MIDPOINT
/
I I I I 8 9 I0 12 OF S E R I A L INTERVAL
Ill
I 13
14
FIG. 5. Per cent words correct on Trials 1, 2, and 3 compared to Trials 9, 10, and 11 as a function of serial pcsition for series of high and low frequency.
symmetrical. A Wilcoxon matched-pairs signed-ranks test performed on the data from both trial groups indicated that the differences between frequencies were significant at the .02 level or less for a two-tailed test throughout the series. The same relationships between frequencies hold when all the trials are averaged, that is 1-8 and 9-15, except that the ordered series function for the lowfrequency words is now approximately the same shape as the high. I t was thought that the vertical distance between the high and low-frequency functions would tend to increase with successive differently arranged series since the highfrequency material should offer greater opportunity for associations to develop. In other words, even though the actual learning rates were approximately the same, the greater associative potential of high-frequency words would tend to evoke more responses. Since, however, the distance remained approximately the same with successive trials, the
possibility was considered that the strengths of average associative potentials, the tendency for a particular word to be recalled after another particular word has been recalled, for both frequency classes are closer than anticipated. A frequency analysis of the occurrences of pairs of words revealed that this was, indeed, the case; that is, there was a strong tendency for words to be elicited in particular pairs regardless of the presentation order. It was interesting to learn, however, that the types of associations formed between words are quite different for the two frequency classes. While the associations for high-frequency words were typically semantic in origin, the low-frequency associations appeared to be entirely phonetic. Table 3 illustrates such associations. Column A shows the six most common word pairs for the highfrequency lists, and column B the six most common for low-frequency. Examination of several cases indicated that repetitions of a three-word grouping were uncommon for both
449
FREQUENCY AND SERIAL POSITION TABLE 3 MOST FREQUENTLY OCCURRING WORD PAIRS :FOR HIGI-I AND LOW FREQUENCIES WHEN THE ORDER OF PRESENTATION W A S ~ANDOiVilZED :FOR E A C H TRIAL
(A) High frequency
(B) Low frequency
lip-touch kiss-lip dark-black hot-cold fall-drop tree-dark
wert-weft shrew-shrike prate-pard pith-plash flange-flux swale-scab
classes when the sequence order was changed for each trial. DISCUSSION When unstructured verbal material is serially presented and a single word-order maintained over successive trials, different intraserial learning effects are exhibited as a function of stimulus word-frequency with free recall. A bow-shaped function is apparent from the first trial on with high-frequency material. The early trial functions for low-frequency series, on the other hand, are skewed to the second part of the series and are much like the earlier results of Raffel (1936), who used nonsense syllables as stimulus material. The present results obtained with unstructured word series apparently define two functions on a continuum of functions between those for the contextual material of Deese and Kaufman (1957) and the nonsensesyllable material of Raffel. The meaningful associations between words which undoubtedly form with the high-frequency material cause the function to approach that for the contextually constrained series. The function for the low-frequency series, on the other hand, approaches that for nonsense syllables; little redundancy is developed. When the material was successively presented in different orders the serial functions for high and low frequencies are essentially the same shape after the first trial, but displaced, i.e., the high-frequency function is
similar to the low, plus a constant. In addition to the displacement, the orders of responding were different. Deese and Kaufman (1957) found that when the structured material was presented the order of recall was typically in the order of presentation. For unstructured material the last words were emitted first, then the initial words and finally the middle. This latter finding is consistent with the present results for both classes of material; the last words were typically emitted first. In addition, it was found that with successive trials such a tendency is reduced, which is probably attributable to the increasing response strength of the words through repetition of presentation. The tendency appears to be stronger with the low-frequency material, however, and the gradual change from such a tendency is not quite as marked, as is shown in Table 4. Apparently, the response strengths for the highfrequency words increase to a particular level more rapidly than the low, the associations formed are more powerful, and the word dusters become longer. TABLE 4 MEDIAN SERIAL POSITION OF FIRST THREE WORDS RECALLED WHEN SERIAL ORDER WAS CHANGED :FOR EACH TRIAL Trials Word series
2
4
6
8
High frequency Low frequency
10.2 11.4
10.2 10.7
9.4 i0.4
8.8 10.1
When the same order was successively presented and the requirement was to place the words in their presented positions, again the functions are highly similar when all trials are totaled, although the bow for the lowfrequency is actually steeper than the function describing the results for the high-frequency series. The order of response emission was quite different for the two frequencies for the first three or four trials. For the highfrequency material the tendency was to emit the first items first and in the order of pres-
450
SUMB~
entation and then the last items, again in order, from some point near the end of the series. Such behavior was observed as being typical for 14 of the 20 Ss. For the low-frequency material almost the reverse was true. Most frequently the Ss began responding with the last word and worked toward the middle and then attempted the initial words, in this case from the beginning toward the middle of the series, but with less success. This was the case for 17 of the 20 Ss. Longer successive word clusters were recorded for the high-frequency material. This finding suggests that there is a greater tendency for high-frequency words to form some association with another high-frequency word, semantically or phonetically, than a lowfrequency word to associate with another lowfrequency word. Much of the difference between functions can be attributed to such a tendency. Immediately before the successive presentations of the same serial order the words were approximately equal in immediate response strength, for at least part of the group, and the recall differences between frequencies were maintained. The greater recall of high-frequency series, then, is only partially attributable to word frequency per se. Other factors play a role, and most probably the high semantic association potential of high-frequency words with each other is of major importance. The association potential appears to be a major determinant of the shape of the verbal serial-position function. SUMMARY
The influence of word frequency on serial learning, using the method of free recall is investigated in this study. The results indicate the following: (1) There is a tendency for Ss to learn series of high-frequency words more rapidly
than series of low-frequency words. A large separation between the average word frequencies of the series is, however, required to yield statistically significant differences between the recall scores. (2) When groups of words are known to S, it is easier to learn a particular order of high-frequency than low-frequency words. (3) The order of recall for high-frequency series is different from that for low-frequency series when a particular word order is to be learned. For high-frequency series the first part of the list is emitted first, and for ~ the low the words at or near the end of the list are emitted first. (4) The bowed shape of the serial position curves is more pronounced for the highfrequency series than the low. (5) It is suggested that there is a tendency for high-frequency words to be associated semantically and low-frequency words to be associated phonetically. REFERENCES
CO~ER, C. N., A~ S~EVITZ,R. Word association as a function of the Thorndike-Lorge frequency of the stimulus words. ONR Contract N70-NR397, TR 13 (1953). DENSE,J. A~D KAtlt'~WA~I,R. A. Serial effects in recall of unorganized and sequentially organized verbal material. J. exp. Psychol., 1957, 54, 180-187. HALL, J. F. Learning as a function of word frequency. Amer. J. Psychol., 1954, 67, 138-140. MILLER, G. A., AND SELERIDGE, J. A. Verbal context and the recall of meaningful material. Amer. J. Psychol., 1950, 63, 176-185.
NOBLE, C. R. The role of stimulus meaning (r~) in serial verbal learning..7, exp. Psychol., 1952, 43, 437-446. RAE~'EL, G. Two determinants of the effect of primacy. Amer. J. Psychol., 1936, 48, 654-657. T~ORNDIKE, E. L., AlX'DLORCE,I. The teacher's word book of 30,000 words. New York: Bureau of Publications, Teachers Coll., Columbia Univer., 1944. (Received October 15, 1962)