JOURNAL OF VERBALLEARNING AND VERBALBEHAVIOR 17, 621-633 (1978)
On the Relation Between Recall and Recognition M I C H A E L J. W A T K I N S AND A M Y K . T O D R E S
Princeton University
The relation between recallability and recognizability was determined in four experiments. The procedures involved presenting a study list and then giving a recall test for some items, followed by a recognition test for all items. The "redaction method" (Tulving & Watkins, 1975) was then used to derive estimates of the recallability-recognizability contingency relation without recourse to data for items encountered in a prior test. The results were consistent across experiments in suggesting that the set of recallable items was substantially, if not wholly, included within the set of recognizable items.
Its long history notwithstanding (e.g., Brown, proportion that can be recalled tells us that 1965a, Hanawalt, 1937; Hollingworth, some items can be recognized but not recalled, 1913, Zangwill, 1937), the question of but it does not tell us whether any items can be how recall and recognition are related is of recalled but not recognized. Do the recallable greater interest today than ever before. The items form a proper subset of the recognizable nature of the relation has many facets, as can items, or is there some proportion of recallbe seen from a glance at the recent volume, able items that cannot be recognized? The Recall and Recognition, edited by John Brown answer should not be sought from a logical (1976). The concern of this paper is with but consideration of the formal properties of the one facet, namely, the relation between recall- recall and recognition tests; it should be sought ability and recognizability. In other words, we empirically. It turns out, however, that the are concerned with the relation between two empirical determination of the recallabilityproperties of a target set of episodes: (1) recognizability relation is no simple matter. This paper explores one approach to the whether they can be recollected under condiproblem, that of giving both a recall test and a tions of recall, and (2) whether they can be recognition test for one and the same set of recollected under conditions of recognition. item presentations. With this procedure the To be sure, as a general rule, recognition is more probable than recall. This assertion is items can be categorized according to whether intuitively reasonable and well documented they are (a) both recalled and recognized, (b) experimentally (e.g., Kirkpatrick, 1894; Mac- recalled but not recognized, (c) recognized but Dougall, 1904), but it does not completely not recalled, and (d) neither recalled nor recogspecify how recallability and recognizability nized. It might be supposed that such fourfold relate. Knowing that the proportion of a set of contingency data directly and completely items that can be recognized is greater than the specify the relation between recallability and recognizability, in which case our problem would be solved. Unfortunately, such an The authors are grateful for the helpful comments of analysis is overly simplistic. The problem is John Gardiner, Ronald Kinchla, Russell Leaf, Jerry that performance on the second test might Rudy, and Olga Watkins. Requests for reprints should be reflect not only the encounter with the target addressed to Michael J. Watkins, Department of Psychology, Princeton University, Princeton, NJ 08540. items in the study list but also the subsequent 621
0022-5371/78/0175-0621502.00/0 Copyright© 1978by AcademicPress, Inc. All rightsof reproductionin any form reserved, Printed in Great Britain
622
WATKINS AND TODRES
encounter during the first test. If so, all four obtained in each of the seven experiments that entries of the contingency table would be in the Broadbents (1975) reported. The Broaderror. bents thus concluded that knowing whether an Possible influences of one test on another item is recallable tells us nothing about have long been acknowledged (Ballard, 1913), whether it is recognizable, and that therefore and have recently become the subject of a recallability and recognizability are indepengood deal of experimental research (see dent attributes. Cooper & Monk, 1976, for a review). The Given that the Broadbents replicated this experiments described here were designed not finding so many times, their conclusion might to evaluate these influences, but rather to get appear secure. However, two potentially around them. To this end we used the "reduc- disturbing features are common to all of their tion method," described by Tulving and experiments. First, the subjects were informed Watkins (1975). The reduction method prior to studying the to-be-remembered list requires some additional data, which allows us whether or not a recall test would be given. In to avoid using confounded measures. Applied light of evidence (e.g., Carey & Lockhart, to the present case it requires two testing 1973; Hall, Grossman, & Elwood, 1976) that conditions: a Recall-Recognition condition in the manner in which the list items are encoded which a recall test is followed by a recognition is influenced by the subject's anticipation of test, and a Recognition condition which the nature of the memory test to be given, it is involves only a recognition test. From these likely that the Broadbents' procedure violated two conditions are obtained three measures: the critical assumption that the memory traces (a) the proportion of items in a recallable state, involved in the two testing conditions were given directly by performance in the first test equivalent. 1 It should be pointed out, however, of the Recall-Recognition condition, (b) the that such a bias cannot explain the failure to proportion in a recognizable state, which is find a positive association between recallgained directly from the Recognition condi- ability and recognizability, for the effects of tion, and (c) the proportion in a recognizable the bias would be to make the association but not recallable state, obtained by con- more positive. sidering the results of both tests in the RecallThe second problem with the Broadbents' Recognition condition. Note that none of these procedure has to do with the interval between three measures concern performance on items list presentation and the recognition test for encountered on a previous test. In the experi- the two conditions. For each of their experiments to be described here these measures will ments, the Broadbents were careful to equate be used to derive the complete recallability- this interval across conditions. In four of their recognizability contingency table. seven experiments, this temporal matching was This general approach to exploring the Indeed, the very notion of test-appropriate encoding recallability-recognizability relation has been adopted by Broadbent and Broadbent (1975, strategies is adopted by the Broadbents to explain a subsidiary finding. After subjects had experienced several 1977; see also Broadbent, 1973, pp. 91-94). lists, recognition for unrecalled items in the RecallIn fact, though the Broadbents were not Recognition condition fell significantly below that for directly concerned with the complete fourfold items of the Recognitioncondition, implying that recallcontingency relation, they too compared the ability and recognizabilitywere, for these lists, positively proportion of items recognized in a Recogni- associated. The Broadbents assumed that across succeslists encoding strategies became more test approprition condition and the proportion of unrecalled sive ate, and hence the consequent bias more pronounced. items recognized in a Recall-Recognition Although this assumption appears reasonable, it seems condition. Their finding was intriguing: the less reasonable to assume a total lack of bias for the first proportions did not differ. This result was list or two.
623
RECALL AND RECOGNITION
achieved at the data analysis stage rather than in the experimental procedure. Thus, although in the Recognition condition the recognition test was given at the same point in time as the recall test in the Recall-Recognition condition, the recognition test took longer to complete than the recall task, so that, by discarding the recognition data from the first part of the test in the Recognition condition together with that from the last part in the Recall-Recognition condition, the Broadbents were able to obtain recognition data matched for retention interval. In the other three experiments, the Broadbents equated the retention intervals by interpolating a filler task in the Recognition condition. This task involved the generation of words (4-letter words, place names, and forenames beginning with prescribed letters). Although this method enjoys the advantage over the first method of using all of the collected data, it shares with the first method the assumption that the intervening activity (word generation or recognition testing) has the same effect on subsequent recognition performance as recall has upon recognition for unrecalled items. For any given intervening activity this assumption is almost certainly wrong. There arises, therefore; the question of whether the resultant error is likely to have an appreciable effect in estimating the recallability-recognizability relation, or whether it can be dismissed as negligible. This is no easy matter to judge, but it is worth noting that, for both types of filler task used by the Broadbents, the subjects in the Recognition condition were likely to have encountered more items than would be produced by the subjects performing the recall task. It could be argued that this factor may have reduced recognition performance in the Recognition condition relative to that in the Recall-Recognition condition, with a consequent underestimation of the association between recallability and recognizability. In any event, a between-condition difference in prerecognition activities could bias estimation of the recallability-recognizability relation.
The critical procedural innovation in the experiments to be described here came with the use of both test conditions for one and the same study list. In each of four experiments, half of the study items were tested for recall and then all of them were tested for recognition. This procedure ensures that precisely the same activity precedes the recognition test for the two conditions. Also, by not telling the subjects which items will be involved in the recall test, we may assume that item encoding in the two conditions will be statistically equivalent.
EXPERIMENT 1
Method All subjects studied the same list, which comprised words drawn from two distinct categories. Following list presentation, the subjects engaged in irrelevant activity, and then tried to recall the words of just one of the categories. Half of the subjects recalled from one category, and half from the other. Finally, all subjects were given the same recognition test, which included all of the words from the study list. Materials and design. Seventy-two words, 36 from each of two taxonomic categories, were selected from the Battig and Montague (1969) norms. The two categories were "fourfooted animals" and "parts of the human body." The words within each of these categories were assigned at random to two 18word sets, one to be presented as target items, the other as recognition lures. The target words were formed into a single 36-word list. Within this list, the words were arranged into six 6-item blocks, with each odd-numbered block comprising words from one category, and each even-numbered block words from the other. Each block was headed with its category name so that the presentation sequence comprised a total of 42 items, 36 targets and three presentations of each category name. The items were typed in upper case, with the
624
WATKINS AND TODRES
category names underlined, and were displayed by means of a projector. Four additional items (States of the Union) were prepared for the purpose of familiarizing the subjects with the mode of presentation. The recall sheet was headed with the name of one of the categories. For 10 subjects, this was "four-footed animals"; for the other 10 it was "parts of the human body." T h e recognition test sheet included 36 pairs of words, typed in two 18-pair columns. Each pair consisted of a target word together with a lure drawn from the same category. There were nine pairs from each category in each of the two columns. The recall and recognition test sheets, together with a number of blank sheets which prevented previewing, were stapled into a test booklet. Subjects. Twenty young adults from the Princeton community participated for pay. Procedure. The subjects were tested in small groups. They were told that they would be shown a list of words, after which they would be tested "to see how many they could ! remember.' The structure of the list was explained. Before the list was presented, the i?: subjects were shown four words (unrelated to the critical words) to allow them to become i: accustomed to the presentation conditions. After res~0nding to any questions, the experimenter presented the study list at a rate of one item every 2 sec. Immediately following list presentation, the experimenter read out a set of eight digits, and instructed the subjects to record them in order on the first page of their booklets. Two more digit sets were presented and recorded !n the same way. After this distraction, subjects turned over two blank pages of their booklets to find the recall test sheet, headed with 'the name of one of the categories. They were allowed 2 min to recall, in any order, as many items as they could from this category. Finally, they turned to the recognition test sheet.~They were told to circle from each pair the word: most likely to have been presented in the studY list. No time limits were imposed on this test,
Results Of primary interest is the comparison of recognition performance for items not tested for recall with that for items that were tested but were not in fact recalled. It turned out that there was a difference in favor of untested items. Thus, the proportion of correct choices in the recognition test was .87 for untested items and .75 for unrecalled items. Put another way, a target item was about twice as likely to be missed in recognition if it was missed in recall than if it was not tested for recall. This recognition inferiority of the unrecalled items was significant [t(19) = 2.62, p < .01] and it indicates that recallability and recognizability are related such that an item is more likely to be recognizable if it is recallable than if it is not. Before considering in more detail the nature and extent of this association, we should address the problem of "guessing." It is usually assumed that recognition testing induces more guessing than does recall testing, which suggests the need for guessing corrections. The details of the correction procedure will, of course, follow from the model of guessing adopted. In the present case, the model of guessing should conform to our treatment of recallability and recognizability as dichotomous variables, and the obvious choice is the "high-threshold" model. According to this model an item is in either a recallable or an unrecaUable state; if the former, it will be recalled with a probability of 1.0, and if the latter it will be guessed with some probability g and similarly for recognition. Given the total proportion of items produced in the recall test or circled in the recognition test ( p H or "hit rate") and the proportion of items in an unrecallable or unrecognizable state produced or circled (g or "guessing rate"), the proportion of items in a recallable or recognizable state (pR) can be computed as follows:
pR --
pH- g 1--g
[1]
625
RECALL AND RECOGNITION TABLE 1 RECALLABILITY--RECOGNIZABILITYCONTINGENCY DATA FOR EXPERIMENT 1 Recognizable Recallable Unrecallable Total
Recallable Unrecallable Total
Unrecognizable
Total .51
.24 .73
1.00
Recognizable
Unrecognizable
Total
.49 .24
.02 .25
.51 .49
.73
.27
1.00
The probability g can be derived if we assume that all nontarget items are in the unrecallable or unrecognizable state. The two-alternative forced-choice procedure fixes g at 0.5, rather than leaving it to the mercy of the subjects' "response criteria." Thus, from Equation 1 the estimated probabilities of the untested and of the unrecalled items being in a recognizable state are, allowing for rounding errors, .73 and .49 respectively. The probability of an item being guessed at recall is directly estimated by the probability of a recognition lure being intruded into recall, which was found to be .04. The observed probability of recall was .53, so that by Equation 1 the corrected probability is .51. 2 Consider now the specification of the recallability-recognizability association in the form of a fourfold contingency table. The corrected recallable and recognizable marginals are, as we have just noted, .51 and .73 respectively. The other entry necessary to allow completion of the table is that for the unrecallable-recognizable cell. We have already seen that the
z Note that the correction preserves the interval relation among the subjects' between-condition difference scores. Hence, in this and in subsequent experiments, correcting for guessing has no effect on the statistical conclusions.
probability of an unrecallable item being recognizable is .49. Since the probability of an item being unrecallable is also .49, the probability of an item being unrecallable but recognizable is (.49 x .49 =) .24. The experiment therefore yields the entries shown in the upper panel of Table 1. The remaining entries are derived by subtraction, and the complete data are shown in the lower panel of Table 1. Of particular interest is the entry of the recallable-unrecognizable cell. This entry estimates the probability of a target item being in a recallable state but not in a recognizable state to be virtually zero. Discussion
The results of this experiment imply a positive association between recallability and recognizability. In fact, they are in excellent accord with the notion that recallable items are necessarily recognizable. At the very least, they strongly suggest that a recallable item is more likely than an unrecallable item to be in a recognizable state, and in this respect they are in sharp Contrast with the findings of Broadbent and Broadbent (1975). A recallability-recognizability association could, of course, arise for many reasons. Among the potentially relevant factors are the physical conditions of item presentation. For
626
WATKINS AND TODRES
instance, an instruction likely to enhance between-item variation in the amount of attention or in "depth of processing" is likely to enhance the recallability-recognizability association. In the present experiment, items were presented at an even rate and no attempt was made to cause subjects to vary their attention among the items. Nevertheless, it is possible that the observed association could be interpreted, at least in part, in terms of serial position effects. To take an obvious possibility, primacy items might tend to be both recallable and recognizable. The Broadbents controlled for possible effects of serial position by yoking items across the Recall-Recognition and Recognition conditions with respect to serial position, and discarding from the recognition analysis not only recalled items but also the yoked items in the Recognition condition. The design of the present experiment precludes any comparable analysis, and so it is difficult to be sure of the extent, if any, to which the observed association reflects the serial position factor. The question is, however, addressed in Experiment 2.
EXPERIMENT 2 The strategy of including the two testing conditions within one and the same testing sequence as a means of equating retention activity for the two recognition conditions was preserved in Experiment 2. However, controlling for item presentation required a somewhat more elaborate design than that of Experiment 1. In the first place, buffer categories were added to the lists in the primacy and recency positions. These categories were never tested, their purpose being merely to reduce any influence of serial position. A related modification was that a presentation list included, in addition to the two buffer categories, four rather than two critical categories, with the items of each category being presented in a single block. But the most important change was that in Experiment 2 each subject
was presented with two lists rather than one, and was tested for recall on complementary portions of the two lists. In other words, when an item in a given serial position in one of the lists was a target in the recall test, then the item presented in that position in the other list was not. By dropping from analysis of the Recognition condition items presented in the same serial positions as the recalled items in the other list, we were able to equate for the two testing conditions the effects of serial position on the recognizability estimates.
Method Materials
and design. Twenty-four instances from each of eight categories were selected from the Battig and Montague (1969) norms. The items of each category were randomly assigned to two equal sets, one of which was designated as the target set and the other as the recognition lure set. The eight target sets were formed into two 4-category lists. To each of these lists were added primacy and recency buffers, with each buffer set consisting of 12 items from a new category. All subjects saw the same two presentation lists. Following each list they engaged in the same sequence of tasks: distractor activity, a recall test, and a recognition test. Both the distractor task and the recognition test were the same for all subjects. In the recall test all subjects were required to recall the items from two of the four critical categories, but the pair of categories selected differed between subjects. Specifically, half of the subjects recalled items from the first and third critical categories of the first list and from the second and fourth critical categories of the second list, while the others recalled the remaining two critical categories of each list. In each recognition test the four critical categories, though not the items within the categories, were tested in the same order in which they had been presented. Each category included 12 target-lure pairs with the positions of the targets determined randomly. Thus, for each list the recognition test sheet
627
RECALL AND RECOGNITION
comprised four columns of 12 word-pairs, with each column headed by a category name. Subjects. Ten young adults from the Princeton community participated for pay. Proeedure. The subjects were given rather general instructions about the nature of the study lists and were told that their memory for the lists would be tested. The first list was then presented by means of a projector at a rate of one item every 2 sec. List presentation was followed by a 5-rain distractor task, which involved circling the odd numbers of a random numbers sheet. Subjects were then given a recall sheet, headed with the names of two of the categories from the study list. They were instructed to recall as many items from these categories as they could, and were urged to distribute their attention equally between the categories. Three minutes were allowed for this task, and the subjects were informed when the first and the second minutes had elapsed. The recognition test was given immediately following the recall test. The subjects were told to circle from each pair the word most likely to have occurred in the study list. No time restrictions were imposed on this test. After a 2-min pause, the whole procedure, from presentation to recognition test, was repeated with the second list.
Results and Discussion The purpose of Experiment 2 was to test the hypothesis that the positive association between recallability and recognizability observed in Experiment 1 was mediated by target presentation position. If this were so, then
recognition for unrecalled items should equal that for items not tested for recall but presented in corresponding positions in the study list (even though, as in Experiment 1, it might be lower than recognition for the untested set of items taken as a whole). The results, however, gave no support for this hypothesis. Recognition performance for the whole set of untested items was, in fact, the same as that for the position-yoked subset. In both cases the probability of a target item being circled was .87, which reduces (with high-threshold correction) to a likelihood of .74 of being in a recognizable state. By comparison, the mean probability of an unrecalled item being nominated in the recognition test was .73, which gives a mean recognizability estimate of .46. This probability was significantly lower than that for untested items, irrespective of whether the comparison is with all untested items [t(9) = 3.38, p < .01] or just the subset yoked by position with unrecalled items [t(9) = 3.31,p < .01J. In the recall test subjects produced the target items with a mean probability of .45, which reduced very slightly with guessing correction to .44. The fourfold contingency data were derived as described for Experiment 1 and are shown in Table 2. Note, in particular, that the estimate for the recallableunrecognizable cell turned out to be slightly, though of course not significantly [t(9) = 1.27, p > .2] less than zero; there was thus no hint of the recallable items being anything other than a subset of the recognizable items. In short, the results of this experiment imply that there is a positive association between the
TABLE 2 RECALLABILITY--RECOGNIZABILITYCONTINGENCYDATA FOR EXPERIMENT 2
Recallable Unrecallable Total
Recognizable
Unrecognizable
Total
.48 .26
-.04 .30
.44 .56
.74
.26
1.00
628
WATKINS AND TODRES
recallability and the recognizability of studied items which is mediated by factors other than serial position. Moreover, the association appears to be substantial, apparently to the point that the set of recallable items is totally included in the set of recognizable items. EXPERIMENT 3 The purpose of Experiments 3 and 4 was to explore the generality of the conclusion reached in Experiments 1 and 2. We were especially interested in whether the strong recallability-recognizability association would be found under other conditions of recall, and in particular under conditions involving the cueing of individual items. Although it may be reasonable to generalize our conclusion of a strong recallability-recognizability association to situations involving a typical free recall procedure (in which subjects are given slightly less recall information than were subjects in Experiments 1 and 2), it remains an empirical question whether such a strong association would occur when the recall test involves a specific cue for each target item. In the present experiment, the target items were drawn from taxonomic categories of the kind used in the previous two experiments, but this time there was only one item per category, so that the category names given as cues in the recall test each specified a single item.
Method Materials and design. Thirty-six categories were selected from among those listed by Murdock (1976) and from the Battig and Montague (1969) norms. Two exemplars were selected from each of these categories; one was allocated to one version of the study list and the other to a complementary version. To each of these lists were added five primacy and 15 recency buffer items; none of these items were instances of any of the 36 critical categories. There were thus two versions of the study list
that differed only in the identity of the category instances, the order of the categories within each list being the same. Half of the subjects saw one version of the study list, and half the other. Items were tested in an order which was the same for all subjects though random with respect to study order. The testing conditions of successive items were alternated. For half of the subjects (selected under the constraint that the two versions of the study list were equally represented) the odd numbered items were tested in the Recall-Recognition condition and the even numbered items in the Recognition condition; for the remaining subjects this arrangement was reversed. Recognition testing involved the same two-alternative forcedchoice test for all subjects. A given test item was therefore a target for half o f the subjects and a lure for the other half. Subjects. Twenty Princeton University undergraduates participated for pay. Procedure. The subjects were tested in small groups. They were first given instructions of a general nature, being told that their memory for the study list would be tested. The experimenter then read the study list at a rate of one item every second. Immediately following presentation of the list, the subjects were given a random numbers sheet and were told to circle the odd numbers. After engaging in this distractor activity for 2 min, they were given a test sheet numbered from 1 to 36. The test items were presented orally. For an item tested in the Recall-R ecognition condition, the experimenter first read out the relevant category name and allowed the subjects up to 20 sec to write the target item in a provided space; she then presented the target item together with a lure drawn from the same category, and the subjects selected one of these alternatives and entered it in the appropriate space on the test sheet. Testing in the Recognition condition involved only the second part of this procedure. The testing procedure continued in this manner for all 36 target items, with testing conditions alternating between items.
629
RECALL AND RECOGNITION
Results Recallability and recognizability were again found to be positively associated. That is, recognition for items unselected with respect to whether they could be recalled was higher than that for items that could not be recalled. Thus the proportion of correct choices in the recognition test was .80 for the Recognition condition and .66 for the unrecalled items in the Recall-Recognition condition; these proportions reduce with correction for guessing to .59 and .32 respectively. The difference between them is significant [t(19) = 3.55, p < .01]. The mean probability of recall was .35, which reduced with guessing correction to .32. By applying the reduction method we derived the recallable-recognizable contingency data shown in Table 3. Once again note the lack of evidence for items being in a recallable but unrecognizable state. The small negative estimate for this cell was, of course, not significantly different from zero [t(19) = 1.19, p > .2]. It appears that the set of recallable items is substantially, if not entirely, included within the set of recognizable items.
EXPERIMENT 4 Experiment 4 also tested the recallabilityrecognizability relation when the recall test involved target-specific cueing. It differed from Experiment 3 in the nature of the cue-to-target relation. Whereas this relation was of a taxonomically hierarchical kind in Experiment 3, it was of a more general associative nature in
Experiment 4. The stimulus materials were culled from among the association norms reproduced in the Postman and Keppel (1970) collection. Thirty-six word triads were selected, with each triad consisting of a stimulus word and two response words; care was taken to minimize the association between any given response term and the remaining 35 "inappropriate" stimulus terms. Examples of stimulusresponse sets are: diapers-INFANT, BABY; up-BELOW, DOWN; mouse-DOG, CAT. Thus one version of the study list included, among its 36 words, INFANT, BELOW, and DOG, while the complementary version included BABY, DOWN, and CAT. The cued recall test included the cues diapers, up, and mouse, and the recognition test the pairs INFANT-BABY, B E L O W - D O W N , and D O G - C A T . Sixteen subjects were tested in this experiment. The only significant procedural departure from Experiment 3 concerned the presentation mode of the study list: each word was printed on a separate flash card and was presented visually. The various counterbalancing procedures, number of buffer items, rate of presentation of both study and test items, mode of testing, use of a 2-min distractor task, and so on, were the same as in Experiment 3.
Results Once again, items for which no cued recall test was given were better recognized than items that could not be produced in response to the recall cue, indicating that recallability and recognizability are positively associated.
TABLE 3 RECALLABILITY--RECOGNIZABILITY CONTINGENCYDATA FOR EXPERIMENT 3
Recallable Unrecallable Total
Recognizable
Unrecognizable
Total
.37 .22
-.05 .46
.32 .68
.59
.41
1.00
630
WATKINS AND TODRES
The probability of a correct choice in the recognition test Jr, the Recognition condition (.78 which corrected to .56) was significantly greater than that for unrecalled items of the Recall-Recognition condition (.64, which corrected to .29) It(15) = 3.19, p < .01]. In the Recall-Recognition condition, the proportion of target items recalled was .38, which reduced with guessing correction to .31. The recallable-recognizable contingency data are given in Table 4. Consistent with Experiments 1 to 3, there was no evidence of any items being recallable yet unrecognizable; again, the small negative estimate for this cell was not significantly different from zero It(15) = 1.18,p > .2]. A PUZZLE The four experiments reported here are consistent in indicating that recallable items are more likely to be recognizable than are unrecallable items, whereas Broadbent and Broadbent's (1975) experiments are consistent in indicating that they are not. The reasons for this discrepancy are unclear. It is conceivable that it has to do with the Broadbents' failure to equate for the two conditions the retention activity prior to the recognition test. Unpublished research from our own laboratory, however, suggests that this is only a partial explanation. Thus, by replacing the recall test in our Recognition condition by activity of the sort used by the Broadbents, we obtained a pattern of results intermediate between those reported here and those reported by the Broadbents. Performance in the
Recognition condition was significantly higher than that for unrecalled items in the RecallRecognition condition, but not to an extent that would suggest all recallable items to be recognizable. The matter remains puzzling. DISCUSSION OF ASSUMPTIONS Underlying our conclusions are a variety of assumptions, and some of these warrant further discussion. We first consider assumptions of a general sort, and then those specifically concerned with the reduction method.
GeneralAssumptions As noted at the beginning of the paper, there are several facets of the relation between recall and recognition. The concern of this paper has been with what is perhaps the most tractable facet, the relation between recallability and recognizability. But to specify even this relation in terms of a four-fold contingency table requires at least two general assumptions. First, it requires the treatment of recallability and recognizability as dichotomous variables; that is, an item is treated as either recallable or not recallable and as either recognizable or not recognizable. It is well known, however, that both recall and recognition performance vary with the details of the testing conditions. It follows that any simple contingency relation should be qualified with respect to the conditions of recall and recognition involved. The present experiments have shown the same relation despite a considerable variation in conditions of recall, but there are certainly conditions of recall for which this relation would not
TABLE 4 RECALLABILITY--RECOGNIZABILITYC ONT1NGENCYDATA FOR EXPERIMENT 4
Recallable Unrecallable Total
Recognizable
Unrecognizable
Total
.36 .20
--.05 .49
.31 .69
.56
.44
1.00
RECALL AND RECOGNITION
hold. In particular, with Tulving and Thornson's (1973)"recognition failure" procedure, in which each target item is studied in a different context which is subsequently represented as the recall cue, recallability would not imply recognizability. The second general assumption concerns guessing. In the experiments reported here, both recall and recognition were corrected according to the high-threshold model of guessing, which distinguishes between a proportion "truly" recalled or recognized and a proportion "guessed." There are, however, serious problems in the application of this model to human memory (cf., e.g., Brown 1965b; Murdock, 1965). Indeed, as is clear from the widespread use of the signal detection model in human memory, many theorists would deny the possibility, or even the meaningfulness, of estimating recallable or recognizable proportions corrected for guessing. Such a view implies that to specify in any simple way the relation between recall and recognition, we must operationalize these variables not as recallability and recognizaability, but rather in terms of the target items being given in a recall test and of their being picked out in a recognition test. The obtained relation would then have to be qualified with respect to the extent to which nontarget items are given in the recall test and picked out in the recognition test. It is perhaps worth noting, however, that the pattern of results obtained in the research reported here (as well as the statistical conclusions, see Footnote 2) is the same whether the recall and recognition scores are left in their raw, uncorrected form or corrected according to the high-threshold model. Either way, the distribution of the recognition-related attribute subsumes that of the recall-related attribute.
Assumptions of the Reduction Method Consider now the assumptions associated with the reduction method. Of these, by far the most important concern the state of the memory traces of the critical, unrecalled items
631
at the time of the recognition test. Three assumptions may be distinguished. (i) Traces of unrecalled items are assumed not to be altered by the actual instruction to recall. This contention has been identified as the central and most questionable assumption underlying the reduction method, and as such it has been considered at some length elsewhere (Tulving & Watkins, 1975, Watkins & Tulving, in press). For the present, it is perhaps enough to note that there is no contrary evidence either from internal analyses of data subjected to the reduction method (Tulving & Watkins, 1975) or from other procedures (Watkins & Tulving, in press). (ii) It is assumed that recalling some of the target items does not affect the recognition of the remaining items. This assumption would become tenuous if overall recognition performance were found to be lowered by a prior recall test, and it happens that an effect of this kind has from time to time been reported in the literature. The seriousness of the problem, however, is tempered by two considerations. First, the effect appears to be of the now-yousee-it-now-you-don't variety. Thus, while some (e.g., Kay & Skemp, 1956) have succeeded in demonstrating it, others (e.g., Hanawalt & Tarr, 1961) have not. It might also be noted that no evidence for the effect was obtained in the experiments reported here. It is perhaps prudent, therefore, to accept the conclusion of Postman, Kruesi, and Regan (1975; see also Cooper & Monk, 1976) that the question should remain open. The second Consideration is that even if there were a detrimental effect of recall on overall recognition, and even if the effect were substantial, it would affect the observed relation between recallability and recognizability 0nly if the inhibitory effect were category specific. Although more direct research on this point is needed, the available evidence suggests that, though there may well be category-specific interference effects of encountering some items on the recall of the items of the same category (e.g., Mueller & Watkins, 1977; Roediger, 1973), there are no
632
WATKINS AND TODRES
such interference effects on recognition (Slamecka, 1975), (iii) The third assumption concerning traces of unrecalled items is that they remain unaffected by encounters with earlier items in the recognition test. This assumption is clearly acceptable for Experiments 3 and 4, for here the recognition test was given for each item directly following the attempted recall. In Experiments 1 and 2, which involved a single recognition test, retroactive interference from prior test items would be expected (cf., e.g., Donaldson & Murdock, 1968). It turned out, however, that recognition performance showed no decline at all in either experiment from the first to the second half of the test. Moreover, it is worth noting that even if there were substantial retroactive inhibition, it would not necessarily affect the essential pattern of association as given by a four-fold contingency table. Indeed, if the relation between recallability and recognizability in the absence of any prior recognition testing is either one of total inclusion or one of total independence, then any decline in recognition performance during the recognition test should have no effect whatsoever on the observed relation.
C ONCLUSION
It was argued earlier in the paper that the primary-level problems in applying a double test procedure to determine the contingency relation between recallability and recognizability could be avoided by combining data from Recognition and Recall-Recognition testing conditions according to the reduction method (TuNing & Watkins, 1975). And we have just concluded that even secondary-level problems incurred by a double testing strategy are likely to be minor. In short, the procedure and analyses described here should provide a reasonably valid specification of the contingency relation. It may therefore be concluded with some confidence that, under the general conditions of the present experiments, the set
of recallable items is substantially if not wholly includedwithin the set of recognizable items. REFERENCE S BALLARD, P. B. Obliviscance and reminiscence. British Journal of Psychology Monographs, 1913, 1, 1-82. BATTIG, W. F., & MONTAGUE, W. E. Category norms for verbal items in 56 categories: A replication and extension of the Connecticut category norms.
Journal of Experimental Psychology Monograph, 1969, 80, (3, Pt. 2). BROADBENT, D. E. In defence of empirical psychology. London: Methuen, 1973. BROADBENT, D. E., & BROADBENT, M. H. P. The recognition of words that cannot be recalled. In P. M. A. Rabbitt & S. Dornic (Eds.), Attention and performance V. New York: Academic Press, 1975. BROADBENT, D. E., & BROADBENT, M. H. P. Effects of recognition on subsequent recall: Comments on "Determinants of recognition and recall: Accessibility and generation" by Rabinowitz, Mandler, and Patterson. Journal of Experimental Psychology: General, 1977, 106, 330-335. BROWN, J. A comparison of recognition and recall by a multiple-response method. Journal of Verbal Learning and VerbalBehavior, 1965, 4, 401-408. (a) BROWN, J. Multiple response evaluation of discrimination. British Journal of Mathematical and StatisticalPsychology, 1965, 18, 125-137. (b) BROWN, J. Recall and recognition. London: Wiley, 1976. CAREY, S. T., & LOCKHART, R. S. Encoding differences in recognition and recall. Memory & Cognition, 1973, 1, 297-300. COOPER, A. J. R., & MONK, A. Learning for recall and learning for recognition. I n J . Brown (Ed.), Recall and recognition. London: Wileyl, 1976. DONALDSON, W., & MURDOCK, B. B., JR. Criterion change in continuous recognition memory. Journal of ExperimentalPsychology, 19!68, 76, 325,330. HALL, J. W., GROSSMAN, L. R., & ELWOOD, K. D. Differences in encoding for free recall vs. recognition. Memory & Cognition, 1976, 4, 507-513. HANAWALT, N. G. Memory trace for figures in recall and recognition. Archives of Psychology, 1937, 31, 1-89. HANAWALT, N. G., & TARR, A. G. The effect of recall upon recognition. Journal of'~'Experimental Psychology, 1961, 62, 361-367. ~ HOLLINGWORTH, H. L. Characteristic differences between recall and recognition. American Journal of Psychology, 1913, 24, 532-544. KAY, H., & SKEMP, R. Different thresholds for recognition-Further experiments on interpolated recall and recognition. Quarterly Journal of Experimental Psychology, 1956, 8, 153-162.
RECALL AND RECOGNITION KIRKPATRICK, E. A. An experimental study of memory. Psychological Review, 1894, 1, 602-609. MAcDOUGALL, R. Recognition and recall. Journal of
Philosophy, Psychology, and Scientific Methods, 1904, l, 229-233. MUE55ER, C. W., ~; WATKINS, M. J. Inhibition from part-set cuing: A cue-overload interpretation. Jour-
nal of Verbal Learning and Verbal Behavior, 1977, 16, 699-709. MURDOCK, B. B., JR. Signal-detection theory and shortterm memory. Journal of Experimental Psychology, 1965, 70, 443-447. MURDOCK, B. B., JR. Item and order information in short-term serial memory. Journal of ExperimentalPsychology: General, 1976, 105, 191-216. POSTMAN, L., & KEPPE5, G. (Eds.), Norms of word association. New York: Academic Press, 1970. POSTMAN, L., KRUESI, E., & REGAN, J. Recognition and recall as measures of long-term retention. Quarterly Journal of Experimental Psychology, 1975, 27, 411-418. ROEDIGER, H. L., III. Inhibition in recall from cueing
633
with recall targets. Journal of Verbal Learning and VerbalBehavior, 1973, 12, 644-657. S5AMECKA, N. J. Intralist cueing of recognition. Journal of Verbal Learning and Verbal Behavior, 1975, 14, 630-637. TULVING, E., & THOMSON, D. M. Encoding specificity and retrieval processes in episodic memory. PsychologicalReview, 1973, 80, 352-373. TULVING, E., ~¢. WATKINS, M. J. Continuity between recall and recognition. American Journal of Psychology, 1973, 86, 739-748. TULVING, E., & WATKINS,M. J. Structure of memory traces. Psychological Review, 1975, 82, 261-275. WATK1NS, M. J., & TULVING, E. When retrieval cueing fails. British Journal of Psychology. In press. ZANGWILL, O. L. An investigation of the relationship between the processes of reproducing and recognizing simple figures with special reference to Koffka's trace theory. British Journal of Psychology, 1937, 27, 250-276. (Received March 27, 1978)