Consciousness and Cognition 42 (2016) 41–50
Contents lists available at ScienceDirect
Consciousness and Cognition journal homepage: www.elsevier.com/locate/concog
The influence of retrieval practice on metacognition: The contribution of analytic and non-analytic processes Tyler M. Miller a,⇑, Lisa Geraci b a b
Department of Psychology, South Dakota State University, United States Department of Psychology, Texas A&M University, United States
a r t i c l e
i n f o
Article history: Received 22 July 2015 Revised 4 March 2016 Accepted 6 March 2016
Keywords: Metacognition Overconfidence Retrieval practice Inferential processes
a b s t r a c t People may change their memory predictions after retrieval practice using naïve theories of memory and/or by using subjective experience – analytic and non-analytic processes respectively. The current studies disentangled contributions of each process. In one condition, learners studied paired-associates, made a memory prediction, completed a short-run of retrieval practice and made a second prediction. In another condition, judges read about a yoked learners’ retrieval practice performance but did not participate in retrieval practice and therefore, could not use non-analytic processes for the second prediction. In Study 1, learners reduced their predictions following moderately difficult retrieval practice whereas judges increased their predictions. In Study 2, learners made lower adjusted predictions than judges following both easy and difficult retrieval practice. In Study 3, judge-like participants used analytic processes to report adjusted predictions. Overall, the results suggested non-analytic processes play a key role for participants to reduce their predictions after retrieval practice. Ó 2016 Elsevier Inc. All rights reserved.
1. Introduction Metacognition refers to people’s awareness, knowledge, and control of their cognitive abilities (Flavell, 1979). Metacognitive awareness, or monitoring, can be measured in a variety of ways (e.g., Nelson & Narens, 1990). For example, the ability to monitor one’s memory can be measured using prospective judgments, in which the person makes a prediction about future memory performance, and using retrospective judgments, in which the person makes a judgment about past performance (Arbuckle & Cuddy, 1969; Hart, 1965). In both cases, the accuracy of these metacognitive monitoring judgments can be assessed by determining how well these judgments correspond with past or future performance using difference scores, gamma correlations, and receiver operating characteristics among other measures (Cheng, 2010; Fleming & Lau, 2014; Nelson, 1996). What is more difficult to determine, though, is how people make these judgments and what information they use to assess how well they will or have performed on a cognitive test (for one account of what information people use to make monitoring judgments see Nelson & Dunlosky, 1991; for a review see Schwartz, 1994). The cues people use to make monitoring judgments can be broadly categorized as either analytic (or theory-based) or nonanalytic (experience-based; Koriat, Bjork, Sheffer, & Bar, 2004; Kelley & Jacoby, 1996). Kelley and Jacoby suggested that analytic cues arise from people’s beliefs or theories about memory and the factors that influence memory performance, whereas nonanalytic cues arise from people’s subjective experiences of performing the task. Similarly, Koriat (1997) offered ⇑ Corresponding author at: Department of Psychology, South Dakota State University, Brookings, SD 57007, United States. E-mail address:
[email protected] (T.M. Miller). http://dx.doi.org/10.1016/j.concog.2016.03.010 1053-8100/Ó 2016 Elsevier Inc. All rights reserved.
42
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
a framework for describing the types of cues that affect memory predictions. Intrinsic cues were defined as ones that are inherent to the studied items, such as the relationship between studied associates (e.g., nurse–doctor). Extrinsic cues were defined as ones that are not inherent to the studied items that could include a variety of factors such as people’s lay theories of memory (e.g., how repetition influences memory). Mnemonic cues were defined as those that are related to the person’s experience studying the information (thoughts about the items, reminiscence about prior related experiences, etc.). Of course, people likely use all of these types of cues to make monitoring judgments. For example, a person could study a distinctive item (an item printed in large font amongst other small font items), and use the item features and the fluent processing experience, along with a theory about distinctive items or fluent processing being well remembered, to inform their prediction about whether an item will be remembered at a later point. In this paper we will focus on the distinction between analytic and nonanalytic processes. There have been a few attempts to assess the separate contributions of analytic and nonanalytic processes to metacognitive judgments. For example, research shows that participants judge that they will remember words printed in larger fonts better than they will remember words printed in smaller fonts (Rhodes & Castel, 2008; see also Kornell & Bjork, 2009; Kornell, Rhodes, Castel, & Tauber, 2011). In the Rhodes and Castel study, participants made a judgment of learning (JOL) after studying words that were printed in large and small fonts and were later asked to recall these words. Results showed that participants gave higher JOLs for words presented in large fonts compared to those presented in small fonts, whereas recall was not affected by font size. Thus, the authors concluded that the perceptual font information lead to an illusion of memory. One explanation for this illusion is that the font manipulation influenced participants’ non-analytic (or experience-based) processing. The idea is that participants interpreted the relative ease of reading or processing words printed in large font as a benefit for later memory performance. The assumption is that participants have different subjective experiences for larger compared to smaller font words and that they use these experiences as cues to make judgments about the future memorability of these items. But, it could be that people gave higher JOLs to items printed in larger fonts than those printed in smaller fonts because they believed that more salient perceptual features would benefit memory performance, an analytic (or theory-based) process. Recently, evidence has emerged that provides evidence against the non-analytic account—that large fonts are processed more fluently—and supports the analytic interpretation instead (Mueller, Dunlosky, Tauber, & Rhodes, 2014). Mueller et al. showed that items presented in large fonts were not processed more fluently than the items presented in small fonts, as evidence by similar lexical decision and study times for large and small font items, despite being assigned higher JOLs. Further, they found that people predicted that others would recall more large- than small-font items. Here, the possibility of using non-analytic (or experience-based) processes to inform recall predictions was eliminated and yet font size affected predictions, suggesting that people use analytic (theory-based) processes when predicting performance. Finally, participants gave higher JOLs to large-font relative to small-font items before they even saw the items (using a pre-study JOL paradigm; Castel, 2008), again suggesting that people used analytic processes to make JOLs for large and small font items. Thus, this study provided evidence for the role of analytic processes for making JOLs that are based on perceptually salient cues. There have been other attempts to estimate the separate contributions of analytic and non-analytic processes by preventing access to one process. One way to do this is to use a method similar to the one used by Mueller et al. (2014) in which participants make a judgment about another participant’s recall, thus preventing the use of non-analytic (experiencebased) processes from contributing to their judgments(e.g., Matvey, Dunlosky, & Guttentag, 2001; Vesonder & Voss, 1985). In this paradigm some participants are assigned to be learners, and experience the full range of experimental tasks potentially using both analytic and non-analytic processes to assess their learning. Other participants are assigned to be judges, and are simply given information about the learners’ performance, and are asked to make a future performance prediction. By design, the judges can only use analytic processes to make judgments about the learner’s experience and performance, while the learners can use either or both types of processes. In the Matvey et al. (2001) study, learners studied cue-target pairs for a later recognition memory test and were either asked to read the cue and the target item, or generate a rhyming target in response to the cue (e.g., cave - s _ _ _, for cave - save). After each item, learners made a JOL about their future memory performance for that item. Judges saw the outcome of a learner’s attempt to generate the rhyming target (thus removing the experience of attempting to generate the rhyming word, a nonanalytic process), and then made a prediction about how likely the learner would be able to recognize the target. For learners, the longer it took them to generate the target in the rhyme condition the less likely they were to predict that they would recognize the target on a later memory test. Their subjective experience of generating the targets served as a cue for the monitoring judgment. In contrast, the JOLs judges reported for the learners were not influenced by the learner’s time to produce the target item at study. The results demonstrate a case in which people use nonanalytic processes to make JOLs. Thus, there is good evidence from a variety of paradigms showing that participants can use both types of processes to make future memory predictions. In the Mueller et al. study described earlier, participants who relied on analytic cues gave similar monitoring judgments to learners (who used both analytic and non-analytic processes). On the other hand, in the Matvey et al. study, participants who only used analytic processes gave JOLs that were significantly different than the learners’ JOLs. More recently research using item-by-item JOLs indicates that subsequent predictions are affected primarily by non-analytic processes (Serra & Ariel, 2014). Serra and Ariel suggest that participants use a memory for past test (MPT) strategy whereby they make JOLs based on their previous performance on the same to-be-remembered items. Taken together, these studies show that people can use both analytic and non-analytic processes to make JOLs and that the contribution of the two inferential processes can be separated.
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
43
The goal of the current studies was to examine the contribution of analytic and non-analytic processes to participants’ changes in their memory predictions following retrieval practice. Research shows that people can improve the accuracy of their memory predictions if they participate in retrieval practice (even a very brief amount of practice) prior to making a final test prediction (e.g., Miller & Geraci, 2014). In the Miller and Geraci study, participants studied paired associates, made a global prediction about their future memory performance, attempted to retrieve a sample of the studied items, and finally made a second global performance prediction before taking the criterion memory test. The results indicated that participants’ first performance predictions exhibited overconfidence and that failing to retrieve practice items led to the most change in participants’ second performance predictions, compared to succeeding at retrieval practice. The interpretation was that failure or difficulty retrieving practice items served as a powerful metacognitive cue that allowed participants to improve their subsequent metacognitive predictions. However, the Miller and Geraci (2014) study did not disentangle the contributions of analytic and nonanalytic cues that might have led participants to change their memory performance predictions. For example, it is possible that during retrieval practice, if participants failed to retrieve 2 out of the 4 items, for example, they could use analytic information to inform their following prediction. They might reason that if they don’t remember the items now, they probably won’t remember them on the final test. Or they could use nonanalytic information and reason that if the practice items didn’t come to them very quickly and some didn’t come to them at all, they probably won’t remember them on the final test. Identifying the relative contributions of these analytic and nonanalytic processes is useful for developing a complete theory about how participants change their predictions following retrieval practice and for identifying appropriate interventions to improve people’s metacognitive accuracy. The present study used a learner-judge paradigm to examine the contribution of both types of inferences to changes in global performance predictions following retrieval practice. If the changes participants make to their performance predictions after retrieval practice are due to non-analytic (experience-based) cues, then when this experiential component of retrieval practice is removed, as is the case in the judge condition, people should be insensitive to retrieval practice and show little to no change from original to adjusted prediction. On the other hand, if the changes participants make to their predictions following retrieval practice are due to analytic (theory-based) cues, then judges and learners should both change their performance predictions following retrieval practice. 2. Study 1 2.1. Method 2.1.1. Participants Fifty participants (27 females) aged 18–37 years (M = 19.69) participated in the study for partial course credit. One participant in the judge condition did not report all predictions therefore the judge and the yoked-learner’s data were excluded. All analyses that follow are based on 48 participants. 2.1.2. Design Participants were randomly assigned to condition (Learner or Judge).Each participant made two performance predictions – an Original and an Adjusted performance prediction. The dependent variable was prediction change (Adjusted Original). Therefore, a positive mean prediction change indicated that the participant increased their adjusted prediction and a negative mean prediction change indicated that the participant decreased their adjusted prediction. For example, if the participant originally predicted they would recall 16 (40% of the items or .40) paired associates but then decreased their prediction to 10 (.25) paired associates after retrieval practice, the value for prediction change would be .15. 2.1.3. Materials and procedure A sample of 40 Lithuanian-English paired-associates taken from Grimaldi, Pyc, and Rawson (2010) was used for study. Four items were chosen to be the retrieval practice items. The four items were comprised of two difficult- and two easyto-remember items based on normative data and had a mean recall of .27 after one study session (Grimaldi et al.). We used this composition of retrieval practice items because a previous study suggested that this type of retrieval practice led to improvements in prediction accuracy (Miller & Geraci, 2014). Participants in the Learner condition studied paired-associates presented on the computer one at a time for 8 s per word pair. Following study, learners made an original performance prediction (called the original prediction) and attempted to retrieve the four practice items. Prior to making the second (adjusted) performance prediction, learners read the following instructions – ‘‘Based on your experience attempting to recall the English equivalent of ‘sesuo’, ‘namas’, ‘muilas’, and ‘kamuolys’ please make a new performance prediction so that it can be as accurate as possible. Your prediction about how many word pairs you will be able to remember on the memory test can go up, go down, or stay the same. Enter a number from 0-40 below for your prediction.” Participants in the Judge condition studied paired associates in the same manner as the learners. However, judges did not make an original performance prediction or practice retrieving any items. Rather, they were informed of a yoked learner’s original performance prediction and the results of the learner’s retrieval practice; that is, judges were told how the yoked
44
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
learner responded for each practice item. For example, for the first practice item ‘‘sesuo-sister” judges may have read” ‘‘The participant then typed that the English equivalent was sister.” Finally, judges were asked to report an adjusted performance prediction as a whole number for the learner for the upcoming memory test that would contain 40 items. The following is an example instruction that one judge received: You just studied 40 Lithuanian - English word pairs. Another participant was asked to study the same 40 words pairs. Following study, the participant was asked to predict his or her memory performance before the upcoming memory test. The participant predicted that he or she would remember 20 out of 40 items. The participant was then given the opportunity to answer 4 practice items. The participant was given the Lithuanian word and attempted to recall its English equivalent. At this point in the procedure, judges read each of the four practice items (e.g., sesuo – sister) and was given information about how the learner performed on each practice item. Next the judge read: The participant was then asked to adjust his or her prediction based on attempts to answer the practice items. If you were the participant, what would your adjusted performance prediction be from 1-40? 2.2. Results and discussion Results showed that following retrieval practice, learners decreased their predictions by nearly two items (M difference score = .06, SD = .13) whereas judges increased their predictions by nearly three items (M difference score = .08, SD = .12) (see Table 1, Fig. 1). Comparing these difference scores showed that participants were influenced by their assignment to either the Learner or Judge condition (F(1, 46) = 13.40, MSE = .01, p < .001, partial g2 = .23). These results demonstrate that retrieval practice information only reduced performance predictions for learners who had access to non-analytic information. Next we report the retrieval practice performance and monitoring accuracy. Overall, learners recalled fewer than half of the retrieval practice items (M = .37, SE = .06). Learners who recalled half or more of the retrieval practice items (n = 12) did not change their predictions (M difference, adjusted prediction original prediction = .01, SE = .13). In contrast, learners who recalled fewer than half of the retrieval practice items (n = 13) reduced their predictions (M difference, adjusted prediction original prediction = .11, SE = .09). In terms of monitoring accuracy, learners reported more accurate adjusted predictions than original predictions (t(24) = 2.18, p = .039). In fact, after retrieval practice, learners were extremely accurate in their predictions. There was less than a 1% difference between their adjusted predictions and their memory performance, suggesting that learners went from being overconfident about their memory performance to being metacognitively accurate. Learners also reported more accurate adjusted predictions than judges (t(47) = 2.91, p = .006). Study 1 showed that learners reduced their performance predictions following retrieval practice whereas judges did not, indicating that retrieval practice influenced non-analytical processes to which learners had primary access. In Study 2 we manipulated retrieval practice difficulty and examined the relative contribution of analytic and non-analytic processes to participants’ performance predictions. Differences in retrieval practice difficulty may affect subsequent performance predictions because they affect non-analytic processes—retrieval practice creates either an experience of easy retrieval or one of difficult retrieval. Retrieval practice difficulty could also affect subsequent performance predictions because people observe that a person has either difficulty or easy recalling information. If these analytic processes (observing failure) contribute primarily to changes in performance predictions following retrieval practice, then there should be no difference between learners and judges adjusted predictions. If only non-analytic processes contribute to performance predictions following retrieval practice, then only learners should be affected by retrieval practice. And finally if non-analytic processes contribute to changes in performance predictions following retrieval above and beyond analytic processes, then learners should be more affected by retrieval difficulty than judges. Specifically, learners’ adjusted predictions should be lower than judges’ adjusted predictions because they have the additional subjective experience associated with personal retrieval practice. The converse could be true—that analytic processes contribute above and beyond non-analytic processes. However, based on the results from Study 1, we hypothesize that non-analytic processes will play a unique role in influencing predictions following retrieval practice.
Table 1 Original and adjusted performance predictions, retrieval practice performance, and memory performance for learners and judges in Study 1 expressed as a proportion of total items. Condition
Original prediction
Retrieval practice performance
Adjusted prediction
Memory performance
Learner Judge
.27 (.03) .27 (.03)
.37 (.06) n/a
.22 (.04) .35 (.03)
.22 (.03) n/a
Note. Judges did not engage in retrieval practice or take the final memory test. Standard error values in parentheses.
45
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
Fig. 1. Learners’ and judges’ original and adjusted performance predictions as a proportion of total items in Study 1. Error bars represent standard error.
3. Study 2 3.1. Method 3.1.1. Participants One hundred participants (78 females) aged 16–33 years old (M = 18.97) participated in the study for partial course credit. 3.1.2. Design Participants were randomly assigned to condition (Learner or Judge) and retrieval practice difficulty (Easy or Difficult). There were two performance predictions for each participant – an Original and an Adjusted performance prediction. The dependent variable was prediction change (Adjusted Original). 3.1.3. Materials and procedure We used the same materials as those used in Study 1, with the exception that the difficulty of the practice items was varied. We used four practice items that either led to good or poor memory performance based on previous norming (Grimaldi et al., 2010). Previous norming, indicated that the four easy retrieval practice items had a mean recall of .48 and the difficult retrieval practice items had a mean recall of .04 after one study session (Grimaldi). The procedure for Study 2 was nearly identical to the procedure for Study 1 except that there were two levels of retrieval practice difficulty for the learners (Easy or Difficult). Participants were not told they were in the easy or difficult retrieval practice. 3.2. Results and discussion Similar to Study 1, results showed that the way participants changed their predictions was influenced by condition (F(1, 96) = 5.35, MSE = .27, p = .023, partial g2 = .05). Regardless of retrieval practice difficulty, learners decreased their predictions (M = .07, SD = .18) whereas judges increased their predictions (M = .04, SD = .27) (see Table 2, Fig. 2). There was also a main effect of retrieval practice difficulty (F(1, 96) = 7.24, MSE = .37, p = .008, partial g2 = .07). Participants in the Easy retrieval practice condition increased their performance predictions (M = .04, SD = .24) whereas participants in the Difficult retrieval practice condition decreased their performance predictions (M = .08, SD = .22). Finally, the interaction effect was not significant (F < 1, p = .739). Follow up t-tests comparing learners’ and judges’ adjusted predictions showed that learners’ predictions were numerically, although not statistically, lower than judges’ predictions in the easy retrieval practice condition
Table 2 Original and adjusted performance predictions, retrieval practice performance, and memory performance for learners and judges in each retrieval practice difficulty condition of Study 2 expressed as a proportion of total items. Condition
Retrieval practice performance
Adjusted prediction
Memory performance
Easy retrieval practice Learner .36 (.03) Judge .36 (.03)
Original prediction
.61 (.05) n/a
.36 (.03) .44 (.05)
.27 (.03) n/a
Difficult retrieval practice Learner .35 (.03) Judge .35 (.03)
.18 (.03) n/a
.21 (.03) .33 (.04)
.34 (.02) n/a
Note. Judges did not engage retrieval practice or take the final memory test. Standard error values in parentheses.
46
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
Fig. 2. Learners’ and judges’ original and adjusted performance predictions for both easy and difficulty retrieval practice conditions as a proportion of total items in Study 2. Error bars represent standard error.
(t(48) = 1.34, p = .185, d = .38) and statistically lower in the difficult retrieval practice condition (t(48) = 2.46, p = .018, d = .69). Both learners and judges were affected by manipulations but learners adjusted predictions were always lower suggesting that non-analytic processes played a critical role above and beyond that of analytic processes in reducing predictions. As in Study 1, we also examined retrieval practice performance and metacognitive monitoring accuracy. Overall, learners in the Easy Retrieval Practice condition recalled more than half of the practice items (M = .61, SE = .05) and learners in the Difficult Retrieval Practice condition recalled less than one item (M = .18, SE = .13). Learners, in either condition, who recalled half or more of the items (n = 24) did not change their predictions (M difference, adjusted prediction original prediction = .02, SE = .03). In contrast, learners who recalled fewer than half of the retrieval practice items (n = 26) reduced their predictions (M difference, adjusted prediction original prediction = .15, SE = .03). For monitoring accuracy, learners’ adjusted predictions were more accurate than judges’ adjusted predictions (F(1, 98) = 4.77, MSE = .06, p = .031, partial g2 = .05). Furthermore, for the learners, difficult retrieval practice led to more underconfidence than learners in the easy retrieval practice condition (F(1, 48) = 22.31, p < .001, partial g2 = .32), which is consistent with previous findings in the literature (i.e., Miller & Geraci, 2014). 4. Study 3 One possible limitation of the previous studies is that participants in the Judge condition studied the Lithuanian-English paired-associates just like participants in the Learner condition. So, even though Judges did not complete any retrieval practice, they could still be using mnemonic cues when they reported an adjusted performance prediction for the yoked learner. To examine this hypothesis, we conducted a study in which we eliminated all experience with the to-be-remembered paired associates. Judges read a short vignette about another fictional participant that had completed the study earlier that day (participants did not know they were reading about a fictional participant). Judges read information about the fictional participant’s original prediction and retrieval practice performance. Then they were asked to report what they believed was the fictional participant’s adjusted prediction. Importantly, all participants in Study 3 read the same vignette, in which the participant predicted 14 items originally and recalled 2 out of 4 (50%) items during retrieval practice. If Judges use only analytic information to make adjusted predictions, then the results of Study 3 should parallel those of the previous studies. That is, judges will report increased adjusted predictions; such a change would not be due to any non-analytic processing. 4.1. Method 4.1.1. Participants Thirty participants participated in the study for partial course credit. Although no data on demographic variables was collected, all participants were recruited from the same participant pool as the previous studies. 4.1.2. Design All participants made up a single-group for a one-sample t-test. Participants’ reported prediction was the dependent variable. Our hypothesis was that judges would use analytic cues when reporting an adjusted prediction. This hypothesis would be supported by a non-statistically significant t-test comparing participants’ prediction to a reference value of 20, which is a direct extrapolation of 50% retrieval practice performance to 50% memory test performance.
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
47
4.1.3. Materials and procedure In the vignette, participants read that the participant reported an original prediction of 14 items and recalled 2 out of the 4 items (or 50%) during retrieval practice. We chose an original prediction of 14 items and 50% retrieval practice performance because both figures were approximately what participants reported and approximately how they performed during retrieval practice in Study 2 (see Table 2). The vignette each participant read was as follows: Earlier today, a participant studied 40 items of Lithuanian words and their English equivalents. For example, ‘‘sesuosister” was one item the participant studied. After studying the 40 items, the participant predicted he would remember 14 items, or 35%, on a memory test in which he was given the word ‘‘sesuo” and had to write down ‘‘sister”. But before the memory test occurred, the same participant completed a PRACTICE TEST with 4 items that looked exactly the same as what the real memory looked like. So for example, the participant was given ‘‘sesuo” and had to write down ‘‘sister”. In all, the participant completed 4 practice test items and correctly remembered 2 out of the 4 practice items. In other words, he correctly recalled 50% of the practice items. Using his experience from the practice test, if the same participant was asked to make a NEW PREDICTION about his performance on the real memory test, how many items out of 40 do you think he would predict he would remember? Recall that his first prediction was 14 and he correctly remembered 50% of the practice test items. After reading the vignette, participants reported a whole-number adjusted prediction for the fictional participant and provided a rationale for the new prediction. We predicted that removing Judges’ experience with the to-be-remembered items would lead participants to report an adjusted prediction that was higher than the original prediction of 14 items. Finally, participants were asked to provide a rationale for their adjusted predictions. They were asked: Thinking back to what you wrote down in the blank above, why do you think you predicted that the person would get that many items correct on the memory test? Use the space below to explain how you came up with the prediction. 4.2. Results and discussion Participants in the follow-up study reported an average adjusted prediction that was higher than the fictional participant’s original performance prediction (M = 20.9, SE = .87). In fact, 29 out of 30 participants reported an increased adjusted prediction compared to the fictional participant’s original prediction of 14 items. A one sample t-test comparing Study 3 participants’ adjusted predictions to 20 revealed non-significant difference (t(29) = 1.07, p = .30). That participants’ adjusted predictions were so similar to the fictional participant’s retrieval practice performance suggests that the participants were using that information to report an adjusted prediction. Examining participants’ rationale for reporting the adjusted prediction revealed that participants considered the fictional participant’s retrieval practice performance (50%). Twenty-two out of 30 participants wrote something about the fictional participant’s retrieval practice performance. Some participants then reasoned that the fictional participant’s retrieval practice performance was proportionally greater than the original prediction and thought it was reasonable for the adjusted prediction to be increased. In fact, 20 participants explicitly referred to the participant’s 50% retrieval practice performance and many of those directly stated that because the participant recalled 50% of the items on the practice test, he would get 50% correct on the real memory test. For example, one participant wrote ‘‘In the practice test, he got 50% right so I think that he would get 50% on the actual test which is 20.” The pattern of results from the follow-up study is consistent with the pattern of results from both Study 1 and 2. Learners in the previous studies decreased their performance predictions following retrieval practice whereas Judges tended to report increased performance predictions following retrieval practice. In the follow-up study, participants, whose experiences were similar to judges in the previous studies, reported increased performance predictions. Thus, taken together, the results from Studies 1–3 suggest that judges used analytic processes to make their adjusted predictions, and that doing so resulted in a different pattern of adjusted predictions from the learners, who also used non-analytical processes to make their predictions. 5. General discussion Prior research shows that participants decrease their performance predictions following retrieval practice, particularly following difficult retrieval practice (see Miller & Geraci, 2014). This effect could occur because retrieval practice affects people’s beliefs about their memory (an analytic process) or because it affects their subjective experiences with memory (a nonanalytic process). The results of Study 1 showed that learners, who had access to analytic (theory-based) and non-analytic (experience-based) processes, decreased their predictions following retrieval practice. Further, the results of Study 2 indicated that non-analytic processes contributed more to prediction changes following different levels of retrieval practice difficult than did analytic processes. In contrast, judges, who had access only to analytic (theory-based) information, either did not change their predictions or they increased them after reading about the learners’ retrieval practice. Study 3 provided additional evidence that judges used analytic processes to make their adjusted predictions, and that doing so resulted in
48
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
a different pattern of adjusted predictions from the learners. Previous work that has investigated these two sources of information has suggested that, while both processes can influence participants’ predictions, analytic processes exert more influence on JOLs (Matvey et al., 2001). Our data demonstrate that following retrieval practice, non-analytic processes exert more influence over subsequent performance predictions. Thus, the current results suggest that people rely on their subjective experiences during retrieval (a non-analytic, experience-based process) when making subsequent performance prediction adjustments. Without direct experience with retrieval practice, judges had to rely on other information to predict performance. For example in Study 1 the learners’ performance on the retrieval practice items was just fewer than half of the items (M = .37). The average adjusted prediction of the judges was very similar (M = .35), suggesting that judges were in fact taking the learners’ retrieval practice performance into account. Similarly in Study 3, the vignette stated the learners recalled half of the items during retrieval practice (i.e. M = .50) and judges reported that the learners would recall about half of the items (M = .52). Again suggesting to us that judges were taking learners’ retrieval practice performance into account. The judges’ adjusted predictions in Study 2 reveal a similar pattern of reporting predictions that are closer to the retrieval practice performance than the original predictions. That is, learners in the easy retrieval practice condition recalled over half of the items (M = .61) and the judges’ average adjusted prediction was M = .44, up from the original prediction (M = .36). Furthermore, learners in the difficult retrieval practice condition recalled under half of the items (M = .18) and the judges’ average adjusted predictions was M = .33, down from the original prediction (M = .35). Our interpretation of the results is that the judges used the learner’s performance on the practice items and their naïve theories, or beliefs, about how memory operates to make their predictions of the learners’ eventual memory performance. Judges may have taken the learner’s retrieval practice performance as an indicator of how the learners would perform on the test. That is, if the learner performed well on the practice test, the judge may have expected the learner to perform as well or even better on the actual memory test. Judges may have considered retrieval practice performance and extrapolated to the memory test by reporting a higher prediction. For example, if the learner correctly retrieved 2 out of the 4 practice items (50%), the judge might have extrapolated 50% performance during retrieval practice to 50% during the memory test. Analyses of the follow-up study results (4.2) converge on this interpretation. That is, after reading the vignette about the fictional participant’s original prediction and 50% retrieval practice performance, almost all of the judge-like participants reported higher adjusted predictions. Furthermore, their reported adjusted performance predictions were not statistically different than 20, or 50%. Therefore, judges were likely considering retrieval practice performance as an analytic cue for eventual memory test performance. We showed that following a short-run of retrieval practice, learners improved their prediction accuracy. In contrast, judges did not improve prediction accuracy, suggesting that the improvement in prediction accuracy following retrieval practice arises from participants’ use of non-analytical processes. Of course we only used four practice test items, and if given many more practice items one would predict that eventually both the learners and the judges would become highly accurate. Dunlosky, Rawson, and McDonald (2002) suggested that for practice tests to be effective for improving monitoring accuracy, they would need to be diagnostic of the final test, something they referred to as the diagnosticity assumption. The most diagnostic practice test would include all 40 items and research shows if participants make retrospective judgments (now called postdictions) their predictions are highly accurate (e.g., Maki & Serra, 1992; Pierce & Smith, 2001). Serra and Ariel (2014) reported that participants used the memory for past test (MPT) heuristic in a multiple study-test trial paradigm. They too, found that non-analytic (experienced) based processes affected performance predictions more so than analytic (theory) based processes. In contrast, in another experiment using pre-study JOLs, participants studied related and unrelated word pairs and made either prestudy JOLs or immediate JOLs. Participants gave higher estimates for related than for unrelated pairs, suggesting that participants’ beliefs at least partially drive the relatedness effect on JOLs (Mueller, Tauber, & Dunlosky, 2013). With a learner/judge paradigm, participants have access to different information but they are also making slightly different judgments (either self- or other-relevant judgments). It could be that the simple act of making a judgment about oneself leads people to try to be more accurate (and lower their predictions in this case). It might also be that, when making selfpredictions, people are motivated to show self-consistency, and not change their predictions, or to believe in themselves and maintain high performance predictions. Future research should examine the effect of making a self- versus other-judgment on participants’ willingness or ability to adjust their predictions following an intervention. We do not think that simply making the self-judgment (vs. other judgment) is sufficient to lead people to lower their predictions, as we know that this doesn’t occur in other studies without some intervention (see control condition of Miller & Geraci, 2014). But, it could be that following an intervention people are more or less willing to change their predictions when they are predicting their own versus another’s performance. Exactly how learners translated their retrieval practice performance into an adjusted performance prediction is unknown. In Study 1, learners recalled fewer than half of the 4 retrieval practice items (M = .37) and reduced their subsequent performance predictions. In Study 2, learners in the Easy retrieval practice condition recalled over half the items (M = .61) and made very little change to their subsequent predictions while those in the Difficult retrieval practice conditions had poor performance (M = .18) and reduced their predictions by 40%. These results suggest that there is not a one-to-one correspondence between retrieval practice performance and adjusted performance predictions. One possibility is that people focused on any failure during retrieval practice and this failure had a disproportionally large influence on their subsequent performance prediction. Such a disproportionate influence of negative information on experience is consistent with several findings from the negativity bias literature. For example, much has been written about the balance of positive and negative
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
49
emotions that leads to happiness and fulfillment (c.f. Frederickson, 2013; Frederickson & Losada, 2005). This research suggests that negative emotions have a larger influence on people compared to positive emotions. Similarly, we know that people are highly attentive to negative feedback and predictive of whether or not that person will make the same mistake on a future trial (Gehring, Liu, Orr, & Carp, 2012; Van der Helden, Boksem, & Blom, 2009). Indeed, errors are memorable and may be crucial for learning. People are more likely to respond correctly on subsequent trials if they had previously responded incorrectly with high- rather than low-confidence, a finding known as the hyper-correction effect (Butler, Fazio, & Marsh, 2011; Metcalfe & Finn, 2011). The evidence supporting the disproportional influence of negative over positive aspects of one’s experience has led some to suggest that the imbalance ‘‘may in fact be a general principle or law of psychological phenomena” (Baumeister, Bratslavsky, Finkenauer, & Vohs, 2001; p. 323). Thus, our data, showing that a small amount of retrieval failure has a relatively large effect on subsequent performance predictions, could be seen as consistent with these various literatures. While learners were able to decrease their performance predictions following retrieval practice, in some cases, such as when following moderate or easy retrieval practice, judges actually increased their subsequent performance predictions relative to the baseline prediction. Why did they increase their predictions relative to baseline? The current study cannot say for sure, but there is good evidence that some people overestimate their own performance and underestimate others’ performance (Hartwig & Dunlosky, 2014; Kruger & Dunning, 1999). Thus, it is possible that while performing poorly on a practice test led learners to lower their subsequent performance predictions (because they were overconfident to begin with), the same amount of practice success (or failure) may be interpreted differently by judges. If people generally expect others to perform poorly, then it is possible that judges may have expected the learners to perform more poorly than they did on the practice items, and might consider even modest success to be surprising and indicative of greater future success. This hypothesis is speculative and awaits future testing. For now, our primary finding is that learners were able to use retrieval practice experiences to reduce their overconfidence. 5.1. Conclusions The current studies add to the literature by demonstrating that retrieval failure is beneficial for metacognitive monitoring. Most importantly, the results demonstrate that while both types of processes – analytic and non-analytic – can contribute to changes in memory predictions following retrieval practice, the subjective experience of retrieval practice plays a key role in leading participants to reduce their performance predictions. References Arbuckle, T. Y., & Cuddy, L. L. (1969). Discrimination of item strength at time of presentation. Journal of Experimental Psychology, 81, 126–131. Baumeister, R. F., Bratslavsky, E., Finkenauer, C., & Vohs, K. D. (2001). Bad is stronger than good. Review of General Psychology, 5, 323–370. Butler, A. C., Fazio, L. F., & Marsh, E. J. (2011). The hypercorrection effect persists over a week, but high confidence errors return. Psychonomic Bulletin & Review, 18, 1238–1244. Castel, A. D. (2008). Metacognition and learning about primacy and recency effects in free recall: The utilization of intrinsic and extrinsic cues when making judgments of learning. Memory & Cognition, 36, 429–437. Cheng, C. (2010). Accuracy and stability of metacognitive monitoring: A new measure. Behavior Research Methods, 42, 715–732. Dunlosky, J., Rawson, K. A., & McDonald, S. L. (2002). Influence of practice tests on the accuracy of predicting memory performance for paired associates, sentences, and text material. In T. J. Perfect & B. L. Schwartz (Eds.), Applied metacognition (pp. 68–92). Cambridge, UK: Cambridge University Press. Flavell, J. (1979). Metacognition and cognitive monitoring: A new area of cognitive developmental inquiry. American Psychologist, 34, 906–911. Fleming, S. M., & Lau, H. C. (2014). How to measure metacognition. Frontiers in Human Neuroscience, 8. Published online 2014 Jul 15. Frederickson, B. L. (2013). Updated thinking on positivity ratios. American Psychologist, 68, 814–822. Frederickson, B. L., & Losada, M. F. (2005). Positive affect and the complex dynamics of human flourishing. American Psychologist, 60, 678–686. Gehring, W. J., Liu, Y., Orr, J. M., & Carp, J. (2012). The error-related negativity (ERN/Ne). In S. J. Luck & E. Kappenman (Eds.), Oxford handbook of event-related potential components (pp. 231–291). New York: Oxford University Press. Grimaldi, P. J., Pyc, M. A., & Rawson, K. A. (2010). Normative multitrial recall performance, metacognitive judgments, and retrieval latencies for LithuanianEnglish paired associates. Behavior Research Methods, 42, 634–642. Hart, J. T. (1965). Memory and feeling-of-knowing experience. Journal of Educational Psychology, 56, 208–216. Hartwig, M. K., & Dunlosky, J. (2014). The contribution of judgment scale to the unskilled-and-unaware phenomenon: How evaluating others can exaggerate over- (and under-) confidence. Memory & Cognition, 42, 164–173. Kelley, C. M., & Jacoby, L. L. (1996). Adult egocentrism: Subjective experience versus analytic bases for judgment. Journal of Memory and Language, 35, 157–175. Koriat, A. (1997). Monitoring one’s own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126, 349–370. Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. K. (2004). Predicting one’s own forgetting: The role of experience-based and theory-based processes. Journal of Experimental Psychology: General, 133, 643–656. Kornell, N., & Bjork, R. A. (2009). A stability bias in human memory: Overestimating remembering and underestimating learning. Journal of Experimental Psychology: General, 138, 449–468. Kornell, N., Rhodes, M. G., Castel, A. D., & Tauber, S. K. (2011). The ease of processing heuristic and the stability bias: Dissociating memory, memory beliefs, and memory judgments. Psychological Science, 22, 787–794. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing ones’ own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77, 1121–1134. Maki, R. H., & Serra, M. (1992). Role of practice tests in the accuracy of text predictions on text material. Journal of Educational Psychology, 84, 200–210. Matvey, G., Dunlosky, J., & Guttentag, R. (2001). Fluency of retrieval at study affects judgments of learning (JOLs): An analytic or nonanalytic basis for JOLs? Memory & Cognition, 29, 222–233. Metcalfe, J., & Finn, B. (2011). People’s hypercorrection of high-confidence errors: Did they know it all along? Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 437–448.
50
T.M. Miller, L. Geraci / Consciousness and Cognition 42 (2016) 41–50
Miller, T. M., & Geraci, L. (2014). Improving metacognitive accuracy: How failing to retrieve practice items reduces overconfidence. Consciousness and Cognition, 29, 131–140. Mueller, M. L., Dunlosky, J., Tauber, S. K., & Rhodes, M. G. (2014). The font-size effect on judgments of learning: Does it exemplify fluency effects or reflect people’s beliefs about memory? Journal of Memory and Language, 70, 1–12. Mueller, M. L., Tauber, S. K., & Dunlosky, J. (2013). Contributions of beliefs and processing fluency to the effect of relatedness on judgments of learning. Psychonomic Bulletin & Review, 20, 378–384. Nelson, T. O. (1996). Gamma is a measure of the accuracy of predicting performance on one item relative to another item, not of the absolute performance of an individual item: Comments on Schraw (1995). Applied Cognitive Psychology, 10, 257–260. Nelson, T. O., & Dunlosky, J. (1991). When people’s judgments of learning (JOLs) are extremely accurate at predicting subsequent recall: The ‘‘delayed-JOL” effect. Psychological Science, 2, 267–270. Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework and new findings. In G. H. Bower (Ed.). The psychology of learning and motivation (Vol. 26, pp. 125–173). New York: Academic Press. Pierce, B. H., & Smith, S. M. (2001). The postdiction superiority effect in metacomprehension of text. Memory & Cognition, 29, 62–67. Rhodes, M. G., & Castel, A. D. (2008). Memory predictions are influenced by perceptual information: Evidence for metacognitive illusions. Journal of Experimental Psychology: General, 137, 615–625. Schwartz, B. L. (1994). Sources of information in metamemory: Judgments of learning and feelings of knowing. Psychonomic Bulletin & Review, 1, 357–375. Serra, M. J., & Ariel, R. (2014). People use the memory for past-test heuristic as an explicit cue for judgments of learning. Memory & Cognition, 42, 1260–1272. Van der Helden, J., Boksem, M. A. S., & Blom, J. H. G. (2009). The importance of failure: Feedback-related negativity predicts motor learning efficiency. Cerebral Cortex, 20, 1596–1603. Vesonder, G. T., & Voss, J. F. (1985). On the ability to predict one’s own responses while learning. Journal of Memory and Language, 24, 363–376.