JOURNAL OF MEMORY AND LANGUAGE ARTICLE NO.
37, 240–267 (1997)
ML972515
The Use of Categories Affects Classification Brian H. Ross University of Illinois Once an item is classified as a member of a category, knowledge about that category may be used. Most research has focused on classification rather than the use of category knowledge. Seven experiments show that in learning to classify and use categories, the use may affect later classifications. Five of the experiments employed a common classification paradigm in which symptom sets were classified into disease categories. After each classification, subjects used the category to decide what treatment should be given. The symptoms that were important for the treatments were later classified more accurately, generated earlier from the disease, and judged to have occurred more frequently. The last two experiments extended this work to new paradigms in which the category use required simple problem solving. Again, the use affected later classifications. The discussion addresses the implications of these results for classification theories and for the study of categories. q 1997 Academic Press
Categories are essential for dealing intelligently with the large number of new items we encounter. Once an item is classified, or assigned to a category, we may use what we know about the category in many ways. Knowing an object belongs to a certain category allows us to make predictions about unseen features of that object. Classifying a problem into a category often provides important information about how best to solve the problem. Deciding that a person is of a particular type may allow us to understand and explain various aspects of that person’s behavior. Thus, after the classification part of category-related processes, we often make use
This research was supported by Grant 89-0447 from the Air Force Office of Scientific Research. Research for this paper was conducted at the Beckman Institute for Advanced Science and Technology. I thank Gregory Murphy for a number of discussions and readings, and Lawrence Barsalou, Gary Dell, Mary Lassaline, Valerie Makin, Arthur Markman, Douglas Medin, Colleen Seifert, Edward Shoben, Edward Smith, and John Taplin for comments on this research. I also thank Amanda Lorenz, Amanda Schulze, and Lalita Pourchot for their excellent help in conducting these experiments. Address correspondence and reprint requests to Brian H. Ross, Beckman Institute, University of Illinois, 405 N. Mathews Ave., Urbana, IL 61801. E-mail: bross@s. psych.uiuc.edu. 0749-596X/97 $25.00
of the category to help predict, problem solve, explain, and understand. Learning new categories involves both learning to classify and learning to make use of the category. Often one learns to classify in the context of some other task or goal. When learning new concepts in a domain, such as in mathematics, we learn to classify problems into problem types, because we can then apply different procedures to solve the different types of problems. Although classification is crucial, it is usually not an end in itself. Categories are useful because the classifications provide access to much other knowledge. For example, medical diagnosis is not very helpful as an end, but it is important because it allows crucial predictions about the efficacy of different treatments. If we classify a person as an extrovert, this classification may allow us to explain past behaviors and predict future behaviors. Despite the importance of learning to use categories, most of the work on category learning has examined how people learn to classify new instances (e.g., see Medin & Smith, 1984; Ross & Spalding, 1994; Smith & Medin, 1981, for reviews). In the typical classification learning experiment, an item is presented, the subject responds with one of a small number of experimenter-defined cate-
240
Copyright q 1997 by Academic Press All rights of reproduction in any form reserved.
AID
JML 2515
/
a00d$$$$$1
07-23-97 09:50:12
jmla
AP: JML
241
CATEGORY USE AND CLASSIFICATION
gory labels, feedback is given, the subject is allowed to study the item and correct category label, and then the next item is presented, with the cycle continuing until some learning criterion or for a fixed number of trials. In these experiments, classification is the goal and no use is made of the classification to make predictions, problem solve, explain, or understand. One reason that classification has been the subject of so much investigation may be that it is seen as a first step in any use of categories. That is, before knowledge about a category can be applied to an item, the item must first be classified. This approach has led to a much better understanding of classification and a number of impressive models (e.g., Estes, 1994; Heit, 1994; Kruschke, 1992; Medin & Schaffer, 1978; Nosofsky, 1986; Nosofsky, Palmeri, & McKinley, 1994). The eventual success of such an approach, however, depends upon to what extent classification is an isolable or separable process. It seems likely that any use of the category (such as for prediction) might affect what one knows about the category. Because the classification and use may be learned together, it is possible that they each affect what is learned about the other. If so, then understanding classification alone is not guaranteed to provide an understanding of classification when categories are being used for some further purpose. The argument for looking at classification as part of a larger goal is similar to other claims in cognition that we need to understand processes working in context or as part of some common goal (e.g., the different processes in analogical problem solving, Ross, 1989, or teaching concepts within the context of problem solving in tutors, Anderson, Corbett, Koedinger, & Pelletier, 1995). Much may be learned by a careful study of an isolated process, but some aspects of its operation may only become visible when it is part of some larger task. For example, speech production can be studied using single, isolated syllables or words, and much can be learned about the mechanisms. However, when speech is examined for longer productions, further clues
AID
JML 2515
/
a00d$$$$$1
about the mechanisms become available from co-articulations, speech errors, etc. (e.g., Bock, 1995; Dell, 1986). Analogously, what one learns about classification from studying classification alone is important, but may not be complete. The point is that while we have learned much from studying classification in isolation, it may be necessary to examine both the uses of categories and the interactions between these uses and classifications. These uses of categories not only provide important functions for the categories, but also may influence what we know about the categories, including how we classify new items. The goal of this paper is to examine whether category use may affect classification during learning. When the classification is being learned to allow some use of the category, the use may affect how the later items are classified. To go back to some of the examples, in medical diagnosis, an understanding of how the different treatments work for a disease may lead one to view some symptoms or constellations of symptoms as particularly important, and this additional information may lead those symptoms to be focused on more in classification. If one is learning new math problem types and how to solve them, the solution procedure might help to highlight some important features of the problem types, which may then be used in classifying later instances. The use of the category may affect what one learns about the category, including knowledge used to classify later items. This paper provides a preliminary investigation of this topic, focusing on how the use might affect the classification during learning. This is a crucial issue for theories of categories for two reasons. First, if theories of categories are to be more than theories of classification, then we need to better understand how people make use of categories and the effects that these uses might have on category knowledge. Second, even if one is interested only in a theory of classification, it may be that the uses made of a category affect classification knowledge. If so, then even theories of classification will need to incorporate the effects of category use. This possibility is tested in the
07-23-97 09:50:12
jmla
AP: JML
242
BRIAN H. ROSS
current experiments. This issue is of general interest because of the importance of categories and their use across many research domains. A discussion of related work is postponed until after the experiments, so that the reader may better understand the specific issues being addressed. THE CURRENT EXPERIMENTS In the seven experiments to be reported, category learning consists of both learning to classify and learning to use the category to do some additional task, such as make a prediction or solve a problem. Each learning trial begins as in the usual classification paradigm—an item is presented, the subject classifies the item into one of the experimenterdefined categories, and feedback is given. However, rather than immediately proceeding to the next trial, the subject must then use the category and item to make some further response, and feedback is given on this use of the category as well. Thus, each trial requires the subject to make a classification and to use the category. The experiments examine how this use of the category affects the category representation, including knowledge used to make subsequent classifications. In the first five experiments, subjects learn to classify patients’ symptoms into two disease categories and then use the category (and symptoms) to decide which of two treatments the patient should receive. Each disease has two possible treatments, which are different for the two diseases. This classification of symptom sets into disease categories is a task that has often been used in studying classification (e.g., Gluck & Bower, 1988; Medin & Edelson, 1988). The major change from earlier work is that the current procedure includes, on each trial, an additional task that makes use of the categories. This second task, since it involves deciding on treatments for a disease, might be expected to influence what is known about the treatments for each disease. The question of interest here is whether it also influences the knowledge used for making disease classifications. When one is learning, how does experience with using a cate-
AID
JML 2515
/
a00d$$$$$1
gory affect the knowledge used to make classifications? As will be discussed following the experiments, many theories of classification assume that classification knowledge is affected only by feedback about classification, not by experience using the category. These theories would predict no effect of the treatment task on disease classifications. The article begins by examining a simple case of this situation in which the critical symptoms are perfectly predictive of the disease. This case is then extended to situations in which the prediction is probabilistic and in which the classification is less related to the use. The next pair of experiments extend this work further to ask what information subjects have learned about the categories. The final two experiments introduce new paradigms to extend this work to cases in which the use involves simple problem solving. Classification performance can be affected by a variety of changes in the representation, including the weights of features or relations, or the re-interpreting of features. As discussed later, it is possible that the use of categories could affect the classification in any of these ways. In the first five experiments, a simple case is examined and any observed effect on classification is likely to be due to changes in the feature weights. The final two experiments investigate a more complex case in which the effect is probably due to learning relations among the features. EXPERIMENT 1 In this first experiment, the question was whether using the categories to make a further decision during learning might influence later classifications. The items were fictitious ‘‘patients,’’ presented as sets of symptoms. The categories were fictitious diseases and the further decisions were fictitious drug treatments to be given. The classification of diseases involved four perfectly predictive symptoms for each disease, while the decision about treatments involved two of these symptoms (i.e., with each of these two symptoms perfectly predictive of a treatment). That is, any of four symptoms were equally (and perfectly) pre-
07-23-97 09:50:12
jmla
AP: JML
243
CATEGORY USE AND CLASSIFICATION
dictive of a disease, but two of these symptoms also each perfectly predicted one of the two treatments for that disease. The critical tests were later disease classifications. If the treatment decisions did not affect the disease classifications, then there should be no differences among the four perfectly predictive symptoms for later classifications. If, however, the symptoms that were predictive of the treatment also came to be viewed as more central to the disease, then the two symptoms that were also predictive of the treatments should be more readily classified to this disease. Such a difference in classification performance would not be predicted by most current theories of classification. Method Subjects. The subjects were 20 University of Illinois undergraduates who participated for pay. The sessions lasted from 30 to 50 min. Materials. The materials were loosely adapted from Medin and Edelson (1988) and consisted of diseases, drug treatments, and symptoms. There were two fictitious diseases (buragamo and terrigitis), four fictitious drug treatments (lamohillin and pexlophene for buragamo; galudane and veptendrine for terrigitis), and 12 symptoms (fever, runny nose, dizziness, abdominal pain for buragamo; cough, nausea, sore muscles, earache for terrigitis; and skin rash, numb fingers, swollen tongue, and inflamed knee as nonpredictive symptoms). All four symptoms for each disease were perfectly predictive of the disease. That is, whenever any of these symptoms was present, the disease occurred. Of these symptoms, two were also perfectly predictive of a treatment, and these symptoms will be called relevantuse symptoms. For example, if fever and runny nose were the relevant-use symptoms for buragamo, then any patient with a fever would have the disease buragamo and be treated by lamohillin, while any patient with a runny nose would have the disease buragamo and be treated by pexlophene. The other two symptoms for a disease were irrelevant-use symptoms and occurred half the time with each of
AID
JML 2515
/
a00d$$$$$1
the disease treatments. Continuing the example, any patient with dizziness would also have the disease buragamo, but half the time would be treated by lamohillin (because the patient also had a fever) and half the time would be treated by pexlophene (because the patient also had a runny nose). Each ‘‘patient’’ consisted of three symptoms; two that were from one of the diseases (one a relevant-use symptom and an irrelevant-use symptom) and one symptom from the four nonpredictive symptoms. Sixteen patients were constructed in this way, with eight from each disease category and four from each treatment. The four nonpredictive symptoms occurred once each with each treatment (so twice with each disease). Thus, one patient might have a fever, dizziness, and numb fingers. This patient would have buragamo (as indicated by both of the first two symptoms) and would be treated by lamohillin (as indicated by the first, relevant-use symptom, a fever). A partial design with sample items is given in Table 1. The symptoms for a patient were typed onto a 3 in 1 5 in (7.6 cm 1 12.7 cm) index card and three cards were made for each patient (counterbalancing the order of symptoms). Each study block consisted of one presentation of each of the 16 patients. To insure that any difference was not due to the particular symptoms, the relevant-use and irrelevant-use symptoms were counterbalanced over subjects. These two types of symptoms were switched and another set of cards was made for half the subjects. The test materials consisted of single symptom tests, double symptom tests, disease rankings, and treatment tests. For the single symptom tests, each of the 12 symptoms was presented individually for the subject to classify by disease. For the 16 double symptom tests, two symptoms were presented. For 8 of these tests, a relevant-use symptom from one disease was paired with an irrelevant-use symptom from the other disease. The other 8 double symptom tests consisted of two symptoms from the same disease: both irrelevant-use symptoms (so zero relevant-use symptoms),
07-23-97 09:50:12
jmla
AP: JML
244
BRIAN H. ROSS TABLE 1 PARTIAL DESIGN OF EXPERIMENT 1: SAMPLE STUDY MATERIALS FOR DISEASE BURAGAMOa
Relevant-use symptomb Irrelevant-use symptom Nonpredictive symptomc Treatment
Patient 1
Patient 2
Patient 3
Patient 4
Fever Dizziness Numb fingers Lamohillin
Fever Abdominal pain Skin rash Lamohillin
Runny nose Dizziness Swollen tongue Pexlophene
Runny nose Abdominal pain Inflamed knee Pexlophene
a
The cards given to subjects contained just the three symptoms listed for each patient. Fever is a relevant-use symptom for subjects getting these materials because whenever it occurs, the treatment to be given is lamohillin. Runny nose always is treated with pexlophene. c The nonpredictive symptoms are also presented with the other disease, terrigitis. b
one relevant-use symptom and one irrelevantuse symptom, or both relevant-use symptoms. The purpose of these tests was to examine differences as a function of the symptom type. The disease rankings and treatment tests consisted of a list of the eight symptoms that were predictive of the diseases, with a disease printed at the top. The exact questions asked of subjects for these tests are included in the procedure. Procedure. Subjects were told that they would be learning about some disease categories by diagnosing patients who had sets of symptoms, and then deciding what drug should be used to treat this disease. A sheet with the two diseases and the two drugs for each disease was visible throughout the experiment. The instructions stressed that the subjects should try not to use prior medical knowledge and that the same symptom may occur with different diseases. On each study trial, the subject received a ‘‘patient’’ card with three symptoms and responded with one of the diseases. Feedback was given and the subject then responded with one of the treatments for the correct disease. Feedback was given and the subject was allowed to study the card for as long as he or she wanted. All subjects participated for six study blocks. Each block consisted of one presentation of each of the 16 patients, presented in a random order. The first three study blocks had different orders of symptoms for each patient.
AID
JML 2515
/
a00d$$$$$1
The last three study blocks used the same cards as the first three blocks. Following the study blocks, four types of tests were given: single symptom tests, double symptom tests, disease rankings, and treatment tests. For the single symptom tests, each of the 12 symptoms was presented individually and the subject responded with the disease they thought was most likely for a patient who had this symptom and a confidence rating (from a low of 1 to a high of 7). For the 16 double symptom tests, two symptoms were presented and the subject again responded with a disease name and confidence rating. The disease rankings consisted of a list of the eight symptoms that were predictive of the diseases. Half the subjects were asked to rank how important each feature is in diagnosing buragamo (from 1, most important, to 8, least important). They were then given the same list and asked to do a ranking for terrigitis. The other half of the subjects ranked the terrigitis symptoms first. The treatment tests consisted of the same lists, but subjects were now required to mark down which treatment they thought would be given to a patient who had this symptom and the disease on the top of the page. They were allowed to mark down either of the treatments for that disease or an X to indicate that they thought a person with this symptom would not be likely to get either treatment. Following these tests, the subjects were debriefed and any questions they had were answered.
07-23-97 09:50:12
jmla
AP: JML
245
CATEGORY USE AND CLASSIFICATION
Design. The main manipulation, of relevant-use symptoms versus irrelevant-use symptoms for a disease, was within subject. The only between subject manipulations were the counterbalancing variables: half the subjects received each set of cards (in which the particular relevant-use and irrelevant-use symptoms had been counterbalanced) and for each set of cards, half the subjects were tested on buragamo first for the disease ranking and treatment tests, with the other subjects were tested first on terrigitis. Results and Discussion Subjects learned the diseases and treatments well, with performance on the final study block of .93 on the diseases and .88 on the treatments. The critical comparisons are whether the four relevant-use symptoms (two for each disease) were responded to differently on tests than the four irrelevant-use symptoms that were also perfectly predictive of the diseases (but were not predictive of the treatment). Does the additional treatment task affect disease classification? Single symptom tests. If the treatment task is affecting classification, then the relevantuse symptoms should lead to more accurate disease classifications than the irrelevant-use symptoms. They do. For classification accuracy, relevant-use symptoms led to the correct disease .96 of the time, compared to .80 for irrelevant-use symptoms, t(19) Å 3.32, p õ .01. Throughout the paper, an additional measure will be used that combines the accuracy score with the confidence rating. For each test, the accuracy (1 for correct and 0 for incorrect) is multiplied by the confidence and the average of this accuracy-confidence score is given.1 This accuracy-confidence score also showed an advantage for the relevant-use symptoms, 5.1 versus 3.6, t(19) Å 2.86, p õ .05. Double symptoms. There were two different 1 The accuracy-confidence measure was used as the primary score for single tests to provide a more sensitive measure, because the classification accuracy proportion is based on only four observations per condition for each subject.
AID
JML 2515
/
a00d$$$$$1
types of questions addressed by the double symptom tests. For eight of the tests, a relevant-use symptom from one disease was paired with an irrelevant-use symptom for the other disease. The prediction, if the relevantuse symptom is viewed as a better disease predictor, is that these tests will more often be classified by the disease of the relevantuse symptom. The results are consistent with that prediction for .73 of the responses, t(19) Å 3.59, p õ .01. The other double tests were included to contrast three types of same category pairs: both irrelevant-use symptoms (zero relevantuse symptoms), one relevant-use symptom and one irrelevant-use symptom, and two relevant-use symptoms. If the two symptoms were both irrelevant-use ones, the disease classification was correct only .78 of the time. (Note that this proportion is about equal to the .80 for a single symptom of this type.) However, with one relevant-use symptom, the proportion correct classification was .93 and it was .98 with two relevant-use symptoms. The overall effect of the number of relevant-use symptoms was clearly reliable, F(2,38) Å 5.26, p õ .01, MSe Å .041, with a repeated measures ANOVA. For all such double tests in the experiments, two planned comparisons were conducted, one contrasting zero relevant-use symptoms against the average of one and two, and one contrasting one relevant-use symptom versus two; both use the same mean-square error as the full analysis. The items with one or two relevant-use symptoms led to better classification than the items without a relevant-use symptom, F(1,38) Å 9.90, p õ .01, but there was no difference between the one and two relevantuse symptom conditions, F(1,38) Å .61. The accuracy-confidence measure showed the same pattern with corresponding means of 3.8, 5.6, and 6.0 (F(2,38) Å 10.48, p õ .01, MSe Å 2.59 for the overall test; F(1,38) Å 20.45 for the zero versus other test; F(1,38) Å .51 for the one versus two test). Disease rankings. All eight predictive symptoms were ranked from 1 (most important) to 8 (least important). The prediction
07-23-97 09:50:12
jmla
AP: JML
246
BRIAN H. ROSS
is that the relevant-use symptoms would be ranked as more important than the irrelevantuse symptoms, and they were, 2.5 versus 4.1, t(19) Å 3.42 p õ .01. (Note that if all subjects ranked the relevant-use symptoms for a disease as 1 and 2, the mean would be 1.5). Although the irrelevant-use symptoms for a disease were not ranked as highly as the relevant-use symptoms, the subjects again were aware that these symptoms were predictive of the disease category, since the ranking was significantly better than the 5.7 given to symptoms from the other disease, t(19) Å 4.08, p õ .01. Treatment ratings. The final test asked subjects to say which treatment, if any, would be given to a person who had a particular disease and a particular symptom. The main reason for including this test was to ensure that subjects knew the relevant-use symptoms were predictive of treatments, which would be important if there had been no effect on the earlier tests. Given the large effects on the earlier tests, the treatment ratings are not very informative. The correct treatment was given for .84 of the relevant-use symptoms. Experiment 1 provides evidence for the basic effect of interest: the use of the category affected later classification judgments. In particular, the knowledge learned about the symptoms from making the treatment judgments affected the disease classification. These results are not predicted by most current views of classification learning. The relevantuse symptoms and irrelevant-use symptoms were equally valid for disease classification, they were presented equally often, and they received the same feedback for the disease classification. Despite being alike on the factors that are usually thought to be crucial for classification, the relevant-use symptoms were much more accurately classified than were the irrelevant-use symptoms. When classification and use are learned together, the use can influence how classifications are made. This effect that using categories has on classification, which will be referred to here as the category use effect, has important implications for theories of classi-
AID
JML 2515
/
a00d$$$$$1
fication. The main point is that classification learning cannot be totally separated from category use, as many current theories assume. Rather, we need theories of categorization in which the classification learning can be affected by nonclassification use. The experiments in the rest of this paper provide information about some possible constraints on the circumstances for this category use effect and a consideration of what knowledge is being learned. Following these experiments, the General Discussion includes a more detailed examination of the implications for theories of classification. When a new effect is first found, it is important to ensure that it is not due to some extraneous characteristic. The next two experiments each examine a particular constraint of Experiment 1 that seemed a potential source of such a problem. First, the symptoms (both relevantuse and irrelevant-use) were perfectly predictive of the disease category, unlike many classification experiments. This structure was used for efficiency because a more probabilistic structure would be more difficult and variable in learning, but it is important to be sure that the effect does not depend upon having perfectly predictive features. Experiment 2 uses a design in which the symptoms occur with both diseases, though they occur more often with one of the diseases. Second, it is possible that the basic effect in Experiment 1 occurs because of some special relationship between treatments and diseases, rather than because of the use of the category. In particular, the use of disease categories and treatments may lead subjects to a particular representation of the materials based on their extensive beliefs about how treatments affect disease symptoms. To ensure that this special disease-treatment knowledge is not necessary to find the category use effect, in Experiment 3 an arbitrary use is substituted for the treatment decision. Because subjects still have to make use of the category, the prediction is that the category use effect will occur here as well. EXPERIMENT 2 In Experiment 1, the use of the categories (along with the symptoms) to make treatment
07-23-97 09:50:12
jmla
AP: JML
247
CATEGORY USE AND CLASSIFICATION
decisions affected later classification decisions. The symptoms were perfectly predictive of both the disease and treatments to facilitate the learning. However, most categories do not have such sets of perfectly predictive features. Thus, before extending this work, a check was made to ensure that the effect was not limited to such a situation. In Experiment 2, the symptoms were only probabilistically related to the disease. The prediction is that the category use effect will still occur, so that the relevant-use symptoms will be viewed as more central to the disease than the irrelevant-use symptoms. Method Subjects. The subjects were 20 University of Illinois undergraduates who participated for pay. The sessions lasted from 60 to 75 min. Materials. The materials were as in Experiment 1, but one additional symptom was added to both buragamo (itchy eyes) and terrigitis (skin rash), and the nonpredictive symptom skin rash was replaced by blurred vision. Each disease had five symptoms that were predictive of the ‘‘correct’’ disease 5/6 of the time. That is, of every six times they occurred, five of those times they occurred with one disease (which will be referred to as the correct disease) and one time with the other. As before, two symptoms were perfectly predictive of a treatment (relevant-use symptoms) when they occurred with the correct disease, and the three irrelevant-use symptoms occurred about half the time (2/5 or 3/5) with each of the treatments for that disease. The general design was similar to that of Experiment 1. Each ‘‘patient’’ consisted of four symptoms and had one of two types of symptom sets: two symptoms from one disease (one relevantuse symptom and one irrelevant-use symptom) and two nonpredictive symptoms, or three symptoms from one disease (one relevant-use symptom and two irrelevant-use symptoms) and one symptom from the other disease. Twenty patients were constructed in this way, with 10 from each disease category and five from each treatment used for that
AID
JML 2515
/
a00d$$$$$1
disease. The symptoms for a patient were typed onto an index card, and four cards were made for each patient (counterbalancing the order of symptoms). Each study block consisted on one presentation of each of the 20 patients. To ensure that any difference was not due to the particular symptoms, the relevant-use symptoms and two of the irrelevant-use symptoms for the disease were switched and another set of cards was made for half the subjects. (All analyses include only the four symptoms that were counterbalanced in this way, not the fifth predictive symptom.) The test materials consisted of single symptom tests and double symptom tests as in Experiment 1. The disease rankings and treatment tests were not included. Procedure. The procedure was as in Experiment 1, but it was stressed that ‘‘there is no single symptom that you can use for determining a disease. Just like real diseases, these diseases may have somewhat different symptoms for each patient and the same symptom may occur with different diseases.’’ All subjects participated for eight study blocks. Each block consisted of one presentation of each of the 20 patients, presented in a random order. The first four study blocks had different orders of symptoms for each patient. The last four study blocks used the same cards as the first four. Design. The main manipulation, of relevant-use symptoms versus irrelevant-use symptoms, was within subject. The only between subject manipulation was the counterbalancing variable (the two sets of cards). Results and Discussion As expected, learning was more difficult than in Experiment 1. Although there were 160 study trials (as opposed to 96 in Experiment 1), by the final study block the proportions correct were only .77 for diseases and .82 for treatments. As predicted, the tests generally show an advantage for the relevant-use symptoms, but the effects are somewhat smaller than in Experiment 1, most likely because of incomplete learning.
07-23-97 09:50:12
jmla
AP: JML
248
BRIAN H. ROSS
Single symptom test. When subjects had to categorize single symptoms, they were slightly more accurate for the relevant-use symptoms than for the irrelevant-use symptoms, .76 versus .69, but the difference was not significant, t(19) Å 1.01, p õ .30. However, for the more sensitive tests using the combined accuracy and confidence data, one does find an advantage for the relevant-use symptoms, 4.1 versus 3.3, t(19) Å 2.57, p õ .05. Double symptom tests. The first contrast for double tests is the disease classifications made to the conflict pairs that included a relevantuse symptom for one disease and an irrelevant-use symptom for the other disease. As in Experiment 1, these items were usually (.64) classified by the relevant-use symptom disease, t(19) Å 2.97, p õ .05. The other double tests contrast whether there were 0, 1, or 2 relevant-use symptoms. The corresponding proportions of correct disease classifications were .73, .70, and .83. There was no overall difference, F(2,38) Å 1.33, p õ .30, MSe Å .066, nor were there any differences in the individual contrasts. The corresponding means for the combined accuracy-confidence measure are 3.8, 3.9, and 4.6 did not show any reliable differences either, F(2,38) Å 1.80, p õ .20, MSe Å 9.26. Overall, there is an effect, but it is attenuated from Experiment 1. If subjects are having difficulty learning the categories, then any effect of the use is likely to be smaller. However, the results make it clear that the perfect predictiveness of the materials in Experiment 1 is not necessary for finding an effect of use on classification. EXPERIMENT 3 The second constraint of Experiment 1 to be examined is the relation between the classification and the use. In particular, diseases and treatments may have a special type of relationship. Treatments may be viewed as intimately involved in the underlying problems of a disease, because the medicine is meant to cure the disease and alleviate the symptoms. The determination of treatments can be viewed as directly related to the classification in a way
AID
JML 2515
/
a00d$$$$$1
that many uses of categories would not be. The view espoused here is that this special kind of relationship is not necessary to find that use affects classification. Rather, what is important is that the use provides some information about the features that might affect their salience, lead to noticing patterns, etc. Thus, it is important to ensure that the category use effect does not depend upon any special relationship between treatments and diseases. To test this view, the use was made into an arbitrary prediction task. In this experiment, the relevant-use symptoms still predicted the use, but it was not one that had anything to do with treatments. More specifically, one change was made from Experiment 1. Instead of treatments, subjects were told that a hospital administrator noticed that everyone who had terrigitis had a last name beginning with the letters G or V (the same as the treatment names in Experiment 1 of galudane and veptendrine), while patients with buragamo had last names beginning with P or L (corresponding to pexlophene and lamohillin). Subjects had to learn to predict the correct letter. According to the view presented here, this should still be sufficient to find that relevant-use symptoms are responded to differently than irrelevant-use symptoms. Because the relevant-use symptoms are still predictive of the use, they should still be viewed as more central to the disease. The claim here in not that people will now have no prior knowledge or theories that they might use in learning the categories. Rather, the claim is only that the effects cannot now be due to beliefs about the special relationship between treatments and diseases. Method Subjects. The subjects were 20 University of Illinois undergraduates who participated for pay. The sessions lasted from 35 to 50 min. Design and materials. The design was exactly as in Experiment 1. The study materials were as in Experiment 1, except for the use of single letters instead of treatment names. The test materials consisted of the single
07-23-97 09:50:12
jmla
AP: JML
249
CATEGORY USE AND CLASSIFICATION
symptom and double symptom disease classification tests from Experiment 1. Procedure. The procedure was as in Experiment 1, except that instead of treatments for each disease, subjects were told that a hospital administrator noticed that the patients who had terrigitis had a last name beginning with the letters G or V, while patients with buragamo had last names beginning with P or L. A sheet of paper had the diseases and the letters for each that was visible throughout the experiment. After classifying the patient by disease (and getting feedback), subjects chose one of the two letters for the beginning of the last name of the patient. Results and Discussion By the final (sixth) block of learning, subjects averaged .89 accuracy on disease classifications and .84 on treatments, slightly lower than performance in Experiment 1. The critical question is again whether the relevant-use symptoms were responded to as more predictive of the disease category than the irrelevant-use symptoms. Does the effect depend on the special relationship between treatments and diseases? The answer is no— one finds a similar effect to that found in Experiment 1. Single symptom tests. When a single symptom was presented at test, the relevant-use symptoms tended to be classified more accurately than the irrelevant-use symptoms, .84 versus .73, but the difference did not reach statistical significance, t(19) Å 1.76, p õ .10. This difference was significant for the more sensitive combined accuracy-confidence scores, 4.6 versus 3.6, t(19) Å 2.19, p õ .05. Double symptom tests. For the eight double tests that had a conflict, pitting a relevant-use symptom of one disease against an irrelevantuse symptom of the other disease, subjects chose the relevant-use symptom disease .69 of the time, t(19) Å 3.57, p õ .01. The other double tests had two symptoms from the same disease category and consisted of two irrelevant-use symptoms (zero relevant-use symptoms), one relevant-use symp-
AID
JML 2515
/
a00d$$$$$1
tom and one irrelevant-use symptom, or two relevant-use symptoms. There was no evidence of a difference in accuracy, with proportions correct of .83, .86, and .83, respectively, F(1,38) Å .14, MSe Å .066. The combined accuracy-confidence scores did show a difference, with zero relevant-use symptoms leading to lower performance. The corresponding means were 3.9, 5.0, and 5.0, F(2,38) Å 3.53, p õ .05, MSe Å 2.21. As can be seen in the means, all the difference is due to the zero versus the other two conditions, F(1,38) Å 7.07, p õ .05. Thus, the treatment-disease relationship is not essential for obtaining an effect of category use upon later classifications. Relevantuse symptoms were responded to as more predictive of the disease category in both the accuracy-confidence measure of single symptom tests and in the double symptom conflict tests. Even if the category use is rather arbitrary, it can influence classifications when the relevant-use symptoms are also predictive of the classification. As elaborated in the General Discussion, category uses often help learners to better understand the category and why the category members ‘‘go together’’ in the category. However, this deeper understanding is not an essential part of the category use effect. Rather, the use of the category may affect the category representation in many ways, from simple weighting of the features to more complex effects on category understanding. Experiments 2 and 3 tested two particular aspects of the initial design and showed that neither of them is crucial for getting the category use effect found in Experiment 1. Thus, it appears that the effect is robust. The remaining experiments examine some extensions that are important for understanding the implications of this category use effect. EXPERIMENT 4 In the next two experiments, the focus is on what is being learned from this use of categories. The earlier experiments show that the use of the disease category does lead to better classification of the relevant-use symptoms, but this classification examines only part of
07-23-97 09:50:12
jmla
AP: JML
250
BRIAN H. ROSS
the category knowledge that might be affected. One important distinction is whether the treatments focus the learners on symptomto-disease connections, which are especially important for classification, or whether the treatments may be making the relevant-use symptoms more accessible for other tasks as well. To examine this contrast, the next experiments use measures other than classification. Experiment 4 investigates a very different type of category task, and Experiment 5 examines whether category use might influence a task that does not require the use of these categories, frequency judgments. These experiments are crucial for understanding the importance of category use. In this experiment, subjects learned the material as in Experiment 1, but then they were given a disease and asked to generate symptoms that a person with this disease was likely to have. Thus, rather than making use of the symptom-to-disease connection from learning, this experiment examined whether the relevant-use symptoms were also more likely to be generated from the disease. This feature generation task is an unusual one for experimentally learned materials, but it does tap an important aspect of category knowledge (related to category validity measures) that is different from that used for classification. Method Subjects. The subjects were 33 University of Illinois undergraduates who participated for pay. The sessions lasted from 30 to 45 min. Design and materials. The design and study materials were exactly as in Experiment 1. The test materials from the earlier experiments were not used, replaced by a generation task described below. Procedure. The study procedure was exactly as in Experiment 1. After study, subjects were given diseases one at a time (the order was counterbalanced) and asked to list symptoms that a person with this disease would be likely to have. No mention was made of how many symptoms they should list. When they completed generating symptoms for a disease, they were asked to go back and for each symp-
AID
JML 2515
/
a00d$$$$$1
tom to give a probability to indicate how likely a person with this disease would be to have this symptom. Results and Discussion By the final (sixth) block of learning, subjects averaged .91 accuracy on disease classifications and .85 on treatments. The critical question is whether the relevant-use symptoms were more available for the disease than were the irrelevant-use symptoms. This question was addressed by several measures, with all of them showing an advantage for relevant-use symptoms. The simplest measure is the proportion of (correct) relevant-use symptoms generated versus irrelevant-use symptoms, .75 versus .53, respectively, t(32) Å 3.43, p õ .01. A second measure took into account the order of the symptoms. Because no subject generated more than six symptoms for any disease, a simple scale was used in which 6 ‘‘points’’ were given for the first symptom generated, 5 for the second, etc. Given the large difference in proportions generated, the relevant-use symptoms will have a large advantage even if there is no difference in the order generated. Thus, rather than comparing the total number of points, the average numbers of points were compared (i.e., the total number of points divided by the number of symptoms of that type that were generated). Even so, the average for relevant-use symptoms was greater than that for irrelevant-use symptoms, 5.2 versus 4.1, t(32) Å 3.99, p õ .001. Thus, relevant-use symptoms tended to be generated earlier than irrelevant-use symptoms. The final measure used the probability judgments that subjects made on each recalled symptom. The relevant-use symptoms showed a small advantage over the irrelevant-use symptoms, 64.4 versus 55.5, but it was not statistically significant t(32) Å 1.96, p õ .06. Experiment 4 shows that the treatment task is making the relevant-use symptoms more accessible from the category than the irrelevantuse symptoms. Thus, the use of categories af-
07-23-97 09:50:12
jmla
AP: JML
251
CATEGORY USE AND CLASSIFICATION
fects not only classification of later items, but also the generation of category features. EXPERIMENT 5 The question of interest in this experiment is whether the relevant-use symptoms are more available only within the categories or whether they might be encoded in such a way as to make them more available for other types of tasks as well that do not even require the use of the categories. Thus, this experiment continues the examination of nonclassification measures, but extends it beyond category-related knowledge. To test this idea, judgments of frequency were obtained for each symptom. After the study blocks, subjects were shown each symptom and asked to give a judgment as to how often the symptom had appeared during study. The question of interest is whether relevantuse symptoms will be judged as having occurred more frequently than irrelevant-use symptoms. Although both relevant-use and irrelevant-use symptoms occurred only with one disease, subjects were never told this (and in debriefings, revealed that they were often not aware that it was true for all those symptoms). The issue is whether the differences found in the earlier experiments are restricted to tasks that tap the relation between the disease and symptoms or whether the category use may lead to those symptoms being available even when the task does not refer to the categories. Frequency judgments were chosen because differences in subjective frequency are generally considered to be related to representation differences that would affect a variety of learning and performance tasks. Method Subjects. The subjects were 18 University of Illinois undergraduates who participated for pay. The sessions lasted from 30 to 45 min. Design and materials. The design and study materials were exactly as in Experiment 1. The test materials were the single symptom test cards from Experiment 1. Procedure. The study procedure was as in Experiment 1. At test, the subjects were given
AID
JML 2515
/
a00d$$$$$1
the single symptom test cards in a random order and for each symptom were asked how often it occurred during learning. They were told that all symptoms occurred 30 or fewer times. (All the relevant-use and irrelevant-use symptoms occurred 24 times during study). Results and Discussion By the final (sixth) block of learning, subjects averaged .92 accuracy on disease classifications and .80 on treatments. The medians for each subject’s judgments of relevant-use and irrelevant-use symptoms were obtained2 and these were averaged across subjects. The relevant-use symptoms were judged as having occurred more frequently than the irrelevant-use symptoms, 18.6 times compared to 16.2, t(17) Å 2.27, p õ .05. Although this difference is not large, it does represent a 15% increase over the irrelevant-use response of 16.2 and the relevant-use symptoms were given higher medians for 13 of 17 subjects, with 1 tie (p õ .05). Thus, the category use affects classification, instance generation, and judgments of frequency of occurrence. The category use leads to a focus on the relevant-use symptoms, but they are more available for other measures both related to the category (Experiment 4) and not related to these particular categories (Experiment 5). One possible explanation, to be examined in the General Discussion, is that these features have become more strongly weighted. It is important to remember that these effects all occur within an experimental context, so there is no evidence that this availability of relevant-use symptoms would hold for tasks outside of this context (see also Medin & Ross, 1989, for why such generalizations might not occur). 2
Medians are often used for judgments of frequency, because they are not affected by a small number of extreme scores. In fact, the means showed the same pattern 18.6 vs 16.9, but because of two subjects with extreme scores, the variability was high, t(17) Å 1.54. If these two subjects are omitted (which does not change the means much since one showed the difference and the other showed an opposite effect) then t(15) Å 2.34 p õ .05.
07-23-97 09:50:12
jmla
AP: JML
252
BRIAN H. ROSS
EXPERIMENT 6 The results of the first five experiments suggest that when people use a category to make predictions, the category representation, including the knowledge relevant to classification, is affected. Before accepting this general idea, it is important to examine the category use effect in a different paradigm. Experiments 6 and 7 test the effects of a very different type of use, simple problem solving, on later classification. In order to better understand the conditions under which the category use effect occurs, it is useful to consider two possible limitations to the generality of the category use effect that arise from interleaving the disease and treatment learning. First, the interleaving may lead learners to search for features that will allow them to both classify the patient according to their disease and simultaneously decide on the treatment (e.g., ‘‘If the patient has a cough, then the disease is terrigitis and the treatment is galudane.’’). Of course, if the relevant-use symptoms do get more weight for both tasks, then such a representation may eventually be formed, but the criticism is that the interleaving promotes the search for such a representation and that it would not occur if the learning was not interleaved. Second, a related limitation is that the prediction about treatments could be viewed as another classification. That is, deciding which treatment to provide might be viewed as assigning patients to the (sub)category of people to be given each treatment. Thus, perhaps these experiments have not examined classification and then use, but rather what happens when learning two classifications. Although the results would be of interest even if both these limitations were true, they nonetheless raise empirical questions to which empirical answers may be sought. There is some evidence against the first limitation, and the second one is addressed in Experiments 6 and 7. First, in other work (Ross, in progress), I do not use an interleaving learning paradigm, but rather the disease classification learning precedes any learning of treatments. The re-
AID
JML 2515
/
a00d$$$$$1
sults are not reported here, because the issues addressed by this work are rather different from the main points here and because the work is still in progress. The category use effect still can be found when the classification learning precedes the use learning, although the circumstances under which the effects occurs are still being examined. For instance, in an experiment exactly like Experiment 1 except that the diseases are learned before the treatments, the relevant-use symptoms are still classified better than the irrelevant-use symptoms, .90 versus .76. Thus, the interleaving does not appear to be crucial for obtaining the category use effect. Second, does the effect depend upon the use also being another type of classification? Even if category use effects were restricted to such situations, the results would be of interest, because it is likely that many feature predictions can be viewed as additional classifications (e.g., Anderson, 1991). However, the next two experiments examine cases in which the category use is not a classification. In Experiments 6 and 7, an instance is classified and then, depending on the classification, different problem solving procedures are applied. The question is whether this problem solving use leads to a change in the category representation that might affect classifications of later instances. In these experiments, the second limitation is directly addressed by having a use that is not a classification. In addition, as will be seen, the features involved in the initial classifications are different than the features involved in the use of the category, so it would not be possible to have a rule incorporating both the classification and the use, as suggested by the first criticism. The materials in Experiment 6 were equations, taken from Ross (1996), which showed that interacting with instances following classification can affect the category representation. In that paper, subjects who had to solve equations after classifying them ended up including aspects of the solution method in their category representations more than subjects who simply classified the equations (the equations could also be classified by surface char-
07-23-97 09:50:12
jmla
AP: JML
253
CATEGORY USE AND CLASSIFICATION
acteristics, such as the parentheses and letters). In addition, if the equations were complex, including both x and y variables, then which variable was solved for led to very different category representations between subjects (Experiments 2a and 2b). In Experiment 6, this result is extended to investigate the effects of using categories to solve the equations. The classification determined how to use the instance (i.e., which variable is to be solved for), and the different uses are predicted to lead to different category representations. More specifically, each equation had both an x and y. The subjects classified the equations as being Type 1 or Type 2, and then solved the equation. The main manipulation was that one group solved Type 1 equations for x and Type 2 equations for y (Group XY), while the other group solved Type 1 for y and Type 2 for x (Group YX). The equations could be classified by either surface characteristics or the solution method used for the category. If the category use is affecting classification, then the prediction is that the particular use for each category will lead to subjects learning a different solution method, which will in turn be incorporated into the category representation and affect later category judgments such as classification. The critical difference from Ross (1996) is that in Experiment 6 the classification determines how the category is used (i.e., how to solve the equation), whereas in the earlier work the solutions of the equations did not depend upon the classification. Method Subjects. The subjects were 20 University of Illinois undergraduates who participated for pay. The sessions lasted from 25 to 45 min. Materials. The materials were taken from Experiment 2b of Ross (1996), with minor changes to accommodate the change in design. Sixteen equations were constructed, all of which included both an x and a y, with examples given in Table 2. The equations all had the same solution method structure. When the equation was solved for x, the solution method
AID
JML 2515
/
a00d$$$$$1
was SMD, for subtract-multiply-divide. For example, to solve
SD
S
D
3x 2 / cy / aq Å g n
for x, one would subtract aq, multiply by g, and divide by 3 (the last two operations could be done in the opposite order, but that will not matter here). However, when the equation was solved for y, the solution method was MSD, for multiply-subtract-divide. To solve this same equation for y, one would first multiply by n, then subtract 2, then divide by c. Although the equations had these same solution methods, they differed in surface characteristics. Half of the equations, labeled Type 1, had two parentheses and had more letters from earlier in the alphabet (a–g) than later (m–t). The other half of the equations, Type 2, had one parenthesis and more letters from later in the alphabet. The proportion of letters was either three of five or four of six. There were 24 test items, illustrated in Table 3. Each test item had the variable z and could be solved by either SMD or MSD. There were three types of tests. Eight of the test items had surface characteristics (parentheses and letters) consistent with Type 1 and eight had surface characteristics consistent with Type 2. Of these 16 items, eight required the solution method SMD and eight required MSD. This design leads to half the items having surface characteristics consistent with the solution method used for that type (e.g., for Group XY, the Type 1 surface characteristics would be with SMD) and half the items having inconsistent pairings of surface characteristics and solution method. If subjects were relying on only the surface characteristics, then both groups of subjects would classify these test items in the same way. If, however, the solution methods had been incorporated in the category representation and influenced classification, then the Group XY subjects would classify the SMD as Type 1 and the MSD as Type 2, whereas the Group YX subjects would classify in the opposite way. The other eight
07-23-97 09:50:12
jmla
AP: JML
254
BRIAN H. ROSS TABLE 2 SAMPLE STUDY MATERIALS FOR EXPERIMENT 6 Solution methoda Category
Group XY
Group YX
(solve for X)
(solve for Y)
SMD
MSD
SMD
MSD
(solve for Y)
(solve for X)
rx g / 9y / sp Å 2 b
MSD
SMD
8y / 7e sx Å /q n d
MSD
SMD
Type 1
SD
S
3x 2 / cy / aq Å g n
S D
D
(dy / es) mx Å /8 a 6 Type 2
SD S D
a
SMD refers to the solution method subtract-multiply-divide and MSD refers to the solution method multiplysubtract-divide.
test items, unrelated, had surface characteristics that had not been presented with either type (no parentheses and letters h–k). Half of
the unrelated test equations could be solved by SMD and half by MSD. These unrelated eight items examine whether the solution
TABLE 3 SAMPLE TEST MATERIALS FOR EXPERIMENT 6 Condition Solution methoda
Group XY
Group YX
9z (g / t) /sÅ a c
SMD
Consistentb
Inconsistentc
nz / 2 Å g / pt a
SMD
Inconsistent
Consistent
4z Å i / 3j k
SMD
Unrelatedd
Unrelated
Sample equation
SD SD h/
a To illustrate the materials, only the SMD test equations are given, but half the test equations had the solution method MSD. b Consistent means that the solution method is consistent with the surface characteristics used with this solution method at study. As can be seen in Table 2, Group XY had the solution method SMD with two parentheses and letters from early in the alphabet. c Inconsistent means that the solution method is inconsistent with the surface characteristics used with this solution method at study. d Unrelated means that the surface characteristics (no parentheses and letters between h and k) were not presented at study, so the solution method is the only available means of classification.
AID
JML 2515
/
a00d$$$$$1
07-23-97 09:50:12
jmla
AP: JML
255
CATEGORY USE AND CLASSIFICATION
method might be used when no other means of classification are available. All study and test equations were handwritten on a 5 in. 1 7 in. (12.7 cm 1 17.8 cm) index card. Design. All subjects received the same materials at study and test and the study classification was the same. The only manipulation was what variable was solved for following classification. Group XY solved Type 1 equations for x and Type 2 equations for y (therefore using solution methods SMD and MSD, respectively), whereas Group YX solved Type 1 equations for x and Type 2 equations for y (MSD and SMD, respectively). Procedure. All subjects were told that the experiment examined how people learn new categories and that the equations were of two types. In addition, following each classification, subjects were to solve the equation for either x or y, depending on the category-variable pairing indicated on the sheet in front of them (which remained visible throughout the study phase of the experiment). Subjects were told that after learning the types of equations, they would be given new equations to classify. On each study trial, the subjects received a card with an equation and classified it as a Type 1 or Type 2. They were given feedback on this classification and then required to solve the equation for the appropriate variable. The solutions were written on a sheet of paper. The study phase consisted of two study blocks with eight cards per block. The eight equations in each study block were randomly ordered, with four of each type. For the test phase, the 24 test equations were randomly ordered and handed to the subjects one at a time. For each test item, the subjects responded with the type they thought it was likely to be. Results and Discussion Learning was very difficult, with proportions correct of just .43 and .50 for the two blocks (with a chance level of .50). However, by the last four trials of the second block, the proportion correct was .61. For the consistent tests, in which the solu-
AID
JML 2515
/
a00d$$$$$1
tion method and surface characteristics both indicated the same type, subjects were correct on .63 of the items, t(19) Å 2.33, p õ .05. The other tests showed a significant proportion of the responses were made on the basis of the solution method, not the surface characteristics. For the unrelated tests, which used surface characteristics that had not been presented at study, .62 of the responses were consistent with the solution method, t(19) Å 2.30, p õ .05. Most interesting, even for the inconsistent tests, in which the solution method and surface characteristics suggested different categories, subjects responded on the basis on the solution method .64 of the time, t(19) Å 2.77, p õ .05. Thus, even though the learning was poor, the tests indicate that subjects had incorporated the solution method into the category representation. It was used in classification not only when there were no surface characteristics available (the unrelated tests), but also when the surface characteristics were consistent with the other equation type (the inconsistent tests). Experiment 6 shows that even when the use of the category does not involve a prediction or an additional classification, the use may affect later classifications. Although the two groups classified the equations in exactly the same way, the classifications led them to solve the equations for different variables, which led to very different category representations. The category use effect occurred even with the poor learning, which might be expected to attenuate the results. Experiment 7 examines the same issue in a very different paradigm. EXPERIMENT 7 In Experiment 7, an easier classification task was used to improve learning performance, plus a new paradigm was introduced to increase generality. Subjects were told that they were clerks in an intelligence-gathering operation. Spies would send them coded messages consisting of letters and numbers. Their job was to determine which spy sent the message (classification) based on the letters and then, depending on the spy, use the appropriate decoding formula on the numbers. The
07-23-97 09:50:12
jmla
AP: JML
256
BRIAN H. ROSS
decoding of the messages led them to make use of different relations among the numbers (products and quotients) and the question of interest was whether these relations were incorporated into the category representation and affected later classification. Two aspects of this design are particularly important. First, the classification and use were based on different parts of the message and subjects were explicitly told this information. Second, as will be explained in more detail in the method section, the critical relations among the numbers were true of all the coded messages. Thus, if different relations were incorporated into the category representations of different spies, it had to be a function of the use. Method Subjects. The subjects were 20 University of Illinois undergraduates who participated for pay. The sessions lasted from 30 to 45 min. Materials. The study and test materials were coded messages from fictitious spies. The study codes were each two letters followed by six numbers. There were two spies, Spy A and Spy B. Spy A messages all began with SF or FS, whereas Spy B message all began with PD or DP. (Another experiment is mentioned later that used a more difficult spy classification scheme but led to the same results). Depending on the spy, subjects then had to apply one of the decoding formulas given in Table 4. For example, if the coded message was FS342526 for Spy B, then the decoding formula (sixth / second/fifth / fourth / third) would lead to (6 / 4/2 / 5 / 2 Å 15). The six-number sequences for all messages had the following two constraints. First, the sixth number times the third number was always equal to 12. Second, the quotient of the second and fifth numbers was always equal to 2. Table 4 gives some examples. There were eight study items for each spy, divided into two sets of four. The study phase consisted of two blocks of eight codes. There were 24 test items, which were divided into three types: consistent, conflict, and no-letters. The consistent tests were exactly
AID
JML 2515
/
a00d$$$$$1
as the study items. The other test items incorporated only one of the number-relation constraints (i.e., the product was 12 or the quotient was 2). In the conflict items, the code included one letter that was predictive of each spy (e.g., PF). In the no-letter tests, dashed lines were included instead of letters (subjects were told that the letters had been lost due to garbled transmission). Examples of these tests are given in Table 4. All study and test codes were handwritten on a 5 in. 1 7 in. (12.7 cm 1 17.8 cm) index card. Design. All subjects received the same coded messages and made the same classifications. As a counterbalancing variable, half the subjects used each decoding formula for Spy A and the other for Spy B. If the relations among the numbers had been learned from using the decoding formulas and incorporated into the category reprsentations, then classification test responses should be a function of the constraint among the numbers (i.e., whether the product is 12 or the quotient is 2). All the study messages had both these constraints, so the prediction is a claim about how the use leads subjects to learn some aspect of the item relative to other aspects. Procedure. All subjects were told that they should think of themselves as being clerks in an intelligence-gathering operation and that their job was to determine which spy had sent the coded message, and then apply the appropriate decoding formula so that the resulting number could be passed on to their supervisor for further decoding. They were told that the letters in the message could be used to determine which spy had sent it. The decoding formula for each spy was typed on a piece of paper that remained in front of the subject during the study phase. Subjects were told that the letters would indicate which spy had sent the message and that the numbers were to be used for the decoding the message. For each study trial, the subject was handed a code and classified it as to which spy sent it. Feedback was given and then subjects applied the appropriate decoding formula, writing the parts and answer down on a piece of
07-23-97 09:50:12
jmla
AP: JML
257
CATEGORY USE AND CLASSIFICATION TABLE 4 SAMPLE MATERIALS FOR EXPERIMENT 7 Decoding formula (these refer to the position of the number in the message) Spy A: 2nd / (6th 1 3rd) / 1st / 5th Spy B: 6th / (2nd/5th) / 4th / 3rd Sample study coded messages Spy A: PD286742 Spy B: SF266632
6th 1 3rd 12 12
2nd/5th 2 2
Test condition Consistenta Conflictb No-letter c
Sample messages PD383144 FD642536 ––384233
6th 1 3rd 12 12 12
2nd/5th 2 1.33 2.67
a Consistent means that the coded message is just as in study, so that the number relations are consistent with both spies, though the letters are consistent with just one spy (in this example, Spy A). b Conflict means that the letters are each predictive of different spies. Only one of the number relations from study is present (in this example, the product of the 6th and 3rd number equalling 12, as predicted to be noticed for Spy A from study). c No-letter means that the letters were not presented. Only one of the number relations from study is present (in this example, the product of the 6th and 3rd number equaling 12, as predicted to be noticed for Spy A from study).
paper. When they had finished, the next item was presented. For each test trial, the subject was handed a code and had to determine which spy had sent it. They were told that some codes might have missing or incorrect letters due to garbled transmission. Results and Discussion The classification learning performance in this experiment was much higher than in Experiment 6, with proportions correct of .77 and .88 on the two blocks. The consistent tests, in which the letters and numbers were exactly as in the study phase, led to correct classification .80 of the time, t(19) Å 6.10, p õ .001. The main question of interest is how subjects classified the codes when the letters were not informative. In both types of tests, the relations among the numbers had a large influence on the classification. For the conflict trials, .79 of the classifications were what would be predicted if subjects were using the number relations (i.e., the product of 12 or quotient of 2), t(19) Å 5.99, p õ .001. For the no-letter tests, .76 of the classifications were ones based on the number relations, t(19)
AID
JML 2515
/
a00d$$$$$1
Å 4.56, p õ .001. Note that this effect of use occurred despite the fact that the numbers were separate from the predictive classification letter cues and that subjects were told at the beginning of the experiment that the letters would indicate which spy had sent the message. This experiment used two letters that were each perfectly predictive of the category to make the classification learning easy. However, the results do not depend on this simple classification. In another experiment, which is not presented in detail to save space, the codes consisted of three letters (two of which were predictive of the spy and one of which was not) and eight numbers. Although learning performance was much lower (.69 on the second block), the two test types with uninformative letters showed reliable effects of the number relations on classification (.73 for each, t(19) Å 4.22, p õ .001 for the conflict tests and t(19) Å 3.89, p õ .001 for the no-letter tests). Experiment 7 replicates the effect found in Experiment 6 with a different paradigm and without the interpretation problems caused by poor learning. Even when the use of the cate-
07-23-97 09:50:12
jmla
AP: JML
258
BRIAN H. ROSS
gory is not a prediction (or another classification), the use may affect later classifications. GENERAL DISCUSSION The results are briefly summarized, and then I discuss work related to this investigation. The remaining discussion focuses on the implications of these findings for theories of classification, the advantages of category use influences, and a consideration of problem solving and classification. Summary of the Results The seven experiments presented here investigate the effect of category use during learning on classification. Experiment 1 demonstrated the basic category use effect: Symptoms that were predictive of the treatments came to be viewed as more predictive of the disease category than were symptoms not predictive of the treatments. This effect occurred even though both types of symptoms were equally predictive of the disease category and were presented equally often. The next two experiments explored whether specific characteristics of the design might be crucial for getting this effect, and the effect was obtained even when each of these characteristics were changed. In Experiment 2, the symptoms were not perfectly predictive, but the relevant-use symptoms still had an advantage. In Experiment 3, a test was made as to whether the special disease-treatment relation was crucial, by changing the treatment to an arbitrary category use. The category use effect still was found. Thus these first three experiments show that the effect can be obtained across a number of changes in the design. The next two experiments changed the dependent measure to test whether what was being learned from the category use was specific to the classification of the symptoms or was such that the relevant-use symptoms might be more available for other tasks as well. In Experiment 4, subjects generated as many symptoms as they could that were likely to occur with each disease. Relevant-use symptoms were more likely to be generated than irrelevant-use symptoms and were generated earlier
AID
JML 2515
/
a00d$$$$$1
in the list. In Experiment 5, subjects estimated the frequency with which the symptoms occurred throughout learning, a task that did not require the determination of what category the symptoms were associated to. Even so, relevant-use symptoms were given higher frequency judgments. These experiments suggest that the category use effect was not restricted to classification or the particular use. The final two experiments extended this work to new paradigms in which the use required simple problem solving. In Experiment 6, the solution of equations led the subjects to learn about the particular solution method used for each category of equations, and this knowledge was incorporated into the category representation such that it affected later classifications. In Experiment 7, the decoding of messages led the subjects to learn about the particular relation among the numbers in the message that were important for decoding each spy’s message (even though the same relations were present for the other spy as well). These experiments provide an important preliminary investigation of how category use might affect later classification. The basic category use effect does not depend upon very specific aspects of the design, and the features that are important for the use become more available for other tasks as well. In addition, the effect occurs when the use is a feature prediction (or classification) or when it is more of a problem solving use. Related Work A discussion of relevant work was postponed until after the experiments had been presented so that the reader might better understand the investigation. The current experiments show that when people learn to classify and use a category, the use may affect classification. I know of no other work that directly examines such a situation, but it is instructive to consider how it differs from some related research. I briefly discuss work that examines how categories are used and the means by which classification can be affected by classification-related processing, and then focus on
07-23-97 09:50:12
jmla
AP: JML
259
CATEGORY USE AND CLASSIFICATION
research investigating how nonclassification experience may affect classification. Category use. A variety of recent work has examined how categories are used to make predictions. Osherson et al. (1990) focused on how information about some categories may be combined to make inductions about other categories. Gelman and Markman (1986, 1987) investigated the way children use knowledge and perceptual similarities in making inferences from a known category to a new category member, as well as how different properties may be used in making different inferences (Kalish & Gelman, 1992). Other research has examined cases in which people need to make feature predictions from uncertain classifications (e.g., Anderson, 1991; Anderson & Fincham, 1996; Heit, 1992; Malt, Ross, & Murphy, 1995; Murphy & Ross, 1994; Ross & Murphy, 1996). Although these research projects do focus on how categories are used to make predictions, they differ from the present investigation in that they do not examine learning nor do they investigate how making predictions may affect later classifications. Classification research. Some category learning research has investigated how classification knowledge is affected by the goals or knowledge of the learner, and by classification performance. Although, this work has not examined the effects of using categories, it does provide some ideas of how classification knowledge can be affected by information about how the categories might be used, as well as by classification. The goal and strategies of the learner affect the similarities among the instances and can also affect classification (e.g., Brooks, 1987; Jacoby & Brooks, 1984; Kemler Nelson, 1988; Lamberts, 1994; Medin & Smith, 1981; Ward & Becker, 1992; Wattenmaker, 1991). Different learning methods affect what knowledge is acquired about categories (e.g., Elio & Anderson, 1984; Markman et al, 1997; Medin & Smith, 1981). Waldmann and Holyoak (1992) show that even whether the subject views the classification task as prediction or diagnosis can affect what is learned. Back-
AID
JML 2515
/
a00d$$$$$1
ground knowledge that is brought to bear during classification can also greatly affect performance (e.g., Murphy & Allopenna, 1994; Spalding & Murphy, 1996; Wattenmaker, Dewey, Murphy, & Medin, 1986; Wisniewski, 1995). There is also much evidence that classification performance can lead to major changes in the representation of the category. For example, attention weights can change with experience in classification, affecting the graded structure and even which category an item is classified as being a member of (e.g., Kruschke, 1992; Medin & Schaffer, 1978; Nosofsky, 1986). In addition, the particular comparisons made during the classification can affect later classifications (Medin & Edelson, 1988; Ross, Perkins, & Tenpenny, 1990: Spalding & Ross, 1994). The results in classification show that classification performance can be affected by both top-down knowledge about the domain or task, as well as by information gained from earlier classifications. The experiments presented here extend these findings to show that the use of a category during learning may also affect later classification. The effects of nonclassification performance. Several research projects have investigated how experience with categories may affect classification performance. Included under this idea are the work on unsupervised learning and expertise effects, as well as work examining the influence of goals and interactions with instances. In unsupervised learning tasks, the categories are not provided and the learner has to either construct them or make inferences about the items (e.g., Anderson, 1991; Billman & Heit, 1988; Billman & Knutson, 1996; Clapper & Bower, 1994; Fried & Holyoak, 1984; Heit, 1992; Lassaline & Murphy, 1996; also see the chapters in Fisher, Pazzani & Langley, 1991). The motivation for this work is the claim that much of our natural learning of categories occurs without explicit labelling and feedback. Although this work does show that experience with the items in categories can affect classification and prediction, it is
07-23-97 09:50:12
jmla
AP: JML
260
BRIAN H. ROSS
rather different from the use of categories studied here. In the unsupervised learning situations, the learner has no classification feedback that can be used to help understand the categories and, therefore, must rely solely upon the knowledge gained from the interactions. Learners cannot make use of the categories, and there is no opportunity in this work to examine the effect on classification when it is learned in the context of a larger task. The work on expertise provides a number of cases in which experience within a domain may affect classification, but little is known about the learning of the classification or the role of the various nonclassification tasks in the domain. Tanaka and Taylor (1991) found that subjects who had extensive experience with a domain (e.g., dogs) used the subordinate level (e.g., collie) for naming, and their categorization reaction times were equally fast for subordinate and basic level categories (see also Dougherty, 1978). In problem solving domains, experts often have learned to use deeper, less surface-based features to classify problems (e.g., Chi, Feltovich, & Glaser, 1981). Presumably, extensive experience with solving problems of these types have allowed the experts to more clearly understand what features and relations are crucial to the type. The effect of experience is not only to add further distinctions to the classifications. Experts can sometimes see the categories as less distinctive, as their experiences lead them to note a number of underlying commonalities that may not be apparent to less experienced people (Murphy & Wright, 1984). Related work in cognitive anthropology (e.g., Boster & Johnson, 1989; Hunn, 1985) also suggests that when people make use of categories it may influence their category representations (also see the review by Malt, 1995). Although this expertise work does suggest that experience with using categories may affect subsequent classifications, it is not possible to isolate what has led to the expert-novice differences in classification. Experts certainly make use of categories, but they also make many classifications within the domain (as well as have much other domain knowledge
AID
JML 2515
/
a00d$$$$$1
learning experience). Thus, these findings do not allow an unambiguous interpretation that it is the category use that is affecting subsequent classification. To make such an interpretation, the learning needs to be under experimental control, such as in the experiments reported here. Barsalou (1983, 1985, 1991) has examined how the uses one has in mind may affect the classification. Goal-derived categories are ones in which the members could all be used to accomplish the same goal (such as ‘‘foods to eat on a diet’’). This work is among the most direct in examining the influence of uses on classification, but has not investigated how the use of a category may affect its representation or how uses may affect the learning of a category. Two projects are particularly related to the current project in that they examine the effects of nonclassification tasks on subsequent classification. First, Markman et al. (1997; see also Yamauchi & Markman, 1996) showed that a feature prediction task led to greater learning about a particular aspect of the category structure, the feature relations, than did classification learning. Second, as mentioned earlier, Ross (1996) found that using items during classification learning can affect later classifications. Subjects learned to classify simple equations (e.g., a / [bx/c] Å p), received feedback, and then solved each equation or did not. There were a number of properties that could be used to classify the items perfectly, such as the number of parentheses, letters, and mathematical structure. The subjects who had to solve each equation were much more likely to classify later equations on the basis of the mathematical structure (though they also gained information about the parentheses and letters). In addition, different uses of the equations led to different classifications (Experiments 2a and 2b), so the effect was not simply due to increased processing. These experiments show that the interactions with items can affect classification, but the solution procedure did not require knowledge of category membership (all the equations could be solved by preexperimental
07-23-97 09:50:12
jmla
AP: JML
261
CATEGORY USE AND CLASSIFICATION
knowledge). Thus, this research did not examine a case in which the category was used to accomplish some additional goal. That was the purpose of the present experiments. As can be seen in this brief review, much work examines how categories are used, the effects of classification learning, and the effects of nonclassification performance on classification. However, the present investigation is, I believe, the first learning experiment to examine how the use of categories affects the category representation, including knowledge used to make classifications. Implications for Classification Theories Because this was a preliminary investigation of the effects of category use, the design and manipulations were kept simple. In the first five experiments, each use depended on a single symptom, and the use did not require the subjects to manipulate the presented information. Because of this simplicity, it seems likely that all of the results for these five experiments can be explained by an increased weighting of the relevant-use symptoms relative to the irrelevant-use symptoms, with some increased connection to the disease category as well. This increased weighting would allow relevant-use symptoms to be better classified, generated more readily, and given higher frequency judgments. A later section considers other possible ways in which category use might affect category-related judgments, such as in Experiments 6 and 7, but how might even the simple effect of weighted features be accommodated by current theories? Most current theories of classification have a means by which some features can become more weighted (attended to, influential) than other features (e.g., Gluck & Bower, 1988; Kruschke, 1992; Medin & Schaffer, 1978; Nosofsky, 1986). Because these theories focus on classification tasks in which classification is the goal, the mechanisms used rely upon feedback about the classification (or on prior knowledge or salience). The experiments reported in this paper suggest that classification weights can also be affected by the use made of the category. Thus, the theories need some
AID
JML 2515
/
a00d$$$$$1
way to make use of the nonclassification feedback to affect the weights of the features for classification. However, these theories are constructed to account for classification, not category use, and so have no simple way to incorporate such nonclassification effects. To try to make the problem clearer, I will discuss a simple prototype view and why it would have problems accounting for these results. I then consider two exemplar views, ALCOVE and MINERVA, with extensions that might be able to deal with category use effects. Prototype theories. Consider a simple prototype theory in which the features that occur most often in a category are weighted most in the summary representation (e.g., Reed, 1972; Rosch, 1978; or see proposals in Smith & Medin, 1981). For relevant-use symptoms to be more accurately classified than irrelevantuse symptoms, the summary representation needs to have the relevant-use symptoms more heavily weighted. However, because relevantuse symptoms and irrelevant-use symptoms occur equally often with the category (and have equal cue validity and category validity), there is no means for predicting differential classification. The difficulty with getting this differential weighting is that the weights depend on the classification decisions and the two types of symptoms are exactly the same with respect to the classification decisions. One possibility might be to consider treatments as another feature of the disease categories and to note that the relevant-use symptoms and treatments are perfectly correlated. Thus, prototype theories that retain some feature co-occurrence information (e.g., Hanson & Bauer, 1989) or feature frequency theories (e.g., Neumann, 1974) might be able to make use of these pairings. The problem is that while each of the relevant-use symptoms always occurs with a particular treatment, each of the irrelevant-use symptoms occurs half as often for each disease treatment but with twice as many treatments. That is, the two kinds of symptoms still occur equally often. Thus, even with this extension, one also needs some nonlinearity in going from frequency to weight in the representation. This
07-23-97 09:50:12
jmla
AP: JML
262
BRIAN H. ROSS
nonlinearity could be incorporated, but it would be the opposite of the usual finding that subjective frequency is a negatively accelerated function of presented frequency (e.g., Hintzman, 1976). (That is, if the subjective frequencies were computed separately for each treatment and then added, the irrelevantuse symptoms would be judged as having occurred more frequently.) What is needed is some way to make the relevant-use symptoms’ relevance to the treatment decisions be reflected in the summary representation for the classification. Currently, prototype views have no proposal for how the use can affect the weightings for the summary representation. ALCOVE. Exemplar theories often provide better accounts of the many classification results than do prototype views (see Medin & Smith, 1984; Nosofsky, 1992; Ross & Makin, in press, for reviews), plus they have been more formally developed. Thus, we consider here two well-known exemplar models, ALCOVE (Kruschke, 1992) and MINERVA 2 (Hintzman, 1986, 1988). ALCOVE is closely related to the Context Model (Medin & Schaffer, 1978) and the Generalized Context Model (Nosofsky, 1986, 1988). Those earlier models allowed for different weightings among the features, but ALCOVE extended them to provide a connectionist learning scheme for how the weights may be changed as a function of classification feedback. In this model, the features each connect to hidden units that are exemplar-like representations, which in turn connect to category nodes. The present model does not have a mechanism for modifying feature weights except through classification decisions. Classification feedback drives the learning by changing the weights on these two types of connections. If this model is extended to include other category uses besides classification, then the classification will have to be modifiable by feedback about some of these other uses. This extension could be made in a number of ways. For example, one possible modification would be to include separate treatment nodes (with activation from their appropriate disease) that are also connected to these same hidden units. Then treatment
AID
JML 2515
/
a00d$$$$$1
choices would influence the symptom-to-hidden unit connections, which would influence classification. This scheme might deal with the simple case examined here, but, even if it does, it is not clear how best to incorporate more complex use effects, such as the ones in Experiments 6 and 7. MINERVA 2. Hintzman (1986, 1988) has proposed an exemplar theory, MINERVA 2, which accounts for a variety of classification and memory results. Each experienced event is represented as a vector of features. Longterm memory consists of a large unordered collection of these vectors. A memory probe consists of another vector that activates all memory traces in parallel, with traces more similar to the probe being activated more. The sum of these traces, weighted by their activation, is returned as a single vector, called the echo. This echo can be viewed as the response to the probe and consists of both a content (the vector values) as well as an intensity (reflecting the overall activation of memory to this probe). Note that because the activation is a nonlinear (cubed) function of the similarity between the probe and each trace, the echo is heavily weighted by the traces most similar to the probe. This simple model accounts for a variety of classification results (see Hintzman, 1986), but it does not predict the current findings. Because all features of an event, such as the symptoms of a patient, are encoded in the same trace (except for some independent encoding probability), there is no basis for getting different echo intensities as a function of relevant-use or irrelevant-use symptoms. That is, one relevant-use symptom might activate some traces from a disease and the other relevant-use symptom for that disease would activate the other traces from the disease, but these same traces would be equally activated by the two irrelevant-use symptoms for that disease. The echo intensity is a function of how similar the probe is to the traces and the frequency of similar traces, but these factors do not differ for relevant-use and irrelevantuse symptoms. Because relevant-use and irrelevant-use symptoms were orthogonal in the
07-23-97 09:50:12
jmla
AP: JML
263
CATEGORY USE AND CLASSIFICATION
design with a relevant-use and irrelevant-use symptom for each patient, the simple model cannot predict the results. However, Hintzman (1986, 1988) proposed an elaboration of MINERVA 2 to account for similarity effects in classification and memory, which he calls intertrace resonance. In this extension, the activation of a trace is not solely a function of the overlap with the probe, but rather each trace gets further activation from other traces. This secondary activation that a trace receives is a function of both overlap between the traces and how much the other trace is activated. This resonance leads to greater activation when the traces activated by the probe are also similar to one another (also see Heit, 1992). This extension can account for most, though not all, of the findings from the first five experiments. A relevant-use symptom would activate traces which not only had this symptom and the disease, but also had the same treatment (while the irrelevantuse symptoms would activate traces that have half same and half different treatments). Thus, the same treatment would mean that the activated traces are more similar for the relevantuse symptoms, so the echo intensity would be greater than for the irrelevant-use symptoms. This increased echo intensity would lead to more accurate classification and higher frequency judgments. However, it is not clear that this intertrace resonance would allow MINERVA 2 to predict the generation results (Experiment 4). Here, the probe consists of the disease, so the echo content would consist of the different symptoms that occur in traces with the disease. Because the relevant-use and irrelevantuse symptoms are represented in the same set of traces for a disease, any highly activated trace with a relevant-use symptom will also have an irrelevant-use symptom. Even the intertrace resonance will not help here, because all the traces for a disease are being activated, not just ones for a particular treatment. If the traces of one particular treatment are activated more, then intertrace resonance might lead to the relevant-use symptom being most fully represented in the echo, but the two irrelevant-
AID
JML 2515
/
a00d$$$$$1
use symptoms would be more represented than the other relevant-use symptom. (This is true since they each occur half the time with the strong relevant-use symptom, while the other relevant-use symptom never occurs with the strong relevant-use symptom.) Thus, MINERVA 2 has already been extended in a way that accounts for many, but not all, of the current findings. The problem here is different from that of the other theories considered. In those cases, the difficulty was that there was no current means by which nonclassification uses could affect the classification. Because MINERVA 2 keeps all classification and nonclassification information together in each trace and calculates the classification at the time of test, it does allow nonclassification knowledge to affect classification. However, it does not currently have a means by which particular features of an item could be differentially weighted. Adding such a mechanism that would also be consistent with the exemplar view might be an interesting avenue for further research. For example, Spalding and Ross (1994) showed that as people are learning categories, they compare instances to similar earlier instances, and features common to those instances are given more weight in the representation. (See also Medin & Edelson, 1988; Ross et al., 1990.) These comparisons lead to more weighting being given to features that are more likely to be influencing the classification. The treatment can be viewed as a second classification and, with more experience, the relevant-use symptoms would be more heavily weighted than the irrelevant-use symptoms. MINERVA 2 could use a similar mechanism (see Hintzman, 1986, for a brief consideration) by having the stored trace be not just a function of what is presented, but also be influenced by what information it retrieved (i.e., the echo). This idea is similar to that of exemplar-guided encoding (e.g., Jacoby & Brooks, 1981; Medin & Bettger, 1994; Medin & Florian, 1992). In summary, classification theories need to be extended to allow other category-related processes to affect the knowledge used to
07-23-97 09:50:12
jmla
AP: JML
264
BRIAN H. ROSS
make classifications. Instead of viewing classification as a separable process, we need to understand how classification interacts with the other functions of categories. Advantages of Category Use Influences One view of the category use effect is that it is non-normative.3 That is, the relevant-use and irrelevant-use symptoms are equally predictive of the category and equally frequent, so it is not adaptive to have the relevant-use symptoms be viewed as more central to the disease. Although this may be true for classification, I want to point out two advantages of the category use effect for performance and learning. First, for performance, the question is whether one considers the classification performance alone or the combination of classification and use. The relevant-use symptoms are no more central to the classification, but they are much more important for the use than the irrelevant-use symptoms. Thus, if one is trying to perform well in both tasks, focussing on the relevant-use symptoms may cost only a little for classification while gaining much for use. Second, although it might seem that such effects lead to a more complicated theory of classification learning, that is not necessarily true. It may be that considering the use of categories helps to provide constraints on classification learning as well, as suggested in the introduction. For example, for very complex stimuli, the number of features that could be important for classification is very large. However, if the features involved in the use of the category may be more likely to also be useful for classification (compared to other features), then the use may help learners to focus more quickly on relevant features. In this way, the category use may provide a heuristic for focusing on use-relevant features. This section is not meant to provide a definitive adaptiveness argument for why category use may influence 3
I thank Edward Smith and Jerome Busemeyer for bringing up these issues.
AID
JML 2515
/
a00d$$$$$1
classification, but rather to suggest that this influence may have some advantages. Problem Solving and Classification To understand the importance of category use, it may be useful to broaden the usual discussion of categories to consider other domains in which categories are crucial. Problem solving is very different from the usual experimental classification tasks, but it provides a clear case in which categories are used to accomplish some nonclassification goal (e.g., Lewis & Anderson, 1985). In many problem solving domains, problem classification is crucial. In many learning situations in these domains, determining the type of problem is often the main obstacle to solving it, and great differences in problem classification are found between novices and experts. For example, experts are generally assumed to have problem schemas that allow them to identify the problem type and that contain associated procedures for solving problems of that type (e.g., Chi et al., 1981). Novices are not yet able to identify the problem types, and often use superficial features for classifying problems rather than using the correct, deeper features. The learning of problem categories differs from learning in the usual classification paradigm in three important ways. First, the goal is not to learn to classify the problems, but rather the goal is to learn to solve the problems. The problem classifications are learned so that rather than solve each problem from scratch, solvers can classify new problems and then use knowledge about that problem type to solve them. Second, feedback about the classification is not always immediate or even direct. The solvers may get the incorrect solution and only then find out they had the wrong problem category. In some cases, finding out they had the incorrect solution may not allow the solvers to be sure whether their classification was correct or not. Third, the use of the problem category is not usually simple, but often involves an extensive instantiation of procedures with information from the particular problem. In applying the procedures to the current problem, there is much opportunity to
07-23-97 09:50:12
jmla
AP: JML
265
CATEGORY USE AND CLASSIFICATION
see what features and relations are important in the solution and why. Thus, the use of the problem category may lead one to re-weight the importance of features, see relations among features, or even re-interpret what some of the features are. This last effect of category use may lead experts to rely upon different (and deeper) features for classification than do novices. The final two experiments, which involved simple problem solving tasks, provide a beginning for examining more complex effects. In Experiment 6, the subjects incorporated aspects of the solution method into their category representations as a function of the use of the categories (i.e., whether to solve for x or y). In Experiment 7, the decoding of the message led subjects to incorporate the relations of numbers that were relevant to the particular decoding formula of each spy. The goal of including such studies and discussing classification and category use in problem solving is to suggest that classification research might profitably be applied to more complex domains and that such applications might help to extend ideas in classification. Conclusions Category learning consists of learning to classify and learning to make use of these classifications. The use of categories is an important issue in a variety of domains, but it has not received much examination. The experiments in this article included category uses of feature prediction and problem solving and showed that the use of the category may affect the category representation, including knowledge used to make later classifications. REFERENCES Anderson, J. R. (1991). The adaptive nature of human categorization. Psychological Review, 98, 409–429. Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. The Journal of the Learning Sciences, 4, 167–208. Anderson, J. R., & Fincham, J. M. (1996). Categorization and sensitivity to correlations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 259–277. Barsalou, L. W. (1983). Ad hoc categories. Memory & Cognition, 11, 211–227.
AID
JML 2515
/
a00d$$$$$1
Barsalou, L. W. (1985). Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 629–654. Barsalou, L. W. (1991). Deriving categories to achieve goals. In G. H. Bower (Ed.), The psychology of learning and motivation, vol. 27. New York: Academic Press. Billman, D., & Heit, E. (1988). Observational learning from internal feedback: a simulation of an adaptive learning method. Cognitive Science, 12, 587–625. Billman, D., & Knutson, J. (1996). Unsupervised concept learning and value systematicity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 458–475. Bock, K. (1995). Sentence production: From mind to mouth. In J. L. Miller, & P. D. Eimas (Ed.), Handbook of perception and cognition. Vol 11: Speech, language, and communication (pp. 181–216). Orlando, FL: Academic Press. Boster, J. S., & Johnson, J. C. (1989). Form or function: A comparison of expert and novice judgments of similarity among fish. American Anthropologist, 91, 866–889. Brooks, L. (1987). Decentralized control of categorization: The role of prior processing episodes. In U. Neisser (Ed.), Concepts and conceptual development: ecological and intellectual factors in categorization (pp. 141–174). New York: Cambridge Univ. Press. Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121–152. Clapper, J. P., & Bower, G. H. (1994). Cateogry invention in unsupervised learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 443–460. Dell, G.S. (1986). A spreading activation theory of retrieval in sentence production, Psychological Review, 93, 283–321. Dougherty, J. W. D. (1978). Salience and relativity in classification. American Ethnologist, 15, 66–80. Elio, R., & Anderson, J. R. (1984). The effects of information order and learning mode on schema abstraction. Memory & Cognition, 12, 20–30. Estes, W. K. (1994). Classification and cognition. New York: Oxford Univ. Press. Fisher, D. H. Jr., Pazzani, M. J., & Langley, P. (Eds.). Concept formation: Knowledge and experience in unsupervised learning. San Mateo, CA: Morgan Kaufmann. Fried, L. S., & Holyoak, K. J. (1984). Induction of category distributions: A framework for classification learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 239–257. Gelman, S. A., & Markman, E. M. (1986). Categories and induction in children. Cognition, 23, 183–208.
07-23-97 09:50:12
jmla
AP: JML
266
BRIAN H. ROSS
Gelman, S. A., & Markman, E. M. (1987). Young children’s inductions from natural kinds: The role of categories and appearance. Child Development, 58, 1532–1541. Gluck, M. A., & Bower, G. H. (1988). Evaluating an adaptive network model of humn learning. Journal of Memory and Language, 27, 166–195. Hanson, S. J., & Bauer, M. (1989) Conceptual clustering, categorization, and polymorphy. Machine Learning, 3, 342–372. Heit, E. (1992). Categorization using chains of examples. Cognitive Psychology, 24, 341–380. Heit, E. (1994). Models of the effects of prior knowlege on category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1264–1282. Hintzman, D. L. (1976). Repetition and memory. In G.H. Bower (Ed.), The psychology of learning and motivation, vol. 10 (pp. 47–91). New York: Academic Press. Hintzman, D. L. (1986). ‘‘Schema abstraction’’ in a multiple-trace model. Psychological Review, 93, 411– 428. Hintzman, D. L. (1988). Judgements of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528–551. Hunn, E. (1985). The utilitarian factor in folk biological classification. In J. W. D. Dougherty (Ed.), Directions in cognitive anthropology. Urbana, IL: University of Illinois Press. Jacoby, L. L., & Brooks, L. R. (1984). Nonanalytic cognition: Memory, perception, and concept learning. In G. H. Bower (Ed.), The psychology of learning and motivation, vol. 20. New York: Academic Press. Kalish, C. W., & Gelman, S. A. (1992). On wooden pillows: Multiple classification and children’s categorybased inductions. Child Development, 63, 1536– 1557. Kemler Nelson, D. G. (1988). The effect of intention on what concepts are acquired. Journal of Verbal Learning and Verbal Behavior, 23, 734–759. Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44. Lamberts, K. (1994). Flexible tuning of similarity in exemplar-based categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1003–1021. Lassaline, M. E., & Murphy, G. L. (1996). Induction and category coherence. Psychonomic Bulletin & Review, 3, 95–99. Lewis, M. W., & Anderson, J. R. (1985). Discrimination of operator schemata in problem solving: Learning from examples. Cognitive Psychology, 17, 26–65. Malt, B. C. (1995). Category coherence in cross-cultural perspective. Cognitive Psychology, 29, 85–148. Malt, B. C., Ross, B. H., & Murphy, G. L. (1995). Making predictions using uncertain natural categories. Jour-
AID
JML 2515
/
a00d$$$$$1
nal of Experimental Psychology: Learning, Memory, and Cognition 21, 646–661. Markman, A. B., Yamauchi, T., & Makin, V. S. (1997). The creation of new concepts: A multifaceted approach to category learning. In. T. B. Ward, S. M. Smith, & Vaid, J. (Eds.), Conceptual structures and processes: Emergence, discovery, and change. Washington, DC: American Psychological Association. Medin, D. L., & Bettger, J. G. (1994). Presentation order and recognition of categorically related examples. Psychonomic Bulletin & Review, 1, 250–254. Medin, D. L., & Edelson, S. (1988). Problem structure and the use of base rate information from experience. Journal of Experimental Psychology: General, 117, 68–85. Medin, D. L., & Florian, J. E. (1992). Abstraction and selective coding in exemplar-based models of categorization. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes, volume II (pp. 207–234). Hillsdale, NJ: Erlbaum. Medin, D. L., & Ross, B. H. (1989). The specific character of abstract thought: Categorization, problem-solving, and induction. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence: Vol. 5 (pp. 189–223). Hillsdale, NJ: Erlbaum. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207–238. Medin, D. L., & Smith, E. E. (1981). Strategies and classification learning. Journal of Experimental Psychology: Human Learning and Memory, 7, 241–253. Medin, D. L., & Smith, E. E. (1984). Concepts and concept formation. Annual Review of Psychology, 35, 113–138. Murphy, G. L., & Allopenna, P. D. (1994). The locus of knowledge effects in concept learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 904–919. Murphy, G. L., & ROSS, B. H. (1994). Predictions from uncertain categorizations. Cognitive Psychology, 27, 148–193. Murphy, G. L., & Wright, J. C. (1984). Changes in conceptual structure with expertise: Differences between real-world experts and novices. Journal of Experimental Psychology: Learning, Memory, and Cognition, 1, 144–155. Neumann, P. G. (1974). An attribute frequency model for the abstraction of prototypes. Memory & Cognition, 2, 241–248. Nosofsky, R. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39–57. Nosofsky, R. (1988). Similarity, frequency, and category representations. Journal of Experimental Psychology: Learning, Memory, & Cognition, 14, 54–65. Nosofsky, R. M. (1992). Exemplars, prototypes, and simi-
07-23-97 09:50:12
jmla
AP: JML
267
CATEGORY USE AND CLASSIFICATION larity rules. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), From learning theory to connectionist theory: Essays in honor of William K. Estes (Vol. 1, pp. 149–167). Hillsdale, NJ: Erlbaum. Nosofsky, R. M., Palmeri, T. J., & MCKinley, S. C. (1994). Rule-plus-exception model of classification learning. Psychological Review, 101, 53–79. Osherson, D. N., Smith, E. E., Wilkie, O., Lopez, A., & Shafir, E. (1990). Category-based induction. Psychological Review, 97, 185–200. Reed, S. K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382–407. Rosch, E. (1978). Principles of categorization. In E. Rosch & B. Lloyd (Eds.), Cognition and categorization (pp. 27–48). Hillsdale, NJ: Erlbaum. Ross, B. H. (1989). Remindings in learning and instruction. In S. Vosniadou & A. Ortony (Eds.) Similarity and analogical reasoning. Cambridge: Cambridge Univ. Press. Ross, B. H. (1996). Category representations and the effects of interacting with instances. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1249–1265. Ross, B. H., & Makin, V. S. (in press). Prototype versus exemplar models. In R. J. Sternberg (Ed.), The nature of cognition. Cambridge, MA: MIT Books. Ross, B. H., & Murphy, G. L. (1996). Category-based predictions: The influence of uncertainty and feature associations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 736–753. Ross, B. H., Perkins, S. J., & Tenpenny, P. L. (1990). Reminding-based category learning. Cognitive Psychology, 22, 460–492. Ross, B. H., & Spalding, T. L. (1994) Concepts and categories. In R. Sternberg, (Ed.), Handbook of perception and cognition, vol. 12. Thinking and problem solving (pp. 119–148). San Diego, CA: Academic Press, Inc.
AID
JML 2515
/
a00d$$$$$1
Smith, E. E., & Medin, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard Univ. Press. Spalding, T. L., & Murphy, G. L. (1996). Effects of background knowledge on category construction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 525–538. Spalding, T. L., & Ross, B. H. (1994). Comparison-based learning: Effects of comparing instances during category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1251–1263. Tanaka, J. W., & Taylor, M. E. (1991). Categorization and expertise: Is the basic level in the eye of the beholder? Cognitive Psychology, 23, 457–482. Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: General, 121, 222–236. Ward, T. B., & Becker, A. H. (1992). Learning categories with and without trying: Does it make a difference? In B. Burns (Ed.), Percepts, concepts and categories (pp.451–491). Amsterdam: Elsevier Science. Wattenmaker, W. D. (1991). Learning modes, feature correlations, and memory-based categorization . Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 908–923. Wattenmaker, W. D., Dewey, G. I., Murphy, T. D., & Medin, D. L. (1986). Linear separability and concept learning: Context, relational properties, and concept naturalness. Cognitive Psychology, 18, 158–194. Wisniewski, E. J. (1995). Prior knowledge and functionally relevant features in concept learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 449–468. Yamauchi, T., & Markman, A. B. (1996). Categorylearning by inference and classification. Manuscript under review. (Received August 1, 1996) (Revision received December 2, 1996)
07-23-97 09:50:12
jmla
AP: JML