Assessing moral development

Assessing moral development

Vol. 10. pp. 347476. 1986 Printed in Greal Britain. All rightsreserved. 08834)355/86$0.00+.50 Copyright(~) 1986PergamonJournalsLtd. Int. J. Edu~. Re...

8MB Sizes 43 Downloads 182 Views

Vol. 10. pp. 347476. 1986 Printed in Greal Britain. All rightsreserved.

08834)355/86$0.00+.50 Copyright(~) 1986PergamonJournalsLtd.

Int. J. Edu~. Res.

ASSESSING MORAL DEVELOPMENT R.

MURRAY

THOMAS

U n i v e r s i t y o f C a l i f o r n i a , S a n t a B a r b a r a , C A 93106, U . S . A .

CONTENTS

CHAPTER 1

AN APPROACH TO SURVEYING ASSESSMENT METHODS A Theory of Moral Development The Organization of Chapter Contents Conclusion

349 350 355 361

CHAPTER2

I D E N T I F Y I N G T H E D O M A I N OF M O R A L Bases for Defining the D o m a i n of Moral Implications of Definitions for Selecting A s s e s s m e n t Techniques Implications of Definitions for Interpreting Results Conclusion

363 365 367 371 371

CHAPTER 3

FACTS, C O N C E P T S , A N D P R E S C R I P T I V E C A U S A L R E L A T I O N S Defining Types of Mental Contents Connections Between Concepts and Prescriptive Causal Relations Examples of A s s e s s m e n t Techniques Conclusion

373 373 375 376 392

CHAPTER 4

THE DETERMINANTS OF MORAL DEVELOPMENT A n Orientation to Some Issues of Causality Assessing for the True Causes in Moral Development Correlational Analyses Assessing Causal Attributions Conclusion

393 393 398 400 405 406

CHAPTER 5

A F F E C T IN M O R A L D E V E L O P M E N T Projective Techniques Semantic-Differential Tests Biometric Approaches Conclusion

407 407 411 412 412

CHAPTER 6

THE DIRECTION AND STRENGTH OF MORAL VALUES Interviews Values Tests and Inventories Academic A c h i e v e m e n t Tests Pictorial Materials Conclusion

415 416 417 419 420 421

CHAPTER 7

MENTAL PROCESSES AND STRATEGIES Psychoanalytic Inquiry Experimental Studies Critical-Life-Incident Inquiry Conclusion

423 423 427 432 432

CHAPTER 8

R E A L - L I F E A N D S I M U L A T E D E N V I R O N M E N T S : AS I N F L U E N C E S ON D E V E L O P M E N T A N D AS A S S E S S M E N T S T I M U L I Assessing the Effects of Different Environments

435 436

347

348

R. M U R R A Y T H O M A S

Assessment Techniques as Environments: Degrees of Realism and Levels of Participation Conclusion

440 451

CHAPTER 9

M O R A L CONSISTENCY The Relationship Between Thought and Action Consistency Across Time Conclusion

453 453 457 459

C H A P T E R 10

SOME C O N C L U S I O N S A N D P R E D I C T I O N S Future Directions in the Assessment of Moral Development Additional Persistent Methodological Problems Conclusion

461 461 467 469

REFERENCES APPENDIX

471 SUBJECT INDEX

475

CHAPTER 1

A N A P P R O A C H TO S U R V E Y I N G ASSESSMENT METHODS

Whereas the title of this monograph is Assessing Moral Development, the title could well be expanded to include Assessing the Outcomes of Moral Education, since the techniques reviewed in these pages can, in many cases, serve both of these closely interlinked purposes. To make this point clear, we can at the outset distinguish between moral development and moral education. Although the meanings of these two terms overlap, the meanings are not identical, even though the methods for assessing development and educational outcomes are often the same. The term moral development refers to changes in a person's moral reasoning and moral behavior with the passing time. I assume that these changes can be caused by a combination of hereditary and environmental influences. An example of hereditary influence is the internal maturation of the child's nervous system as determined by a genetically established time schedule. As a result of such maturation, the growing child's capacity to understand moral matters increases with the passing years. An example of environmental influence is the model of moral behavior that the child observes his or her parents display in the daily conduct of their lives. A further example is the series of lessons about moral and ethical behavior which a student follows in school. Moral education, as intended here, is a subcategory under moral development. Moral education is that part of environmental influences which is intentionally designed to alter people's ways of thinking and acting in moral-decision situations. Such education can be offered in the formal school program. Or it can be provided in the non-formal setting of a club for boys or girls. Or it may be of an informal type, such as the learning opportunities available through library books, magazine articles, or radio and television programs. All of the assessment techniques described in this m o n o g r a p h yield information about moral development. And theoretically, at least, all techniques could be used to assess the results of moral education. However, in the actual conduct of research and of education, the ways of judging moral-development and moral-education outcomes are usually not identical. This is because assessment methods to answer most research questions require more time, facilities, and types of expertise than teachers can muster during the routine conduct of schooling. A n d the evaluation methods that are practical for everyday use in the classroom are usually not sufficiently sophisticated to produce the kinds of data needed for answering complex questions that researchers often study. The nature of this distinction 349

350

R. MURRAY THOMAS

between assessment techniques appropriate for research and those appropriate for everyday use in moral-instruction programs will become more apparent as readers inspect the examples of tcchniques provided in subsequent chaptcrs. Although some writers differentiate between assessment and evahmlion~ throughout this monograph the two terms are used as synonyms. They can mean both (1) the process of judging how well something meets a particular standard and (2) a final conclusion about how well the standard has been reached. Thus, with the process in mind we may say, "The assessment of people's unconscious moral wdues is indeed a difficult task". When we intend assessment to mean a conclusion, wc may say, "Our assessment is that people's moral wdues are not inborn but, rather, result from the kinds of consequences individuals experience following their encounters with moral-decision incidents in their daily lives". A great host of evaluation questions can be asked about moral development and moral education. The diversity, of questions is suggested by the following examples: Which dcfinition of the field of moral development, as contrasted to other sorts of development, is the best': Which experiences in life qualify as moral cducation? What contribution is made to moral development by genetic factors as compared to cnviromncntal factors? What is the most desirable source of goals for a moral-education program? What techniques for measuring moral development arc the most accurate? Which of two moral-education programs has been the morc successful? What criteria should be used in selecting textbooks for a moral-education program'? And the listing of such examples can go on and on. Because the potential number of assessment questions is far too large to be treated in a single document, it is necessary at the beginning to limit the scope of the evaluation issues that will be analyzed in this monograph. The choice of questions I have selected to inspect has been guided by the sort of work my colleagues and I have been pursuing in recent years. Our work has consisted of devising an integrated theory of moral development and of creating educational techniques founded on the components of the thcory. Our efforts during this time have focused on both ( 1) reviewing evaluation techniques that others have used for assessing moral development and moral education and (2) crcating evaluation methods to fit our own models of development and education. Thus, the evaluation questions on which this monograph centers are ones relating to: (a) the most popular ewduation methods for moral development and education found in the professional literature of recent times: and (b) the evaluation methods wc have used in the pursuit of our own thcory of development and its resulting educational procedures.

A Theory of Moral Development Because the assessment techniques described throughout the monograph are organized into chapters whose titles derive from our integrated theory of moral development, it is appropriate at this point to sketch the main features of that model. However, preceding the issues derived from the theory is the more basic concern of what moral or morality means. In other words, what sorts of thought and overt behavior in life are properly regarded as being moral matters? Our concern with this question is occasioned by the fact that people interested in this field continue to debate about which aspects of life are suitably labeled moral aspects. Chapter 2 reviews diverse ways this question has been answered and proposes assessment implications that derive from various conceptions of what moral means. The subsequent seven chapters, 3 through 8, are organized according to elements of the in-

Assessing Moral Development

351

tegrated theory. Chapter 9 concerns issues of the consistency between moral thought and action and consistency over time. Chapter 10 closes the monograph with a summary and with suggestions about likely future directions in the field of assessing moral development. Now to the integrated theory. It is termed integrated in order to indicate that it has been constructed by a process of amalgamating components of a variety of theoretical models of mind and behavior prominent in 20th-century Western psychological thought. These models include: (1) information-processing theory; (2) radical behaviorism; (3) a sociallearning paradigm; (4) Piaget's structural theory; (5) Kohlberg's extension of Piaget's proposal; (6) psychoanalytic beliefs about levels of consciousness; (7) Lewinian decisionmaking theory; and (8) some of my own notions. The model is founded on a philosophical assumption of realism, on the conviction that there is a 'real world' of objects, people, and events outside of the individual. The facts, concepts, and processes that the person has 'in mind' represent that individual's estimate of what the outside world is like. Therefore, from the perspective of this theory, a key indicator of moral development is the individual's becoming ever more realistic. In other words, with the passing years a person's mental map approximates with increasing accuracy the conditions of the 'real world'. Consequently, we can expect that the more mature that people become regarding moral matters, (1) the less often they will be surprised by moral events they witness (because they are equipped to explain such events) and (2) the more accurately they will predict the outcomes of moral conflicts. A further inference following from this proposed link between the real world and a person's mental map is that people's moral behavior is significantly influenced by their estimate of what the real world is like. In effect, people act on the basis of what they think the world is like, rather than on the basis of what the world is really like. As a consequence, in order to estimate the level of a person's moral development, an important assessment task is that of determining the present configuration of the individual's cognitive map. One way of conceiving this task is suggested by the diagram of the model in Figure 1. The two major elements of the model are the person (mind or personality or memory) and the physical and social environments. The person component consists of: (1) contents of memory; (2) foundational influences that significantly determine what the contents will be; and (3) the person's sensory system, which is the route through which mind becomes aware of the environments, with the senses of most significance for moral development being sight, hearing, and touch. Let us now define the moral contents of memory. Facts are memories of specific events, people, and objects that the person has seen or heard or touched. They are the building blocks with which the mind's other contents are constructed. A concept is a characteristic shared by, or abstracted from, two or more facts or other concepts. Usually a label is applied to such abstractions so they are easy to store in memory and to communicate to other people. Typical moral concepts are altruism, exploitation, fair play, sin, goodness, and adultery. A variety of ways that researchers and educators have assessed people's moral concepts and facts are described in Chapter 3. Another type of mental content of crucial importance in moral development is labeled causal relations. A causal relation is an if~then belief that a person holds. In the review of assessment techniques in Chapters 3 and 4, I have distinguished between two types of causal-relation beliefs. The first is called prescriptive, for it represents a relationship that a person believes should obtain in life. It is a statement of the individual's ideals or moral principles. For example, a prescriptive causal relation proposed by the Chinese

352

R. M U R R A Y T H O M A S

The person ( m i n d )

Environments

I Natural I t Participant 1.2 Observer

I

Moral c o n t e n t s of memory Facts

Vision Hearing Touch

Concepts Causal relations

(descriptive/expLanatory) (prescriptive/principLes)

2 Arranged 2 I Stories 2.2 FiLms/Videotapes 2 5 Sociodramas 2 4 Puppetry

's, \

M e n t a l processes

\

Affect Vo Lues

F o u n d a t i o n a l influences on m e n t a l processes and c o n t e n t s Level o f cognitive m a t u r a t i o n Level of consciousness (conscious, unconscious)

Figure 1.

An integrated model of moral

development.

philosopher Confucius is: "If there are ways that you do not wish to be treated, then do not treat others in those ways". Or, as Jesus Christ voiced the same notion, "Do unto others as you would have them do unto you". Prescriptive causal relations, then, are convictions about the way people should act in moral situations. Methods of evaluating for the prescriptions or principles a person has in mind are considered in Chapter 3. In contrast to the prescriptive causal relations are those labeled descriptive in Figure 1. A descriptive or explanatory relation is an if/then conviction about what consequences actually will result in real life situations if certain conditions obtain. For instance, "If the police catch that girl stealing jewelry from the store, then she'll end up in jail'. Or, "If I give money to feed hungry people and then if sometime later I need help, some kind person will also probably help me". In our integrated theory, such descriptive causal relations are very important, because they represent the reality foundation on which a person bases decisions about how to act in moral situations. Techniques for assessing people's descriptive/ explanatory beliefs are presented in Chapter 4. A crucial problem involving such beliefs is that the beliefs may not be true. In other words, an individual's mental map may not accurately represent reality. So researchers have been interested not only in discovering what people believe causes the outcomes of moral situations, but also in discovering how faithfully such beliefs match the 'true' causal relations that obtain in the 'real world'. So Chapter 4 also includes assessment techniques that have been used in attempts to determine both the 'true' causal relationships in moral events as well as people's attributions of cause. The next sort of moral content of memory postulated in Figure 1 is called mentalprocesses. The term refers to patterns of thinking people employ when confronted with moral events and decisions. For instance, an estimate of one such process is pictured in Figure 2. It is represented as a circuit or a loop of steps that begin with the person's current mental contents interacting with a moral event in the environmenL The thought process consists of the following steps: (1) By means of the senses of sight and hearing, a schoolgirl's mind encounters an event

353

Assessing Moral Development

(2) (3) (4)

(5) (6) (7) (8)

in the environment that she interprets as involving morality. (A boy has copied answers from the girl's test paper, but when the teacher accuses him of cheating, the boy denies that he copied.) The girl's mind generates alternative actions she might take in this situation. From the causal-relations that she has in mind, the girl predicts what consequences will likely result from the various alternative actions that she could take. Weighing the advantages and disadvantages of the various expected consequences, the girl decides how to act in order to maximize the advantages and to minimize the disadvantages. She acts on her decision. Following her action, real-life consequences occur. The girl receives feedback from the consequences about how accurate her prediction at Step 3 has proven to be in terms of the actual consequences. This feedback then influences the moral contents of her mind. If her prediction was verified by the consequences, then the feedback strengthens the structure of the original contents (a process called assimilation in Piagetian theory). But if the prediction has proven to be in error, then the feedback alters the original contents by generating changes in the mind's pattern of facts and causal-relations (what Piagetian theory would label accommodation). The person (mind) Moral contents of memory ~ Environment

Facts Concepts Causal relations (descriptive/expLanatory) (prescr Jpt ive/principLes)

Vision

~

Hearing Touch

I Mind encounters an environment I

Mental processes Affects VaLues

2. Mind generates response alternatives ] 3 Mind predicts consequences of alternatives

I

4 Mind weighs advantages and disadvantage of response alternatives

I

5 Person applies the selected responseact

8. Feedback affects contents of mind

I

6 ReaL-Life consequences

occur

1

7. Person recewes f e e d b a c k of consequencesof act

Figure 2.

A hypothetical

.I decision-making

process.

354

R. MURRAY THOMAS

Theorists have proposed a variety of such processes or strategies that may be involved in moral thought and action. Assessment techniques used for analyzing the postulated processes are inspected in Chapter 7. The next type of memory content in Figure 1 is affects, meaning the feelings or emotions involved in moral thought and action. I am assuming that every moral encounter with the environment - - and every fact, concept, causal relation, and process that is linked to morality - - is accompanied by some sort of emotion, such as joy, sorrow, pride, shame, elation, anger, or the like. I am assuming, as well, that the emotion can be either weak or strong. In our integrated theory, emotions exert influence over moral thought and behavior, stimulating a person to act one way rather than another and affecting the accuracy of the individual's mental map of the real world. Methods of assessing emotional aspects of moral development are the concern of Chapter 5. Just as emotions are associated with mental contents, so also are values, with values meaning judgments about the goodness or badness, or about the propriety or impropriety of facts, concepts, causal relations, and processes. Actually, all of the chapters of this monograph necessarily concern values, at least implicitly. However, Chapter 6 differs from the others by concentrating on two features of values, their direction (positive or negative) and their strength. Chapter 8 concerns two aspects of environments. The first is the way environmental influences on moral development have been investigated in people's efforts to understand the ~real world' of moral development. The second is the way assessment techniques can be perceived as environmental stimuli intended to reveal the moral characteristics of mind or personality. That environments can be of many varieties is suggested by the terms listed on the left side of Figure 1. Environments can be ones which a person encounters in everyday life, with the individual being either a participant in the moral event or only an observer. Or environments can be fictitious, that is, arranged situations that involve moral issues, with such *environments' presented in the form of stories, motion pictures, television programs, newspaper articles, and the like. In summary, then, the model of moral development proposed in Figure 1 has provided the organizing scheme for Chapters 3 through 8. However, there yet remains in the model one section not yet discussed - - the foundational influences on mental processes and contents'. The two components of this section are: (1) level of cognitive maturation; and (2) levels of consciousness. There are no separate chapters in the monograph dedicated to these foundational components. Rather, both components receive attention at various points throughout the document. It may be useful, however, to describe the two at this juncture as a way of alerting readers to their nature. The phrase level o f cognitive maturation refers to the style of thinking that a person employs at a given stage of development. The most widely recognized illustrations of progressive stages of cognitive maturation in the realm of morality have perhaps been those furnished by such investigators as Piaget (1932/1948) and Kohlberg (1984). Throughout this monograph, occasional examples are offered of stages of moral thinking in childhood and adolescence, with the stages reflecting different degrees of cognitive maturation. The second postulated foundational influence, entitled levels o f consciousness, derives from the assumption that people are more aware of certain contents of their minds than they are of other contents. In other words, they can recognize and report some of their facts, concepts, causal relations, and mental processes but cannot recognize or report others. This distinction between conscious and unconscious contents of the mind is particu-

Assessing Moral Development

355

lady important from the viewpoint of assessment, because different evaluation techniques may be required to assess conscious contents than to assess unconscious ones. Thus, assessment issues related to the distinction between conscious and unconscious are accorded attention throughout the following chapters. The organization for the monograph, as described above, may seem to imply that researchers can devise assessment techniques that separate out and measure quite distinctly the individual components of mind suggested in Figure 1. But such is not the case. The hypothetical components are so intimately enmeshed that when a researcher devises a method for investigating one component, other components are involved as well. Thus, in the following chapters, an assessment technique described as a measure of a given component will at the same time be influenced to some degree by the operation of other components. As a result, all of the assessment techniques furnish approximate rather than pure measures of separate components. So much for the integrated model as a scheme for organizing the subject matter of Chapters 3 through 8. We turn now to the final pair of chapters. In their research on moral development, investigators over the years have focused attention on two sorts of consistency: (1) the relationship of moral thought to moral action; and (2) consistency of thought and action over time (from childhood to adolescence to adulthood). Ways of assessing these varieties of consistency form the subject matter of Chapter 9. To close the monograph, Chapter 10 reviews conclusions from the preceding chapters and speculates about likely future trends in assessing moral development and the outcomes of moral education.

The Organization of Chapter Contents Within each chapter, 3 through 9, the contents are organized by types of evaluation methods and materials. For most of the types, information is usually offered about: (1) the purpose which that type is expected to achieve or the learning goals for which that type is expected to assess; (2) the characteristics of the evaluation method and its materials; (3) the sorts of people with whom the method has been used; (4) advantages and disadvantages of the method; and, in many instances, (5) some illustrative results derived from the use of the method. If readers are to understand clearly what to expect at the fourth of these steps, they need some explanation of the sorts of criteria that will be used for identifying advantages and disadvantages of the assessment techniques described in these pages. The purpose of the following paragraphs is to indentify five such c r i t e r i a - - f o u n d a t i o n a l assumptions, reliability, validity, suitability for particular populations, and feasibility. In the survey of assessment techniques in the following chapters, not all of the criteria are applied to the appraisal of every technique. In other words, certain of the criteria will be mentioned in the appraisal of one assessment approach, whereas others will be mentioned in the appraisal of another. There are three reasons that I have not systematically applied all criteria in appraising all techniques. First, empirical evidence in the form of statistical data are not available for all techniques. For example, correlations reflecting reliability and validity levels have not been reported for all of the assessment methods that are reviewed. One reason for this is

356

R. MURRAY THOMAS

that a level of reliability or validity can be computed for a specific assessment instrument, such as Kohlberg's (1984) Moral Judgment Interview or Rest's (Rest et al., 1974) Defining Issues Test, as these instruments have been used with a particular sample of subjects. However, reliability and validity statistics cannot appropriately be reported for a general type of assessment technique, such as achievement tests in general or moral-judgment rating scales in general. Second, in critiques of assessment approaches that appear in the professional literature, not all of the criteria have been considered equally important, so that some critics have stressed one criterion over others when appraising a given assessment method. Third, a systematic, detailed application of each criterion for appraising each method would take up so much space that, in order to keep the m o n o g r a p h within the publisher's allocated page limits, it would have been necessary to eliminate many of the assessment methods now found in the following chapters. As a consequence, in describing strengths and weaknesses of a given assessment method, I have selected those criteria which: (1) I believe are particulary important for judging the worth of that particular technique; and (2) for which readily c o m p r e h e n d e d statistical evidence is available. Now to the five criteria.

Foundational Assumptions The term foundational assumption, as intended in these pages, refers to convictions on which an assessment method has been based. Some assumptions are widely accepted as reasonable beliefs by nearly everyone working in the field of moral development. Other assumptions, however, are problematic, accepted by some people but questioned or rejected by others. To illustrate, one c o m m o n assumption underlying published tests for assessing college students' values is that students will all assign essentially the same meaning to the wording of test items, so it is then reasonable for a researcher to draw an accurate conclusion about the entire group by summarizing the individual responses. This assumption, while clearly accepted as reasonable by some researchers, has been seriously questioned by others ( D a m o n , 1977, pp. 18-19, 52-53). Even more problematic is an assumption underlying a c o m m o n psychoanalytic approach to assessing people's moral convictions. When a psychoanalytic investigator uses a client's dreams as a source of information about the person's moral values, the investigator assumes that the manifest content of the dreams is only a symbolic r e p r e s e n t a t i o n - - a kind of mask - - of the values that are hidden in the client's unconscious mind. If the analyst is faithful to classical Freudian tradition, he will further assume that projectile-like objects in the client's dream are symbolic of the male sex organ and that container-like objects are symbolic of the female. The dream content is then interpreted in keeping with these assumptions about symbolic meanings. But the Freudian assumption about the meaning of projectiles and containers is far less widely accepted than is the assumption about how well college students agree on the meaning of items on tests of values. Thus, I am proposing as one criterion for judging the advantages and disadvantages of evaluation techniques reviewed in Chapters 3 through 9: "The fewer the n u m b e r of questionable assumptions on which an assessment technique is founded, the better".

Assessing Moral Development

357

Reliability In the fields of testing and of behavior rating, the term reliability traditionally has meant with the consistency usually being one of four types: test-retest, split-half, equivalent-form, and interrater. Test-retest reliability is determined by administering the same assessment technique to a group of people on two occasions. Then the degree of correlation is computed between the individuals' success on the first occasion and their success on the second, with the resuiting n u m b e r often referred to as a coefficient of stability. The higher the correlation, the more stable or consistent over time the assessment technique is judged to be. Obviously, conclusions about the consistency of a test's characteristics are most trustworthy when very little time has elapsed between the first and second occasions. In effect, the shorter the time between the two administrations of the assessment technique, the more likely that correlations between the two events will reflect a level of consistency due to characteristics of the assessment instrument itself. Likewise, the longer the time between the two administrations, the greater the chance that the correlation will be reduced, not by the test's instability, but rather, by changes that occurred between the first and second testing in the skills or attitudes of the people involved. Split-half reliability refers to the internal consistency of a measuring instrument. In the case of a test, such reliability is usually determined by administering the instrument to a group of people on a single occasion, then computing the degree of correlation between one half of the test and the other half. Sometimes such reliability is computed by comparing the first half of the test with the second half. In other cases, it is determined by computing the correlation between the odd-numbered items and the even-numbered ones. Equivalent-form reliability refers to how comparable two versions of the same assessment technique are. Does Form 1 of a test measure the same thing as Form 2? Such comparability is determined by administering both forms of the device to the same group of people, then computing the degree of correlation between the people's success on Form 1 and their success on F o r m 2. The resulting n u m b e r has been called a coefficient of equivalence. If the correlation is very high, then the two forms are considered to be equivalent measures of the same knowledge, skills, or attitudes. If the correlation is only moderate or low, then the two are judged not to be interchangeable measures of the same phenomenon. A m o n g the factors that influence the reliability of a measuring device, such as an attitude scale, are the n u m b e r of items or statements comprising the device and the homogeneity or heterogeneity of the target population being studied. Interrater reliability refers to how well two or more judges agree on how to score a set of test answers or how to rate behavior they have observed. If several judges achieve a high level of agreement, then the objectivity of the assessment method can be trusted more than if they achieve only a low level of agreement. From the foregoing considerations we can identify the followlng criterion to use in judging assessment methods throughout the monograph: "The more convincing the evidence about the m e t h o d ' s reliability, the better."

consistency,

Validity The term

validity

has traditionally been used by evaluators to mean the extent to which

358

R. MURRAY THOMAS

an assessment technique truly evaluates what it is supposed to evaluate. A thorough discussion of the complex issues involved in validity would require far more space than we can dedicate to the matter here. Thus, our discussion is limited to identifying several key concepts that will prove useful in our appraisal of assessment methods examined in subsequent chapters of the monograph. The crucial initial step in estimating the validity of an assessment method is that of defining what it is the method is supposed to be measuring. Simply saying that an evaluation technique is designed to measure moral development or a person's moral status is far too vague. As will be illustrated in Chapter 2, the term moral means different things to different people, so it is necessary to identify which of these meanings is intended in the case at hand. Furthermore, as suggested by components of our integrated theory of moral development described above, there are various aspects of moral development to be evaluated, so it becomes necessary to define which of these aspects is being assessed by a given appraisal technique. For example, is the technique supposed to be assessing for people's moral concepts, their causal relations, their emotional reactions, their thought processes, or what? Once the object of the assessment has been precisely defined, the next step is that of deciding which sort of validity bcst suits one's needs. Of the different forms of validity that are popularly used, the five identified by the adjectives face, content, construct, concurrent, and predictive seem most appropriate to the assessment methods reviewed in the monograph. Face validity refers to whether, when we inspect an evaluation technique, it seems logical to conclude that the technique does indeed measure what it purports to measure. If a researcher is seeking to test children's conceptions of the terms lying, cheating, and stealing, do the researcher's test items appear suitable for measuring such conceptions? Content validity is a refined, more technical form of face validity. Whereas face validity results from a rather informal judgment of an assessment device, estimating content validity requires a precise definition of the content universe involved and a detailed description of how the content universe is to be sampled in order to develop the evaluation questions that compose the assessment technique. For instance, assume that we wish to create a test for discovering what conditions influence people's judgments about which situations in life involve morality. In other words, we want to know whether a definition of moral should include consideration of such conditions as the age of the people in an incident, their knowledge of right from wrong, their ethnic status, their marital status, and others. Our task in establishing content validity for the test, then, consists of our first defining specifically the entire set of conditions that we believe can influence the definition of moral. In doing so, we are obligated to explain the rationale underlying the way we have identified the conditions. Finally, we are obliged to tell what system we are using to sample the domain of conditions so that all aspects of the domain are represented in the test we are preparing. Construct validity is the same as content validity when the contents of the assessment method have been created to reflect the structure of a theoretical proposal. The process of establishing the construct validity of an evaluation technique begins with a researcher defining a theoretical construct. A n example of a theoretical construct is the concept prescriptive causal relations as proposed in our integrated theory of moral development. First we define this construct, then create an assessment approach, such as test questions or an interview schedule, intended to reveal people's status in terms of the construct. Our test

Assessing Moral Development

359

might consist of brief stories for children to analyze, with our stories intended to reveal the sorts of prescriptive causal relations that children have in mind. If critics wish to estimate our test's construct validity, they can do so by examining each test item to determine whether it logically represents a true measure of children's causal relations. Although such logical analysis is a proper first step in estimating the construct validity of an assessment technique, it is desirable next to furnish empirical evidence that the estimate is sound. This second step is in keeping with Messick's (1975) definition of construct validation as "the process of marshalling evidence in the form of theoretically relevant empirical relations to support the inference that 'a test score or other evaluation result' has a particular meaning". The purpose of marshalling empirical evidence is to avoid what Tate (1985, p. 1803) has identified as threats to validity: Threats to construct validity would consist of any reasons for an (assessment) operation either underrepresenting all of the aspects of a construct or overrepresenting a construct by including irrelevant aspects. For example, since it can be argued that a construct is seldom precisely represented by a single operation, a 'mono-operation bias' may result unless multiple operations are used to define (the construct that is being assessed). In other words, in order to test the conviction that a construct in the realm of moral development is accurately reflected in the assessment approach being used, it is desirable to employ more than one assessment technique that furnish alternative views of the same theoretical p h e n o m e n o n . Then the degree of correlation among the several measures is computed. A high level of correlation among the results derived by the several operations strengthens the belief that the construct is being accurately assessed. A high degree of correlation also strengthens the conviction that any one of the several assessment operations is a valid measure of the construct. In short, the logical analysis of how well an assessment technique represents the components of a theoretical construct is profitably bolstered by the statistical analysis of data derived from assessing the construct from more than one perspective. Throughout the monograph, it will become apparent that content and construct validity are the most c o m m o n forms of validity on which confidence in the various assessment methods must depend. There are, however, two further forms of validity that some researchers cite in support of their assessment a p p r o a c h e s - - concurrent and predictive validity. The concurrent validity of an assessment technique is determined by comparing (1) the results obtained by the technique with (2) the results of some other method of evaluation that has been applied at nearly the same time. For instance, assume that we create several stories involving moral decision situations, and we tell the stories to pupils who are then asked how they would resolve the moral dilemmas contained in the stories. At about this same time we also place the pupils in real-life moral-decision situations in the schoolroom and on the playground, and we observe the decisions they actually make. Finally, we compare the pupils' answers to the story dilemmas with their decisibns in the real-life situations to determine how similar their responses have been between the fictional and real-life conditions. In this case, we are using the real-life events as the criterion measure, that is, as the evidence of the children's 'true' or 'real' convictions about moral decisions. The closer the agreement between pupil's story responses and their behavior in the real-life situations, the greater faith we have that the stories are valid indicators of children's real-life moral behavior. Oftentimes people who develop an evaluation method, such as a written test or inter-

360

R. MURRAY THOMAS

view system, seek to establish concurrent validity for their instrument by calculating the correlation between scores on their instrument with scores obtained by another assessment approach whose construct validity they assume has already been established. For example, as noted in Chapter 3, Gibbs and coworkers (1982) created a group-administered test of moral reasoning and then gave both their test and Kohlberg's interview schedule to 55 people. When the researchers c o m p a r e d the developmental stage achieved by each respondent on the test with the stage achieved on the interview, they learned that the stage was the same in 75 percent of the cases. The remaining 25 percent showed a discrepancy of only one stage-level between the test and interview. Thus, the researchers concluded that their test measured substantially the same theoretical construct as did Kohlberg's interview system. As this example illustrates, concurrent validity that depends on a comparison between two modes of assessment actually yields the coefficient of equivalence described above in our discussion of reliability. In both instances, what the investigator is doing is estimating the degree to which two modes of assessments yield the same results. Predictive validity is similar to concurrent validity, except for a time lapse between the administration of the assessment device and the collection of information from the criterion measure. The technique for estimating predictive validity is essentially the same as for determining concurrent validity. For instance, we may create a test intended to measure primary-school children's potentiality for engaging in heavy use of illicit drugs by the time they have reached mid-adolescence. To determine the test's predictive validity, we administer it to group of eight-year-olds, then wait eight years, at which time we observe the present extent of drug a m o n g the same individuals who took the test eight years earlier. This report on the extent of drug use is the criterion measure we will use for determining the validity of the test. In effect, by comparing the individuals' answers on the test at age 8 with their behavior at age 16, we determine how accurately the test predicts adolescents' drug use. In judging either concurrent or predictive validity, researchers typically express their findings in terms of some generally understood correlational statistic reflecting the relationship between the assessment method and the criterion measure. From the foregoing discussion we derive a further criterion to apply in estimating the advantages and disadvantages of assessment techniques: "The m o r e pertinent and convincing the evidence about how validly a technique assesses for the characteristics it is intended to measure, the better the technique."

Suitability for Various Populations It seems obvious that a particular assessment technique will not necessarily be suited equally well to all types of people whose moral development is being studied. Young children who are being interviewed about their moral decisions typically cannot understand complex issues nor explain their ideas as well as can adolescents and adults. Likewise, incidents that are thought to involve moral issues in one culture may not be considered moral matters in another. Thus, a further consideration in appraising the worth of assessment techniques is reflected in the question: H o w suitable is this assessment method for use with people other than those included in the investigator's group? In other words, "Is the technique equally appropriate for people of other age levels, of other cultures, of different levels of ability, or the like?".

Assessing Moral Development

361

Therefore, comments about the how suitable a given assessment technique will be for different sorts of people are included in certain of my observations about the advantages and disadvantages of the evaluation methods reviewed in the following chapters.

Feasibility Feasibility, in the present context, means how easily an assessment technique can be applied in practical situations. Consequently, feasibility is determined by such considerations as the equipment or instruments that the assessment method requires, the skills needed by the evaluator, the skills required of the people being assessed, the time needed for administering the method, and the complexity of the process of analyzing the results. Stated as a criterion, this appraisal dimension becomes: " O t h e r things being equal, an as: sessment technique is better if it requires simple equipment, simple skills on the part of both the evaluators and the subjects, and a simple process of interpreting the results." Other Criteria A variety of other criteria, in addition to the foregoing types, could be applied as well in the appraisal of assessment methods. For example, Damon (1977) has proposed than an adequate method for discovering the principles on which a person bases moral decisions is one which reveals the degree of stability, consistency, and spontaneity of those principles. By stability he means that "organizing principles are relatively more resistant to external influence than are other aspects of knowledge, such as specific beliefs or opinions". Consistency means "that an organizing principle permeates a range o f . . . both verbal and active manifestations of knowledge, both in theoretical and real-life contexts". Spontaneity refers to organizing principles that are "self-constructed, deeply held, and need no prodding or external stimulation to encourage their application" (Damon, 1977, pp. 31-32). I believe that in relation to the model or moral development proposed earlier in this chapter, these three criteria could properly be applied in appraising techniques for assessing an individual's concepts (principles, causal relations), mental processes, affect, or values. However, I have not used the three extensively throughout the monograph because rarely have the researchers reported information about the extent to which their investigative techniques have met such standards. Thus, I have referred to characteristics of stability, consistency, and spontaneity only in a few instances in which evidence of their presence or absence has been particularly apparent.

Conclusion The purpose of Chapter I has been to explain the system used in the following chapters for displaying the diverse assessment methods reviewed throughout the monograph. Hence, I have described the way the assessment methods are organized in terms of an "integrated theory of moral development". Although the theory represents only one of many ways of displaying the assessment techniques, this particular model was chosen because its structure contains components that can be represented as chapters which conveniently ac-

362

R. MURRAY THOMAS

c o m m o d a t e a multiplicity of assessment approaches found in published studies of moral development and moral education. In closing, I wish to note once more that the intent of the m o n o g r a p h is to illustrate an array of methods of assessing different facets of moral development, chiefly as such methods have been described in the professional literature. The purpose is to introduce potential investigators in the fields of moral development and moral education to the nature of commonly used assessment techniques. Included are comments about advantages and shortcomings of the techniques in terms of the evaluation criteria identified in this chapter. H o w e v e r , the m o n o g r a p h does not analyze the appropriateness, strengths, and weaknesses of different statistical methods that can be used in studies of moral development. Readers interested in descriptions of statistical methodology and research designs are referred to books by Hays (1981), Johnson & Wichern (1982), and Marascuilo & Levin (1983).

CHAPTER 2

I D E N T I F Y I N G T H E D O M A I N OF M O R A L

A perennial problem in the field of moral development is that not everyone defines the domain of moral the same way. For example, Piaget (1932/1948, p. 1) wrote that "All morality consists in a system of rules, and the essence of all morality is to be sought for in the respect which the individual acquires for these rules". Kohlberg (1976) proposed that the moral domain is one concerning issues of justice. Kohlberg has contended that at different age levels, childrens' and adolescents' moral judgments are founded on different conceptions of justice, with the highest and most worthy level of justice based on principles of universal equal rights among all people and the overriding value of human life. Siegel (1982) also sees a sense of justice, which he terms a sense of fairness, as the culmination of children's general moral development. This sense, he suggests, "involves an ability to consider consistently and without contradiction the interests and intentions of others: to act bearing these in mind and without the guidance of a superior authority and to generalize fully this behavior in all relevant situations" (Siegel, 1982, p. 2). In Siegel's opinion, a sense of fairness is not simply an intuitive or emotional sense of right but, rather, is a conscious, intentional rationale for action. "Fairness is a rationale attribute not reducible to mechanistic processes. Actions taken without cognition cannot be considered moral" (Siegel, 1982, p. 5). While many agree with Kohlberg's view that morality is essentially concerned with justice, some critics of Kohlberg's methods of assessing moral development claim that he has included issues which other writers regard as beyond matters of justice, such as questions of authority and social convention. For example, in D a m o n ' s (1980, p. 44) opinion, questions of justice are moral matters but questions of authority are mainly pragramatic concerns, "since the crux of the authority relation is deciding how (and whether) obedience best serves the needs of the self. Beyond enlightened self-interest, there is rio social or moral basis for subservience to another (although there may occasionally be justice problems that arise in an authority relation, for example, when the rights of an employee may conflict with unreasonable demands made by an employer)". In a similar fashion, Turiel has contended that such investigations as Piaget's and Kohlberg's studies have improperly mixed moral concepts with social-convention concepts. To Turiel (1980, pp. 71-72), moral issues are limited to ones that concern justice, meaning that the consequences of acts can cause physical or psychological harm to others, violate people's rights, or affect the general welfare. Social conventions, on the other 363

364

R. MURRAY THOMAS

hand, "constitute shared knowledge of uniformities in social interactions and are determined by the social system in which they are formed", with examples being conventional modes of dress, forms of greeting, and rules of games. In Turiel's view, social conventions can be changed by consent of the m e m b e r s of the group without seriously affecting people's welfare, whereas moral precepts cannot. In contrast, Maccoby (1968, p. 229), from a social-learning viewpoint, has differed with D a m o n and Turiel by broadly describing moral values as beliefs "shared in a social group about what is good or right". Likewise, she has described moral action as "behavior a group defines as good or right and for which the social group administers social sanctions". As a consequence, under Maccoby's definition, both D a m o n ' s pragmatic interests and Turiel's social conventions could qualify along with justice as matters of morality, so long as the m e m b e r s of the group agreed. The viewpoint espoused by Maccoby has been called cultural relativism or social relativism in contrast to the universality standard represented by Kohlberg's proposal that his definition of morality as equal-handed justice is valid for all social groups, no matter what the personal notions of morality may be among thc group members. A n o t h e r relativistic position, other than that of cultural relativism, is individualistic relativism, which accords each person the right to define the moral domain in whatever way suits his or her preferences. A still different view of morality is expressed by Gilligan (1982), who has argued that men and w o m e n base their conceptions of moral matters on different assumptions deriving from the major separate roles males and females have traditionally been assigned in their societies. She proposes that in virtually all cultures w o m e n ' s central activity has been in child care, whereas men have been charged with administering the society and its productive facilities. These assignments have left females in a dependent, subservient role to men. As a consequence, according to Gilligan (1982, p. 19), a woman's conception of morality that arises from caring for others "centers around the understanding of responsibility and relationships, just as the [male's] conception of morality as fairness ties moral development to the understanding of rights and rules". Some theorists equate the moral with the ethical, whereas others, such as Erikson (1964, p. 222), distinguish one from the other: I would propose that we consider moral rules of conduct to be based on a fear of threats" to be forestalled. These may be outer threats of a b a n d o n m e n t , punishment and public exposure, or a threatening inner sense of guilt, or shame or of isolation. In either case, the rationale for obeying a rule may not be too clear. It is the threat that counts. In contrast, I would consider ethical rules to be based on ideals to be striven for with a high degree of rational assent and with a ready consent to a formulated good, a definition of perfection, and some promise of self-realization. As the foregoing examples illustrate, the realm of moral matters is not defined in the same way by everyone who investigates moral development. From the standpoint of accessing such development, there are at least two ways that issues of definition are significant. First, if we are to evaluate the worth of various definitions, we need to know on what evidence or line of logic the advocates of different definitions have based their convictions. Second, when we accept a given definition, we need to recognize what implications the definition holds for: (1) the sorts of assessment techniques that can properly be used in evaluating moral development and moral-education outcomes; and (2) the viewpoint we

Assessing Moral Development

365

adopt when interpreting what the results of the assessment mean. The following discussion centers on these matters.

Bases for Defining the D o m a i n of Moral Two key questions about the proper way to identify the boundaries of the moral domain are: " W h o is best qualified to define the domain?" and "What is the most appropriate means for producing and supporting the definition?". For most studies of moral development and for most programs of moral education, the answer to the first question has been "Those who are designing the program are best qualified to define the domain or at least to decide which available definition is the most suitable". For answering the second question, the three most c o m m o n means have been: (1) by logical analysis founded on observations of society; (2) by adopting a definition proposed by a respected authority; and (3) by historical example, which consists of following religious, legal, or philosophical traditions from the past, such as adopting the Ten C o m m a n d ments from the Bible or deriving precepts of behavior from the Analects of Confucius. Sometimes such means are combined, with a scholar supporting his definition by both a line of logic and an appeal to tradition and authorities. For instance, Piaget (1932/1948, p. 1) defended his 'system-of-rules' conception of morality by contending that his definition was congruent with "the reflective analysis of Kant, the sociology of D u r k h e i m , or the individualistic psychology of Bovet". When the foregoing ways of supporting a definition are used, how can we assess which definition is best? Apparently it is by how reasonable the author's argument appears to be and by how thoroughly we trust the cited authorities and traditions. For example, do we trust religious doctrine that is reported as divine revelation more than we trust notions derived from empirical research? In an effort to furnish some historical insight into this issue, Kurtines and Gewirtz (1984, pp. 4-6) have proposed that the rise of modern science is responsible for m o r e and more thinkers in the Western world rejecting an objectivistic in favor of adopting a relativistic view of morality. We argue that from its very beginnings in classical antiquity through the Middle Ages, Western intellectual history was dominated by an objectivistic orientation (i.e. a belief in the possibility of obtaining 'absolute', 'objective', or 'certain' knowledge), in both an espistemological and a moral sense. We further argue that during the modern age this objectivistic orientation has been overshadowed increasingly by a relativistic orientat i o n . . . ; scientific knowledge explicitly rejects the possibility of obtaining absolute, objective, or certain knowledge [so that] contemporary moral thinking tends no longer to be so much concerned with the justification of a particular set of universal or objective moral standards as with the question of whether such standards can be justified at all . . . . Even though relativism is the trend of the day, Kurtines and Gewirtz observe that there continue to be modern investigators who subscribe to an absolute or objectivistic notion of morality, with Kohlberg and his conception of the primacy of human life as an example. In keeping with the efforts of present-day investigators to cast their definitions of morality in convincing scientific form, some of them buttress their rationales with empirical evidence. For example, Turiel refers to studies that he and Nucci conducted in which subjects

366

R. MURRAY THOMAS

who ranged from around age 4 to age 19 were asked about acts and rules, some of which the researchers considered moral and others they considered social-conventional. When preschoolers were questioned about social-convention events, 81 percent said the act would be all right if there were no school rule about it, whereas in the case of moral events, 86 percent said the act would not be right even if no school bore on it. Similar results were reported for judgments of rules by older children and adolescents, causing Turiel (1980, pp. 79-80) to conclude that: "Taken together, the findings from these studies on social rules demonstrate that the distinction between morality and convention is made across developmental levels. It appears, therefore, that concepts of social convention and moral concepts develop in parallel fashion". In effect, Turiel has proposed that the moral and social-convention domains are separate, but the correlation in the development of the two has caused other researchers erroneously to assume that both sorts of concepts are properly contained within the domain moral. In such a manner, then, Turiel (1978) and Nucci (Nucci & Nucci, 1982) have used statistical analyses of empirical studies to support such a conception of morality. (A description of assessment methods used in Turiel's and Nucci's investigations is found in Chapter 8 of this monograph.) As in the case of Turiel's and Nucci's definition of morality, Kohlberg has cited results of empirical studies in support of his contention that children's conceptions of moral evolve: (1) from beliefs based on immediate self interest (Kohlberg's preconventional stages 1 and 2); (2) through convictions stressing obedience to authority (such as parents or the law of the land, representing the conventional stages 3 and 4); and (3) to beliefs founded on principles that transcend both the individual's self interests and society's rules by considering universal rights and the value of life itself to be most important (principled stages 5 and 6). (See Chapter 3 for a more detailed account of the stages.) Kohlberg's studies suggest that when people of different age levels are presented moral dilemmas in story form, the moral reasoning in which they engage reflects an age progression that generally parallels the foregoing ascending levels of his conceptual system. He has concluded that such results validate his proposal that the highest stage in his system - - that of evenhanded justice transcending immediate self interest and society's l a w s - - is naturally or inherently the ultimate, truest conception of morality. However, such critics as Henry have charged that, from the viewpoint of logical analysis, one cannot so easily pass from is (types of reasoning displayed by average people at different age levels) to ought (the contention that the conception of morality that the majority of older groups adopt is the best one which everyone necessarily should adopt). In pointing out serious logical difficulties in Kohlberg's interpretation of his data, Henry (1983, pp. 1-19) has suggested that Kohlberg's stages are not differentiated by one stage representing a higher, more comprehensive form of morality than another. Instead, the stages differ principally in the source of authority to which they appeal, and one of these sources is inherently no better than another. Which authority is judged better is not a matter of fact to bc established by empirical evidence of age changes in moral reasoning but, rather, it is a matter of: (1) the value system to which an individual chooses to subscribe; and (2) the resulting source of authority the individual chooses to honor. For example, the source of moral authority at the conventional-level stages 3 and 4 is ascribed to legitimate authorities (Henry, 1983, p. 18): The individual's family, group or nation are the source of moral prescriptions, and reasons given for the rightness or wrongness of solutions to Kohlberg's moral dilemmas are based on the extent to which those solutions conform to and maintain the personal expectations of others - - in particular the expectations of the individual's family group

Assessing Moral Development

367

(stage 3), and the social order or roles, rules, duties and regulations of institutions (stage

4). In effect, there continues to be serious debate about the sorts of evidence that are appropriate for determining which definition of moral is the most acceptable. And different definitions carry different implications regarding what techniques will be considered suitable for assessing moral development. This matter of implications is the issue to which we next turn.

Implications of Definitions for Selecting Assessment Techniques Even though a particular definition of moral can hold implications for both the content and form of an assessment method, definitions typically imply more about the content than form. Consider, for example, Piaget's (1932/1948, p. 1) stipulating that "All morality consists in a system of rules, and the essence of all morality is to be sought for in the respect which the individual acquires for these rules". The content for assessing people's moral-development status from Piaget's perspective will necessarily concern only people's conceptions of, and level of respect for, rules. It will not include such matters as compassion or empathy, as would Gilligan's (1982) conception of morality among women as emphasizing nurturing or caring. In the case of Turiel (1980, pp. 71-72), for whom morality means inherent justice but not social conventions (rules of games and social ettiquette), the sorts of issues that can properly represent the content of an assessment method will be far more restricted than the issues that Piaget's definition would allow. In sum, if we were to create an interview schedule for investigating children's and youths' moral development, the content of our interview questions would differ significantly if we adopted Piaget's conception of the domain of moral rather than Turiel's or Gilligan's. This point is illustrated in the diverse contents of the sample evaluation techniques included in the following chapters. In contrast to content, the form that the assessment assumes (interview, written test, observation of behavior) is usually not clearly determined by the definition of the moral domain. For example, Kohlberg's notion of moral development (a person's progressing toward a principled code of justice) has been assessed in the form of both individual interviews and group-administered tests (Chapter 3). Likewise, more than one sort of assessment technique can be used for investigating development from a psychoanalytic view of morality, which focuses on the development of a conscience (Freud's superego) that includes both conscious and unconscious components. From a psychoanalytic perspective, not only can assessment take the form of a psychoanalytic interview, but such projective techniques as the Rorschach Test or Thematic Apperception Test can be used as well. For most definitions of moral, the task of devising assessment techniques consists of two steps: (1) describing the behaviors or thought processes encompassed within the defined domain; and (2) creating assessment items or questions that sample the domain. For instance, for Piaget's definition, the task involves identifying the realms of life that are governed by rules, then creating assessment questions or situations which sample the application of rules in those realms. For Siegel's definition of morality as "fairness governed by conscious intention", the task requires that the aspects of life commonly affected by decisions of fairness be identified, and then assessment items created that focus on those aspects. However, when a cultural-relativist definition of moral, such as Maccoby's, is adopted,

368

R. MURRAY THOMAS

then before the content of the assessment techniques can be specified a preparatory datagathering step is required to delineate the realm of situations that will qualify as moral. That is, a cultural-relativist definition proposes that moral values consist of beliefs "shared in a social group about what is good or right". From this perspective, we cannot know what events compose the moral domain for a given social group without studying that group's beliefs. In effect, the task of specifically defining the domain involves conducting a survey of what people think moral means. Two evaluation problems faced by researchers who adopt this approach are those of deciding: (1) how to select the respondents whose opinions will be sought; and (2) how to design the method for collecting their opinions. The nature of these problems and of related ones can be illustrated by our inspecting an approach for producing an operational definition of morality within a particular society.

A Method]br Studying the People's Definition of Moral As Chapter 3 will illustrate, studies of moral development have typically involved an investigator asking respondents to make judgments about moral decisions that are depicted in anecdotes describing moral situations. An investigator's selection of anecdotes is based on the researcher's opinion about what sorts of incidents in life reflect moral values. If we adopt a cultural-relevance conception of moral, then this selection should closely conform to the ideas of moral that are widely held among the people being studied. One method for operationally defining moral from the perspective of m e m b e r s of a particular cultural group is the approach devised in a study of conceptions of moral a m o n g subgroups of American and Haitian youths (Thomas et al., 1984). Two key assumptions underlie this assessment technique. First is the belief that a more precise description of people's definition of moral can be gained if they are asked to judge specific life-like incidents than if they are simply asked to state their definition of moral. In effect, when the incidents respondents judge are planned in a way that will reveal their patterns of belief, the concepts underlying their conceptions of moral can be inferred from the pattern of answers to the series of incidents. The second assumption is that, with few exceptions, people's moral principles in practice are neither simple nor absolute, such as "Always tell the truth". Instead, their principles are influenced by conditions of the situation in which the moral judgment is applied. For example, the principle in practice may be "Always tell the truth, unless telling the truth would do greater harm to worthy people than could result from the truth being withheld". This assumption implies that understanding a person's definition of moral requires understanding the conditions under which his or her moral principles apply. In keeping with these assumptions, an evaluation instrument can be created to consist of a series of anecdotes designed to show: (1) the types of acts or situations that people might regard as involving morality (killing, inflicting injury, taking property, engaging in sexual behavior, destroying property, lying, and others); and (2) conditions under which such acts constitute moral issues as contrasted to conditions under which they are not regarded as moral matters. An inspection of laws, judicial cases, mores, and the teaching of ethics in various societies enables a researcher to identify such acts and conditions. In the study of American and Haitian youths' conceptions of moral, the investigators identified four categories of conditions that might influence a person's definition of a situation as involving morality (Thomas et al., 1984, pp. 169-170). In the following list, the

Assessing Moral Development

369

actor is the principal person whose actions are described in an anecdote, the recipient is the individual or group directly affected by the actor's behavior, and consequences are results experienced by both actors and recipients. (1) Actor's Characteristics: Age, knowledge of right and wrong, intention, degree of selfcontrol, awareness of consequences, sex, ethnic status, religious affiliation, position in an authority hierarchy, socioeconomic status, marital status. (2) Recipient's Characteristics: Species (human, animal, or supernatural - - such as a god), age, awareness of consequences, sex, ethnic status, religious affiliation, position in an authority hierarchy, socioeconomic status, marital status. (3) Characteristics of Consequences: Type of consequences (physical, psychological, social, monetary, and others), degree of consequences (major or minor), probability of consequences occurring. (4) Legal Status of the Act. Although the above list of conditions fails to exhaust those affecting people's judgments, it does illustrate numbers of the conditions commonly found in a variety of cultural settings. The foregoing sorts of acts and conditions were built into the assessment instrument used for collecting people's operational definitions of moral. This was done by the investigators creating pairs of anecdotes which differed from each other only in a single relevant condition. For example, in the following pair, the age of the actor is the altered condition: (1) When the two-year-old girl's mother was out of the room, the child carried her mother's purse outside and offered it to a friend who was walking by. (2) The high-school girl reached into the bag in which her mother kept money and took out five dollars, which she later offered to her boyfriend. A series of 25 pairs of anecdotes representing various acts and conditions made up a questionnaire that respondents completed. Respondents told whether an anecdote: (1) involved a moral matter (yes); (2) did not involve a moral matter (no); or (3) it was indeterminate (the respondent could not decide). The pattern of a respondent's answers for each pair of incidents was interpreted as implying something about how the respondent defined moral. For example, if a person answered yes to either one or both of the incidents, then the respondent considered the act (giving away another's property) a moral matter, at least under certain conditions. If the respondent answered yes to one of the pair and no to the other, then apparently age (in the example above) was an important condition for that respondent. If the answer was no for both incidents, then giving away property (at least giving away relatively small amounts of a parent's funds) was not a matter of morality. (In the above example, age is likely a surrogate variable, representing the respondent's estimate of the actor's 'knowledge of right and wrong' or 'intention'.) There are several ways such an approach to collecting people's operational definitions of morality can be used. First, the results collected from a sample of a population can determine what sorts of acts and conditions compose the definition of moral for most people in the population. In other words, the results produce a culturally relevant definition of moral issues, thereby identifying what types of moral-decision anecdotes will be suitable to use in studying moral issues in that population. For instance, in a comparison of the answers of a sample of American and Haitian high-school students on a 32-item instrument of the above variety, a majority (75 percent or more) of students in both groups agreed that the following sorts of acts were moral matters: engaging in premarital and extramarital sexual relations, lying to one's relatives, falsifying data on homework assignments, taking prop-

370

R. MURRAY THOMAS

erty without permission, and maliciously damaging harmless animals and plants. However while the great majority of Haitians (86 percent) believed it was a moral matter for someone to continue using tobacco when he knew it could damage his health, less than 50 percent of the American students considered it a moral issue. And while 73 percent of American students believed that separating children in school on the basis of their language background would be a moral matter, only 46 percent of Haitians judged it so. Thus, if a particular percentage of respondents is chosen to distinguish between moral and non-moral issues in a society (such as 75 percent or 60 percent or 50 percent), the moral incidents falling above that score can form the operational definition of morality for the social group represented by the sample of respondents. A further use of the paired-anecdotes instrument is for determining a group's internal consistency in defining which acts and conditions constitute moral matters. The statistic here is the average intragroup agreement. It is determined by computing for each item on the instrument the number of students who answered "Yes, this incident reflects a moral matter" and then determining how far this figure deviates from 50 percent. The resulting deviation figures for the total number of items are then summed and a mean computed to represent the average intragroup deviation from 50 percent. So the closer the judgments of a group come to a 50/50 split over the question of whether an incident involves morality, the greater the disagreement within the group. The farther the split of opinion deviates from 50/50, the greater the intragroup agreement. When this average intragroup agreement was computed for the Haitian and American students, the Americans (with a mean deviation from 50 percent of 26.3) displayed slightly greater intragroup agreement than did the Haitians (22.9). A degree-of-certainty can also be determined from respondents' answers. This is accomplished by comparing the proportion of 'not sure' answers to the proportion of definite 'yes' and 'no' answers among the group's responses. For example, 27 percent of the Americans gave "not sure" answers compared to 18 percent of the Haitians, suggesting that the Haitians were more certain of their judgments than the US students. The paired-anecdote approach can be applied not only in comparisons between groups but also as a means of evaluating change within a group. The instrument can serve as a pretest prior to an instructional unit on moral issues, then again as a posttest following the instruction so as to determine what effect the instruction apparently exerted on the group or on selected individuals within the group. It is apparent that the paired-anecdote instrument can also be used with individuals to aid them in understanding their own operational conceptions of moral or else to compare one person's pattern of beliefs with another's as a stimulant for a discussion of the reasons one person may subscribe to a different definition than does another. However, the use of the foregoing approach to defining the domain of moral is accompanied by several problems. One is a sampling question: What is the best source of anecdotes to use with a given cultural group to determine most accurately the group's conceptions of moral matters? A second problem is caused by the large number of possible permutations of acts and conditions, which poses a further question about sampling: Which of the many alternatives should be selected to produce an instrument that is short enough to be answered in a reasonable length of time by respondents. A further problem is faced in making clear to respondents the exact nature of the task they are to perform. That is, they are not to judge whether the actors in the incident did the right or wrong thing. Rather, they are only to decide whether the anecdote depicts a situation in which questions of

Assessing Moral Development

371

moral right and wrong are at stake. It is also important to recognize that cross-cultural comparisons always entail risk of misinterpretation. One obviously important variable is the language in which the respondents' opinions are collected. Translating phrases from one language into another can result in distortions of the original concepts. Furthermore, the meaning of the data-gathering situation may differ from one society to another, since people in some societies are more accustomed to taking tests and being interviewed than are people in other societies. People whose opinions are often sought by researchers may react differently than do people who rarely have such an experience. Likewise, cultures differ in the attitudes that people are expected to display in authority relationships - - relationships between adult and child, expert and novice, tester and testee. In some societies, youths are expected to express only opinions that will please the authority who is collecting the information. In others, youths are more often expected to express their true beliefs, whether or not these views are identical with those held by the person gathering their data. As a consequence, when evaluators interpret cross-cultural data, it is important that they consider the possible influence that cultural factors may have exerted on their results.

Implications of Definitions for Interpreting Results In regard to the effect of a definition of moral development on the interpretation of assessment results, perhaps the most obvious implication is that conclusions about a person's moral development should be limited to those aspects of life contained within the specific definition. For instance, when someone falls short of the ideal performance on measures of moral development, a question can be asked about who is to blame. Is that individual really inadequate, or is there something about the original definition of morality that necessarily but unjustly identifies the person as less satisfactory than other people or than some ideal standard? The issue here can be illustrated with the charge leveled by Gilligan that Kohlberg's proposal of a hierarchy of moral-development stages unjustly pictures females as less developed than males in moral reasoning. Whereas Kohlberg's stages have been founded on a conviction that moral issues are ones concerning equal rights, Gilligan contends that women generally conceive of morality as involving issues of caring and responsibility. "This different construction of the moral problem by women may be seen as the critical reason for their failure to develop within the constraints of Kohlberg's system", a system originally constructed from studies of males' patterns of reasoning about moraldecision incidents (Gilligan, 1982, p. 19).

Conclusion The purpose of Chapter 2 has been to illustrate a variety of ways researchers have defined the domain of moral and to note: (1) how different investigators rationalize or defend their definitions; and (2) implications different definitions hold for the tasks of creating methods of assessing moral development and of interpreting the results of assessments. Further illustrations of such implications are found in subsequent chapters.

CHAPTER 3

FACTS, CONCEPTS, A N D PRESCRIPTIVE CAUSAL RELATIONS

In Chapter 1 the diagram of the estimated moral characteristics of the mind included the following c o n t e n t s - - f a c t s , concepts, causal relations, values, affects, and mental processes. In life's moral-decision situations, these components are obviously intimately entwined. None operates alone. They are simply different facets of a unified mental act that is involved in every incident of moral thought. Therefore, if we try to isolate any one of them so as to analyze it separately, we do a measure of violence to the role of the others and to the interaction of the entire complex of components. However, despite the risk we run of over-simplification, for our present purposes it is still convenient to separate the components for analysis, since some assessment techniques are more suitable for evaluating one c o m p o n e n t than for evaluating another. Hence, in Chapters 3 through 7, the several components are considered individually so that evaluation methods suited to each of them can be readily examined. A n d as a preparation for thus displaying them individually, we begin Chapter 3 by defining each and by implying something about how they might function together to produce moral thought and behavior.

Defining Types of Mental Contents

Facts are defined as specific people, places, objects, or events related to moral situations. A specific person could be 'my mother' or 'Carl Johnson', a place might be 'the thirdfloor corridor in school' or 'cell 9 in the county jail', an object could be ' m y brown purse' or 'the picture of Cousin G e o r g e ' , and an event might be 'the occasion of A n n ' s confirmation at the church' or 'the 100-meter dash at last Saturday's track meet'. A concept is a characteristic c o m m o n to, or abstracted from, a variety of facts or other concepts, with a label attached to the concept so that it can be concisely communicated in speech and writing and can be readily manipulated during thought. The label is typically a single word or a short phrase. Illustrative concepts are animals, justice, fairplay, transportation, antisocial behavior, weekend, altruism, obedience, language, and exploitation. Generalizations are statements that may either define a concept or describe relationships among concepts. A generalization defining the term animals could be "Any living 373

374

R. MURRAY THOMAS

organism typically capable of moving about but not capable of making its own food photosynthesis". As a second example, a generalization defining justice could be "Ensuring that everyone is treated equally under the law, with no special favor accorded anyone on the basis of his or her ethnic, age, sex, national-origin, educational, political, or socioeconomic status". A generalization describing antisocial behavior might be "Actions that endanger the welfare of the group". Examples of generalizations describing relationships among the concepts could be: "The air is thinner at high altitudes than at low altitudes" and "The incidence of sexual assault is greater in economically depressed urban areas than in economically prosperous ones". A value is a person's belief about the goodness/badness, desirability/undesirability, or propriety/impropriety of a fact, concept, or generalization. In effect, a value conveys either a positive or negative judgment. Consider the generalization "'Take from the rich and give to the poor". Robin Hood assigned a positive value to this statement, whereas Scrooge, in the early pages of Dickens' Christmas Carol, assigned a negative value to it. Values also vary in degree. For instance, a person can express either vehement support for a generalization or can merely accord it weak endorsement. Likewise, a person's opposition to a generalization can be strong, moderate, or weak. Throughout these pages, I am assuming that any fact, concept, or generalization that does not have a positive or negative value assigned to it does not qualify as a moral matter because, as noted in Chapter 2, the essence of morality is the goodness or badness of people or events. Hence, when we apply this criterion to the concepts listed above, we can likely agree that those qualifying as moral concepts include justice, antisocial behavior, fair play, altruism and exploitation. We also might agree that the non-moral words in the list are animals', transportation, weekend, and language. The remaining word, obedience, though perhaps debatable, would likely be considered a moral term by most people. It is also the case that for some people, but not for others, the labels attached to certain concepts imply both an objective denotation and a value connotation. For some people, the word kill is not value laden but murder is. Likewise, for some the term sexual intercourse would not imply an underlying moral value - - that is, a guide to behavior - - whereas adultery and rape would carry such an implication. As for judging which of the above-listed generalizations are moral matters, we could perhaps agree that the definition of animals and the statement about altitude's relation to air density are amoral or non-amoral. In contrast, we would likely concur that the generalizations about justice, antisocial behavior, and sexual assault are matters of morality. A particular kind of generalization of central concern in assessments of moral development is the causal relation. Every causal relation is an if~then belief or statement. For our purpose of reviewing methods of assessment, we shall distinguish two subcategories of causal relations. The first is labeled descriptive or explanatory. This type reflects a person's answer to such questions as "Why does she so often sacrifice her own welfare on the behalf of others?". Or "What causes him to break the law all the time?". In other words, the explanatory variety represents a person's beliefs about the 'true' causes of moral development. Methods of assessing the nature and validity of people's explanatory causal convictions are treated in Chapter 4. The second variety of causal relation is called prescriptive. We shall also refer to this type as a moral principle. For our purposes, a moral principle is defined as "a generalization describing an if~then relationship to which a value judgment is attached so that the generalization serves as a guide for moral decision-making and behavior". Such a generali-

Assessing Moral Development

375

zation consists of both a description of conditions (the if portion) and an attached value which enables one to prescribe (the then portion) what sort of behavior should or should not result under given conditions. Examples of such principles include "Murderers should be put in prison for life" and "She deserves a lot of praise for being so honest". Beliefs of this sort do not necessarily reflect an individual's conception of reality - - of what life is really like - - but, instead, they reflect the person's values, ideals, and moral standards. Such convictions are important because they serve as the basis for our learning: (1) what consequences an individual may seek to impose on others for their behavior; and (2) whether the individual believes justice has been done in the consequences resulting from his own or others' acts. Or the value portion may not be quite so strong as the verb should implies. It may indicate only what behavior is permissible under the specified conditions. The earlier example of Robin H o o d ' s belief is a prescriptive causal relation or moral principle. From Robin H o o d ' s viewpoint, if people are rich, he should (or at least is permitted to) rob them. If people are poor, he should (or at least may) give them goods. The assessment of prescriptive causal-relation beliefs is treated later in the present chapter. A m o n g the moral contents of the mind, the term affect refers to emotions or feelings that are attached to moral facts, concepts, and causal-relations. Typical words used for identifying such feelings include guilt, pride, fear, contentment, shame, and hatred. The final component of mind is labeled mentalprocesses. It refers to the steps of reasoning or the pattern of thinking a person employs in achieving moral development or in making moral decisions. The foregoing terms, then, identify the topics to be considered in Chapters 3 through 7. The present chapter is dedicated chiefly to methods of assessing people's moral concepts and prescriptive causal-relations (principles). Ways of assessing people's knowledge of facts concerning moral matters are considered briefly near the close of the chapter. Chapter 4 concerns descriptive or explanatory causal-relations. Chapter 5 focuses on affective aspects of moral development, Chapter 6 on values, and Chapter 7 on mental processes. We turn now to the main topic of Chapter 3 - - ways of evaluating people's moral concepts, principles, and facts.

Connections Between Concepts and Prescriptive Causal Relations The reason moral concepts and prescriptive generalizations are considered together in the following pages is because of the methods used for assessing one of them are so often used to assess for the other as well. This point can be illustrated by the typical assessment interview used with children in studies of moral-development. In one version of this method, an interviewer describes to a child a moral-decision incident in the form of a story, but stops the narrative at the crisis point when a character in the incident must decide which course of action to take. The interviewer then asks the child "What should that person do?". The story, then, has presented the conditions (the if portion), and the child is expected to prescribe the appropriate behavior (the then portion). Baier (1965, pp. 40-50) has called the process involved here deliberative reasoning. That is, when facing a moral decision, a person first deliberates about the moral values involved and, on the basis of this deliberation, then offers a decision. When this unfinished-story approach is used, the interviewer usually is not satisfied only to hear what the child recommends (the if/then prescription) but also asks why the child believed the recommendation was a good one (the

376

R. MURRAY THOMAS

underlying moral concepts as reasons for the decision). So the interviewer asks a follow-up question: "Why should he do that?" or "Why do you think that would be a good thing to do?". The key element of the child's answer is the nature of the moral concepts that undergird his or her prescription. For example, has the child grounded his or her prescription on a moral concept of retributory justice (an eye for an eye) or on the concept of hedonism (get pleasure, avoid pain) or on some other concept? In another c o m m o n form of the assessment interview, a moral-decision is described to the child in its entirety. Not only is the conflict situation depicted, but the narrator also tells what decision the characters in the anecdote reached as they sought to resolve the conflict. In this case, the interviewer asks the child whether the character did the right of wrong thing. After hearing the child's answer, the interviewer asks why the child took such a position. The child's subsequent defense of his answer is what Baier (1965, pp. 40-50) has called justificatory reasoning. The line of logic the child offers should reveal the moral concepts he believes justify his appraisal of the story character's behavior. In other words, after a moral incident has occurred, the individual cites reasons he believes would be proper and sufficient cause for the behavior of the character in the incident. "The girl was right in not telling the teacher the truth, because if she told the teacher that her friend had copied the h o m e w o r k , then the girl would not have been a faithful friend." Or if the child had disapproved of the behavior in the story, his supporting reasons reveal which of his moral values he thinks were violated by the story's central character. Jusitificatory reasons may relflect either an individual's actual moral values or, in some cases, a rationalization. By rationalization I mean an attempt to defend an act by citing a reason that is not the person's true reason but, instead, is a proposed cause that the individual believes is socially more acceptable than his real reason. In summary, then, prescriptive and justificatory reasons are reciprocals, with the justificatory variety simply being an ex-post-facto prescriptive judgment. Both types reveal people's concepts and principles that determine what they believe should be done in moral-decision situations. Now, with this discussion of relations between moral concepts and prescriptive (and justificatory) causal relations as a backdrop, we turn to methods of assessing the development of such components of mind. The following presentation is organized according to types of evaluation methods, with each type illustrated by examples from one or more research study. My purpose in the folllowing pages is not to offer a definitive survey of all studies but, rather, to present only enough examples to represent the different forms that assessment methods have assumed. W h e n e v e r a given method has been used by many researchers, I have given preference to examples drawn from the work of those researchers who either were pioneers in the development of the approach or else whose work has been widely known or has been particularly controversial. For instance, because such researchers as Jean Piaget and Lawrence Kohlberg nicely fit these criteria, their contributions figure prominently in our examples.

Examples of Assessment Techniques The assessment methods reviewed in the remainder of this chapter appear under two major divisions: (1) interviews requiring oral responses; and (2) techniques requiring written responses. The interview methods include: (1.1) asking interviewees for definitions;

Assessing Moral Development

377

(1.2) posing unfinished anecdotes in the form of moral dilemmas; (1.3) posing single finished anecdotes; (1.4) posing comparative sets of finished anecdotes; (1.5) offering interviewees a choice among alternative solutions to the problem of morality; and (1.6) asking interviewees to demonstrate and explain some phenomenon. The techniques requiring written responses include: (2.1) recognition or evaluation tests; (2.2) production tests; and (2.3) academic-achievement tests.

Individual Interviews 1.1 A s k i n g f o r D e f i n i t i o n s

The purpose of this technique is to learn what meanings interviewees attach to such common moral terms as lying and stealing with the interviewees expected to offer the meanings in their own words rather than to select meanings from among alternatives presented by the researcher. The aim, in effect, is to discover the nature of people's moral concepts, and particularly to learn how such concepts change with advancing age. Research by the Swiss child psychologist, Jean Piaget, illustrates this technique. In the late 1920s Piaget included lying among the moral phenomena he studied. His objectives were to learn: (1) how children define a lie, especially whether children understand that telling a lie is an attempt to avoid or hide the truth; and (2) what responsibility children assume as a function of the lie's content and its material consequences. His evaluation technique was a form of the clinical m e t h o d he had devised as the investigative approach in nearly all of his studies. The method involves orally posing a standard introductory question to the child, then following up with further questions intended to clarify the meanings behind the child's initial response. In other words, the sorts of questions and comments offered by the investigator following the introductory standard question are spontaneously created by the investigator in reaction to the child's answer. Piaget defended this deviation from a single, standard set of questions by contending that all children do not interpret a given query in the same way. As a consequence, it is desirable for the interviewer to adjust follow-up, probing questions to the particular child's apparent style of thought. In the study of children's definitions of a lie, Piaget's interviewers began with the standard question "What is a lie?" or "Do you know what a lie is?". The researcher's subsequent remarks were then adjusted to the child's answer to this first question, as illustrated in the following interview with a seven-year-old (Piaget, 1932/1948, p. 137): Do you know what a lie is? - - It's w h e n y o u lie. - - What is lying? - - It's s a y i n g n a u g h t y words. - - When do people tell lies? - - W h e n y o u say s o m e t h i n g that isn't true. - - Is lie the same thing as a naughty word? - - N o , it's n o t the s a m e thing. - - Why not? - - T h e y ' r e n o t like each other. - - Why did you tell me that a lie is a naughty word? - - I t h o u g h t it was the s a m e thing.

An advantage of such an inquiry approach is that it enables the researcher to accommodate the questioning to the child's apparent background and style of thought. A disadvantage is that when the probing questions asked of one child differ from those asked of others, there is doubt about whether the researcher can validly compare one child's answers with another's or properly summarize answers for a group of children. Conclusions about the validity of this Piagetian approach have depended chiefly upon

378

R. MURRAY THOMAS

how accurately we believe the child has understood the interviewer's questions and on how truthfully we believe the child's answers reflect the child's actual beliefs. In other words, faith in the methodology has depended mainly on the face validity or content validity of the approach. After using this method to gather Swiss children's definitions of lying, Piaget concluded that before age seven or so, children equate lying with saying something naughty, that is, with saying something forbidden by adults. He also found that children ages five to seven could distinguish between an intentional act and unintentional error, but they still applied the word 'lie' to each of these sorts of acts. Not until age 10 or 11 did most children accurately define lying as intentionally giving a false statement (Piaget, 1948, pp. 139-144). 1.2 Posing Unfinished Anecdotes An unfinished anecdote is typically a very brief story story told up to the point at which a character in the anecdote must make a moral decision. The narrator stops the story at this juncture and asks the interviewee a question about how the character's dilemma should be resolved: "What should the girl do now?" or "What would you do in this situation'?". After the interviewee responds, the researcher usually seeks to learn the moral concepts or rationale underlying the initial answer by asking: "Why should she do that?" or "Why would that be the best solution?". Subsequently the researcher may ask further questions either to make sure that he has correctly grasped the respondent's meaning or else to urge the respondent to resolve apparent inconsistencies in her answers. As an assessment device, the unfinished anecdote has several advantages over simply asking a person to define a moral concept or principle. First, the anecdote focuses the respondent's attention on a life-like event that includes specified conditions which might affect decisions about the moral issue involved in the case. Such conditions could be the age, education, or economic status of the characters in the story. In constrast to using anecdotes, the technique of asking a person only for an abstract definition of a concept may fail to reveal how such conditions might influence his application of the concept in life-like situations. Furthermore, younger children who are unable to formulate an abstract definition of a moral concept may, when they hear a story, be able to suggest how to resolve the moral dilemma posed in the story and, in doing so, may reveal the moral principle on which their solution has been based. The following examples illustrate unfinished anecdotes used by Kohlberg and by Piaget.

Kohlberg's moral dilemmas. An American psychologist, Lawrence Kohlberg, in 1958 wrote his doctoral dissertation at the University of Chicago on "The Development of Modes of Thinking and Choices in Years 10 to 16". The document reported the results obtained from interviewing young adolescents about what solutions they would suggest for dilemmas represented in nine hypothetical moral-decision incidents. Over the three decades since Kohlberg launched this initial study, the same set of incidents has also been used with other samples of respondents for investigating a variety of moral-development issues. Kohlberg's unfinished anecdotes concern (Kohlberg, 1984, pp. 640-651): (1) A husband (Heinz) who cannot afford an expensive drug to treat his wife's cancer, so he must decide whether to steal the drug from the pharmacy. (2) A police officer who has seen Heinz running from the pharmacy, and later the officer learns that the drug had been stolen, so he must decide whether to report Heinz to

Assessing Moral Development

379

the authorities. A son (Joe) who has earned money to go to camp, but his father asks for the money so the father can go on a fishing trip; thus Joe must decide whether to give his father the money. (4) A woman in intense pain from terminal cancer who urges her physician (Dr Jefferson) to give her a fatal dose of morphine, so the physician must decide whether to fulfill this request. (5) Dr Jefferson does administer the morphine and the woman dies. Another physician (Dr Rogers) passing by at the time witnesses the incident and must decide whether to report Dr Jefferson to the authorities. (6) The mother of a teenage girl (Judy) who has told her daughter to buy clothes with money the girl had earned babysitting, but the girl uses part of the money to attend a rock concert and then lies to her mother about where she had been. Now Judy's sister must decide whether to tell their mother what Judy did. (7) A military unit during a war. If the unit is to retreat safely, one man must remain behind to blow up a bridge, but in doing so he will risk certain death. Thus, the officer in charge of the unit must decide whether to order one of his men to blow up the bridge or to do it himself. (8) A poor man (Valjean), who stole food and medicine for his sister and brother, was arrested and imprisoned, then escaped and, under another name, became a very successful, law-abiding businessman. Years later an early acquaintance recognizes him as Valjean, so the acquaintance must decide whether to report Valjean to the police. (9) Two brothers who want to get $1,000 each before they sneak out of town. One brother gets the money by stealing it, and the other by feigning illness and begging an elderly man to 'lend' him $1,000 for an operation. Which of the brothers did the worse thing? (It is apparent that all but the last incident are unfinished anecdotes. The ninth is of the comparative-finished-anecdotes variety discussed under 1.4 below.) In their recent use, the nine incidents have been arranged in three clusters representing parallel forms of Kohlberg's inquiry technique (Form A = 1, 2, 3. Form B = 4, 5, 6. Form C = 7, 8, 9). In the suggested mode for conducting the inquiry, interviewers are not simply left to their own devices to formulate follow-up questions. As an inquiry aid, Kohlberg has furnished a specific questioning sequence for each of the anecdotes. Under each incident except the ninth, the first question is about what the main character's proper choice should be: "Should Heinz steal the drug?" or "Should Dr Rogers report Dr Jefferson?". And the second question, "Why or why not?", is intended to reveal the moral concept underlying the interviewee's decision. Further questions are then proposed to clarify in more detail particular aspects of the individual's moral reasoning. For instance, "Suppose the person dying is not his wife but a stranger. Should Heinz steal the drug for the stranger?. Why or why not?". The principal way Kohlberg has used the results of his interviews has been for identifying levels of development in moral reasoning. When people have been asked to tell why they have offered their particular solution for a moral dilemma, their answers have been interpreted by Kohlberg as reflecting a particular underlying conception of morality that can be assigned to an identifiable level on a moral-reasoning hierarchy. Kohlberg's hierarchy originally was conceived as consisting of six stages, but in recent years he has stated that the sixth, uppermost stage has not been identified in empirical studies (Colby et al., 1983). (3)

380

R. MURRAY THOMAS

Thus, in terms of people's moral-reasoning in the practical world, as reflected in responses to the moral-dilemma anecdotes collected in various societies, there appear to be five stages. Kohlberg has proposed that children's development advances from the lowest toward the highest levels. However, not everyone reaches the highest levels. In Kohlberg's view, a person may become fixated at a particular level as a result of his cognitive limitations and the moral principles displayed by others in his social environment. The five stages, plus the sixth original one, can be summarized as follows, with the six stages subsumed under three major levels (adapted from Kohlberg, 1976, p. 171): 1 Preconventional level (prernoral level). A person follows society's rules of right and wrong but in terms of the physical or hedonistic consequences (punishment, reward, exchange of favors) and in view of the power and authority of the one who imposes the rules. Stage 1: Punishment and obedience orientation. Whether an action is good or bad depends on whether it results in punishment or reward. Stage 2: Naive instrumental orientation. Proper action satisfies one's own needs and occasionally others' needs. Fairness involves "If you need me, I'll help you", but not out of loyalty, gratitude, or justice. 11 Conventional level. A person conforms to the expectations of family, group, or nation and supports the existing social order. Stage 3: Good-boy, nice-girl orientation. A person acts in ways that help or please others and are approved by them. O n e ' s intentions become important ("she meant well"). Stage 4: Law-and-order orientation. A person is doing right when he does his duty, obeys the law, and respects authority. II1 Postconventional, principled, or autonomous level. A person tries to identify valid universal moral values, regardless of what authority or group subscribes to those values. Stage 5: Social-contract orientation. Moral behavior is defined in terms of general individual rights to which the whole society has given its consent. Personal values are recognized as being relative, so there are procedures for reaching consensus and changing laws for social-utility reasons. Stage 6: Universal-ethical-principle orientation. A person's moral judgments are based on universal principles of justice, equality of human rights, and respect for humans as individuals. Right is defined by one's conscience in accord with carefully chosen ethical convictions. An important assessment problem faced by Kohlberg and his colleagues has been that of devising a reliable scoring system for assigning a respondent's answers to one of the foregoing stages. In the early years of the use of the methodology, critics found fault with the amount of subjective judgment required for scoring answers through use of what Kohlberg called ~'clinical issue rating" (Kohlberg, 1984, p. 219). Thus, in an effort to increase the objectivity of scoring, in recent times Kohlberg and his colleagues have refined the method for determining that a respondent's answer properly reflects one of the five stages or 'viewpoints' represented in his hierarchy of moral development (Colby & Kohlberg, 1984). The refined Standard-Issue Scoring system for the Moral Judgment Interview, when applied to moral judgment data from Kohlberg's 20-year longitudinal study, yielded high test-retest, parallel form, and interrater reliability (Colby et al., 1983). Yet the scoring system still requires an amount of personal judgment about what the respondent apparently meant, and in this sense the system falls within the E u r o p e a n tradition of

Assessing Moral Development

381

hermeneutic interpretation. The purpose of Kohlberg's interview technique, which includes probing queries that follow an initial standard question, has been the same as Piaget's clinical method - - to grasp clearly the mode of thought and point of view from which a respondent makes a moral judgment. The investigator tries to look at the world through the subject's eyes. So in scoring responses, the researcher estimates the sorts of underlying moral concepts or assumptions that must be held by the respondent in order to make his responses sensible or logical from his perspective. Kohlberg (1984, p. 220), in explaining his approach to scoring, has accepted H a b e r m a s ' (1983, p. 259) description of the position of the hermeneutic interpreter: Only to the extent that the interpreter grasps the reasons that allow the author's utterance (e.g. a moral judgment) to appear as rational does he understand what the author could have meant. An interpreter can elucidate the meaning of an opaque expression only be explaining how this opacity arises, that is, by explaining why the reasons which the author might have been able to give in his context are no longer acceptable to us. In some sense all interpretations are rational interpretations, an interpreter cannot but appeal to standards of rationality which he himself has adopted as binding for all parties including the author and his contemporaries. Such as appeal to presumably universal standards of rationality is, of course, no proof of the soundness of that presupposition. But it should be sufficient reason to at least look at theories or metahermeneutical analyses which focus on the conditions for validity of normative (or moral) expressions. Because Kohlberg's intention has not been to identify 'correct' individual answers but, rather, to identify the thought structure or pattern underlying a series of responses, he states that the test scorer must have a good knowledge of the theory and "must function in a quasiclinical manner, going from a response to postulated underlying structures and then testing the inference to particular structures by looking at additional response material" (Kohlberg, 1984, p. 402). Kohlberg (1984, pp. 412-415) has reported high test-retest reliability (above r = 0.90 for 58 subjects, with all subjects' social-maturity scores falling within one-third stage of each other on the two testings) and high alternate-form reliability (r = 0.95 for social-maturity scores between forms A and B). Kohlberg's original data were collected in 1956; then most of the same subjects were tested again every four years until 1977. Analyses of the resulting longitudinal data have been interpreted by the author as supporting his proposal that his set of moral-reasoning stages is indeed hierarchical and that the sequence of m o v e m e n t from one stage to another is invariant, that is, "no stage will be omitted as development proceeds" (Kohlberg, 1984, p. 422).

Piaget's studies of tale-telling. Informing or tattling typically consists of passing information to those in authority regarding someone's censurable behavior. In child society, it usually involves a child telling an adult - - such as a teacher, parent, club leader, or police ofricer - - about another child's violating a rule or custom. As Piaget (1932/1948, pp. 288289) saw it, the ethical question from the viewpoint of the child becomes: "Is it right to break the solidarity that holds between children in favour of adult authority?". Or, when we cast the issue in a form not limited to child/adult relationships, it becomes: " U n d e r what conditions should one person inform on another?". I have placed tattling under principles rather than simple concepts because it involves a conflict between principles, such as between be loyal to peers and obey law and authority. The studies reviewed below illustrate a variety of assessment techniques used for investigating developmental changes in the

382

R. MURRAY THOMAS

principles people hold regarding such matters. To investigate developmental changes in chidren's convictions about tattling on their peers, Piaget told children a story about a father who had two sons, both 'good boys', but one of them was consistently obedient whereas the other often did silly things. So when the father went off on a t~ip, he assigned the obedient son to watch his brother and, when the father returned, to report any silliness that the brother had displayed. At this point the story ended and the experimenter asked the children what the obedient boy should do upon the father's return - - tell on his brother or not. The assessment technique, in effect, consisted of carrying the story up to its c r i s i s - - to the decision p o i n t - - and asking the children what the proper decision should be and why. The investigator then followed up each child's answer with the types of probing questions that marked Piaget's general clinical method, questions that took their cues from each child's particular response. Typical follow-up questions were: "Was it fair to tell on his brother? .... Would you tell on your brother? .... If the boy did not tell his father, would that be a lie'?". The advantages and disadvantages of this approach are the same as those discussed earlier in relation to Piaget's interview technique in the study of children's beliefs about lying and stealing. As for developmental trends, Piaget found that nearly 9(/percent of his Swiss children ages six and seven believed that the obedient boy should inform on his brother, whereas the majority of those beyond age eight believed he should not, and some of them would even lie to protect the brother. Thus, with the advance of age, the principle of obedience to authority gave way in most cases to the principle of loyalty to peers.

I. 3 Posing Comparative Sets of Anecdotes This approach consists of the interviewer describing two or more moral incidents, then asking the interviewee to make a comparative judgment about the incidents on the basis of a criterion the researcher suggests. The researcher's intention is to learn the nature of the person's moral concepts and process of reasoning. The following example is from the work of Piaget.

Piaget's studies of lying. To investigate children's notions of the content and consequences of lies, Piaget told pairs of very brief stories to a child, two stories at a time. In each pair, one story contained an obvious departure from fact, but with no evil intention on the part of the person who made the statement. The other story was an actual lie, an untruth told with the intention to deceive. To make sure the child understood the pair of incidents, the investigator had the child repeat their essence. In particular, the researcher wished to learn whether the child had grasped the intentions involved. This was done by asking, "Why did the boy say that?". Then the adult asked the child to compare the two tales in terms of which boy was the naughtier and why. In testing children between ages 6 and 12, Piaget observed that young children (average age of seven) did not base their judgments of naughtiness on whether a person intended to deceive but, rather, on how greatly the statement departed from the facts. For instance, a boy who said he saw a dog as big as a cow was judged to be a more of a liar than one who falsely told his mother that his teacher said he was very good. Piaget labeled such reasoning objective responsibility. In contrast, older children (average age of I0) more often considered the child who sought to deceive to be the more serious liar, evidencing a form of reasoning Piaget called subjective respon-

Assessing Moral Development

383

sibility (Piaget, 1932/1948, p. 152). Piaget used the same approach in studying children's opinions of stealing and arrived at similar conclusions about children's developmental stages - - that prior to age eight or nine, children base their judgment the naughtiness of an act on the objective outcomes of an event rather than on the subjective intentions of the actors in the incident. However, more recent research has suggested that Piaget's original methodology was flawed. For example, his stories varied information on both consequences and intentions simultaneously. But recent experiments have shown that: when amount of damage is held constant, and only intentionality is varied, even young children judge intentional damage to be more culpable than unintentional d a m a g e . . . • Still other researchers have noted that Piaget's stories invariably presented the intention information before the consequence i n f o r m a t i o n . . . [But] at least four different experiments have found evidence that this does interfere with the young child's use of intention information . . . . Children as young as five or six years of age do use intention information when it is presented at the end of the story (Shultz, 1980, p. 135). In conclusion, refinements of Piaget's original techniques in recent years have led to reinterpretations of the age levels at which objective responsibility yields to the importance of intentions in children's opinions about how to place the blame for undesirable outcomes. 1.4 Moral Problems with Alternative Solutions Instead of asking people to suggest their own solutions for moral-decision anecdotes, some investigators provide alternative solutions from which an interviewee is to select the option that is judged to be the best. After the choice has been made, the investigator usually seeks to discover on what moral concepts and line of logic the interviewee based the choice. So the researcher asks, "Why is that the best solution?"• Multiple-choice assessment items have several advantages over the types of open-ended questions illustrated under 1.1, 1.2, and 1.3 above• First, multiple-choice options permit the investigator to focus respondents' attention on the facets of moral decisions of particular interest to the investigator. Furthermore, multiple-choice responses are easy to compile, analyze, and treat statistically• However, one serious disadvantage is that none of the choices provided by the researcher may reflect the solution the respondent would propose on her own initiative. In other words, by being restricted to two or three options, the respondent may have no way of describing what she would really do in that moral-decision situation. As a consequence, conclusions later drawn from the collected data may not be an accurate reflection of the respondents' moral concepts and principles. The following example of multiple-choice forms of assessment is taken from Piaget's research. A m o n g his investigations of ideas of justice was a study of the types of punishment children of different ages believed were most suitable to impose for various kinds of transgression. His assessment method consisted of describing to the child a brief incident of unacceptable behavior, then asking the child to select from three punishment options the one that would be the fairest. The child was also to defend her choice, that is, to explain why her selected punishment was the best. The kinds of undesirable behavior represented in the stories included neglecting to carry out an assigned responsibility, lying to a parent, disobeying a parent's warning not to play ball (with the disobedience resulting in a broken window or broken flower pot), failing to heed a parent's warning about not soiling the pages of a picture book, and a disgruntled robber's turning a fellow thief over to the police

384

R. MURRAY THOMAS

(Piaget, 1932/1948, pp. 200-202). In the case of each story, Piaget formulated the three punishment options in a way that might reveal to him the principle underlying the child's punishment preferences. Specifically, the researchers wished to learn whether the child favored expiatory punishments over punishments by reciprocity or vice versa. Piaget defined expiatory punishments as those consequences that have no intrinsic connection with the transgression so that the punishment is not a 'natural' consequence of the act but, instead, is simply intended to be a sufficiently painful experience to m a k e the miscreant realize his guilt and avoid committing such unacceptable acts in the future. An example of expiatory punishment is a father's either spanking his son or preventing the boy from attending a movie because the boy has carelessly soiled a picture book. In contrast to expiatory punishments, Piaget defined punishments by reciprocity as those consequences of an act that are 'natural' results of a person's breaking a bond of mutual trust. The notion of reciprocity is seated in the conviction that proper, enduring social systems are founded on mutual trust among m e m b e r s of the system, whether the system be a pair of friends, a family, a community, a factory, or a nation. When s o m e o n e transgresses breaks the contract of trust by lying, cheating, stealing, tattling, or the like - - consequences result that are 'natural' for breaking the trust. In Piaget's story of the soiled picture book, the punishment option regarded as natural reciprocity was the father's refusing to lend the child the b o o k again. When considering the strengths and weaknesses of this m o d e of assessment, Piaget identified the following two shortcomings: (1) the story incidents he used were quite naive and thereby failed to incorporate the complexities found in m o r e typical real-life punishment situations; and (2) the punishments suggested as the three options failed to exhaust the alternatives that would be found in real life. H o w e v e r , Piaget defended the incidents and his multiple-choice form of answers by contending that his methodology still achieved his goal, that of discovering whether the child preferred expiation over reciprocity as the better principle on which to base punishment (Piaget, 1932/1948, p. 202). Piaget's analysis of the answers given by around 100 Swiss children, ages 6 through 12, suggested that on the average the younger children favored punishment based on expiation rather than reciprocity, whereas older ones tended to favor reciprocity. However, there was no clearcut division by ages, only a discernible trend. H e concluded that expiatory punishment, practiced as it is in "certain types of family life and social relationships, survives at all ages and is even to be found in many adults" (Piaget, 1932/1948, p. 199).

-

-

1.5 Asking for a Demonstration and an Explanation As a means of learning the nature of people's moral concepts, researchers sometimes have asked an interviewee to demonstrate some act in the realm of morality, then explain the characteristics or meaning of the act. This technique has been used to discover how closely people's explanations of what they know correspond to what an outside observer can infer about what they know on the basis of their behavior in the required demonstration. It is a c o m m o n observation a m o n g people who work with children that a child will often evidence her understanding of a moral concept or causal relation before she is able to explain the concept or principle in words. The following example of demonstration and explanations concerns rules. As noted earlier, in his pioneering book on moral development Piaget stated that "All

Assessing Moral Development

385

morality consists in a system of rules, and the essence of all morality is to be sought for in the respect which the individual acquires for these rules" (1932/1948, p. 1). Rules can be regarded as prescriptive causal relations, since they prescribe "when this condition exists, then that is the right thing to do". Piaget's defining morality as respect for rules led him to study how children grasp the rules of games they play. He used the games of marbles among boys and hide-and-seek among girls as his exemplary phenomena to investigate. His dual purpose was to understand both: (1) how children adapt to the rules of the game at each age and each level of mental development; and (2) how conscious children are of rules at different age levels, in the sense of what obligation they believe rules place on them (Piaget, 1932/1948, p. 13). The evaluation technique he devised was a variation of his general clinical method. For instance, in the case of the boys, the method consisted of an experimenter playing a game of marbles with a child and, during the game, questioning the child about the rules. The activity was introduced by the researcher placing marbles on a broad baise-cloth-covered table and explaining that "When I was little I used to play a lot, but now I've quite forgotten how to. I'd like to play again. Let's play together. You teach me the rules and I'il play you" (Piaget, 1932/1948, p. 13). The subjects used in the study were Swiss children ranging in age from around 4 to 14. Although the same basic assessment technique was employed at each age level, in a few details the approach was different for younger subjects than for older ones. Whereas older boys could readily explain all the rules, the younger ones were observed to follow more rules than they could explain. As a result, among younger children the experimenter would play the game with one boy alone, asking the boy to explain all the rules he knew. Next, the experimenter repeated the procedure with another younger boy. Finally he brought the two boys together to play a game with each other, and he observed the rules implied by the way they played. By means of this 'controlled experiment' the researcher was able to cross-check the boys' implied conceptions of the rules against their explanations of rules (Piaget, 1932/1948, p. 14). In addition to asking children to describe the rules they used, Piaget tested their conceptions of the characteristics of rules by posing further questions, such as: "Could you make up a rule yourself, a rule that no one else knows? .... Would it be all right to use your rule in playing with your pals? .... When you are older, suppose you tell your rule to others, so they play by your rule and forget the old one. Then which rule would be f a i r e r - - y o u r s that everyone knows, or the old one that everyone has forgotten? .... Where do rules come from - - do children make them up, or do they come from parents and other grown-ups?" (Piaget, 1932/1948, p. 15). The purpose of these additional queries was to learn whether the child believed in the mystical virtues and finality of rules or, in contrast, believed that rules are conventions agreed upon by people, conventions that can be changed by people's consent. An advantage of Piaget's clinical method is that it encourageg children to describe things in their own manner, to explain things the way they see them without undue structuring on the part of the experimenter. As mentioned earlier, the method is not limited to a strict set of questions but permits the investigator to rephrase queries in whatever way seems desirable to elicit an accurate picture of the child's perception of morality. However, Piaget warned that unless interviewers are careful, the way they phrase their questions and their remarks can elicit answers that reflect more the way investigators expect children to perceive morality than the way children actually do perceive it.

386

R. MURRAY THOMAS

The following observations illustrate types of conclusions Piaget drew from his study of children's ideas of rules in playing games. First, he identified four stages in rule conception from around age two through adolescence. The first stage (about age one to two) he labeled motor or individual, when the child plays alone with objects (marbles), manipulating them in rivaled ways that do not involve collective rules. At the second stage (around age two to five or six) called egocentric, children begin to recognize rules imposed by others but do not abide strictly by the rules, whether playing alone or with others. In effect, the child m o r e or less imitates rules described by others but does so in a purely individual, unabashedly self-serving fashion. Stage 3 (age 7 through 10) is a period of incipient cooperation, when players begin to concern themselves with mutual control and unification of the rules, but the rules are not yet stable nor is children's understanding of the rules entirely clear and consistent. Stage 4 (ages 11 to 15 and beyond) marks a period of rule codification, when every detail of the game is fixed and the code of rules is known throughout the society of players. In this progression of stages, Piaget (1932/1948, p. 76) identified a general trend in children's perceptions of rules: First, a mystical respect for the law, which is conceived as untouchable and of transcendental origin, then a cooperation that liberates the individuals from their practical egocentrism and introduces a new and more immanent conception of rules.

Written Tests Although the individual interview has served as the most popular method for studying children's moral development, numbers of researchers have been dissatisfied with the great investment of time, money, and energy that such an approach requires. As a consequence, a variety of efforts have been made to advise techniques which permit investigators to gather data from a quantity of people simultaneously. The most c o m m o n technique of this kind has been the written test. Written tests have been of two main types, those posing recognition or evaluation tasks and those involving production tasks.

2.1 Recognition Tests A variety of tests reported in the literature are ones that require neither intensive training of interviewers nor individual administration (Enright et al., 1980; Hogan, 1970; H o g a n & Dickstein, 1972; Maitland & G o l d m a n , 1974; Page & Bode, 1980; Rest et al., 1974). All of them offer the convenience of being in printed form, so they can be administered to a group and scored by computer. In addition, they all assess a person's ability to recognize statements that indicate a particular viewpoint toward moral issues, or else they require the person to evaluate such statements. Perhaps the best known instrument of this sort is the Defining Issues Test (Rest et al., 1974; Rest, 1979). The Defining Issues Test ( D I T ) is a derivative of the Kohlberg interview approach, utilizing both the Kohlberg moral-dilemma incidents and the system of stages of moral development proposed by Kohlberg. But in contrast to the interview method, the D I T is in written form so that it can be administered in a group, and it requires subjects to make judgments about other people's moral reasoning rather than to produce their own rationales for how to behave in situations involving morality. The D I T describes six moral dilemmas, each followed by brief statements of 12 issues or

Assessing Moral Development

387

considerations bearing on the moral incident. Examples of issues related to the Heinz dilemma are: (1) whether or not the community's laws are being upheld; (2) "Isn't it only natural for a loving husband to care so much for his wife that he'd steal?"; and (3) "What values are going to be the basis for governing human interactions" (Rest et all, 1974, pp. 493494). Each statement is designed to exemplify a distinct characteristic of a stage. People taking the test are to rate each statement according to how important that issue should be in making a judgment about the dilemma, with the rating appearing on a fivestep Likert scale ranging from 'most important' to 'of no importance'. Subjects also rank their top four choices of the most important issues. From their studies of the DIT with students ranging from ninth-graders to advanced university political-science and philosophy majors, the authors concluded that the DIT "demonstrates good test-retest reliability [r = 0.81], produces comparable information with each testing, minimizes variance due to differences in verbal expressivit3), and is scored objectively, thus mimimizing scoring bias and an enormous amount of time" (Rest et al., 1974, p. 500). The proposal that the DIT reflects developmental differences was supported by the fact that the percentage of subjects who chose higher moral-reasoning statements increased with age. A correlation of r = 0.68 between DIT scores and Kohlberg interview ratings suggests that while there is a substantial relationship between what is assessed by the D I T and the interview approach, the two do not measure identical moral-reasoning characteristics, such as recognition or evaluation of moral reasons versus the creation of such reasons. 2.2 Production Tests As noted with the DIT, recognition tests have not been entirely comparable with the Piaget and Kohlberg interview techniques, since recognition tests fail to reveal the answers people produce in response to the opened-ended questions of the sorts asked during a typical interview. To provide a production-task instrument that could be administered to a group, Gibbs and coworkers (1980, 1982) devised the Sociomoral Reflection Measure that enables researchers to train themselves in its use and permits relatively rapid protocol assessment. The Sociomoral Reflection Measure (SRM) is a variant of Kohlberg's Moral Judgment Interview (Colby & Kohlberg, 1984), using the same first six standard dilemmas and intended to be interpreted in terms of Kohlberg's six-stage typology of moral reasoning, though in practice the SRM yields results scored only for the first four levels of the Kohlberg system. With the SRM, each dilemma is presented in printed form to be read silently by the subjects, with the dilemma followed by pairs or trios of questions. The first in each pair or trio is in multiple-choice form, and the last is open-ended, requiring the subjects to reveal the value foundation of their multiple-choice responses by telling why they have offered such a response. For example, the queries related to the Heinz dilemma include the following set of three questions (Gibbs et al., 1982, p. 899): (2) What if the person dying isn't Heinz's wife but instead is a friend (and the friend can get no one else to help)? Should Heinz: steal/not steal/can't decide (circle one)? (2a)How important is it to do everything you can, even break the law, to save the life of a friend? very important/important/not important (circle one)? (2b)WHY is that very important/important/not important (the one you circled)?

388

R. MURRAY THOMAS

The administration of the SRM to a group is reported to take around 45 minutes, or over an hour when it is read aloud to younger subjects (fourth and fifth-grade pupils). Scoring the answers requires what the authors describe as "negligible classificatory work," and "justificatory responses to the b components of the questions are scored by reference to corresponding norm sections of the SRM manual" (Gibbs et al., 1982, pp. 899-900). However, the fact that the data have been collected in a group setting means that the investigators lack the opportunity provided by Kohlberg's original interview technique to ask probing questions of individuals in order to clarify ambiguous aspects of subjects' responses. The SRM yields two primary ratings: (1) a modal stage, meaning the Kohlberg stage (one through four) most frequently found in a subject's judgments of the six dilemmas; and (2) a Sociomoral Reflection Maturity Score, which is a more differentiated rating similar to the Moral Judgment Interview's MMS. Psychometric information, based on protocols from over 300 subjects (fourth-graders through adults) as scored by highly trained and selftrained raters, was interpreted by the instrument's authors to suggest 'acceptable' levels of interrater reliability, 'good' internal consistency of the SRM, and 'good' concurrent validity with the Moral Judgment Interview, meaning that '~developmentally valid sociomoral reasoning data can be collected through group-administerable procedures" (Gibbs et al., 1982, p. 908). However, the authors' hope that the form of responses on the SRM would significantly reduce the time typically needed with the Moral Judgment Interview for determining a subject's devolopmental stage was not fulfilled. The average SRM required about one-half hour to score for developmental stage, which is only slightly shorter than the time required to score a typical Kohlberg interview protocol. While the SRM could not be a complete substitute for the Moral Judgment Interview, since it does not permit the researcher to ask clarifying probe questions of a respondent, the SRM does appear generally to approximate the results of the Moral Judgment Interview and has the convenience of being suitable for group administration and self-training of raters. 2.3 A c a d e m i c A c h i e v e m e n t Tests The evaluation techniques reviewed so far in this chapter have been used chiefly in the conduct of research on moral development. They are not the evaluation devices routinely used in educational programs for determining how well learners have mastered the concepts of the moral instruction they have received. There are two main reasons that teachers do not routinely employ the foregoing techniques. First, teachers usually do not have time to conduct individual interviews with pupils nor to score complex protocols or carry out sophisticated statistical analyses of the results. In addition, to save time, classroom teachers usually must administer their assessment techniques to an entire group rather than to individuals, one at a time. As a result, paper-pencil tests and other written products - - such as compositions and h o m e w o r k papers - - are widely used as classroom assessment devices. A second reason that the assessment methods routinely used by teachers frequently differ from the methods of researchers is that teachers and researchers often seek to answer different questions. The questions that usually interest teachers are: " H o w well have the students mastered the learning objectives?" And if they have not achieved the objectives, "What might be done to improve or help them?". Researchers, on the other hand, are usually investigating the basic process of development in order to propose or test

Assessing Moral Development

389

generalizations such as those offered by Kohlberg and Piaget. So the assessment techniques appropriate for the teacher's needs are often not well suited to the purposes of the researcher. The intent of this final section of the chapter is to illustrate briefly types of test items commonly used by teachers to assess how well students have mastered facts, concepts, and causal relations that have been taught in a moral-education program. Six common varieties of test items are illustrated below. The first three - - true-false, multiple choice, and matching questions - - are sometimes labeled recognition items, for they require the student only to recognize which of several alternative answers is correct. The last three - - completion, short-answer, and essay - - are recall items, for they require the student to recollect or generate a correct answer without the aid of the alternative possible answers that appear in the test item. Recall questions usually are the more demanding type, for they assess not only the respondent's ability to generate an answer but also the ability to communicate the answer accurately in written form. So far in this chapter the examples of assessment techniques have all demonstrated ways of evaluating respondents' concepts and prescriptive causal-relations. None of the examples has shown ways of assessing people's knowledge of facts related to moral matters. However, educators who operate moral-education programs are usually interested in discovering how well facts about specific people and events have been mastered by the learners. It is important to realize that the term facts, as intended here, means not only information that everyone would agree has been objectively 'proven'. In addition, facts refers to convictions which have not been demonstrated to everyone's satisfaction but which are judged as being true by those conducting the moral-education program. Such a flexible definition of facts thus accommodates the sorts of subject-matter included in the moral instruction advocated by groups representing particular religious, ethical, or political persuasions. To show how test items can be designed to assess for different types of knowledge, in each of the following sets of examples the first item is intended to test students' knowledge of a fact, the second to test for a concept, and the third to test for a prescriptive causal-relation. The six sets of items are followed by a brief note about several advantages and disadvantages of these types.

True-False Items: Directions. In the blank at the left of each statement, write a plus sign ( + ) if you think the statement is true and a zero (0) if you think it is false. (1) In the R o m a n Catholic church, the Pope is the authoritative representative of God. _ _ (2) The word ethical means the same as moral. _ _ (3) The death penalty is never a proper kind of punishment, even for people who have murdered someone. _

_

Multiple-Choice Items: Directions. In the blank at the left of each item, write the letter of the answer that best finishes the sentence. _ _

(~)

A medical doctor famous for his missionary work in Africa was: (A) Cecil Rhodes (B) Thomas A. Edison (C) Albert Schweitzer

390

_

_

R. MURRAY THOMAS

( 2 )

_ _

(3)

(D) Louis Pasteur Altruism means: (A) receiving pay for helping other people. (B) helpin,g someone else at a sacrifice to yourself. (C) giving 10 percent of your wages to the church. (D) praying that God will help people who are in trouble. After H a r r y found the bag of m o n e y that had been lost by the bank robbers, what should he do with the money'? (A) Keep it for himself, because finders are keepers. (B) Share it with his friends, because friends should help each other. (D) Give it to the Red Cross to help feed starving people in other countries, because every possible life should be saved.

Matching Items: Directions. In the blank at the left of each item in column I, write the letter of the best answer from column II. Some answers in column It may be right more than once. Some may not be right at all. Item 1: Column I Slave trade outlawed in British colonies. Magna Carta signed. __ United Nations Organization formed. __ League of Nations formed.

Column II (A) 1215 (B) 1811 (C) 1861 (D) 1920 (E) 1945

Column l Item 2: A very serious kind of crime. Releasing a prisoner before the end of his sentence on his promise not to break the law. A minor kind of crime. Stealing s o m e o n e ' s property. Releasing a prisoner from punishment.

Column II (A) parole (B) pardon (C) larceny (D) m i s d e m e a n o r (E) tort (F) felony

_ _

_ _

m

E

Item 3: In Column I, write the letters from Column II to show how you rank the four ways to solve the world's population-growth problem. Column I Column II Best solution (A) H a v e government pay for abortions. Second best solution (B) Pay people not to have children. Third best solution (C) Ask people to have small families. Worst solution (D) Charge people fines if they have more than two children. Now tell why your best solution is the best. And tell why your worst solution is the worst. _ _

_ _

_ _

_ _

Fill-in or Completion Items." Directions. In the blank space in each of the following sentences, write the word or words that best complete the sentence.

(1)

The name of the nation that awards the Nobel Peace Prize is

Assessing Moral Development

391

(2) The practice of putting black and white children in separate schools is called school (3) The most important human right is the right to

Short-Answer Items: Directions. Write a short answer to each of the following questions. (1) N a m e three countries to which United Nations security forces have been sent to keep the peace? A n d in each of these countries, what groups were fighting each other before the U N troops arrived? (2) In Turiel's opinion, what is the difference between morality and social convention? (3) When is it proper to tell a lie? A n d why would it be proper at that time?

Essay Items: Directions. Write answers to the following questions. (1) N a m e the three main divisions of the national government, and explain how they are supposed to check and balance each other. (2) You recall that in the French Revolution the c o m m o n e r s claimed the aristocrats were immoral. What actions did the c o m m o n e r s say were immoral, and why did they consider such acts immoral? (3) One group of Christians believes that children are born immoral, so that children naturally do bad things. Another group believes that children have no inborn tendency to be either good or bad. Your task is to: (A) Tell what reasons each of these groups uses to support its opinion and (B) tell which group you agree with and why.

It is apparent from the above examples that the three sorts of recall i t e m s - - f i l l - i n , shortanswer, and essay - - are not distinct types but, rather, they differ only in the degree of complexity in the answers they require. A long fill-in item merges into the short-answer category, and a lengthy short-answer item merges into the essay type. C o m p a r e d to short-answer and essay questions, an advantage of true-false, multiplechoice, and fill-in items is that they permit the test m a k e r to sample a larger variety of moral-education topics within the confines of a single test. On the other hand, short-answer and essay questions are more effective for revealing students' thought patterns and their reasons for their beliefs. One important distinction among the six types of items is found in their ability to discriminate between students who have truly achieved the learning objectives and those who have not. That is, some types offer students a greater chance simply to guess the answer without actually knowing the material. The chance of guessing correctly can be represented as a fraction, with the numerator equal to 1 and the denominator equal to the number of possible - - and seemingly reasonable - - answers. In the case of true-false items, there are only two possible answers (true and false), so the fraction is 1/2, meaning that there is a 50-percent chance a person will get the answer correct simply by guessing, without having met the learning objective at all. With a three-option multiple-choice item, the chance of guessing correctly is reduced to 1/3and with a four-choice item to 1/4. Matching

392

R. MURRAY THOMAS

questions are a form of multiple-choice test, so the chance of guessing correctly is 1 over the n u m b e r of possible choices in the right-hand column. (But this is true only when the directions inform the students that some of the choices in the right-hand column may be correct m o r e than once. If, however, a choice in the right column is eliminated from further consideration after the student has selected it once, then the chance of guessing correctly increases with each additional correct selection the student makes.) The chance of guessing the correct answer for fill-in, short answer, and essay items is extremely slight, since the array of possible answers does not appear on the test but must be generated out of the student's own mind. Thus, the most discriminating of the item types is the essay question, which not only requires correct recall but also requires that students produce a well organized version of what they have recalled. Because teachers expect older pupils to m a k e finer distinctions a m o n g facts, concepts, and causal relations than do younger pupils, multiple-choice and matching items for older students usually contain a larger n u m b e r of possible choices than do such items for younger children. F u r t h e r m o r e , the choices provided for older students typically differ from each other in m o r e subtle ways than do choices for the young. It should also be clear that essay questions are most appropriate for older students, because of their greater skill in communicating their ideas in writing. Additional varieties of achievement-test items, along with detailed guidelines for constructing them, can be found in the following sources: Gronlund (1982), Lippey (1974), Marshall and Hales (1973), Nitko (1983).

Conclusion The intent of Chapter 3 has been to illustrate some of the principal methods used by researchers and educators to determine people's beliefs about facts, concepts, and prescriptive causal relations related to moral matters. However, the sorts of assessment methods inspected in this chapter do not exhaust the types available for such purposes. Psychoanalytic interviewing and pictorial projective techniques have also been used. They will be found in Chapters 5 and 7 where they appear because of their special usefulness in assessing the affective aspects of moral facts, concepts, and principles and in assessing thought processes.

CHAPTER 4

T H E D E T E R M I N A N T S OF M O R A L D E V E L O P M E N T

As noted in Chapter 3, I am using the term causal relations or causal relationships to mean an if~then connection between two or more facts, concepts, or generalizations in a person's mind. Furthermore, in Chapter 3 a distinction was drawn between two sorts of causal relations, the explanatory and the prescriptive. The explanatory variety represents what a person believes to be 'the real' or 'the true' cause-and-effect connections in moral behavior. Within statements about casuality, an if portion is assumed to identify the cause associated with a then segment that is assumed to be the result or effect. Whereas explanatory statements reflect people's convictions about 'real' or 'true' causes of moral events and moral development, prescriptive statements represent a person's values in the form of what should result in moral situations. Assessment methods for prescriptive beliefs were reviewed in Chapter 3. Methods of assessing for explanatory convictions are surveyed in the present chapter and in Chapter 8. For people who are concerned with influencing the moral attitudes and behavior of individuals and of society in general, the most important question to answer is: "What are the true causal factors that determine people's moral development?". This question identifies the issue at the core of moral-education programs, of attempts to rehabilitate law-breakers, and of parents' child-rearing practices. One of the two major purposes of the present chapter is to review evaluation techniques used in efforts to answer this question. The second major purpose is to review assessment methods used for identifying how people's perceptions of causal factors can vary according to their age and social environment. As a preparation for inspecting these matters, we begin with a discussion of some key concepts underlying these concerns.

An Orientation to Some Issues of Causality Although the two fundamental issues treated in this chapter are closely interlinked, for purposes of analysis it is useful to consider them separately. The first question focuses on attempts to solve the mystery of what factors determine people's moral development and of how these factors operate in the process of development. The hope is that answering this question about 'true causes' can enhance our ability to guide moral development into constructive channels. The second question concerns what people at different age levels or 393

394

R. MURRAY THOMAS

with different experiential backgrounds believe about causes of moral development. We seek to learn what reasons people at different age levels and in different cultural settings give when asked "Why did that happen?" or "What really caused them to act that way'?". In other words, to what causal factors do people attribute promoral and immoral behavior. Answering this secon'd question can enhance our ability to understand other people's viewpoints and their reasons for treating moral issues the way they do. The following discussion first centers on the matter of 'true causes', then on the nature of people's attributions of cause. However, before considering these two matters, I believe it will be useful to describe several assumptions that underlie the contents of this chapter regarding characteristics of explanatory causal relations and their importance.

Some Assumptions about Causality To illustrate several assumed characteristics of causation in the realm of morality, consider the following three examples of explanatory causal relation. If the temperature of water at sea level is reduced to 0 degrees centigrade or lower, then the water will become ice. If the girl were only four years old instead of fourteen, the authorities probably wouldn't hold her responsible for taking the money. If you explain that you didn't mean to run over the cat, and if you freely admit you did it and, in addition, you apologize to them and offer to pay the veterinary bill, then there's a good chance they'll forgive you and they'll still regard you as a friend - - and maybe they won't even make you pay the veterinary bill. The examples illustrate two aspects of causal relations. First, some relations involve moral issues, whereas others do not. A m o n g the three examples, only the last two could qualify as moral matters under most definitions of moral. The first example - - water turning into ice - - is obviously not within the moral domain but, instead, is a causal statement about a physical rather than a social phenomenon. A n o t h e r characteristic is that the if and the then portions of a causal relationship can be either simple or complex. The second example consists of a single if condition and a single consequence. In contrast, the third example includes multiple if conditions and multiple consequences. In daily conversations involving situations, statements about causal relations typically cast the if~then elements in reverse order when people are trying to explain an effect that has already occurred. For example, the effect that is observed may be a girl's acts of altruism. The cause which produced her altruistic acts is sought as the answer to the question why? For instance, "Why does she spend so much time helping others?". However, when people are seeking to predict the outcome of an event that is yet to occur, they are morc apt to cast the assumed causal connection in its original if/then sequence. "If the drug dealers pay him enough money, he'll probably agree to sell drugs to high-school students." Such a prediction of future outcomes will usually also involve an assumed connection between the predicted effect and a present or past causal condition. For instance, we might ask, "Why would he sell to high school students?" The answer will then reveal a connection between the act of selling drugs and some assumed trait or attitude or past behavior of the individual. "He'll sell them because he needs money to feed his own expensive drug habit" or "'He has no social conscience - - he cares nothing about others' welfare." From the standpoint of moral judgments and decisions, the causal relations that a person

Assessing Moral Development

395

has in mind are important for two reasons. First, they represent the consequences the individual believes will result from certain acts - - the rewards to be enjoyed and the punishments to be suffered. ( " I f I give some of my allowance m o n e y to the Red Cross for the drought victims, my father will be very proud of m e . " "If Mother catches me driving her car without her permission, she'll never let me use the car again.") Thus, I am suggesting that people's convictions about if/then connections serve as primary intellectual guides in their moral decision-making, since people's convictions suggest the consequences to be expected from the various behavior options that they recognize are available in the present moral-choice situation. The assumption underlying this proposal is that when people face moral decisions, they will select the behavior option they believe will yield the greatest rewards and minimize punishment. We can now use this proposition as the basis for setting a criterion in judging the developmental level of an individual's moral-choice ability: "The more accurate a person's prediction about which consequences will result from a given behavior, the higher the individual's level of moral-choice ability". We can also suggest that the goal of most moral-education and values-clarification programs is to influence the causal-relation convictions that people hold. But more important for our immediate concerns, this line of reasoning enables us to identify a significant evaluation task, that of learning how valid or accurate are the causal-relation beliefs the person holds.

Seeking the True Causes of Moral Development In their search for the 'real' or 'true' forces that determine moral beliefs and behavior, theorists over the centuries have proposed a variety of sources of cause, including those of heredity, environment, an individual's developmental history, willpower, the 'nature of things', and supernatural intervention. A brief description of these kinds of variables can serve as a foundation for suggesting the sorts of evidence and assessment methods used in efforts to determine the extent to which each source is actually involved in effecting moral development.

Heredity The notion that human heredity - - people's inborn nature - - affects their moral thoughts and actions has a long history. In Christian Protestantism, the traducian tradition holds that ever since A d a m and Eve disobeyed the Lord's dictum against tasting the forbidden fruit in the Garden of Eden, each newborn human has carried the taint of A d a m and Eve's original sin of disobedience. The taint assumes the form of a natural tendency to do evil. All children are believed to be born immoral (Thomas, 1985, pp. 716-717). In contrast, Confucian, Shinto, and Islamic traditions propose that humans are born promoral, tending toward goodness unless diverted from the path of righteousness by an unfavorable environment (Lin, 1985, p. 970; T h o m a s & Niikura, 1985, p. 4668; Obeid, 1985, p. 2718). From a non-religious perspective, other theorists, from Jean Jacques Rousseau to Abraham Maslow, have likewise invested the neonate with a promoral bent (Rousseau, 1773; Maslow, 1970). However, in none of these proposals is there an explanation of a genetic process that has determined the newborn's ostensible moral inclination. While the creators of the foregoing theories have proposed an inherited moral inclination for the entire human race, there also exist folk traditions in which inborn moral incli-

396

R. MURRAY THOMAS

nations can differ from one person to the next. Such a tradition claims that some people are born with 'bad blood' from an ignoble lineage, whereas others are born with 'good blood' from a noble line. This notion, in a moderated form, continues to be expressed by some writers in the field. For example, A b r a h a m s e n (1960, p.31) has contended that: When a person is basically equipped with poor traits, such as emotional instability, overpowerful drives, and an abundance of asocial feelings, environmental influences may easily elicit criminal behavior. This idea of individuals' inborn tendencies has been entertained as well by modern-day sociobiologists who have suggested the likelihood that certain genetic structures can predispose a person more to one form of social behavior than to another. For example, Wilson (1978) has argued that the basis of altruism lies in the degree of similarity in genetic structure between two people, so that acts of self-sacrifice are more likely to be performed for a near relative than for someone not closely related. A n d a survey by Mednick (1985, p. 60) of studies about the frequency of criminal behavior among identical twins as compared with fraternal twins led him to conclude that: These studies all suggest that we would take seriously the idea that some biological characteristics that can be genetically transmitted may be involved in causing a person to become involved in criminal activity. But in recognition that such biological tendencies can also be influenced by environmental forces, Mednick added that "Regardless of genetic background, improved social conditions seem to reduce criminality". In addition to a belief in the direct influence of heredity on moral behavior, there are also proposals about indirect effects of inheritance. Some genetically determined characteristic, such as a mental or physical handicap or an ugly visage, may be said to cause other people to react negatively to the affected individual, which in turn influences that individual to respond in antisocial ways. Or the birth of children into an economically impoverished family or into a crime-ridden neighborhood may increase the chances that they will develop antisocial behavior. In other words, such proposals suggest an indirect influence of heredity that affects the interaction between heredity and environment in ways which bear on moral thought and behavior.

Environment Because the influence of environments on moral character will be inspected at some length in Chapter 8, the nature of hypotheses regarding environmental influences will receive little attention here. It may be sufficient at this point to note that most present-day scientific investigations in the realm of morality appear to assume that the child is born amoral, with no tendency to either promoral or immoral behavior, and that the dominant - - if not exclusive - - influence on moral belief and action is exerted by environmental forces.

The Individual's Developmental History Some theorists have cited the past process of development itself as a significant cause of future development. By the time the growing child or youth has arrived at his present stage of development, he already has well-established characteristics-- a particular appearance, level of knowledge, style of interacting with other people, self concept, and the like. These

Assessing Moral Development

397

characteristics are a result of the individual's developmental history, the consequence of past interactions between genetic endowment and environmental forces. The present characteristics now establish a set of potentials and limitations for further development that are unique to the individual and that will influence all future interactions with the environment. Schneirla (1957, p. 86) has proposed that "the individual seems to be interactive with itself throughout development, as the processes of each stage open the way for further stimulus-reaction relationships depending on the scope of the intrinsic and extrinsic conditions then prevalent". And the radical behaviorist B. F. Skinner (1983, p. 25) reflected a belief in this causal force (along with a denial of either a free will or supernatural intervention) when, in a retrospective view of his own present life condition, he explained that, "So far as I know, my behavior at any given m o m e n t has been nothing more than the conduct of my genetic endownment, my personal history, and the current setting". This notion that one's developmental history is a significant factor in determining future development has always been an essential element of Soviet psychology. It is a conviction founded on the assumption that "all fundamental human cognitive activities take shape in a matrix of social history and form the products of sociohistorical development . . . . It follows that practical thinking will predominate in societies that are characterized by practical manipulations of objects, and more 'abstract' forms of 'theoretical' activity in technological societies will induce more abstract, theoretical thinking" (Luria, 1976, pp. v, xiv). From this viewpoint, the history of the society in which children are reared, and the children's own unique developmental histories of experience in that society, are both extremely important in fashioning the ways they will be able to think.

Willpower The question of whether humans have a 'free will' which is under their own control has been long-debated and remains unsettled. Within the tradition of Western science, theorists and empiricists alike appear to regard human development as being determined solely by the interaction of two sources of cause - - genetic inheritance and environmental forces. However, in people's everyday interactions and in society's laws and court procedures there is a clear assumption that individuals actually do make choices on their own volition - - that people are not merely pawns moved about by their genetic structure and environmental influences. "You can change if you really want to" is a common expression reflecting the assumption that people are capable of exerting willpower in moral-decision situatiollo. In modern-day discussions of moral behavior, this concept of varied degrees of self-determination may not be called willpower but, rather, may be labeled ego strength (Rothman, I980, p. 114). Typical moral-education programs appear to be founded on the conviction that individuals do have a measure of freedom in choosing what they will believe and how they will act.

The Nature of Things As studies of children's ideas about the inevitability of social justice demonstrate, people sometimes believe it is in the nature of the social universe for people automatically to get their just deserts for their behavior, just as it is the order of the universe for the sun to rise in the morning and for water to run down hill. It is simply the nature of things that immoral acts inevitably lead to punishment and promoral acts to reward. And while this conception

398

R. MURRAY THOMAS

of causality does not directly account for why people act as they do, it does account for why they can expect to experience the inevitable consequences that their behavior warrants. A n d when people know that deserved consequences will result, this knowledge can influence their actions.

Supernatural Intervention Apparently in all societies there are people who seek to affect their own or others' moral thoughts and acts by appealing for help from superhuman powers - - a supreme being, selected gods, or spirits. Such aid is solicited by means of prayers, rituals and rites, offerings, sacrifices, and the like. Seldom does supernatural intervention figure among the causes considered in scientific investigations of moral development, even though in times of personal anguish the scientists themselves may privately appeal to some unseen supreme power for aid. But oral-education programs based on a religious persuasion typically include this factor of supernatural intervention as an important element in the educator's conceptions of causes determining moral belief and action.

Interacting Causes Even though in the foregoing discussion I have separated sources of cause so they might be inspected individually, it should be apparent that rarely if ever does any thoughtful person believe that only one of the five sources has been the cause of a moral belief or action. Nearly everyone conceives of causality as involving an interaction among two or more sources of influence. As a consequence, the difference between one person's view and another's is typically found in the emphasis they place on one source of cause in comparison to another. Of two theorists who believe an immoral act is the consequence of both heredity and environment, one may credit genetic factors with far greater power than does the other. For the purposes of this monograph, the significance of the above review of theories about cause lies in the implications such proposals hold for assessing the validity of the proposals. Our forthcoming review of assessment methods should make this clear. When people are asked to explain why someone displays certain beliefs or actions in moral situations, not everyone gives the same answer. That is, not everyone attributes a given belief or act to the same causes. And the matter of why they disagree with each other has been the subject of considerable research. Of particular interest have been the changes in causal attributions that occur in children with increased age. Of similar interest have been the differences in causal attributions offered by people from different social environments. Thus, we will be concerned briefly in the closing section of the chapter with one method used for investigating such differences.

Assessing for the True Causes in Moral Development A detailed analysis of the issues involved in evaluating for causes would be far too complex for consideration here. Thus, I shall offer an admittedly over-simplified view of issues regarding the appraisal of cause, yet a view which should be sufficient for identifying the principal techniques that have been used in the search for the true causes of moral develop-

Assessing Moral DeveLopment

399

ment. The methods we shall consider are those of reasoned arguments, appeal to authority, correlational analyses, controlled experiments, and survey summaries of studies.

Reasoned Arguments Founded on Daily Experience Although any attempt to account for cause involves adducing a line of logic or reasoned argument, what I intend here by reasoned argument is the variety that rarely draws on statistical data or experiments. Instead, the author builds an explanation of cause on commonsense observations from everyday life, assuming that his own experiences on which he bases his case are much the same as the experiences of his audience. The audience is thus expected to accept the author's observations as both typical and sufficient to support his argument. In sum, assessment of this variety consists of a theorist first alluding to everyday observations in the sphere of morality, then linking these observations together in a chain of reasoning intended to be so compelling that it convinces the audience that the account of cause is correct. Frequently the line of logic is used not only to support the author's own case but is employed as well to reveal the inadequacies of competing proposals. Examples of this form of assessing descriptions of cause are boundless, but the following instance should suffice to represent how the technique has typically been used. Piaget (1932/1948, p. 231) adopted such an approach when writing about the causes behind the evolution of children's ideas concerning justice and punishment. He contended that children's earliest notions of punishment are founded on instinctive reactions which are subsequently altered by adult intervention. It cannot be denied that the idea of punishment has psycho-biological roots. Blow calls for blow and gentleness moves us to gentleness. The instinctive reactions of defence and sympathy thus bring about a sort of elementary reciprocity which is the soil that retribution demands for its growth . . . . Things change with the intervention of the a d u l t . . . the child is watched over continuously, everything he does and says is controlled, gives rise to encouragement or reproof, and the vast majority of adults still look upon punishment, corporal or otherwise, as perfectly legitimate. It is obviously t h e s e . . , adult react i o n s . . . [that] are the psychological starting-point of the idea of expiatory punishment. (Piaget, 1932/1948, p. 321.)

Appeal to Authority The method that people most commonly use to support the validity of a statement about cause is to appeal to an authority. The principal difference a m o n g people in their use of this technique lies in the sort of authority in which they place their faith. When Jews, Christians, or Muslims state that a cause of immorality is the operation of an invisible supernatural evil force (Satan), the authority on which they found their conviction is a written document - - the Torah, the Bible, the Koran. They accept this document as authoritative on the say-so of some human authority - - a rabbi, pastor, i m a m - - or because there is widespread belief in the validity of the document among m e m b e r s of the individual's society. In contrast, people who subscribe to methods of ' m o d e r n science' as their authority for verification do not place their faith in a particular document or supernatural being that

400

R. MURRAY THOMAS

employs mysterious means for effecting moral outcomes. Instead, they invest their faith in rules of 'scientific inquiry' that include canons governing how to formulate inquiry questions, to gather information, to organize and summarize the information, and to interpret the results. Not only do such people rely on the canons of scientific procedure, but they also trust that most other investigators who subscribe to a scientific approach to verification have been both technically competent and honest in carrying out their research and in reporting the results. When advocates of scientific methods seek to explain their line of logic regarding moral development, they commonly bolster their argument with references to reports of other investigators' scientific inquiries (as is the case of this monograph with its bibiographical citations). In the role of critic, the scientist also often points out ways in which others' research has either violated or overlooked important rules of scientific inquiry. In summary, then, differences among people in the way they attempt to verify an estimate of the causes of moral development often lie in differences in the source of authority they believe is most appropriate for verification.

Correlational Analyses The term correlational analysis" is used here to mean speculating about cause through assessing the degree of relationship between two variables. The simplest form of such analysis involves only two individual variables (the bivariate situation). The goal of computing the degree of correlation between the two is to learn the extent to which one of the variables, such as religious-education programs, has determined or influenced the other variable, such as the level of a child's moral-reasoning skill. However, two perennial difficulties are encountered in such attempts. One is the problem of establishing that there is a true causal connection between the variables. The other derives from the common observation that moral behavior is not the result of a single cause but, rather, is determined by a combination of causal factors. Let us inspect each of these problems.

Causal Connections It is apparent that demonstrating some degree of correlation between two variables says nothing about whether one of them has caused the other. A m o n g children ages 5 to 14, height is positively correlated with scores on mathematics tests, but this clearly should not imply that height causes mathematics achievement or vice versa. From this viewpoint of cause, such a relationship is spurious or merely casual. On the other hand, it could be argued that a positive correlation between religious-education programs and children's scores on an honesty test does indeed reflect a causal relation - - that the methods and materials of the program do indeed influence children's moral behavior. Thus, to support the proposal that an obtained correlation reflects a causal connection, researchers must adduce a convincing line of logic to support their proposal. But regardless of the persuasiveness of their supporting argument, their line of logic still cannot 'prove' that the relationship is causal. Their argument only strengthens the likelihood that cause has been demonstrated. To furnish more secure evidence that the relationship is causal, the researchers can conduct an experiment in which variables other than religious education are controlled.

Assessing Moral Development

401

Multiple Causes Because no moral thought or moral act is the consequence of a single cause, computing a correlation between a single hypothesized cause and an hypothesized effect leaves questions about cause still unanswered. Even if the obtained correlation is positive, it still will be insufficient to account for the entire effect. A n d the researcher remains puzzled about what undiscovered factors have combined with the single hypothesized cause to bring about the complete effect. To deal with this problem, researchers have employed multiplecorrelation, partial-correlation, and path-analysis statistical treatments. Multiple-correlation techniques, like simple bivariate correlation methods, are used for determining the relationship between two variables. However, in the case of multiple-correlation, the n u m b e r representing the proposed causal factor is a composite of two or more numbers derived from separate variables. For example, we may wish to learn the relationship between: (1) a combination of four separate factors in children's lives - - incidence of crime in a neighbourhood, parents' level of education, a child's church attendance, level of family income; and (2) children's acts of self-sacrifice or altruism. Our first step is to measure the status of each child on the first four variables, then combine these four scores to compose a single composite environmental-influence score. We next need a measure of each child's altruism. Finally, we compute the correlation between children's environmental-influence scores and their altruism scores. If, indeed, each of our four hypothesized environmental factors has influenced children's levels of altruism, then the combination of four will account for more of the altruism score than would be true of any one of the environmental factors alone. While multiple-correlation techniques can enable us to rationalize cause better than does simple bivariate correlation, we still do not know what proportion of the outcome (altruism) has been associated with each of the four factors that comprise the environmentalinfluence score. But partial-correlation techniques can aid us in estimating the relative influence of each factor. One way to accomplish this is to decompose our composite score step by step, removing one factor at a time and computing the resulting correlation between the remaining composite factors and the effect (altruism). The magnitude of the change in the correlation coefficient that results from removing a factor tells us how much that variable contributed to the original four-factor composite-score correlation. Or the series of partial correlation coefficients can be derived by the opposite p r o c e d u r e - - adding one by one, rather than subtracting, the variables that build up the final composite score (Thorndike, 1985, pp. 1021-1023). A n d while partial correlation can suggest the relative powers of different factors that we assume have caused an effect, it does not reveal the route by which each factor expresses its influence. To estimate the route patterns, researchers may employ path analysis. Path analysis, as defined by Nil and coworkers (1975, p. 389) "is a method for tracing out the implications of a set of causal assumptions which the researcher is willing to impose upon a system of relations". To illustrate this process, we first estimate, on the basis of a line of logic, what factors have combined to yield a given effect. Then, by comparing the correlations among measures of these factors, we are able to propose how much of the final effect has been caused by each variable and by what route the influence has been exerted. For example, we might hypothesize that the moral values expressed by an adolescent boy's parents, siblings, and close friends combine with the boy's favorite television programs to determine the boy's own values. To test out this hypothesis, we obtain measures of the five

402

R. MURRAYTHOMAS

variables for a sample of adolescents. Then we prepare a diagram to show the paths along which we think the influence of the first four variables will likely pass to determine the fifth variable. Our diagram consists of boxes representing the five variables, with certain of the boxes connected by paths (Figure 3). Imagine that we propose that paths from the parents' box flow: (1) directly to the adolescent; (2) to the siblings box; (3) to close companions; and (4) to favorite television programs. In other words, we believe parental values influence the typical boy, his siblings, his choice of companions, and his television viewing habits. We next propose that paths also lead to the typical boy from siblings, close companions, and television programs, because we hypothesize that each of these factors has some influence on his moral values. Finally, by applying the appropriate statistical procedures (Nil et al., 1975, pp. 383-397), we discover the apparent magnitude (in terms of path correlation coefficients) of the contribution that each of the first four variables makes to the boy's moral values. We also discover the amount of apparent influence of parents' values on siblings, on close companions, and on television-program choices as these factors bear on the typical boy's values.

~ence

7

Siblings' values

P~entss' ~] ,Other influences ~',,\ , Other influences

~._~s

J

programs

Figure3. Pathanalysisof hypothesizedcausalrelations. Despite the usefulness of path analysis for determining the degree of likely influence of different factors on such an outcome as a person's moral behavior, path analysis requires two assumptions that are often hard to defend. First, the technique assumes a closed system. This means accepting the notion that the proposed causal factors (the independent variables) are the only ones that influence the outcome (the dependent variable). Thus, in Figure 3, we are assuming that the boy's moral values result entirely from the values expressed by parents, siblings, companions, and television programs. But it is unlikely that this is true. The boy's values are probably affected by other factors as well. A second limiation is the assumption that the paths we display in our diagram are the actual routes of causal influence in real life. But this assumption may also be in error. Thus, path analysis cannot furnish proof of causal relationships. However, it can serve as a helpful exploratory tool for evaluating how much each factor within a network of hypothesized causes contributes to a final effect.

Assessing Moral Development

403

Illustrative Correlational Research: Twin Studies There are m a n y types of correlational studies used for investigating the comparative contributions of hereditary and environmental factors to development. The following discussion focuses on one of these types, that of twin studies. A n u m b e r of others are provided in Chapter 8, which focuses particularly on methods of estimating the contribution of various kinds of environmental forces on moral development. Studies of different sorts of twins compared to non-twins has been a favorite assessment method for estimating the relative contribution to development of heredity versus environment. Twin studies have assumed various forms, with each form intended to yield a particular sort of information about the interaction of genetic and environmental factors. One form utilizes only monozygotic twins, that is, two people whose genetic endowment is identical because the pair have been born from a single ovum that had been fertilized by a single sperm. If the twins in the study were separated from each other early in life, so that each was raised by a different family, then differences in their development can be attributed to environmental rather than hereditary factors. When investigators thus focus attention only on the development of the two separated twins, environment can be said to account for differences between them but we cannot say that similarities between the pair are due solely to the genes unless we provide additional conditions in the research design. That is, the twins, though separated, may still have very similar family situations that might account for likenesses between them. Therefore, investigators may also study the other children living in the same home as the twin in order to see how much the others are like the twin, on the assumption that children within the same family usually share certain similar - - though never identical - - environmental influences. If twins, raised in very different environments, are more like each other in some aspect of development than they are like the non-related children of their adopted family, then it is assumed that such an aspect is determined more by genetic than environmental factors. On the other hand, if a twin is more like the unrelated children with whom she is raised than she is like her identical-twin sister, then environmental factors are credited with greater influence than is genetic endowment. This illustration of twin studies does not exhaust the forms that such assessment methods assume. Other variations have been devised as well, as illustrated in the following example. Twin studies have been used to investigate various facets of development - - physical, cognitive, emotional, social. The example offered below bears on matters that would qualify as moral thought or action according to any of the definitions of moral described in Chapter 2. In an investigation employing 573 pairs of twins between ages 19 and 70 in Canada and G r e a t Britain, Rushton and his colleagues sought to determine the relative effect of genetic versus environmental factors on subjects' opinions relating to aitruism,'empathy, and nurturance (Zuckerman, 1985, p. 80). A person's altruistic attitudes were measured by how often the individual engaged in 20 acts, such as donating blood and giving directions to strangers. E m p a t h y and nurturance were estimated by a subject's yes/no or true/false answers to series of statements bearing on these traits. The overall sample of 573 twin pairs was divided into five subgroups: 206 identical female pairs, 90 identical male pairs, 133 fraternal female pairs, 46 fraternal male pairs, and 98 opposite-sex pairs. Within each subgroup, correlations were computed between the attitude scores (altruism, empathy, nurturance) achieved by one twin and the scores

404

R. MURRAY THOMAS

achieved by the other in the pair. The identical twins were found to be far more similar to each other than were fraternal twins, causing the investigators to conclude that a person's genetic e n d o w m e n t exerts a significant influence on the three moral traits measured in the study. In regard to altruism, the researchers estimated that about half of the altruism scores could be accounted for by genetic endowment and half by influences of an individual's particular social environment. Examples of correlational-analysis approaches other than twin studies appear in Chapter 8.

Controlled Experiments The term controlled experiment in this context means a study in which: (1) a selected characteristic of moral development is first measured in a group of people; (2) the people are then given some sort of experience; and (3) thereafter the original characteristic of moral development is again measured to determine whether the characteristic changed as a result of the experience. This approach to assessment is often used in efforts to determine how much a moral-development lesson or program has influenced participants. A typical application of such methodology is found in a study by D a m o n and Killen (1982) on the influence of peer interaction on children's moral reasoning. Specifically, the researchers wished to learn whether a child's discussing of a moral issue with peers would advance the child's m o d e of reasoning to a higher step on a moral-reasoning scale. To investigate this matter, D a m o n and Killen drew a sample of 147 children from kindergarten, first, second, and third grades of an urban American school system, with 56 percent of children coming from middle-class homes and 44 percent from lower-class families. Each child was assigned either to the experimental group or to one of two control groups. Then all were given a standard version of D a m o n ' s positive-justice interview, which poses hypothetical problems in distributive justice and asks the interviewees to reason about issues of fairness in the incidents. Two months later, the 78 children in the experimental group took part in a positive-justice peer debate that required the three children who composed each discussion group to reach a consensus about a fair way to divide 10 candy bars a m o n g themselves and a fourth child who was not present. (See the more detailed description of D a m o n ' s procedure in Chapter 8.) The resulting 26 peer debates were videotaped. In parallel to these sessions, the 44 children assigned to Control G r o u p 1 met individually with an adult to discuss a hypothetical justice problem. The 22 children in Control G r o u p 2 engaged in neither a peer debate nor a discussion of justice with an adult. Subsequently, m e m b e r s of all three groups were posttested on a version of D a m o n ' s positive-justice interview. The researchers then c o m p a r e d children's pretest with posttest scores, and they analyzed the videotapes to learn the social-interactionai characteristics of those children who advanced to a higher level of moral reasoning during the peer encounter. The investigators concluded that the children who participated in the peer discussions "were more likely to advance in their moral reasoning than were either children who discussed a similar justice problem with an adult or children who were merely exposed to the pre- and posttests" ( D a m o n & Killen, 1982, p. 347). The obvious advantage of such methodology is that it provides the experimenters greater control over variables that may affect moral development than is generally true when observations are made in spontaneous everyday situations. Furthermore, by conducting

Assessing Moral Development

405

pre- and posttesting and by manipulating the treatment conditions for experimental and control groups, the researchers are typically better able to identify the effect of causal factors than is true in correlational studies. However, a disadvantage of controlled experiments is that they usually fail to reflect the real-life interaction among the multiplicity of variables that apparently contribute to moral decisions and behavior.

Survey Summaries of Studies Frequently investigators do not depend solely on a single experiment or correlational analysis as a means of assessing causal factors underlying moral thought or behavior• Instead, they survey the professional literature on the particular topic in order to report the apparent causes as found under a variety of conditions and with different sorts of people. Such reports are often referred to as reviews• An example of such a review is Cooks0n's (1982) paper in which he examined the apparent impact of American secondary boarding schools on students' moral development by summarizing key findings from a broad range of studies picturing life in private boarding schools and describing characteristics of boarding-school students and graduates• Cookson (1982, p. 95) concludes that: Through a specific process of assimilation, boarding school students undergo a moral career that is designed to shape their characters and their general social perspective. • . . By teaching students to have the "character" to "shoulder the obligation of being the best", boarding schools instill in their students the idea that service is both an obligation of and a reward to those enjoying the privileges of leadership status• Thus, there is an important relationship between the communal life in boarding schools and their role within the larger society.

Summary The foregoing section has briefly illustrated five approaches that investigators have used for assessing what the 'real' or 'true' causes of moral thought and action are. These approaches have included reasoned arguments, appeals to authority, correlational analyses, controlled experiments, and survey summaries of studies.

Assessing Causal Attributions In the introductory pages of this chapter, I suggested that people sometimes cite 'the nature of things' as the cause behind an event in the realm of morality. Children's ideas of immanent justice is an example of this phenomenon. As Piaget (1932/1948, p. 250) explained, a belief in immanent justice is the conviction that automatic punishments emanate from things themselves. Not only do people, such as parents and teachers, dole out punishment for immoral or 'naughty' behavior, but the physical universe is organized in such a way that people necessarily will suffer consequences for violating the rules of conduct, whether or not any authority is present to'witness the transgressions. In effect, misbehavior inevitably results in punishment. Piaget studied immanent justice to learn at what stage of development children believe

41)6

R. MURRAY THOMAS

in it and at what point they may abandon such a notion and why. His assessment technique, as used with Swiss children from ages 6 through 12, consisted of an interviewer telling a child a story about a misfortune that befell a boy who had misbehaved. Then the interviewer asked if the misfortune would still have occurred if the wrongdoer had not misbehaved. One story involved a boy who, while running from an orchard with stolen apples, crossed a rotten bridge that broke and cast him into the river. The question, then, was whether the boy could have crossed the bridge safely if he had not stolen the apples. In a second story the teacher forbid her pupils to sharpen their pencils alone, but one boy tried it anyway and in the process his knife slipped and cut his finger. So the question is: Would the knife have slipped and cut the finger if the teacher had given him permission to sharpen the pencil? In a third story a boy used his m o t h e r ' s scissors against her orders, then managed to return them without her knowing. The following day when the lad was crossing a stream on a plank, the plank broke and he fell into the water. Would he have fallen in if he had not disobeyed? Why or why not? Other similar stories were also used. In terms of developmental trends, Piaget and his colleagues found that the incidence of belief in immanent justice decreased regularly with advance in age. In a study of 167 pupils, 86 percent of 6-year-olds but only 34 percent of 11 and 12-year-olds believed in immanent justice. Since the time of Piaget's original study, numbers of other investigations of children's attributions have been conducted, most of them through use of stories of a similar variety.

Conclusion The purpose of this chapter has been to illustrate typical ways of assessing the descriptive causal relationships that people have in mind. The discussion has focused on real and attributed relationships. The term real has been used to identify those relationships that investigators interpret to be the true causes of moral thought and action. The term attributed has referred to factors which children cite as causes but which informed adults would dub misconceptions. It should be recognized, however, that our distinguishing here between real and attributed causes has been merely for purposes of convenience of discussion. In a strict sense, every descriptive causal relation is only an attribution, that is, an estimate of cause which may well need to be corrected in the future when better data b e c o m e available. Because scientists are still at a somewhat primitive stage in their ability to assess the true causes of moral reasoning and behavior, we can expect that a great amount of research effort will be directed toward studying such matters in the future. This effort can also be expected to include further attention Io the how and the why of people's moral attributions.

CHAPTER 5

A F F E C T IN M O R A L D E V E L O P M E N T

The task of assessing affect or feelings, as the emotional facet of moral development, is fraught with challenging difficulties. An initial problem is that of identifying the nature and place of emotions in the structure of human personality so that emotions might be sufficiently separated from other personality components to be appraised. As an inspection of introductory psychology books will show, in authors' discussions of personality structure, cognition or 'thinking' is usually separated from affect or feeling. Cognition is typically conceived to consist of such intellectual processes as those of remembering, comparing, contrasting, analyzing, synthesizing, evaluating, and the like (Bloom, 1956). Affect, on the other hand, can be viewed as consisting both of p r o c e s s e s - - receiving, responding, valuing (Krathwohl et al., 1964) - - and of conditions or states - - joy, depression, pride, shame, anger, guilt, and others. We can ask, then, what is the relationship between these two aspects, the cognitive processes and emotional states? Does each cognitive process have attached to it affective concomitants? If so, what is the nature of this connection? And in moral-decision situations, what influence does an affective state, such as guilt, have over the operation of a cognitive process, such as remembering or evaluating, and vice versa? In assessing the role of affect in moral development, these are crucial and puzzling questions to deal with. A further evaluation problem is that of determining what medium of communication to use for achieving a valid assessment of affect. In other words, how can one person's emotional state be accurately understood by others? Or, casting the question in quantitative terms, how can emotions be accurately measured? Assessment techniques employed in an effort to answer such questions are reviewed in the present chapter under the headings: (1) projective devices; (2) semantic-differential tests; and (3) biometric approaches.

Projective Techniques The principal assumption underlying the use of projective techniques is that there is no inherent or necessary 'meaning' in any stimuli in the world. Rather, a person assigns meaning to stimuli in terms of her own needs, hopes, fears, and memories of past experiences 407

408

R. MURRAY THOMAS

that were similar in some way to the present stimuli. So the assumption is that people project m e a n i n g s - - both cognitive and e m o t i o n a l - - onto stimuli they encounter in the world. This assumption has been the basis for the creation of a variety of appraisal devices called projective approaches or techniques for estimating the affective characteristics of personality. Perhaps the best known of these devices is the Rorschach Test, which consists of 10 standard ink blots first organized as a measuring device by a Swiss psychiatrist, H e r m a n Rorschach, shortly before 1920. The test is typically administered individually by a psychologist who shows a person each card separately and asks, "What do you see here". However, such ink-blot tests, because they fail to focus a subject's attention on matters bearing particularly on morality, have been less useful in revealing emotional components of moral development than have picture-story tests.

Picture-Story Tests The best known picture-story projective device is the Thematic Apperception Test ( T A T ) (Murray, 1943), which consists of a series of drawings, most of them showing people carrying out some kind of activity, the nature of which is usually not very clear. The subject is asked to tell or write a story of what seems to be happening in each picture. The psychologist who interprets the responses seeks to identify in the subject's series of answers a pattern reflecting the themes dominating that person's perception of the world. In other words, the interpreter tries to determine the primary interests, hopes, fears, frustrations, pleasures, and motives in the subject's life. Whereas the original T A T devised by Murray (Morgan & Murray, 1935) is best suited for use with adults, other versions have subsequently been created for use with children. For example, Rachel Henry (1981) developed a Children's Apperception Test intended to measure aggression-anxiety, which she used in studying moral development among five-year-old sons of 'liberal' and of "authoritarian' mothers. She interpreted her results as supporting "indications in the literature that in addition to fostering high levels of conformity and dissatisfaction, parental restrictiveness combined with punitive, rejecting attitudes fosters displaced aggressiveness tendencies in the child" (Henry, 1983, p. 95). In another version of the picture-story technique, Engel (Engel et al., 1964) created five school-oriented pictures showing school-age children in various ambiguous situations. Subjects' responses to the pictures are scored under eight categories, four of which I would assume could relate to emotional reactions in moral situations (rebellion, constriction, anxiety, punishment), but the other four might have no relation to morality (platitude, magic solution, problem solving, fanciful). A key concern in scoring subjects' responses to projective materials is that the interrater reliability be high. As implied in the above-noted examples, the system for scoring people's responses consists of assigning the responses to categories that reflect the theoretical construct on which the test has been founded. If the scoring process is to achieve an acceptable level of interrater agreement, the categories into which responses are scored must be precisely defined and differentiated, with the raters trained to readily distinguish among the categories. In the case of the School T A T , Engel and coworkers (1964) reported interrater correlations exceeding r = (I.81 for 126 children's story responses, which is a level the investigators regarded as satisfactory.

Assessing Moral Development

409

Sentence-Completion Tests A n o t h e r popular projective approach depends on unfinished sentences or incomplete stories for revealing an individual's m o d e of interpreting the world• An incomplete-sentence device, aimed at evoking emotional aspects of moral situations, can consist of a series of such unfinished phrases as the following: (1) I feel guilty if . . . . (2) The happiest times in my life are when . . . . (3) I am afraid that . . . . (4) In rape cases, the person most at fault is . . . . (5) When I hear of s o m e b o d y getting away with not paying income tax, I want to . . . . (6) I feel that the police . . . . As these examples illustrate, there are two main ways of forming the sentence stem. One way, shown in items 1 through 3, is to identify an emotion (guilt, happiness, fear), then have the respondent offer a situation in which she feels that emotion• The other way, indicated by items 4 through 6, is for the sentence stem to identify a situation or a type of person, then leave the respondent free to express feelings she associates with that situation or person. Whereas the foregoing six sample items have been designed particularly to assess emotional reactions, sentence-complete items can also be designed to elicit people's definitions of moral concepts (as in offering a meaning for such terms as sin, morality, good behavior, and exploitation) and for determining the nature of people's values (assigning judgments of good or bad to people or to acts identified in the sentence stem). Consequently, sentence-completion items could have been included among the concept-assessment methods in Chapters 3 and 6 as well as in the present chapter. Apparently none of the formally published sentence-completion tests has been designed specifically to assess concepts, emotions, or values within the domain of morality. Some of the tests, however, include certain sentence stems expected to reveal a respondent's emotional reaction to situations that could be interpreted as moral. For instance, Loevinger and Wessler's (1970) 36-item Washington University Sentence Completion Test in its several forms (women, men, girls, boys) has a few items that may well reflect an individual's moral values, such as: "Usually she felt that s e x . . . " or "My conscience bothers me if • . . " o r "Crime and delinquency could be halted i f . . . " M o s t of the other items, however, would not necessarily relate to moral matters, such as "When I was y o u n g e r . . . " and "When I am nervous, I a m . . . ". As a further example, Rabin's (1959; Rabin & Thelen, 1964) adaptation of an earlier sentence-completion test contains items scored for fears and guilt, but also includes items intended to reflect children's perceptions of friends and family, their own abilities, and the future. Consequently, investigators wishing to study moral matters exclusively will not likely find ready-made sentence-completion tests that closely match their purposes. They can better achieve their ends by constructing their own sentence-completion items focusing precisely on the categories that compose the theoretical perspective being assessed. It should be apparent that accurate scoring of sentence-completion protocols requires that the scorers be well versed in the theoretical rationale on which the particular set of sentence stems was founded. And the categories into which respondents' answers are to be placed need to be precisely defined. Loevinger and Wessler (1970, p. 53) found that mastering the scoring system for their test required between one and two months, with the result that a rater could then score a typical protocol in 20 minutes, with interrater reliability

41(I

R. MURRAY THOMAS

exceeding r = (}.88 "regardless of the amount of previous training of the raters above a certain self-taught minimum". When sentence-completion tests are administered in printed form, with the respondents' completing the sentences in writing, a large quantity of responses can be collected at the same time. However, the worth of responses gathered by such a group-administration approach depends on both the reading and writing abilities of the subjects. If the subjects' communication skills are limited - - as is true with young children, the mentally retarded, and people from a foreign-language background - - then the sentences are better administered orally in an individual-interview setting, since the interview permits the investigator to recognize whether the subject actually comprehends the sentence stem well enough to complete the statement in a fashion that faithfully conveys the subject's opinion.

Finished and Unfinished Stories Two sorts of stories or anecdotes - - finished and unfinished ones - - can be used to elicit respondents' descriptions of their feelings about moral incidents. In the case of finished stories, such as ones illustrated in Chapter 3, the investigator presents the story in oral or written form, then asks questions intended to reveal respondents" emotional reactions to the moral issues involved. Such questions as the following can be posed: How do you think the boy felt after he tattled on his friend? And why? How would you feel if they accused you of cheating on a test? Would you show your feelings or try to hide them? Why would you do that? You say that John was angry at being punished? Would you have been angry if you had been John? Are you happy with the way the story turned out? Why, or why not? Unfinished stories, as compared to incomplete sentences, offer a respondent more complex situations in which to express emotions. In that sense, stories are perhaps more lifelike than sentences alone. On the other hand, separate sentences enable the evaluator to tap a greater variety of situations and emotions in a shorter time than is possible with stories.

The Generality~Specificity Dimension In selecting a suitable projective device for assessing the emotional aspects of someone's moral beliefs, evaluators can profitably consider the generality/specificity dimension of projective techniques. This dimension concerns the degree to which a projective device directs the respondent's attention to specific moral concepts and events or specific emotions. Of the four types of devices described above, ink blots are the most general, for their indefinite form places little or no limitation on what a respondent can reasonably say the blots represent. TAT-type pictures offer more direction to the respondent than do ink blots, but they still provide considerable latitude for interpretation because of the intentionally designed vagueness of what is occurring in the scene. Incomplete sentences move a step closer to defining a situation or emotion. A n d unfinished stories are the most specific in directing a respondent's attention to the variables in a moral-decision event. Consequently,

Assessing Moral Development

411

if the evaluator wishes to elicit a subject's general attitude or feeling tone toward life, ink blots and pictures representing imprecisely defined events and emotions are appropriate. However, if the evaluator wishes to direct a respondent's attention at a specific moral concept (fair play, injustice, race prejudice) or emotion (joy, shame, pride, guilt), then incomplete sentences or stories are more suitable.

Semantic-Differential Tests The semantic-differential technique, originally created by Osgood (1952) for determining the connotations that people attach to words, can be adapted to studying affect associated with moral matters. In its most c o m m o n form, a semantic-differential test consists of a series of horizontal lines with opposite adjectives at their two ends. For assessing affect, the adjectives relate to feelings that can accompany moral concepts and events. Each line is marked off to form five or seven spaced intervals that represent a graduated scale of meanings between the contrasting adjectives at the two ends, such as: proud elated crying

1 I I

I I I

I I I

I I I

I I I

I, ashamed I depressed I laughing

With a series of such scales before him, the respondent is asked to consider a particular concept or event in the moral domain (cocaine, self-sacrifice, giving money to the church, copying another student's homework), then to mark on each scale the point between the opposing ends that best represents how she feels about the concept or event. A mark in the exact center would suggest that her feeling is neutral. The closer the mark is to one end of the scale, the stronger the individual's emotion regarding that concept or event. Thus, the instrument is intended to identify where a person's opinion lies between (differential) the meanings (semantic) of the opposing words on the line. Although the original form of the semantic-differential device featured adjectives as the choices, verbs are also useful in assessing expressed feelings about moral matters, as the following examples suggest: like/dislike, love/hate, smile/frown, avoid/embrace. The scales used for appraising feelings about a series of moral-related concepts can be identical for each concept or they can be recast for each concept. The following examples from a booklet teaching adolescents about the operation of the criminal-law system illustrate semantic-differential scales adapted to the particular concepts that are being inspected (Thomas & Murray, 1982, pp. 138-139). It should be apparent that both emotional and evaluative (desirable/undesirable, proper/improper) aspects are involved in each of the opposing pairs of scale terms.

Mar~uana smart good sick great

I I I ....I

I I I I

I I I I

I stupid I bad I healthy I awful

412

R. MURRAY THOMAS Search and Seizure Laws

fair confusing harmful

1

I

I

l I

I I

I I

unfair I easy to understand I helpful I

Juvenile Detention Hall

scary unfair friendly

I 1 I

1 I I

I I I

fun fair unfriendly

Biometric A p p r o a c h e s In the field of social psychology, attempts to measure people's emotional reactions to environmental stimuli have included the use of such physiological indicators as pulse rate, skin resistance, blood pressure, and constriction of the pupil of the eye. The most publicized of these measures in relation to moral matters has been the lie detector. Although some time in the future, physiological measures may prove to be suitable for assessing affect in moral situations, in their present state of development they are of little practical value. For one thing, they measure the strength of emotional reaction to environmental stimuli but not the direction of reaction, since the same degree of change in pulse rate or galvanic skin response can be accompanied by joy, by fear, or by anger. A second problem is that there is great variation in the way people's physiologies respond to the same stimulus. One person's reaction to a real-life moral dilemma may result in increased heart rate, another's reaction may bring increased blood pressure, and a third person's may be accompanied by pupil contraction. A third problem is that the equipment typically used for recording such physiological reactions can alter the way a person normally might respond in real-life moral situations. Such difficulties as these have caused investigators generally to avoid biometric approaches for assessing affective aspects of moral development (Worchel & Cooper, 1979, p. 69).

Conclusion Researchers and educators have been far less successful in assessing the affective correlates of moral thought and behavior than they have been in assessing the cognitive aspects. Some progress has been made with such approaches as picture-story tests, sentence-completion tests, stories (both finished and unfinished), and semantic-differential tests. However, because the type and strength of emotions are not readily transmitted in words, these techniques reveal far less about emotions than investigators would like to know. Biometric measures have not proven to be very useful in understanding moral development, partly because they may reflect the strength of emotion but cannot show the direction (positive or negative).

Assessing Moral Development

413

It seems a p p a r e n t that m u c h yet remains to be d o n e before the emotional factors related to moral d e v e l o p m e n t can be assessed as satisfactorily as the cognitive c o m p o n e n t s are now.

CHAPTER 6

THE DIRECTION AND STRENGTH OF M O R A L V A L U E S

The term moral values, as intended throughout this monograph, means judgments of good/ bad or p r o p e r / i m p r o p e r that are assigned to ideas or events in the realm of morality. Such judgments are not emotion-free, logical appraisals but, instead, are accompanied by such emotional states as distaste, pleasure, fear, pride, shame, and others. For present purposes, it is useful to consider two characteristics of moral v a l u e s - - their direction and their strength. Direction refers to whether a value is positive or negative. Acts judged as good or proper or promoral are in a positive direction. Those judged as bad or improper or immoral are in a negative direction. Acts or ideas judged to be neutral - - neither good nor bad, neither proper nor improper - - fail to qualify as matters of morality. They are amoral. Strength means the degree of goodness or badness that an idea or act is conceived to represent. In one person's opinion, an act of adultery may be a mortal sin, its strength extremely negative. In another's view the act may be a mere pecadillo, its strength only slightly negative. The purpose of Chapter 6 is to review methods used by researchers and educators to learn the value-direction and value-strength that people at different stages of development assign to various moral concepts and acts. As we consider ways of assessing values, we can profitably recall the central issue of Chapter 2 - - how best to define the domain of moral. Whereas all value judgments consist of drawing conclusions about the worth or goodness or propriety of something, all value judgments do not qualify as moral judgments. Moral values are only a subcategory of the general realm of values. Although not everyone defines the domain of moral in exactly the same way, probably no one would claim that the statement "It's really a badly shaped vase" is founded on moral values. Rather, such a judgment is founded on aesthetic criteria. Aesthetic values, when applied to a flower garden, a poem, or a dance performance are reflected in such phrases as "pleasing to the eye" or "nicely turned m e t a p h o r " or "delightfully innovative". A n o t h e r value subcategory is economic - - "Investing in electronics is better (financially more profitable) than investing in steel plants". There are also technical values, meaning judgments about how efficiently a machine operates or how well the divisions of a business organization are coordinated to accomplish a goal. Thus, an obvious 415

416

R. M U R R A Y

THOMAS

first step in analyzing techniques for assessing people's moral values consists of identifying the realm of moral as distinct from other value realms. For the purposes of the present chapter, I am considering moral values to be those values reflected in events or behavior encompassed by any of the typical definitions of moral identified in Chapter 2. It should be apparent that, by the very nature of this monograph, the assessment techniques in nearly all chapters necessarily illustrate ways to evaluate moral values. So what distinguishes the present chapter from the others is merely the focus of interest. Whereas Chapters 3 and 4 centered on ways to determine the kinds of concepts, principles, and causal relations people considered moral, the present chapter centers only on ways of determining the direction (positive or negative) and strength of such beliefs. It is also clear that Chapter 6 is closely linked to the methods of studying affect in Chapter 5, since cognitively positive values are typically accompanied by positive feelings and vice versa. Thus, the affect of Chapter 5 and values of Chapter 6 have been separated only for convenience of analysis, not because we assume that they are distinct entities in moral development. This chapter is organized around types of assessment techniques, including: (1) interviews; (2) values tests and inventories; (3) achievement tests; and (4) pictorial materials.

Interviews Because interview techniques have been reviewed in some detail in Chapter 3, it is not necessary to discuss them in detail here. It should suffice only to describe types of interview questions suitable for discovering the direction and strength of a person's values. As shown earlier, a typical moral-development interview consists of the researcher asking a respondent to express an opinion about a moral incident which the person has directly witnessed, has seen dramatized, has read, or has heard the interviewer narrate. In some cases the incident is a completed moral event. Hence, to learn the direction of a respondent's values, the interviewer asks, "Did Carl do the right thing? If so, why? If not, what should he have done and why'?". In other cases the incident is left incomplete, with the respondent asked to tell what the characters in the story should do and why. Whereas the above sorts of questions can reveal the direction of an invidual's values, they usually fail to reveal the strength. To estimate the strength of a value, the interviewer can pose a sequence of questions reflecting increasingly undesirable consequences that could result from the moral dilemma, with the respondent asked what should be done under the conditions reflected in each question of the series. This suggested pattern of questioning is founded on the assumption that: The strength of a value to which a person subscribes may be reflected in the degree to which the person will act on that value in the face of increasingly unfavorable consequenccs.

The following sequence of questions illustrates such an approach as used with an older adolescent. The illustrative dilemma concerns an extreme sort of value conflict - - the choice between someone else's life and one's own welfare. The interviewer not only asks each question in turn but, after the respondent offers an initial answer to a question, the interviewer asks " W h y ? " or "Why not?" in order to hear directly the value-rationale the respondent uses to support his or her answer. (1) Imagine that you have driven your mother to the store to buy groceries. As you park the car, a man in another car yells insults at you because you took the parking space

Assessing Moral Development

417

that he wanted for himself. Also imagine that under the front seat of your car there is a loaded pistol your father put there for protection. Would you now use the pistol to shoot the man who yelled at you? (2) Imagine next that the man not only yelled at you, but he also smashed his car into yours. Would you now use the pistol to shoot him? (3) Let's say the man not only smashed your car, but he jumped out carrying a long knife, opened the door of your car, grabbed your mother's arm, was about to stab her. Would you now shoot him? (4) Let's say that instead of threatening your mother, the man went after you with the knife. Would you shoot him? In using such an approach, we are assuming that the farther down the sequence of questions that the respondent fails to shoot the attacker, the stronger the respondent is adhering to the c o m m a n d m e n t "Thou shalt not kill". An obvious shortcoming of this technique is that it yield's only a person's speculation about hypothetical behavior, rather than revealing the true strength of a given value under real-life conditions. Thus, the results of the interview assessment are best interpreted as being no more than the individual's "publiclyexpressed value strength".

Values Tests and Inventories For more than a half century, psychologists have been devising personality paper-pencil tests and inventories intended to reveal people's personality traits, interests, and values. Scores of such instruments have been published, and an untold n u m b e r of others have been created for use in specific situations and never published commercially. The items on such tests usually require subjects to respond in one of two ways: (1) to indicate their approval or disapproval of a statement (the agree-disagree or true-false form); or (2) to show along a scale their degree of approval or disapproval of a statement (the scale form). The following examples illustrate how these sorts of items may appear on tests of moral values.

Type 1: Agree~Disagree Directions. At the left of each of the following statements are the letters A and D. If you agree with the statement, circle the letter A. If you disagree, circle the letter D.

A D (1) When facing a moral decision, it is best to stop and ask G o d what to do. A D (2) A child who steals m o n e y from his or her parents should be given understanding help and forgiveness. Type 2: Scale Directions. In the blank at the left of each statement, write the code

letter that best expresses your opinion about the statement. Code Meaning A = Very much agree

Agree somewhat

B

=

C D E

= D o n ' t know or not sure = Disagree somewhat = Very much disagree

418

R. M U R R A Y T H O M A S

__ _

_

(1) When you obey the law, you are always supporting true justice. (2) Official law is the enemy of freedom.

Although each of these types of items can serve for estimating both the direction and strength of a value commitment, the second can be expected to result in a finer determination of strength. One way to use the first type (agree/disagree) to reflect strength as well as direction is to include on the test two or more items that focus on the same value. For example, if we construct eight items that tap a value labeled strict obedience to the law, then the more items a respondent marks 'approve', the stronger that person's commitment to the value is considered to be, with the degree of strength perhaps reported as a percentage (the number of approved items divided by eight). Or, the individual items on the test can be awarded strengths by means of such scaling procedures as those introduced by Guttman (1944), Likert (1932), and Thurstone (1959; Thurstone & Chave, 1929). The second item type (scaled judgments) can be employed in a similar fashion, with each of the five levels of approval/disapproval assigned a numerical weight ranging from 1 (very much agree) to 5 (very much disagree). Again, a number of items are included that focus on the same value. When a respondent's test is scored, the weighted scores from each item are totaled and divided by the number of items. The resulting average is interpreted as representing the strength of the respondent's commitment to the moral value that has been sampled by the items. We next consider the availability of published tests for measuring the direction and strength of moral values. An inspection of the assessment instruments listed in the Buros lnstitute's periodic reviews of tests indicates that even though published assessment devices focusing on values are available, none of them is concerned exclusively or even primarily with moral matters (Buros, 1974, 1978; Mitchell, 1983). For example, the Stud)' of Values inventory, which has been in use for more than four decades, yields separate scores for six sorts values (theoretical, economic, aesthetic, social, political, religious), but only two of them (social and religious) appear to touch on moral values, and even then not very directly (Allport et al., 1951-196(t). As a further example, the LiJT, Adjustrnent Inventory for high-school students contains 14 categories of personal adjustment, only one (religion-moral-ethics) bearing on moral values. The other 13 focus on such diverse areas as study skills, use of leisure time, and art appreciation (Wrightstonc & Doll, 1951). In a similar way, such instruments as the Personal Values htventorv (Schlesser et at., 1941-1969) and Personali(v Rating Scale (Amatora, 1944-1962) include very few items that touch at all on moral matters. The Rokeach Value Survey (1967-1973) is composed of two lists, one list of 12 instrumental values and the other of 12 terminal values. Respondents are asked to rank the items in each list in order of preference. Instrumental values are such personal traits as clean, forgiving, responsible, and broadminded, whereas terminal values are such statements of life goals as salvation, equality, freedom, a comfortable life, a world at peace, wisdom, and national security. The order in which a respondent organizes the values is expected to reveal his or her relative regard for the 12 terms. However, since the terms are at a high level of abstr~lction and few of them bear even obliquely on typical notions of moral values, the Rokeach ins[rument would appear to be of no help for assessing the strength and direction of specific moral values a person holds. These characteristics of published values inventories suggest that such devices arc of little or no use for assessing people's moral values. Of slightly

Assessing Moral Development

419

greater use are attitude scales produced some decades past to indicate how highly a person regarded such sources of moral authority as the law and religion (Shaw & Wright, 1967, pp. 249-279, 329-357). However, these instruments also treat a very limited range of issues that typically fall within the realm of morality. Thus, I would conclude that published tests of values are of little aid for assessing the direction and strength of the kinds of moral values intended within the definitions of moral reviewed in Chapter 2. However, this does not mean that paper-pencil attitude tests are necessarily useless. On the contrary, I believe that such instruments can be valuable, so long as their contents accurately tap the domain of moral on which an investigator is focusing. In other words, rather than researchers and educators depending on published tests, they may more appropriately create their own. As an aid with this task, they can consult such guides to attitude-scale construction as the books by Henderson and coworkers (1978), O p p e n h e i m (1966), and Shaw and Wright (1967).

Academic Achievement Tests As mentioned in Chapter 3, academic achievement tests are tests commonly given in school following a segment of instruction to discover how well students have mastered the learning objectives. Such tests are often used to determine whether the learners have understood facts and concepts. But items can also be designed to evaluate the extent to which students recognize and accept the v a l u e s - - in terms of strength and d i r e c t i o n - - advocated in the curriculum. If test items are to measure value direction or strength, they cannot simply ask students to identify facts or definitions of concepts (tell what is) but must require students to differentiate between good and bad actions (tell what should be or what is right or is wrong). (Such was true of the items focusing on prescriptive causal relations in Chapter 3.) The distinction between value-free and value-laden items is illustrated in the following examples of five popular types of test questions. U n d e r each type, the item at the left is intended to be value-free and the one at the right value-laden.

True-False Items (la) A felony is a very serious crime, such as murder or robbery with a gun.

(lb) The search and seizure laws in our country are unfair.

Multiple-Choice Items (2a) What is a narcotic? __ any drug that makes you sleepy. a pill that makes you excited. __ a medicine that helps you see and hear better. _

_

(2b) People who use marijuana should: __ be free to smoke it without being bothered by the police. ~ h o u l d be warned not to use it. __ should be put in jail for a long time.

42(I

R. MURRAY THOMAS Fill-in or Completion Items

(3a) The nation that first used rocket buzz-bombs in World War II was

(3b) Which side carried out the greatest atrocities during the Vietnam war?

S h o r t - A n s w e r or Essay Items (4a) In the story about the immigrants, who finally gave them shelter and why?

(4b) In the story about the immigrants, which of the people they met did the right thing? Why do you think it was right?

Of the assessment methods reviewed in this monograph, achievement tests are by far the type most frequently used to assess the outcomes of moral-education programs aimed at value-indoctrination or value-clarification. The intention of value-indoctrination programs is to convince learners that a particular set of values is the proper or best one to adopt. In contrast, the three-fold intention of value-clarification programs is to: (1) acquaint learners with alternative systems of values; (2) clarify the advantages and disadvantages of each system; and (3) urge learners to make a free and reasoned choice of the system to which they will subscribe (Raths et al., 1966). The great majority of moral educational programs - - as directed by religious or cultural organizations or by such political bodies as national g o v e r n m e n t s - - aim at value-indoctrination rather than unbiased valueclarification. Consequently, it is common to find value-laden items included on the achievement tests used in such programs. An obvious shortcoming of value-related test items is that while they may reveal a student's understanding of the values advocated in the curriculum, they do not necessarily reveal the values to which that student actually subscribes in his own social-relations. Clearly, knowing the "right thing to do" is not the same as "doing the right thing". Thus, scores students earn on such achievement-test items can tell educators the extent to which learners are well informed about the values taught in the educational program but they cannot tell whether the learners will choose to, or even be able to, act on such information.

Pictorial Materials Both researchers and teachers have used pictorial materials for studying the direction and strength of people's values. One reason for using drawings, photographs, slides, or videotapes is to help ensure that subjects understand the moral issue being posed. This is most frequently the case with individuals whose receptive communication skills are limited, as is true with very young children, illiterate youths and adults, people who speak a foreign language, or ones who display hearing deficiences. For example, Kusche and Greenberg (1983) used the Rhine etal. (1967) series of 48 pictures of good and bad behavior to compare conceptions of good and of bad among 30 deaf children and 30 hearing children between ages 3 and 10. As each child looked at a drawing,

Assessing Moral Development

421

the incident depicted in the drawing was described briefly by the researcher, such as "This little boy is hugging his m o m m y " or "This little boy is tearing up his book". The description was given orally to the hearing children and in hand language to the deaf. Each picture was a simple line drawing on a 6-by-9-inch card, with the drawings organized into 12 sets, four pictures to a set. Each set contained three pictures intended by the researchers to represent neutral moral values and one picture intended to be either good or bad. As a child inspected the children in the four drawings, the tester said, "Show me the one who is doing something very good or very bad" (Rhine et al., 1967, p. 1037). The good pictures included drawings of a child doing such things as cleaning up her room, helping mother feed the baby, or going to the bathroom in the toilet and not in her pants. The bad drawings included a child kicking a dog, scribbling on the wall, or refusing to go to bed. Neutral drawings showed a child looking out the window, sitting in a chair, or eating a banana (Rhine et al., 1967, p. 1036). By selecting such incidents to compose their test, the authors seemed to assess children's knowledge of social conventions within a specific culture rather than what might be commonly regarded as universal moral standards. A n d the technique could measure the direction but not the strength of such values. Kusche and G r e e n b e r g (1983, p. 145), on the basis of deaf and hearing children's responses to the pictures, concluded that: • . . deaf children evidence a developmental delay in the understanding of both good and bad. By age 4, hearing children know bad quite well, but similar knowledge is not displayed by deaf children till age 6 . . . . Hearing children know good by age 6, but deaf children not until age 10.

Conclusion The focus of Chapter 6 has been on methods of determining the direction and strength of people's moral values. The concepts of direction and strength are useful in the analysis of moral dilemmas. That is, a moral dilemma can be pictured as a dialectical condition which involves a conflict between two or more values. The dilemma is resolved when the strength of one value (or combination) is sufficiently greater than the strength of another value (or combination) to permit the individual to make a decision about how to act. Thus, if we can assess with some degree of accuracy the direction and strength of an individual's values, we should be able to improve our predictions about how that individual will behave when confronted by moral dilemmas which involve such values.

CHAPTER 7

MENTAL PROCESSES AND STRATEGIES

The term mentalprocesses refers to the complex pattern of operations of the mind that are involved in thinking about moral matters, with particular attention given to those opera, tions leading to moral decisions and action. Whereas the information-processing model of moral-development pictured in Figure 1 in the first chapter is intended to depict the overall operation of the mind as it interacts with the environment, most of the processes described in the present chapter represent only segments of that overall operation. The term strategies, as used in these pages, refers to a subtype of mental process, a goal-directed mental function that the person can consciously recognize and report. In contrast to strategies are those operations conducted automatically and unconsciously, as illustrated by people's remembering, forgetting, or evaluating an event without being able to report how or why. Methods of assessing mental processes are reviewed in this chapter under three headings: (1) psychoanalytic inquiry; (2) experiments; and (3) critical-life-incident inquiry.

Psychoanalytic Inquiry In the psychoanalytic tradition as initiated by Sigmund Freud, human mind or personality is pictured as consisting of three main functionaries that interact to produce thought and overt behavior. The functionaries are the id, ego and superego. The id is conceived to be the source of human drives, seeking fulfillment of needs without regard for the reality of the outside world. The ego is conceived to be the arbiter between the social id and the outside world, with the ego's main task being that of devising ways of fulfilling, id demands in light of the restrictions and available opportunities provided by the 'real world'. The superego is the person's internalized task-master, representing the values the individual has accepted from the social environment, particularly from parents. Hence, the superego is conceived to be the container of one's conscience and ideals. Psychoanalysts have devised assessment techniques for revealing the processes of interaction among these three components of the mind and the outside world. Psychoanalytic theory also posits varied levels of consciousness or self-awareness. The id theoretically operates only at the unconscious level, so that the individual is unable to report either the contents of the id or how it performs. In contrast, the ego and superego are pictured as 423

424

R. MURRAY THOMAS

operating at both conscious and unconscious levels, so that a person is at least partially aware of the ego's and superego's conduct. It is clear, then, that the hypothetical anatomy of the mind proposed in Figure 1 of the opening chapter of this m o n o g r a p h differs in important ways from the psychoanalytic conception. Whereas both theories include varied levels of consciousness, they differ in their types of mental functionaries. Consequently, in our inspection of psychoanalytic assessment techniques, it is appropriate to alter the model in Figure 1 (Chapter 1) in favor of a Freudian model of the mind. A key assumption of Freudian psychology is that when a psychological conflict becomes too painful for a person to retain in the conscious mind, the conflict is repressed into the unconscious. However, such repression does not solve the problem, for the conflict continues to fester at the unconscious level, expressing itself only in such masked forms as feelings of anxiety, odd behavior, or a physical ailment. Psychological conflicts may be of several varieties, but often the conflict is between: (1) sexual drives arising from the id; and (2) the rules of proper conduct in the person's social environment, with the rules typically represented by parental standards of behavior. Such conflicts qualify as moral dilemmas under most definitions of the moral domain. The goal of psychoanalytic therapy is to uncover the unconscious conflicts and to describe how they originated in the client's developmental history. When the therapist brings the sources of the conflicts to the client's conscious attention so they can rationally be 'lived through', the conflicts are ostensibly dispelled, along with the disturbing symptoms they have caused. The function of assessment techniques in psychoanalytic theory is to reveal the contents and processes of the unconscious mind. Two of the most basic traditional methods for accomplishing this have been free association and dream interpretation. The term free association refers to a person's assuming a relaxed condition, such as lying on a couch or sitting in a comfortable chair, and describing aloud the stream of thoughts that drift through consciousness. The psychoanalyst sits nearby, listening to the narrative in an effort to identify themes and symbols that suggest the sorts of disturbing concerns - - of distorted concepts and values - - hidden within the client's unconscious. In psychoanalytic theory, dreams are considered to be symbolic representations of unconscious problems which press for solution but, because they are unduly painful in their bare state, are normally closed out of the person's awareness. But conflicts can escape detection and express themselves if disguised in symbols when the guardian of conscious thought (the ego functioning as censor) is less vigilant, as during sleep or mental relaxation. Thus, the psychoanalytic evaluator seeks to fathom the symbolic meanings underlying dreams the client relates. A further technique that Freud originally employed for plumbing the depths of the mind was hypnosis, but he later abandoned it in favor of free association. In recent times, some evaluators operating from a dynamic-psychology view point have continued to use hypnosis. Or they have adopted a drug-induced form of hypnosis referred to as narcosynthesis, which involves the administration of such substances as sodium pentathol or sodium amytol to blunt the mind's censoring function and thereby allow a client to reveal information that has hidden in the unconscious. Psychoanalytic assessment methods are not limited to free association and dream interpretation, but include appraisals of events in everyday life. For example, the phrase "Freudian slip of the tongue" that has entered popular parlance refers to the analysis of verbal mistakes or of words with double meanings as symbolic indicators of unconscious

Assessing Moral Development

425

motives and meanings. A psychologist administering a standard intelligence test intended to ask the client "Who wrote the Iliad?" but asked instead, "Who wrote the idiot?". From a psychoanalytic viewpoint, this slip could be interpreted as representing the psychologist's unconscious opinion of the client. Or a slip can be in written form, as when a university psychology student began a paper about therapists' roles with the phrase "The rapists' roles are o f t e n . . . " . However, psychoanalytic assessment of events in everyday life is not limited to interpreting slips of the tongue and the pen. The pattern of a client's social relations with different sorts of people is inspected by the evaluator to reveal unconscious meanings the patterns may reflect in the realm of moral reasoning and behavior. Projective tests, such as those decribed in Chapter 5, have also been used to investigate mental processes from the viewpoint of depth psychology. Tice (1980, p. 161) has summarized the psychoanalytic approach to such assessment by proposing that the results of one's social interactions from the time of infancy are carried in the mind where: They remain unconscious. Gaining greater insight into these unconscious processes requires extraordinarily disciplined attention. This is what psychoanalytically oriented therapy and clinically attuned observations of external behavior have been designed to achieve - - sharper, deeper ways of seeing. The psychoanalytic perspective is broadly interactionist, taking into account interpersonal, familial, and other sociocultural influences on moral development, but it concentrates, as other methods do not, on intrapsychic structures. Thus, one advantage of psychoanalytic assessment techniques is their recognition of unconscious processes, with the assumption that moral values have unconsciously developed as if by osmosis. On the presumption that "all is not what it seems to be" in moral matters, a person's actions and opinions are not accepted at face value but are interpreted as being frequently symbolic. But accompanying this advantage is the risk that the interpretation will be faulty. There is, at least up to the present time, no objective instrument for effecting symbol interpretation. The interpreter continues to be another human whose view of the world, we must assume, is filtered through his or her own lens of unconscious motives and conflicts. So how do we know that the interpretations the evaluator places on the words and acts of a client are not simply projections of the evaluator's own unconscious conflicts rather than accurate insights into the recesses of the client's mind? In effect, as yet the much debated issue of how to validate psychoanalytic interpretations has not been settled in a way that psychologists generally accept as satisfactory. With the foregoing sketch of a psychoanalytic perspective as a background, we turn now to the central concern of the present chapter - - the assessment of mental processes and strategies. In the Freudian model of personality, the functionaries most concerned with morality are the ego and superego. As mentioned above, the ego is the personality's problem-solver, responsible for recognizing the id's demands (human needs or drives) and, at the same time, for recognizing what opportunities there are in the outside world for fulfilling such demands with the least pain to the individual. It is then the ego's task to devise strategies for simultaneously satisfying the requirements of the id and those of the outside world. The resulting operations and strategies are referred to as ego-defense or ego-adjustment mechanisms. Psychoanalytic assessment techniques are used for analyzing a variety of processes related to the ego's strategies (Freud, 1938/1973). One ego process of particular importance for moral development is that of identification, which Siegel (1982, pp. 20-21) defines as "a motivation to want to be like others". Accord-

426

R. MURRAY THOMAS

ing to this conception, a child sees his parents as powerful people on whom the child depends for satisfying both emotional and worldly needs. Hence, the child is motivated to imitate the parent, with the dual goal of strengthenig the bond with the parent on w h o m the child depends and of acquiring the parent's admired traits. Identification, then, is believed to be a crucial process in the child's adopting the moral concepts, principles, and values of the parent. But it is not only identification with the parent that contributes to moral development. Identification with peers, movie heroes and heroines, teachers, and others affects development as well. Thus, the analysis of how the identification process works has been a key assessment task pursued by psychoanalytic investigators. However, the task has been rendered particularly difficult by the recognition that identification apparently is not simply a conscious, rational operation. Instead, it appears to function largely, or perhaps entirely (at least for younger children), at an unconscious level and to bear a heavy load of positive affect. These qualities of being unconscious and emotion-laden have made identification a p o o r candidate for assessment by means of such evaluation techniques as those depicted in Chapter 3 which depend on people's logical analyses of moral dilemmas. Consequently, psychoanalytic theory and assessment techniques have not been accorded much attention in recent times in the literature on assessing moral development. Siegel (1982) has speculated that "Freud's identification [might] have to undergo changes (i.e., become more verbal, conscious, and hence more testable) before a pale form of psychoanalytic theory can come in from languishing alongside the periphery of experimental psychology and enter its mainstream". Typically, psychoanalytic forms of assessment are intended to reveal two sorts of psychological processes. First is the process by which a client initially adopted such an egodefense mechanism as a psychosomatic ailment, compulsive lying or stealing, escaping from difficult moral decisions, phobias, or the like. To discover the nature of the initiating process, the evaluator guides the client to recall traumatic incidents in childhood or particular child-rearing practices that initiated the adoption of the ego-defense habit that is in question. Second is the process by which the client currently maintains or rationalizes the continuing use of these ego-adjustment patterns. In sum, the analyst seeks to answer the questions "Through what process did the defense technique first develop, and by what process is it now sustained?". From a psychoanalytic perspective, answering these questions equips the therapist to guide the client in understanding the sources of her psychic troubles and in devising ego strategies that more satisfactorily fulfill both the id's demands and the requirements of the social environment. Psychoanalytic theory assumes that people are not born with moral values, that is, with knowing right from wrong. Instead, they are born with a capacity to gradually adopt moral values from their environment, incorporating notions of right and wrong into their own personalities through the process of identification. According to Freudian theory, the most prominent period for the initial incorporation of values occurs around ages three to six or seven. Whereas prior to this time, directives about what is proper behavior have all come from other people, once the superego has developed, moral directives, prohibitions, and criticisms come also from inside the personality. The way children have been raised up to this point is assumed to determine the contents of their consciences and ideals. And because not all parents teach the same sorts of moral standards to their children, the contents of one child's superego differ from the contents of another's. As a consequence, the kinds of thoughts and acts that cause one child guilt, fear, or shame may not be the ones that cause such feelings in another child. Thus, one goal of psychoanalytic assessment is to dis-

Assessing Moral Development

427

cover the process by which the particular individual's superego developed. Through learning what experiences during childhood were apparently responsible for the present nature of the client's superego, the therapist tries to guide the client in reconstructing the contents of the superego so the person is a responsible member of society yet is not unduly self-punitive. Not only do psychoanalytic-inquiry techniques serve for revealing the processes by which one's ego and superego have developed, but the techniques serve as well for estimating the processes of interaction among id, superego, ego, and the outside world in order to account for the way an individual arrives at moral decisions. Although throughout this discussion we have centered attention on the use of psychoanalytic assessment for processes and strategies, it should be apparent that such assessment reveals as well the concepts, causal relations, emotions, and values that are components of the moral aspects of personality. As a result, this discussion of psychoanalytic evaluation methods is also relevant to the concerns of Chapters 3 through 6.

Experimental Studies In constrast to psychoanalytic methods, mental processes have also been studied in experimental settings from other perspectives, including those of: (1) general social-learning and information-processing theories; and (2) particular theories of moral judgment and action, such as that of Kohlberg. The following examples focus first on experiments deriving from general theories, and second on ones deriving from specifically moral theories.

Social-Learning and Information-Processing Studies An example of a social-learning-theory approach is the work of Bandura on imitation or modeling. Children were shown a film that pictured a model displaying four different types of aggression and accompanying each type with distinctive verbalizations. In one condition of the experiment, the model was severely punished; in a second, the model was generously rewarded with approval and food reinforcers; while the third condition presented no response-consequences to the model. During the acquisition period the children neither performed any overt responses nor received any direct reinforcement and, therefore, any learning that occurred was purely on an observational or vicarious basis (Bandura & Walters, 1963, p. 57). Later the experimenters tested the three groups of children to discover whether the different amounts of observed reinforcement produced different levels of imitative behavior in the children. The results showed that children who had observed the punished-model film performed significantly fewer imitative acts than children in either the rewardedmodel group or the no-consequences group. Likewise, boys displayed more imitative behavior than girls, particularly in the punished-model group. Subsequently the investigators offered children in all three groups attractive rewards if they would reproduce the model's actions. Motivated by such incentives, children in all three groups reproduced more of the model's behavior than they had earlier. Furthermore, the amount of imitative behavior was now nearly the same across the three conditions - - reward, punishment, and no consequences. And the earlier differences between girls and boys was greatly reduced. These

428

R. MURRAY THOMAS

results were interpreted as supporting a proposal that the influence of models on children's moral behavior involves a process of children's: (1) observing a model's acts; and (2) learning these acts (storing the acts as memories); but (3) exhibiting the acts in behavior only under certain circumstances, such as when the behavior promises to yield desired consequences. From these results, we could infer that the likelihood that children in moral-decision situations will display behavior they observed earlier is influenced not so much by the kinds of consequences (reward, punishment, non-reward) experienced by people in the earlier observed incidents but, rather, is influenced by the consequences they themselves expect to experience in the present moral-decision situation. Let us next consider experiments founded on information-processing theories. The interest of information-processing researchers focuses mainly on the way a person's trafficking with the environment influences: (1) what the sense organs (eyes, ears, taste buds, skin receptors, and others) take in; (2) how these sensations are interpreted in the central nervous system; (3) how the interpretations are stored in m e m o r y ; and (4) how memories are later retrieved for interpreting new sensations or for taking action in the world. The rapidly growing body of research bearing on these processes does not fall strictly within the realm of morality. Instead, it ranges across various realms of human endeavor. But, because the processes that the research treats are all intimately involved in moral development as well as in other domains, the methods of assessing the processes warrant attention here. The following two examples illustrate sorts of mental processes which are not limited to moral situations but which, nevertheless, are important in moral development.

Example 1: Perceptual Attention It is commonly assumed that what people learn and are thus able to recall and use in decision-making is dependent on what they have paid attention to in the learning situation. This is true both in unguided learning that occurs during the routine of daily living and in the guided instruction offered by parents and teachers. Experiments conducted with children at different age levels have suggested that the process of intentionally focusing attention operates differently for younger than for older children. To illustrate, Hagen and Hale (1973) presented children (ages 5 through 15) a series of pictures that contained two items, with children told which of the items (central item) was to be r e m e m b e r e d . At the end of the presentation, the child is asked to recall both the central and peripheral items. While both the younger and older children recall substantial amounts of both types of information, the younger ones seem to ignore the instructions to focus on only central objects~ so that they attend to both central and peripheral ones. Recall of central items improves markedly with increasing age, while recall for peripheral ones eventually decreases. Collins and Hagen (1979, p. 90) interpret these findings to mean that: younger children, due to their dispersed mode of perceptual processing, distribute attention between the two sets of pictures, thereby enhancing m e m o r y for both. Older children, on the other hand, may be better at directing attention because of the degree to which much of their perceptual processing has become automatic, thus requiring less focal effort. It may be expected that the more processing is automized and performed without focal involvement, the less it will be accessible for efforfful recall. Such studies of the attention process appear to yield implications for the assessment of moral development. In the use of such assessment devices as stories and pictures or films, younger children's attention may be so dispersed among peripheral elements of the

Assessing Moral Development

429

stimulus material that the experimenter fails to derive a clear understanding of the children's understanding of the central elements of the depicted moral incident. Thus, for certain types of studies, the stimulus material used with younger subjects may need to be cleaned of distracting peripheral elements.

Example 2: Mnemonic Strategies Earlier we defined strategies as goal-directed mental operations that the person can consciously recognize and report. In moral-education programs, it is important not only that the pupils understand the facts and principles being taught but also that they be able to recall such knowledge when it is needed to arrive at moral decisions. In recent years a rapidly increasing number of studies have been conducted to determine how and when people use consciously applied processes for remembering what they have been taught. For example, Kreutzer and coworkers (1975, p. 29) asked children, ranging from kindergarten through grade 5, the following question to learn how they would seek to remember something they should do in the future: What if you were invited to a birthday party for a friend. How could you make sure you remembered the party? Can you think of anything else you could do? How may different ways can you think of? Significant developmental trends were displayed in the children's responses, with the older children showing a considerably wider range of resources than the younger ones. Perhaps the most straightforward way to remember the party would be to use a note - a strategy suggested by only half of the kindergarteners, but by nearly all of the older children . . . . A majority of the older children remarked that the note should be put in some conspicuous location so that it would be seen often before the day of the party; none of the kindergarteners and a distinct minority of the first-graders mentioned this (Kail, 1979, p. 26). Because the investigators were seeking information about children's conscious strategies rather than subconscious mental processes, the interview technique seemed to be a suitable assessment method, so long as the children's expressive language skills were sufficient for them to cast into words the mnemonic they employed. Implications of such research for moral development are perhaps quite obvious. In moral education programs which seek to teach facts, concepts, and such traits as responsibility, people's mnemonic strategies enable them to retrieve pertinent material from memory when it is needed for acting in moral situations. In conclusion, hundreds of studies have been conducted in recent years to identify how human information-processing develops. And because people's general information-processing functions also apply in the realm of moral development, the methods of assessment used for understanding general functions are useful as well for understanding processes involved in the moral domain.

Studies Deriving from Moral-Development Theories The following paragraphs offer two examples of approaches used for assessing mental processes theorists have proposed specifically in the realm of moral development.

430

R. MURRAY THOMAS

Example 1: Determinants o f Principled Reasoning We may recall from the description in Chapter 3 of Kohlberg's six-stage hierarchy of moral reasoning that at the highest level of reasoning an individual bases moral decisions on universal principles of equal-handed justice rather than on simply a desire to obey the law or to avoid unpleasant consequences. A question can then be asked about what cognitive or personality components are involved in the process of a person's achieving this level of principled reasoning. Kohlberg (1971) proposed that two of the important determinants were the person's: (1) intellectual maturity in terms of Piaget's stages of cognitive development; and (2) ability to adopt roles, that is, to see life from the viewpoints of other people. Both the intellectual maturity and role-taking ability were assumed to improve with advancing age and experience. If we accept this proposal, then what other personality components may also participate in producing principled reasoning? To study this question, Lapsley et al. (1984) studied the moral reasoning of 96 American youths (ages 14 to the early 20s) to learn whether such personality characteristics as conservative/liberal political attitudes and attitudes toward authorities influenced moral reasoning. To measure the youths' Kohlbergian stages of reasoning, the researchers used the Defining Issues Test (Rest, 1979). The political conservative/liberal dimension was assessed by means of the Wilson and Patterson (1968) Conservatism Scale, a paper-pencil instrument comprised of brief labels or catch phrases representing familiar social issues. Two additional paper-pencil devices were administered to assess attitudes toward authority - - a semantic-differential test and Ray's (1971) Attitude to Authority Scale. The semantic-differential test offered four concepts for the youths to evaluate - - p o l i c e and government as public, impersonal authorities and mother and father as personal authorities. Each concept was judged on 10 bipolar five-point adjective scales (good-bad, optimistic-pessimistic, hostile-friendly, altruistic-egoistic, honest-dishonest, kind-cruel, unfair-fair, importantunimportant, worthless-valuable, successful-unsuccessful). Correlations were then computed between the youths' moral-reasoning levels from the Defining Issues Test and their scores on the Conservatism Scale, semantic-differential test, and Attitude to Authority Scale. The results suggested that while an adolescent's conservative/liberal political attitudes do indeed contribute to the process of moral reasoning, attitudes toward authority do not. That is, youths who displayed higher-level principled reasoning were more 'liberal' than ones whose reasoning placed them at lower levels of the Kohlberg hierarchy. Principled reasoners did not favor "religious dogmatism, right-wing politics, intolerance, ethnocentrism, conformity, punitiveness, and authoritarianism" (Lapsley et al., 1984, p. 538). The investigators concluded that in addition to Piagetian cognitive maturity and roletaking ability, attitudes along the conservative/liberal political dimension contributed to the process of moral reasoning. They also concluded that "personality dimensions other than conservatism may be more substantially related to moral thought, but such relationships have yet to be demonstrated using Kohlbergian assessments" (Lapsley et al., 1984, p. 438).

Example 2: From Thought to Action The second example of an assessment of hypothesized moral-development processes extends past the act of moral reasoning. A continuing source of debate in the field of moral

Assessing Moral Development

431

development has been the matter of inconsistencies between people's moral beliefs (moral knowledge and moral reasoning) and their moral behavior. Kohlberg and Candee (in Kohlberg, 1984, pp. 499-581) have proposed a cognitive/affective process which they believe accounts for apparent inconsistencies between people's moral principles and their actions in moral-decision situations. The Kohlberg and Candee model of the relationship between moral judgment and moral action involves a sequence of four steps of functions that lead to behavior. First is the person's interpretation of the moral situation in terms of Kohlberg's stages of moral reasoning. Second, the person makes a choice of what action is right or proper in this situation, as determined by her stage of moral reasoning. Third, she judges the extent to which she is responsible for acting in keeping with the moral reasoning. Fourth, whether the judgment of responsibility or obligation is carried into action will finally depend on: the non-moral skills needed for follow-through. These include such cognitive skills as intelligence (i.e. figuring out a plan to achieve the moral result), attention (i.e. persevering in one's chosen plan). (Kohlberg & Candee, in Kohlberg, 1984, p. 536.) To determine the validity of this hypothesized four-step process, and to gather information about how different individuals might carry out the steps, we need methods of assessing: (1) the components of each step; and (2) the way each step in the process connects with each subsequent step. When Kohlberg and Candee argued in support of their theory, they drew on assessment evidence from a variety of empirical studies, numbers of which had been conducted by other people for somewhat different purposes. Although in no single study were data collected to reflect all of the factors in the authors' model of thoughtto-action, Kohlberg and Candee did gather assessments of enough segments of the model from different studies so that the accumulation of empirical evidence could be interpreted as support for the process they postulated. For instance, they compared students' levels of moral reasoning (based on the standard Kohlberg dilemmas in Chapter 3) with students' cheating on tests (of the Hartshorne-and-May and the Grinder varieties in Chapter 9) to establish that there was a tendency for people at higher moral-reasoning levels to cheat less. But the relationship was far from perfect, a result in keeping with the authors' belief that variables other than one's moral-reasoning stage also effect behavior in moral situations. As one way to learn whether the other steps in their model might represent the remaining variables, the authors analyzed transcripts from interrogations of participants from the Mai Lai massacre of civilians by soldiers in the Vietnam War as well as transcripts from the Watergate incident that precipitated the resignation of United States President Richard Nixon. Analysis of the transcripts enabled the researchers to: (1) estimate participants' levels of moral reasoning; (2) estimate the degree to which the participants considered it their responsibility to do what they saw as the "right thing"; and (3) compare these estimates with the participants' ultimate behavior. This evidence, combined with results of additional studies, was interpreted as verifying the first-through-third steps of the Kohlberg and Candee four-step model, but it did not furnish information about such hypothesized ego controls as intelligence and ability to focus attention on a task. To investigate the ego-control aspect, the authors turned again to results of experiments in which students could cheat to obtain higher scores on tests or in games. Factor analyses of the results suggested that intelligence and focusing-attention were indeed components that apparently helped determine whether or not a person cheated. The evidence also suggested that the interaction among the several components of the model was complex,

432

R. MURRAY THOMAS

so that such a variable as high intelligence would not by itself help much in predicting a person's tendency to cheat. Rather, intelligence had to be combined with the other factors in the model to produce a pattern or profile for the student, and this pattern would serve as a better predictor of cheating or non-cheating than any c o m p o n e n t invidividually. In summary, then, Kohlberg and Candee relied on a variety of assessment techniques from various individual studies of moral behavior in order to support their conception of the process that links moral reasoning with moral behavior.

Critical-Life-Incident Inquiry As noted in Chapter 3, one danger of having people judge moral incidents in story form is that such hypothetical situations may fail to reveal how people think when facing real-life moral decisions. In an effort to avoid this danger, some investigators have studied the patterns of thought exhibited by people currently experiencing a moral dilemma in their own lives. A principal mode of this approach to assessing mental processes entails interviewing people who are facing a critical moral incident in their lives, with critical meaning that the individuals" resolution of the dilemma will likely produce serious consequences for their own or others' lives. Critical incidents typically include such events as serious illness or accident to the individual or to others, the death of a loved one, loss of employment, the fracturing of a deep emotional relationship (divorce, separation), moving away from familiar living quarters, and the like. The use of critical incidents is illustrated in Gilligan's (1982, pp. 71-105) study of 29 pregnant women who were considering abortion. She first interviewed them early in their pregnancies, then talked with 21 of them again at the end of the year following their decision. Gilligan used her interviews as a device for exposing thc thought processes w o m e n employed to resolve the question of whether or not to undergo abortion. In interviewing each woman on two occasions, Gilligan was seeking evidence in support of her proposal: (1) that women define morality in terms personal caring and responsibility while men define it in terms of justice and fair play; and (2) that the different ways women may reason about morality form a developmental sequence, with progress through this sequence fostered by such critical life incidents as the prospect of abortion. Thus, by looking directly at women's lives over time. it becomes possible to test, in a preliminary way, whether the changes predicted by theory fit the reality of what in fact takes place. In comparing interviews conducted at the time of the abortion decision with those that occurred at the end of the following year, I use the magnification of crisis to reveal the process of developmental transition and to delineate the pattern of change. (Gilligan, 1982, pp. 107-108).

Conclusion The term processes has been used throughout this chapter in reference to the patterns of interaction among components of the mind that account for moral thought and action, l have assumed that such processes operate at both the unconscious and conscious levels of awareness, with the conscious or intentional ones labeled strategies. The samples of processes reviewed in the chapter have been those composing a psychoanalytic conception of moral development, experimental studies for social-learning and information-processing

Assessing Moral Development

433

perspectives, and critical-life-incident inquiry. Efforts towards solving the problems of assessing processes are still in their infancy. However, researchers' high level of interest in analyzing psychological processes in recent years suggests that more rapid advances in this field can be expected in the near future.

CHAPTER 8

REAL-LIFE AND SIMULATED ENVIRONMENTS: AS INFLUENCES ON DEVELOPMENT AND AS A S S E S S M E N T STIMULI

In the model of moral development introduced in Chapter 1, the term environment referred to a set of conditions that exist outside the person, conditions with which the person interacts. Underlying the model is the assumption that different environments influence people's development in different ways. For example, one family's style of child-rearing can produce a different pattern of moral development in a child than does another family's. And one group of peers may influence an adolescent's moral development differently than does another group. In effect, an environment is labeled moral if it can affect development according to whatever definition of moral that one chooses to adopt (Chapter 2). A great quantity of both speculation and empirical research has been directed toward discovering how various types of environments influence development. Typical questions investigators have sought to answer include: H o w do parents' methods of reasoning with children about morality affect a child's development? What influence do different kinds of peer models have on moral development during childhood, adolescence, and the early adult years? Which sorts of educational experiences are most effective for promoting desirable moral development? H o w do people's internal maturation and their experiential history combine to affect their interpretation of various kinds of moral environments at successive stages of growth? Thus, one objective in assessing the relationship of environments to moral development is to answer such questions as these, as illustrated briefly in Chapter 4. As an expansion of the discussion in Chapter 4 of the 'true' causes of development, the first part of the present chapter considers assessment problems related to this objective. But the term environments also has a further specialized meaning in relation to assessment. Environments serve as the stimuli for eliciting people's responses to moral situations. In regard to this second meaning, we can recognize that the great bulk of nearly every chapter of this monograph is about environments, with environments in such instances referring to the stories, interviews, questionnaires, tests, and simulated life situations used by investigators to learn about individuals' moral development. But there is a particular dimension of these assessment environments" which receives no direct attention in other chapters - - the dimension of realism. The second section of the present chapter focuses on this dimension by treating the questions: H o w realistic (meaning 'how true to life') are the assessment techniques that are used for eliciting information about people's 435

436

R. MURRAY THOMAS

moral development? A n d how may the degree of realism of an assessment environment influence conclusions one draws about an individual's moral development? We now turn to the first of the chapter's t o p i c s - - ways of evaluating the influence of different environments on moral development.

Assessing the Effects of Different Environments As noted in Chapter 4, all m o d e r n - d a y theories of human development are interactionist models, with all theorists recognizing at least two interacting sources of influence on development: (1) heredity, in the sense of genetic endowment; and (2) environment, in the sense of the physical and social settings in which a person grows up. The sorts of environments that typically have been accorded the greatest attention are those of the home and family, the school, the individual's circle of companions, the church, and such mass-communication media as motion pictures and magazines. However, it is not these agencies per se that exert the influence on moral thought and behavior. Rather, as Michel (1985) has pointed out, it is the nature of the growing child's interaction with such environments that affects moral attitudes and acts. The phrase nature of interaction in this case refers to such factors as: (1) the frequency of the child's experiencing a given environment; (2) at what point in the child's life the environment was experienced; (3) the types of human models inhabiting the environment; and (4) the types of consequences resulting from the moral acts that took place in that setting. Although theorists usually agree that both genetic and environmental forces effect moral development, they disagree markedly about which aspects of the environment are the most powerful influences on development and about the m a n n e r in which such environmental factors combine with a person's genetic characteristics to produce the observed development. For example, in one of his early proposals about causes underlying moral development, Kohlberg (1971) proposed that four principal factors interact to determine how high in his six-stage moral-reasoning hierarchy an individual will progress. The first factor, and the one with the greatest genetic component, is the individual's level of logical reasoning as identified in Piaget's cognitive-growth system. The second, a personal factor that probably has both genetic and environmental elements, is the person's desire or motivation, sometimes referred to as the person's needs. The remaining two factors are entirely environmental: (1) the individual's opportunities to learn social roles; and (2) the form of justice in the social institutions with which the child is familiar. Thus, to assess the apparent influence of these two environmental factors, a researcher could first be expected to: (1) identify the kinds of social roles a given individual observes and tries out; (2) the frequency with which the person sees or attempts these roles; and (3) the results the individual cxpcriences as a consequence of assuming the roles. Next, the investigator would need a scheme for categorizing forms of justice in societies and then would require a method of determining which form - - or combination or forms - - was dominant in the society within which an individual grows up. Somewhat at deviance with Kohlberg's proposal is Siegal's (1982, p. 5) notion of four chief causes underlying the development of a sense of fairness in children. First is the child's level of cognitive growth, as illustrated by Piaget's cognitive-development stages, a level depending heavily on internal-maturation factors. But the other three factors are all

Assessing Moral Development

437

environmental: (2) the structure of the problems of fairness which the child encounters; (3) adult instruction, including the child's identifying herself with selected adults and imitating them; and (4) peer-group influence, particularly as the child grows older. Siegel's basic guide to how he might assess the influence of environment on moral development is thus provided by his three-factor conception of environmental effects. In other words, he first requires a system for analyzing "problems of fairness" so he can determine the particular structure or pattern of such problems an individual child encounters. Second, he needs a system for categorizing adult instruction as well as a way to determine the manner and degree to which the child identifies with various sorts of adults. Third, he needs a system for categorizing and measuring peer influence. In summary, an initial notion or hypothesis about what aspects of the environment are likely most powerful in effecting moral development serves as the foundational guide to which variables outside the person are to be evaluated. The theorist's next problem, then, becomes one of devising assessment methods that will accurately evaluate each of the defined aspects. A great quantity of aspects and ways of assessing them have appeared in both armchair speculation and empirical studies over the past decades. It is far beyond the capacity of this monograph to review even a major portion of them. The best we can hope to accomplish in the limited space of this chapter is merely to illustrate with four cases something of the diversity of ways researchers have performed the two steps of: (l) positing kinds of variables in the environment that significantly affect moral development; and (2) developing approaches which furnish accurate assessments of these variables. Case 1: Selected Societal and Personal Variables

-

Over the decades, the environmental variables most frequently studied as likely influences on moral development have been such factors as a person's family condition, socioeconomic status, ethnic background, religion, peers, and access to mass-communication media (television, motion pictures, reading matter). The typical mode of inquiry into these matters has consisted of investigators: (1) assessing people's moral-development status (using such techniques as those described in Chapters 3, 5, 6, and 7); and (2) determining the individuals' status on the selected environmental factor, such as family condition or religion. Then the degree of correlation between moral-development status and the selected environmental factor is computed. However, the exercise cannot stop at this point, since it is possible that an obtained correlation may be casual - - that is, by chance rather than causal. Thus, the final step consists of adducing a convincing line of logic intended to demonstrate that the environmental variable has indeed contributed to the person's moral-development status. This approach is illustrated in the studies of deceit conducted 60 years ago by Hartshorne May (1928), who measured the extent of cheating on a variety of tests and in games among 11,000 American school children and adolescents. In an effort to estimate the probable influence of selected environmental conditions on deceitful behavior, the researchers computed correlations between: (1) the measures of cheating; and (2) children's family condition, cultural background, ethnic or national background, religious affiliation, peer associations, attendance at motion pictures, school setting (teaching methods, school morale, grade in school), and others. Information about pupils' status on these variables was collected through questionnaires filled out by pupils and teachers, sentence-completion and multiple-choice tests completed by pupils, ratings of home and family conditions founded -

438

R. MURRAY THOMAS

on visits to a sample of 150 homes, observations of classroom teaching, and case histories. In addressing their task of seeking to identify which environmental factors influence deception behavior, the authors noted that "We shall do what we can to pick out the invariable and necessary antecedents of the tendency to deceive, but these are so intermingled in a matrix of irrelevant data that they cannot for the most part be completely isolated from one another . . . . [The apparent causes] are in most cases a mixture of social and biological ingredients which we m a k e only a slight attempt to disentangle" (Hartshorne & May, 1928, p. 167). Their attempt was most diligent, as illustrated by the 250 pages they dedicated to explaining their analyses which resulted in their concluding that: one form of deceit or another is definitely associated with such facts as . . . socioeconomic handicaps, cultural limitations, certain national, racial, and religious groupings [ a n d ] . . . frequency of attendance at motion pictures . . . . Where relations between teachers and pupils are characterized by an atmosphere of cooperation and good will, there is less deception, and to this effect the general morale of the school and classroom also contributes. On the other hand, attendance at Sunday school or m e m b e r s h i p in at least two organizations which aim to teach honesty does not seem to change behavior in this regard, and in some instances there is evidence that it makes children less rather than more honest (Hartshorne & May, 1928, pp. 14-15).

Case 2: Likelihood of Detection Environmental influences can be divided into two kinds. One kind builds into an individual's personality (or mental map) a rather lasting characteristic that influences the person's perception of all future moral incidents. Examples of this type would be parents' child-rearing habits. The other kind is entirely situational. The nature of the environment during a specific moral encounter - - such as, a teacher is close by while a pupil is taking a test - - influences the person's action on this particular occasion. That is, the nearness of the teacher influences the pupil's estimate of how likely dishonestly will be detected. The method commonly used for assessing this second of these factors has been that of placing subjects in several situations which can allow cheating, as on tests in school or in games, with the apparent probability of being caught cheating varying from one situation to another. For example, Krebs and Kohlberg (Kohlberg, 1984, pp. 549-550) tested 123 sixth-grade pupils on four activities - - writing numbers inside small circles with their eyes closed, manipulating blocks, a model house game, and hitting targets with a ray gun. Whereas numbers of subjects cheated on the circles and blocks activities in which they felt their cheating could not be noticed, few cheated on the other two activities "because of the elaborate gadgetry of the ray gun and house games" which the subjects suspected were electronically rigged to expose anyone who cheated. The authors proposed that this likelihood-of-detection variable probably contributed to the low correlations Hartshorne and May found in children's cheating behavior between one variety of test or game and another variety.

Case 3: Authoritarian Versus Liberal Child-Rearing Sometimes a researcher begins by adopting a conception of environmental influences that has been studied in some detail by others but has yielded inconsistent empirical results in reports from the professional literature. The investigator's next step becomes one of re-

Assessing Moral Development

439

fining both the definitions and the assessment techniques related to these conceptions. Only then is the researcher prepared to compute correlations between the environmental variables and people's morality in order to estimate the extent to which the variables affect morality. An example of this process of refining the definition of environmental variables is found in the work of an Australian psychologist, Rachel Henry (1983). A considerable body of research literature concerning the effects of authoritarian and liberal child-rearing practices on moral development has evolved in recent decades (Hoffman, 1970). One problem encountered in summarizing the results of such studies is that all investigators have not used the same terms to differentiate their contrasting approaches to child raising. (Words other than authoritarian versus liberal that have been used include conventional versus humanistic and authoritarian versus authoritative.) Moreover, the operational meaning of a particular term has varied from one study to another. However, there are still some general attitudes that theorists have proposed as characteristics distinguishing between these contrasting forms of child treatment. It has been claimed that authoritarian or conventional parents expect children to obey absolute standards without question, display a 'cold' (love withdrawal) attitude in their interaction with children, and are prone to use harsh physical or psychological punishment. Liberal or humanistic parents have been said to explain to children the reasons behind the parents' actions, display 'warmth' in parent-child interaction, encourage (or at least permit) children's "verbal give and take", and recognize "the child's individual rights and special ways" (Henry, 1983, pp. 31-32). To investigate the validity of such an authoritarian/liberal dichotomy, Henry (1983, pp. 45-47) interviewed 70 mothers of five-year-old boys and, on the basis of the interviews, assessed the mothers on 34 child-rearing variables on seven-point scales designed to measure the level of demands made of the child in 14 areas of behavior (such as toilet training, modesty, sex play, watching TV), various modes of discipline, warmth, aspects of reciprocity (such as obedience enforcement, intolerance of questioning authority), ritualism of punishment, and following through with punishment. Interrater reliabilities ranged from r = 0.58 to r = 0.89. The ratings of the child-rearing variables were factor analyzed followed by cluster analysis. The analysis revealed seven separate profiles of mothers' child-rearing practices. For example, the analysis distinguished two types of liberal mothers: (1) "genuine" liberals, because they "combined their concern for justice and capacity for relating unconditionally towards their children as individuals with a tolerance for the children's impulse expression"; and (2) doctrinaire liberals, because their ostensible humanistic sentiment "is belied by the conditional nature of the affection accompanying it and their active repression of aggression in their children" (Henry, 1983, p. 55). In like manner, different subtypes of authoritarian mothers were also identified. With these seven child-rearing profiles in hand, Henry was then prepared to compare the types of mothers with the morality of children raised by such women (a study currently underway) in order to determine the extent to which each style of child-treatment influenced children's moral beliefs and actions. Case 4: Moral Atmosphere

Higgins et al. (1984, p. 75) hypothesized that "moral atmosphere" significantly influences people's judgments of moral responsibility. As a test of their hypothesis, they conducted a study of students in two types of high schools - - traditionally governed schools as

440

R. MURRAY THOMAS

contrasted with alternative high schools which featured "participatory democracy" management systems that involved students in administrative decision-making. The environmental variable of "moral atmosphere" was conceived to be: "a collectively shared definition of the situation" in which the issue of moral responsibility was faced and the resulting decision regarding "what should be done about it". Moral action usually takes place in a social or group context, and that context usually has a profound influence on the moral decision making of individuals. Individual moral decisions in real life are almost always made in the context of group norms or group decision-making processes. Moreover, individual moral action is a function of these norms or processes . . . . First, participatory democracy puts sociomoral decisions in the hands of the students, giving them a greater sense of responsibility for school-related actions. Second, participatory democracy, we believed, helps create a sense of the school as a caring community . . . . Students' practical judgments of responsibility, we believed, would derive from their perception of the moral atmosphere of the school, that is, from their perception of the school's norms and the school's sense of itself as a community (Higgins et al., 1984, p. 75). In summary, investigators searching for the types of environmental conditions that are the most potent influences on moral development employ a process of: (1) using their own observations or prior studies to hypothesize which environmental variables seem to affect moral development and how these variables operate; and (2) devising assessment techniques to test the validity of their hypotheses. This use of assessment techniques to discover the 'real' environmental causes of development will obviously continue to be one of the most demanding and important of the functions of evaluation methods in the realm of moral development.

Assessment Techniques as Environments: Degrees of Realism and Levels of Participation As mentioned above, since methods used for assessing moral development represent stimuli outside of the individual person (stories, questions, tests, and the like), then assessment methods are types of environments, and the characteristics of such environments can be expected to influence the conclusions one draws about development as the result of using such methods. For this reason, it is useful to inspect the characteristics of different assessment environments in order to estimate what effect various characteristics may exert on the results they yield. Such is the concern of the remainder of this chapter. To begin, we can recognize that moral environments can be either real-life or simulated. A real-life environment is one encountered during the process of daily living. It has not been devised in an altered form by someone in order to simulate real life - - as is true of a story or film - - for the purpose either of appraising or of intentionally influencing a person's moral development. In studies of moral development, the great majority of investigators have employed simulated rather than real-life environments. Their reason for preferring such arranged incidents over real-life situations is the far greater convenience to the researcher of anecdotes, stories, and prepared dramas as assessment devices. By depending on contrived moral incidents, researchers are able to maintain greater control over variables than is true with spontaneous true-life events. For example, Kohlberg's dilemma about the husband's stealing medicine for his wife has been presented in precisely the same

Assessing Moral Development

441

way to thousands of respondents, thus providing a c o m m o n basis for comparing the moral principles of one group of people with those of many other groups. But an oft-noted disadvantage of arranged environments is that people's judgments of such moral incidents may deviate from the judgments that these same people m a k e in real life. In other words, the assessment of people's moral reasoning in contrived situations may be an inaccurate measure of how they will reason or behave under true-life conditions if the contrived events lack elements of realism that significantly influence people's judgments and actions. Therefore, an important dimension in people's responses to arranged environments is the degree of realism represented by such contrived stimuli. In other words it is important that a method of assessment have ecological validity, which Bronfenbrenner (1977, p. 516) has defined as "the extent to which the environment experienced by the subjects in a scientific investigation has the properties it is is supposed or assumed to have by the investigator". The following pages are dedicated to illustrating methods that have been used to assess moral development both under realistic conditions (assessing people's reactions in real-life situations) and under simulated-environment conditions (stories, dramas, role-playing situations, and the like). E n m e s h e d in the issue of real-life versus contrived assessment conditions is the matter of degree of participation in a moral incident. By degree of participation I mean the extent to which an individual is actively, in contrast to passively, involved in the incident. For instance, in a real-life incident of a person being accused of lying, I am assuming that the assessment of how a person will think and act in that situation can differ if the individual is only an observer of the event rather than one of the central actors (the accuser or the one accused). Even in arranged assessment environments, such as in a sociodrama involving a moral issue, I am assuming that the assessment of a person's moral development can yield different results if the individual is a m e m b e r of the audience rather than playing a key role in the drama. What I am suggesting is that the closer the person is to assuming an active role in either a real-life or a simulated moral encounter with the environment, the more useful the assessment of the individual's thought or behavior will be for predicting how that person will act in a similar real-life situation in the future. This hypothesized relationship between level-of-participation and degree-of-realism in assessment situations is pictured in Figure 4. As the arrow in the figure indicates, I propose that the accuracy of an assessment environment for eliciting information about a person's moral behavior increases as: (1) the conditions of the assessment situation become more life-like; and (2) the individual being evaluated assumes a more active role in the moral encounter. This hypothesized relationship between participation and realistic assessment-environments is reflected throughout the following review of assessment techniques. The review begins with real-life assessment situations, then continues with simulated situations.

Real-Life Situations -- Spontaneous and Arranged True-life assessment situations can be either spontaneous or arranged. A spontaneous situation is one that is not preplanned in any way but simply occurs in the normal processs of everyday living. An arranged situation is one in which the general setting or incident is preplanned, but the participants see it as a real-life event. Oftentimes they are unaware that the event is being used for assessment purposes. Thus, they are free to act naturally. They are not expected to play-act, as they would be in a sociodrama.

442

R. M U R R A Y T H O M A S

Degree of personal engagement E x t e n t t o which t h e people being assessed take d i r e c t a c t i o n and experience consequences

of the moral decisions Direct

individual participant

+~

Spontaneous real l,fe

g ~ ~

m

~ o ~

~ ~_

o c o F--~

c u

~o.

d~ a LU ~ ~

Arranged real L,fe

~

c

Direct part i cipant as

Observer

group member

o re curate

Accuracy of

Soclodrama Motion

picture Puppetry Oral or

written story or anecdote

~:;Srate~

Figure 4. Degree of realism and of personal engagement in assessment environments.

The following assessment methods illustrate both spontaneous and arranged real-life situations. The assessment methods reviewed below are ones involving: (1) inferences from responses; (2) post-participation interviews; and (3) observation interviews.

Inferences from Responses One way to use real-life incidents for assessing the moral contents of the mind involves an observer recording social encounters in natural settings, then using the responses of the people engaged in the encounters to infer something about their mental systems. This method can be illustrated by Nucci and Nucci's (1982) study of teacher-pupil interactions in second-, fifth-, and seventh-grade classrooms. During the normal routine of the school day, a trained observer seated at the rear of a classroom noted social transgressions to which the teacher overtly responded, then coded teacher or pupil responses according to 13 predetermined response categories intended to represent something about how the responder had interpreted the event. Examples of such categories are (Nucci & Nucci, 1982, p. 412): Evaluation of act as unfair or hurtful. Evaluation which indicates that the behavior causes injustice or harm to others. Focus on motives. Request for transgressor to consider or state the motives for his/her action. Sanction. Imposition of punishment or indication that punishment will follow if the behavior continued.

Assessing Moral Development

443

To establish the level of interrater reliability of the coding of responses, a second observer rated events at the beginning and at the midpoint of the study. In judgments of 76 events, the percentage agreement between the two observers for the 13 response categories ranged from 83 to 100. At the midpoint, percentage agreement for 120 events ranged from 78 to 100. As suggested by these reliability figures, rather high levels of agreement can be obtained by such a method. Two of the disadvantages of the foregoing assessment approach are that the researcher must wait until a moral incident occurs before collecting data, and the researcher has no control over what sort of incident may arise. As a consequence, it is difficult to collect information about how different people will act in relation to a particular moral matter under the same general environmental conditions. In order to ensure that the subjects who are being studied are responding to the same environmental setting, investigators have often sought to arrange life-like situations in which they are able to control the type of social encounter that occurs, yet still maintain a true-life quality in the way the people in the incident interact. Such an approach was used by Hartshorne and May (1928) in their studies of deceit in children. The investigators wanted their assessment experiences to be as true to life as possible and, at the same time, to represent activities that were educationally constructive, or at least entertaining. Thus, in selecting their assessment techniques they included such criteria as the following (Hartshorne & May, 1928, B o o k 1, pp. 47-48): The test situation should: (1) be as natural as possible, even though controlled; (2) not subject the child to any moral strain beyond that normally met in life situations; (3) not arouse the suspicions of the subject that deceit was being studied; and (4) have real values for the subject. The investigators also met such practicality criteria as: the tests to be used in statistical studies should be easy to adminster to groups, mechanically scored, and be short enough to be completed during a single class period in school. After choosing their assessment methods, the authors concluded that their criteria were "quite rigid" and no technique has yet been devised which will meet all of them (Hartshorne & May, 1928, p. 48). Hartshorne and May's (1928, pp. 47-97) work resulted in the creation or adoption of 22 test situations using ordinary classroom activities, four athletic contests, two parties, and one work-at-home. Examples of classroom activities are ones involving copying from a classmate, changing one's original test answers during the process of scoring one's own test paper, or cheating while working a puzzle or a maze. The athletic contests involved a pupil being left alone to test himself or herself and then record the score for grip strength, lung capacity, pull-up, and long jump. The party games included ones that enabled a child to peek when he was not supposed to, as in pinning the tail on the donkey and sticking an arrow in a target; or children could surreptitiously steal attractive objects that were used in a guessing game. From their studies of deceit among 11,000 children, ages 8 to 16, Hartshorne and May concluded that the test-retest reliability of their classroom tests was around r = 0.82 to r = 0.86, but that there was a good deal of inconsistency in pupils' scores from one test to another. In other words, the test results suggested that: neither deceit nor its opposite, "honesty", are unified character traits, but rather specific functions of life situations. Most children will deceive in certain situations and not in others. Lying, cheating, and stealing as measured by the test situations used in these studies are only very loosely related . . . . Whether a child will practice deceit in any given situation depends in part on his intelligence, age, home background, and the

444

R. MURRAY THOMAS

like and in part on the nature of the situation itself and his particular relation to it (Hartshorne & May, 1928, pp. 411-412). H o w e v e r , they did not entirely abandon the notion that there is a general trait of degreeof-honesty that influences how a child will behave in various situations that provide opportunities for deceit. They observed that (Hartshorne & May, 1928, p. 135): we cannot predict from what a pupil does on one test what he will do on another. If we use ten tests of classroom deception, however, we can safely predict what a subject will do on the average whenever ten similar situations are presented . . . . Ten tests of the I E R [Institute of Educational Research] type would yield a reliability coefficient or selfcorrelation of 0.90. Thus, H a r t s h o r n e and May appeared to believe that while certain conditions specific to a given testing situation could alter the expression of a child's general tendency to deceive or not in that situation, there was still such a general trait of honesty underlying all such situations. D a m o n (1977, pp. 52-53) has suggested that even though Hartshorne and May sought to make their assessment settings as life-like as possible: the experimenters were surprised by the unpredictability of dishonest behavior from one situation to the next, without ever finding out how the children in their study interpreted the situations in question. In fact, it is probable that cheating and lying in Hartshorne and May's situations bear little implication for most childhood codes of honor. As a corrective to this ostensible shortcoming of Hartshorne and May's approach, D a m o n advocates interviewing the participants to learn the moral principles on which they founded their decisions in such test situations. Since the time of Hartshorne and May's early studies, a variety of technological innovations - - such as one-way-vision mirrors and videotaping facilities - - have expanded experimenters' repertoire of controlled real-life situations. Consider, for example, the technique employed by Johnson and McGillicuddy-Delisi (1983) in studying the ways parents" modes of interacting with their children might relate to preschoolers' knowledge of social rules and conventions. To estimate the typical way parents interacted with their preschool children, the investigators placed the mother or father alone with the child in a room, and asked the parent to read the child a story. This activity was chosen because the investigators assumed that story-reading was a typical event in the lives of the families being studied, and such an event would yield behaviors suitable for rating on the Parent Child Interaction Observation Schedule (Sigel et al., 1977). As the parents read the story to their preschool child, the researchers videotaped the incident through a one-way vision mirror. The videotape was later used for rating the quality of parent-child interaction. The investigators were mainly interested in the manner in which children responded to p~[rents' ways of expressing expectations about how the child should behave in that situation. Analysis of the ratings led Johnson and McGillicuddy-Delisi (1983, p. 225) to conclude that three- and four-year-olds " . . . seem to benefit more from parents" negative reactions to their behavior than from reiterations of the rule itself or reasons for complying with conventions". Although such observation techniques as the above can help maintain the real-life quality of social interactions, the amount of information about people's mental contents that can be inferred with confidence from their overt responses is limited. Thus, a technique intended to yield additional information is that of the post-participation interview.

Assessing Moral Development

445

Post-Participation Interviews This method involves an investigator interviewing a person who was been an active party in a social encounter. One sort of post-participation interview consists of: (1) placing several people in an arranged situation intended to approximate real-life conditions; (2) requiring them to decide a moral issue precipitated by the situation; (3) interviewing each participant separately to learn what that individual's decision will be and why; and finally (4) gathering the participants together again in the presence of the experimenter to arrive at a group decision. D a m o n (1977, pp. 62-63) used this technique in studying concepts of distributive justice among middle-class American children ages 4 through 10. Four children at a time (three about the same age and the fourth considerably younger) were placed in a room with beads and string to make bracelets for the two adults who were there to aid them with their task. When enough time had passed so that one child had made more bracelets than the rest, the children were thanked and taken from the room. The youngest child (who inevitably had made the fewest bracelets) returned home, while the other three reentered the room. The one who had made the most was congratulated, and his or her name was written on the chalkboard beside the words "prettiest and most". The name of the larger remaining child was written beside the words "biggest boy" or "biggest girl". The name of the third child was written beside the word "nice". The three were again taken from the room, then returned one at a time to be told that for doing such a good job the children would be rewarded with 10 candy bars. But they themselves had to decide how to divide the 10 among them. The child was asked for a decision about how this division should be made and was then interviewed about his or her reasons for that decision. Finally, the three were brought back together and asked to reach a group decision both in the presence and in the absence of an adult. By means of such a procedure, D a m o n and his staff sought to learn children's notions of distributive justice as reflected both in their responses to the individual interview and in their verbal negotiations with the other m e m b e r s of the group when they finally had to arrive at a group judgment about who should get how many candy bars. It is apparent that D a m o n ' s approach did not depend solely on the participants' spontaneous perceptions of the situation. Instead, the children's thoughts were intentionally influenced by the considerations that the experimenter wrote on the chalkboard (prettiest and most bracelets, biggest child, nice child) and by the dismissal of the younger, less adept child from the group during the decision-making process. In such studies, the pattern of reasoning the children display is also influenced by the particular sequence of questions the interviewer asks.

Interviews with Observers Not only have interviews been conducted with people who have directly participated in social encounters, but investigators have also interviewed bystanders who have witnessed such events. The bystanders are interviewed as a means of learning how they have perceived or interpreted the events. Two variations of this approach are illustrated in the study by Nucci and Nucci (1982) mentioned above. The investigators' purpose was to learn what sorts of encounters children judged to be moral and what sorts socio-conventional (following Turiel's definitions of morality and social convention cited in Chapter 2). The procedure involved two adults - - an interviewer and a coding-observer - - who

446

R. MURRAY THOMAS

watched for any social encounters that might naturally occur during the course of classr o o m or playground activities. This method was a variant of the approach used by Nucci and Turiel (1972) for studying nursery-school children's conceptions of moral issues and later revised by Weston and Turiel (1980). As soon as an apparent transgression occurred, the adult observer coded the incident according to 13 response categories. Then a pupil who had witnessed the event was taken aside and asked by the interviewer (Nucci & Nucci, 1982, p. 405): (1) Did you see what happened? Can you tell me what h a p p e n e d ? " (2) "Was it right to do (the observed act)? Why/why not?" (3) "Is there a rule about that at your school?" (4) "'Is that a rule that should always be followed or are there some times when it would be alright to do so (the observed act)? When would those times be?" (5) "Is the rule about (the observed act) a rule that the teacher or principal could change?" (6) "Would it be right for the teacher or principal to remove the rule about (the observed act)? Why/ why not?" (7) "What if there weren't a rule in the school about (observed act), would it be right to do it then'?" Judges coded pupil responses as social conventional if the child said the witnessed act would be right if there were no rule or consensual norm bearing on it. Responses were coded as moral if the child said the act would not be right if there were no rule or consensual norm concerning it. One conclusion the researchers drew from the results was that there was a high level of agreement (88 percent) between the children's interpretation of events as moral or social conventional and the judge's classification of the event descriptions, supporting the thesis that "social convention and morality constitute distinct conceptual and developmental domains" (Nucci & Nucci, 1982, p. 409).

Simulated Moral Incidents

As mentioned earlier, by far the largest amount of research on moral development has involved the use of arranged rather than real-life social encounters, with the most popular form traditionally being that of a brief story or anecdote narrated orally to, or read silently by, the subjects whose moral development is being judged. However, as photographic and videotape equipment has become increasingly affordable and convenient to use, more researchers have employed both still pictures and motion pictures as means of presenting arranged environments for revealing information about people's moral development. A review of alternative versions of such simulated enviromnents is offered in the following pages, along with comments about advantages and disadvantages of each variety. The review begins with oral and written stories, then continues with drawings, photographic presentations, and live dramas. Oral attd Written Stories

Because so many examples of moral stories and anecdotes have been offered in Chapters 3 and 4, it is unnecessary to illustrate the most c o m m o n varieties here. We will focus attention, instead, on less c o m m o n variants that have been devised in attempts to correct shortcomings of the most popular types used in the past. The following discussion is organized according to types of shortcomings of stories that researchers have sought to avoid or correct.

Assessing Moral Development

447

Unusual moral incidents. Some investigators have sought to arrange for the incidents they submit for judgment to represent as closely as possible the sorts of issues most commonly met in the everyday lives of the people they are studying. Some critics have charged that Kohlberg's moral dilemmas (Chapter 3) fail to meet this standard of being typical of everyday experiences of children, so that the dilemmas may fail to elicit the kinds of solutions people normally apply in their own lives. In this sense, then, such dilemmas as that faced by Heinz, whose wife was dying of cancer, could be considered at a low level on a scale of realism. Insufficient story content. A principal difference between real-life and arranged environments is that elements found in real-life settings have been left out of arranged settings. And if the omitted elements are ones that would significantly influence a person's moral judgments of a moral situation, then such an arranged environment will be inappropriate for accurately eliciting information about the moral aspects of a person's mind. In effect, the greater the number of significant stimuli that are omitted from stories, the less suitable the stories will be as devices for the assessment of moral development. For instance, Gilligan (1982, p. 100) has charged that Kohlberg's dilemmas lack important information about the complexity of the characters' life conditions: Hypothetical dilemmas, in the abstraction of their presentation, divest moral actors from the history and psychology of their individual lives . . . . Only when substance is given to the skeletal lives of hypothetical people is it possible to consider the social injustice that their moral problems may reflect and to imagine the individual suffering their occurrence [involves ] . . . . The following paragraphs offer: (1) examples of research that has been criticized on this score; and (2) steps recommended as correctives for the shortcomings of ostensibly inappropriate types of stories. In appraising Piaget's (1932) and others' studies of how children's moral judgments are affected by intention and consequence information in moral incidents, Grueneich (1982) has offered an analytical scheme intended to identify features of intentions and consequences that can profitably be included in stories. Grueneich argues that these features can influence people's judgments of: (1) whether a moral issue is indeed involved in a described incident; (2) the degree to which anyone in the incident should be held at fault; and (3) what sanctions would be justified in the case. Significant features related to intentions include whether an actor in the story produced an effect purposely or accidentally, whether she acted voluntarily or involuntarily (as under duress or under the influence of drugs), whether the actor was careful or careless, and whether the actor's motive was positive (intended to be helpful), negative (intended to do harm), or neutral. Significant information regarding consequences includes whether the consequences were physical (a broken cup) or social (hurt feelings), whether a physical consequence was suffered by an object or a person, the reactions of the actor following the act (apologized versus blamed someone else), the reactions of the victim or beneficiary of the act (forgave rather than retaliated against the actor), and whether there was an authority-figure present (Grueneich, 1982, pp. 30-33). When Grueneich used his analytical scheme to evaluate the adequacy of Piaget's and others' stories for assessing moral development, he found them wanting. He contends that omitting significant information about intentions and consequences from stories has lead to inaccurate conclusions about the way children and adults make moral judgments. As a

448

R. MURRAY THOMAS

remedy for such shortcomings in stories, Grueneich has proposed that researchers design their stories according to a system that includes the features which significantly affect people's perceptions and judgments about moral incidents. In addition to presenting his own system for identifying intention and consequence factors, he has r e c o m m e n d e d the use of such story-grammar analytical schemes as those developed by Mandler and Johnson (1977), Rumelhart (1975), and Stein and Glenn (1979). The form that such grammars may assume can be illustrated with Stein and Glenn's version that identifies a sequences of six major categories of information about a behavioral episode, plus the relations which connect the categories. The sequence begins with the setting, with its description of the characters, the physical surroundings, and the time of the incident. The second is the initiating event, telling how the incident started. This event then elicits an internal response in the recipient of the initial behavior, such as a feeling, thought, or goal that motivates the individual to attempt an overt action to achieve the goal. The attempt results in a direct consequence which, in turn, evokes a reaction indicating how one or more characters in the incident think, feel, or act in response to the consequence. A researcher guided by this grammar would design stories containing each of the six kinds of information.

Subjects' memory limitations. When stories are used as the stimuli in assessments of the mind's moral contents, a further consideration is that of how accurately the subjects rem e m b e r the narrative. A variety of studies of children's m e m o r y of story content indicate that younger children r e m e m b e r fewer significant elements of stories than do older children (Grueneich, 1982; Mandler & Johnson, 1977; Stein & Glenn, 1979). One way researchers have sought to correct for m e m o r y deficiencies has been to have children identify the main elements of the story, and then to remind them of elements they have omitted before their moral judgments are assessed. Two main approaches have been used to check on the accuracy of subjects' m e m o r y . The first is free recall, in which children are asked to relate as much of the story as they can (Harris, 1977; Miller & McCann, 1979). The second involves probe questions. The difficulty with free recall is that it takes more time, causes problems for the interviewer in matching the child's narrative against significant story elements, and involves children relating irrelevant details. The probe approach, on the other hand, has the disadvantage of the interviewer's questions sometimes influencing children to draw inferences which they would not have drawn spontaneously, and the questions may not touch on all the story elements that have affected the children's judgments of the incident. Some investigators have accompanied their story narrations with pictures illustrating key elements of the tale, thereby making it unnecessary for the child to retain these elements in m e m o r y when expressing a judgment related to the story incident (Rybash et al., 1979). Although pictures can serve as a mnemonic aid, not all of the key information, such as a character's intentions or feelings, can be portrayed adequately in pictures. Subjects' comprehension and communication limitations. Not only are children's responses affected by their ability to recall the contents of a moral incident, but responses are influenced as well by how well children c o m p r e h e n d the complexities of the described situation. In effect, children may have an opinion to express about a moral decision, but they will not be able to express it adequately until they clearly understand the various elements in the incident that bear on that decision. A further problem is children's limitations in their ability to express themselves verbally. It is apparent that these problems of comprehension and communication are most often met in studying moral development among

Assessing Moral Development

449

younger children and among older subjects who have limited language skills. One way investigators have sought to surmount such difficulties has been that of accompanying the verbal description with pictures of the moral issues being posed and then asking the subjects to select which of several pictures best represents their solution to the moral issue at hand. For example, Enright and coworkers (1980) devised a distributive-justice-reasoning scale built around moral dilemmas patterned after those D a m o n (1977) created to study children's concepts of fair ways to partial out goods and privileges. The following is an example of such a dilemma (Enright et al., 1980, p. 195), which was described while the children looked at a line drawing depicting the characters and objects (children's paintings) involved in the incident. These boys and girls all go to the same camp. This is Betty, she's the oldest one at the camp; this is Jennifer, whose family does not have money; this is David, who made the most paintings; and this is Mathew. One morning they all thought it would be a good idea if they got out their paints and painted pictures of what they saw around the camp. When they were done, Betty made two paintings, Jennifer made two, David made four paintings, and Matthew made two. After they did this, they asked the person who runs the camp if he would like some of the paintings. He bought all the paintings and gave some nickels to the children. The children, then, had to decide how to split up the nickels. What is the best way to split up the nickels? In conclusion, investigators have adopted a variety of approaches to render the presentation of traditional moral dilemmas more life-like and comprehensible to the subjects who are making judgments about how to act in such situations. A m o n g these approaches has been the use of motion pictures.

Motion Pictures and Videotapes C o m p a r e d to oral and written stories, films and videotapes have been rarely used in the assessment of moral development. And when they have been used, it has more often been in moral-education programs than in scientific investigations. In moral-education settings, films typically serve both instructional and evaluation purposes. As an instructional tool, the motion picture presents to students a dramatized moral-decision incident, which is subsequently discussed by the class m e m b e r s so they will understand concepts, principles, and thought processes involved in such decision-making situations. A motion picture serves as an assessment device when students are asked to answer questions about the dramatized incident (Raths etal., 1966, pp. 117-119). Obvious advantages of motion pictures over oral or written stories are that the depicted action can advance at the same pace as do events in real life, the emotions and non-verbal communication (body language) of the characters are displayed as in real life, and the environment has not been 'cleaned' of ostensibly irrelevant elements that may actually be significant influences on people's perception of the moral situation. Chandler and coworkers (1973) found that when moral situations were presented on film, children made more effective use of information about the actors' intentions than they did when the moral situations were presented in hypothetical stories. Because of the filmed incidents' realistic dramatic quality, the episodes can more frequently evoke significant emotions in the observers than do briefly described moral dilemmas. The chief reason that motion pictures have been so rarely used in the assessment of moral development is likely the cost and bother typically required to produce and use

450

R. MURRAY THOMAS

them. In terms of time, money, effort, and complexity, oral or written stories are far more convenient. While motion pictures can go a long way toward simulating real-life incidents, the person who is being assessed is still in an observer's rather than a participant's role. Role-playing techniques have been devised to surmount this shortcoming. Role Playing or Sociodrama Role playing or sociodrama consists of subjects acting out hypothetical social-interaction episodes, including moral situations. The typical role-playing incident in a school setting consists of the experimenter or the teacher introducing a social situation by identifying the characters involved (including their attitudes, interests, and relationships), the physical setting, and the problem situation in which they find themselves, such as a situation involving a moral conflict. Then students are assigned to play the parts of the characters in the incident. The actors are usually not provided scripts to read but, rather, are expected spontaneously to act as they assume such characters might behave in such a situation. The teacher may stop the drama at any point and invite m e m b e r s of the audience to analyze or to respond to the events they have witnessed. When m e m b e r s of the audience suggest that a particular character might have acted in a different manner to resolve the moral conflict, the teacher can invite those people to start the d r a m a over again and act out their version. Role playing has been used as an assessment approach in both scientific investigations and moral-education programs. As for its use in scientific inquiry, Mixon (1974, pp. 78-79) has observed that: Much of social behavior, no matter how well observed and reported, is not well understood. Suppose an investigator is simply puzzled by a social scene or episode and wonders why people in particular roles in a particular context do what they do. A scenario can be drawn up, and actors repeat the puzzling scene as many times and with as many modifications as are necessary to understand the scene. The scenario method is a way of tinkering with what might be called a working model of a social episode. An application of role playing to the process of moral decision-making is Mixon's (1974) role-playing reproduction of conditions implied in Milgram's (1963) classic study in which subjects were directed by an experimenter to administer increasingly severe electric shocks to a learner (who was actually a stooge) who was in an adjacent room ostensibly engaged in a nonsense-syllable experiment. Milgram was interested in learning the degree to which subjects would abide by the experimenter's (authority figure's) directions rather than abiding by personal principles of justice or compassion. The results of the experiment showed that most subjects followed the experimenter's directions even when the electric shocks seemed to become quite severe. But Mixon (1974, p. 74) observed that the subjects' reactions of "almost unrivalled compliance" could be interpreted at least two ways. Should the results be viewed "as an experiment where expected safeguards had broken down, or as an experiment where expected safeguards only appeared to have broken down". To test these rival interpretations, Mixon devised a role-playing simulation of the Milgram study in which it became perfectly clear to the subjects that "the experimenter believed the "victim' was being seriously h a r m e d " (Mixon, 1974, p. 80). Under these conditions all actors in the role-playing situation defied the experimenter's commands, an outcome that was quite the opposite of the original Milgram results. This evidence was viewed as supporting Mixon's contention that the process of deciding whether to resist an authority figure's orders in a

Assessing Moral Development

451

moral-decision situation is governed in part by an individual's perception of the severity of the consequences to people other than oneself. Mixon concluded that in the original Milgram experiment, the subjects likely assumed that the investigator urged them to continue because the investigator would not have done so if he had believed the electric shocks were doing serious harm. In scientific inquiry, role-playing has been r e c o m m e n d e d as particularly useful at an early investigative stage to reveal possible avenues of research that can subsequently be pursued by means of more controlled laboratory or field studies. In this exploratory stage, role-play can be used to: (1) represent the whole range of feasible, or acceptable behaviors within a given social context (taxonomic function); (ii) the most frequently occurring, or most typical acts [in a type of] social episode . . . (stereotyping function); and finally (iii) the sequential rules governing the flow of events within a given episode can be subjected to study using the role-play method (time-sequencing function). (Forgas, 1979, p. 41.) In moral-education programs, sociodramas can be used for assessing how well students can apply in life-like social interaction the principles they have studied in the program. The advantages of role playing over such assessment techniques as interviews about moral dilemmas is that the sociodrama requires immediate, spontaneous responses to other peoples' behavior, thus making the situation more life-like than is an interview about a moral decision contained in a story. Furthermore, role playing generates and is affected by emotions aroused during the interaction among the characters in the drama, which again is more life-like than interviews about events in a moral anecdote. And, while people who observe sociodrama from the audience likely become more emotionally involved with the events than they would in simply reading or hearing a moral story, observers do not experience as much life-like involvement as do the people acting in the drama. Thus, because participation as an actor involves more life-like elements than does observation as a m e m b e r of the audience, we would expect the actors' behavior to serve as a more accurate assessment of how they would respond in real life than do observers' comments reflect how the observers would behave. A disadvantage of sociodrama simulations is that the people playing the roles recognize they are in an imaginary situation which does not require them to bear the real-life consequences of their decisions. This difference can significantly alter the decision-making process displayed in the two situations. In addition, subjects "(i) may not know with any certainty how they would actually behave in a real situation; [and] (ii) their responses can be affected by a variety of unintended distortions, such as social desirability, demand characteristics etc." (Forgas, 1979, p. 42). Hence, role playing has advantages over story analysis as an assessment approach but falls short of completely representing true-life conditions.

Conclusion The purpose of the present chapter has been to present two ways of viewing the relationship between environments and assessment techniques. First, we illustrated approaches used in efforts to assess which elements of environments serve as potent influences on individuals' moral development. Second, we analyzed assessment techniques in terms of how life-like they are as 'environments' for eliciting accurate information about people's moral

452

R. MURRAY THOMAS

development. Analyzing assessment techniques for their life-like quality reveals that traditional moral dilemmas in story form are in many ways distortions of true-life conditions, but that alternative methods as motion pictures and role playing have rarely been used in assessing moral development, probably because of the greater inconvenience entailed in producing and using motion pictures and sociodramas.

CHAPTER 9

MORAL CONSISTENCY

Over the years continuing debates have revolved around issues of moral consistency. One debate concerns the consistency between moral reasoning and moral behavior. When people are in real-life moral-decision situations, will they act on the basis of the moral principles they have expressed in hypothetical moral situations, such as in situations described in story form? A second debate focuses on consistency over time. To what extent is a person's moral thought and moral behavior the same from one day to the next or from one period of life to another? Methods of assessment for answering these sorts of questions are the concern of Chapter 9.

The Relationship Between Thought and Action Piaget recognized the problem of consistency between thought and behavior when he devised his questioning technique for investigating children's moral development. He wrote that, "The only good method in the study of moral facts [moral behavior] is surely to observe as closely as possible the greatest possible number of individuals [in moral-decision situations] . . . . While pure observation is the only sure method, it allows for the acquisition of no more than a small number of fragmentary facts. And we therefore consider that it must be completed by questioning children at school" (Piaget, 1932/1948, p. 107). As a consequence, he came to rely chiefly on his clinical method in preference to observations in real-life situations. The purpose of the first portion of this chapter is to inspect techniques used to study the relationship between moral thought and moral behavior. The approaches we will review can be categorized under the headings: (1) anecdotal confirmation; (2) constructing a line of logic; (3) ex-post-facto interviews; (4) observations in life situations; and (5) experimental studies.

Anecdotal Confirmation The anecdotal method for establishing a relationship between moral reasoning and 453

454

R. MURRAY THOMAS

moral behavior is a rather commonsense, layman approach, for it consists of an evaluator citing instances from his or her own experience to support a generalization about the connection between expressed moral convictions and moral action. Piaget displayed this sort of confirmation when he spoke of how impressed he was with the 'psychological penetration' of children's answers to questions about the varied reactions of different children to punishment. To support his contention that such answers reflected real-life responses of children and were not simply what he termed a "sugary form of morality" created on this story-telling occasion, Piaget drew his readers' attention to what he assumed were experiences in their own past: " H o w often, indeed, one sees children stoically bearing their punishment because they decided beforehand to endure it rather than give in to the superior will? It is our belief, therefore, that the answers examined above correspond, up to a point, with the experiences of real life" (Piaget, 1932/1948, pp. 123-124). Perhaps the most obvious shortcoming of the anecdotal approach lies in the danger of applying to a large n u m b e r of p h e n o m e n a a generalization established on a few casually selected instances. In apparent recognition of this, Piaget added the caveat "up to a given point". H o w e v e r , what that point is and under what conditions children's opinions about punishment are fanciful rather than true to life is left unclear. Consequently, anecdotal observations would appear to be a proper source of hypotheses about the relationship between expressed thought and action, but science requires something better for convincing confirmation.

Constructing a Line of Logic A second way of evaluating the connection between thought and action consists of building a chain of reasoning that convinces listeners of the logical necessity of such a relationship. A typical line of argument to support the conviction that our moral behavior does indeed consistently follow our moral thought holds that: (1) moral behavior occurs in situations in which we have a choice of alternative ways to act; (2) in making a decision about which way to act, we have in mind certain alternative behaviors (with one person's perception of alternatives different from another's, depending on their mental ability and past experiences); (3) we next predict the likely consequences that will result from each alternative (with one person's ability to predict consequences accurately different from another's); and then (4) we choose to act in the way that will likely yield the consequences we will find most satisfying. This form of assessment or validation depends, then, not on empirical studies but, instead, on the reasonableness of the argument. And the argument is typically supported, or at least illustrated, with anecdotal examples.

The Ex-Post-Facto Interview A line of logic can also serve as the foundation for assessment techniques used in settling practical cases of morality. For example, we can use a line of argument as the basis for evaluating people's culpability for what has appeared to be immoral behavior. Following people's seemingly i m p r o p e r actions, we can question them in order to learn why (with why meaning "by what kind of thinking") they behaved in such a fashion. This ex-postfacto interview is typically used in court cases as attorneys seek to reveal the pattern of

Assessing Moral Development

455

thought behind an act. If the jury becomes convinced that the defendent had a 'good' reason (promoral intention) for an ostensibly criminal act, the judgment about whether to punish the defendent - - or to provide only light rather than severe punishment - - is different than if the defendant had a 'bad' reason (immoral intention). Obviously the ex-postfacto interview is also used by parents in the home and teachers in the schoolroom for determining the sorts of consequences to mete out for a child's apparently unacceptable behavior. Even if the logic underlying the ex-post-facto interview is accepted as valid, the interview technique has serious shortcomings as a trustworthy method for establishing the connection between conscious thought and action in individual cases of behavior. The method first depends on the individual telling the truth when asked to explain her motives and pattern of thought. In law-makers' attempts to reduce the temptation to lie in court, witnesses are required to swear to tell the truth (traditionally with their hand on the Bible, implying that the Supreme Power will punish them if they lie), and they are threatened with earthly punishment for perjury. Yet it is clear that purposely 'twisting the truth' is common when either children or adults are questioned about their unacceptable acts. A second disadvantage is that communication between the questioner and the accused may go awry. The accused may not understand the questioner's terminology or may not be facile in formulating an answer about what she was thinking at the time of the act. Young children and people who normally speak a foreign language are particularly prone to communication errors. Furthermore, the longer the time between the act and the interview, the greater the chance that the person's memory of what she was thinking when committing the act will be distorted. Finally, the ex-post-facto interview is founded on the assumption that all acts result from consciously reasoned decisions. If, however, we accept the proposal represented in the model of moral development in Chapter 1 that much of one's mental life is unconscious, then we might expect at least part of one's decision-making to take place unconsciously, that is, at a non-reportable level of awareness. Such unconscious 'thought' would therefore not be revealed under normal questioning, even if the accused sought to tell the truth. The term normal questioning has been used here to mean a direct interview technique in which the accused is conscious and "in her right mind". Furthermore, normal questioning assumes that the accused's answers are interpreted literally, not symbolically. However, as the psychoanalytic tradition has demonstrated, beyond normal questioning, evaluation techniques have been devised for revealing unconscious thought processes that the interviewer presumes had led to the moral act that is under inspection. One well-known application of the ex-post-facto interview is represented in the study by Haan and coworkers (1968) of the moral reasoning of students at the University of California in Berkeley who in 1964 defied university authorities' order not to engage in a free speech sit-in demonstration that would disrupt the normal operation of the university. Two months following the sit-in demonstration, a research team conducted Kohlberg's moraldilemma interviews with 129 students who had been arrested for participating in the demonstration and with 210 other students drawn randomly from the campus population. Analysis of the Berkeley data by Kohlberg and his colleagues, using the revised scoring system for the Kohlberg dilemmas, revealed that the higher a student's Kohlbergian stage of moral reasoning, the more likely the student had participated in the free-speech demonstration. That is, of the total sample of 339 students interviewed, only 10 percent of those whose reasoning was at stage 3 had been arrested for demonstrating. Of students

456

R. MURRAY THOMAS

scoring at the traditional 3/4 stage, 31 percent had sat in. At stage 4, 44 percent; at transitional stage 4/5, 73 percent (Kohlberg & Candee, 1984, p. 66). Kohlberg interpreted such results as evidence supporting his hypothesis that moral reasoning (subscribing to a principle that establishes what is 'right' or 'just') is a necessary and strong factor in determining one's moral action. In his opinion, two other factors that participate with the principle determining behavior are a person's: (1) accepting the responsibility to act on the principle; and (2) such ego controls as intelligence, attention, and delay of gratification that comprise 'moral willpower' (Kohlberg & Candee, 1984, pp. 71-72).

Observations in Life Situations The apparent way that people in everyday life judge thought/behavior consistency is by observing how well what a person says about moral matters matches how the person acts in moral situations. H o w e v e r , there are several serious difficulties with such an approach to assessment. First, the observer must be present when the individual who is being judged expresses a moral belief. And waiting for this opportunity to occur spontaneously may well prove futile. But if the observer uses an interview in order to hasten the process of learning the people's thoughts, he risks the errors of interviewing described above, that is, the problems of inaccurate communication and of the subject's not telling the truth. As Piaget noted, still more difficult is the task of observing a person's behavior in a sufficient variety of moral-decision situations to draw trustworthy conclusions about the correlation between moral beliefs and moral acts. And the difficulty of gathering valid data about a single individual obviously increases many fold when investigators seek to gather such information about large numbers of subjects in order to draw conclusions about people in general. For these reasons, researchers have devised experiments intended to simulate real-life conditions while minimizing the foregoing problems.

Experimental Studies In the typical experimental study, a person's moral beliefs are first solicited or else assumed. Then the individual is placed in an arranged 'real-life' moral situation, ostensibly unaware that observations are being collected about how well his behavior matches his stated beliefs. Or the 'real-life' moral incident may take place first and the person's opinions solicited later. Such was the procedure in H a r t s h o r n e ' s and May's studies. They first gave children various sorts of tasks to complete, including puzzles, party games, and arithmetic problems on which the children could surreptitiously cheat when ostensibly their dishonesty could not be detected. H o w e v e r , the tasks were so arranged that the investigators would learn who had cheated and who had not. A week or so later a m e m b e r of the research team returned with a paper labeled Personal Data Sheet containing a set of general questions of a more or less personal nature. Typical questions were (Hartshorne & May, 1928, p. 95): Did you ever cheat on any sort of test? On some of these tests you had a key to correct your p a p e r by - - did you copy any answer from the keys'? Do you think to do so is really cheating'? On any of these same tests did you copy answers from other pupils' papers'? Have you answered all questions honestly and truthfully?

Assessing Moral Development

457

The researchers then c o m p a r e d the answers to these questions with the behavior students had displayed in the tests to determine how consistently the answers matched the behavior. Another version of the real-life-situation experiment featured a markmanship contest devised by Grinder (1962), with a variation of this approach also attempted by Lehrer (1967). Grinder created a toy ray gun which was p r e p r o g r a m m e d to produce a targetshooting score slightly beneath the score required to earn a markmanship prize. The situation was arranged so that those who participated in the contest could cheat a bit in reporting their score and thereby earn the prize. The experimenter's intention was to produce a situation in which children's immediate impulse to win the prize was pitted against their conscience, that is, their trait of honesty. Grinder reported that 80 percent of the 12-yearolds in his experiment did indeed cheat. In Lehrer's later version of the experiment, the ray gun was a far more elaborate electronic gadget than the one used by Grinder. Lehrer's device was p r o g r a m m e d to yield a pattern of random scores that would bring the child to the brink of success, then permit him to report an incorrect score in order to win. Lehrer discovered cheating among only 15 percent of the junior-high-school students she tested. From comparing the results of the two ray-gun studies, Kohlberg (1984, p. 502) speculated that Lehrer's students guessed that her elaborate gun contained a computer which kept its own score, while pupils who had used the Grinder gun had not believed it was a self-scoring instrument. Consequently, Kohlberg concluded that "moral behavior is essentially defined by situational factors, including expediency". In summary, the consistency between moral thought and moral behavior has been assessed in a variety of ways, but with only partial success. Before an adequate plan for evaluating the relationship between moral reasoning and moral behavior, we need a better understanding of the mental processes that link the two.

Consistency Across Time The consistency of moral thought and behavior from one time of life to another has been studied for both individuals and groups. The obvious method for answering questioning about the consistency of an individual is to assess the same variety of thought or action at different junctures in that person's life. The most c o m m o n way of conducting such an assessment is an ex-post-facto approach, as frequently illustrated in attempts to understand criminal behavior. That is, after a person is charged with breaking the law, we can search back through his personal history to estimate how consistent this incident of lawbreaking is with his behavior at prior times in his life. Despite the popularity of this method, its potential disadvantages render it often untrustworthy. Evidence from the past is frequently in the form of s o m e o n e ' s vague recall rather than a written or photographic record made at that earlier time. Furthermore, the information from the past may be only illustrative of a single incident rather than representative of an individual's typical behavior. In contrast to the ex-post-facto approach, researchers can conduct a longitudinal study by intentionally collecting the same type of evidence about moral reasoning or behavior on several occasions. An example of this approach is found in the series of six studies conducted by Kohlberg and his colleagues, using the sample of respondents whom Kohlberg had tested in his initial study of moral reasoning in 1955-56 (Colby et al., 1983). Following

458

R. M U R R A Y T H O M A S

the initial testing of 84 adolescent boys on moral dilemmas (see Chapter 3), as many of the same boys as could be located were retested at three-to-four-year intervals until 1976-77. The main difficulty with conducting such longitudinal research is that of ensuring that a sufficient n u m b e r of the m e m b e r s in the original sample continue to participate in subsequent testings. Of the original 84 boys in Kohlberg's study, only 10 took part in the complete sequence of six testing sessions. However, 58 participated in at least two of the periodic testings, while 47 engaged in four or more sessions. Hence, the authors used data from the 58 subjects for analyzing the consistency of moral reasoning over time (Colby et al., 1983, pp. 17-18). A longitudinal approach of this type can be used to assess consistency for both individuals and groups. That is, not only was each individual's consistency in responding to the moral dilemmas traced, but the patterns displayed from one age group to another were also compared. The comparison of age groups indicated that, on the average, the reasons the subjects gave to support their answers to the moral-dilemma question changed systematically from one age group to another, a result interpreted by the Kohlberg group as verifying the authors' proposal that people tend to advance from one stage of moral reasoning to another with the passing of time. But, as the investigators pointed out: "Curves of mean stage usage for a group do not necessarily represent curves of growth for any of the individuals in the group" (Colby et al., 1983, p. 47). Therefore, the authors also analyzed the consistency of moral-reasoning over time for individuals. In their report they not only illustrated with particular cases the variations in consistency that can be obtained between one individual and another, but they also identified age periods during which moral reasoning typically is more stable or consistent than during other periods. Our results support the assumption that relative maturity of moral judgment is a fairly enduring characteristic of individuals throughout their development and becomes even more stable in adulthood. That is, if an adolescent is advanced in his moral judgment relative to his age peers, he is likely to be advanced in adulthood as well. However, the correlations are not as high as to preclude some changes in relative position for many subjects. (Colby etal., 1983, pp. 69-70.) Despite the value of longitudinal designs in furnishing evidence about both individual and group patterns of moral development, such designs are rarely used. Apparently the two main reasons researchers have avoided longitudinal models are that: (1) a longitudinal approach requires that the investigator wait months or years to complete the study; and (2), as noted above, from one testing to the next, some of the original subjects may be lost so the n u m b e r of people included in the entire series of assessments may become far smaller than the n u m b e r that took part in the original testing. To avoid such problems, most investigators have preferred cross-sectional designs. In such studies, the beliefs or behavior of people in different age groups are studied at the same time, with conclusions then drawn about the extent of consistency from one age level to another. For example, Pratt and his colleagues (1983) developed a story-pair task to learn whether people based their judgments of proper behavior in the pair of stories on a philosophical orientation of fairness as contrasted to utilitarian motives. After using the stories with young, mature, and older adults, the authors concluded that the findings supported their hypothesis that adults become increasingly reflective philosophically as the years pass. In other words, between one age group and another there was a type of change or inconsistency in responses that caused the researchers to identify the change as representing a shift in philosophical orientation toward moral incidents.

AssessingMoral Development

459

Clearly, drawing such conclusions from cross-sectional data requires our accepting the assumption that there are no causal differences among the age groups except the effect of passing time. We assume that when the youthful group becomes old,the individuals within it will have grown to hold the same opinions as the views presently expressed by the elderly subjects. But frequently it is difficult to provide convincing evidence that such an assumption is sound, for there may be differences between the groups other than changes that come with increase in age.

Conclusion In the field of moral development, one of the popular issues attracting the attention of researchers, educators, and lay people alike is the question of the degree of consistency between people's moral reasoning and their moral behavior. In this chapter we have reviewed five methods used in attempts to answer this question: (1) anecdotal confirmation; (2) constructing a line of logic; (3) ex-post-facto interviews; (4) observations in life situations; and (5) experimental studies. In reviewing ways of answering questions about the consistency of people's moral thought and action over time, we have discussed three approaches: (1) ex-post-facto case studies; (2) longitudinal designs; and (3) cross-sectional designs. Although longitudinal designs provide the strongest method for assessing trends in consistency for both individuals and groups, such designs are rarely used because they are difficult to implement.

C H A P T E R 10

SOME CONCLUSIONS AND PREDICTIONS

The two-fold purpose of this final chapter is: (1) to summarize main points from preceding chapters and to propose directions that evaluation might profitably take in the near future; and (2) to consider three additional methodological problems not yet adequately solved.

Future Directions in the Assessment of Moral Development In this section, I propose a series of generalizations that seem supported by the material reviewed in earlier chapters. And in relation to each set of generalizations, I suggest needs and directions for the future. In speculating about what lies ahead, I believe it is useful to focus on both trends and needs. Although these two concepts are related, they are not identical. The term trend, as used here, refers to the question: What can likely be expected in the future? The term need concerns the question: What inadequacies of presently available assessment methods should be remedied so that moral development can be satisfactorily understood? While trends are significantly influenced by perceived needs, trends are also influenced by such additional factors as feasibility and popularity. So, before offering generalizations about the past and speculations about the future, I believe it will be helpful to explain in slightly more detail what I am assuming about trends and needs.

Aspects of Needs and of Trends In the assessment of moral development, the most basic determinant of what a need will be is the structure of the theory of development employed by the investigator. For example, the organizational framework of this monograph derives from the model of moral development proposed in Chapter 1. Therefore, the task of assessing development becomes one of gathering information about each of the components of the model and about the components' interactions. A different model of development would identify different components about which information is required. For example, a psychoanalytic model of the mind would require information about such functions as the id, ego, and superego at 461

462

R. MURRAY THOMAS

various levels of consciousness. In short, different theoretical structures identify different categories of need. Needs are not only differentiated by their theoretical categories, but also by their degree. When the methods of gathering information about a category seem rather satisfactory, there is little need for better assessment techniques for that category. But for categories in which the information-gathering methods are poor, there is great need for creating more adequate assessment approaches. In way of illustration, I would suggest that the ways of assessing people's conscious c o m m a n d of moral facts, concepts, and causal attributions as described in Chapter 3 are a good deal more satisfactory than the ways of assessing emotional aspects as defined in Chapter 5. Hence, if future activity in the field of assessing moral development were to be predicted solely on the basis of need, we would expect to find far more investigators focusing on methods for evaluating affective aspects than on ways of evaluating cognitive components. However, researchers' agendas are not determined only - - or even primarily - - by the degree of the need. Two other determinants are feasibility and popularity. Feasibili O, here refers to how conveniently the research goal can be achieved. For instance, I am convinced that the great quantity of research on moral development conducted on people's judgments of brief verbal moral dilemmas (Chapter 3) is in part a result of the ease with which such methodology can be used. It has been easier for investigators to spend their time refining this simple assessment technique than for them to grapple with the far less manageable problems encountered in attempts to measure affect or to discover unconscious mental processes. The field of developmental psychology, like other disciplines, is faddish. A topic or theory or assessment approach may be more popular at one time than another. Editors of books and scholarly journals are affected by such phases of popularity. There are several reasons that investigators may limit themselves to popular assessment methods rather than seeking to adapt or create a different approach. One reason is that they derive ideas about methodology from the current professional literature. The Freuds and Piagets, who veer off the beaten path, are relatively rare. Most workers in the field are less creative and more heavily influenced by what is 'normally' done. So invcstigators may continue working with popular assessment techniques because they lack the creativity required to do otherwise. A second reason is that researchers who pursue currently ~hot' topics are more likely to be awarded research grants and get their work into print than are those who pursue methods that are in less favor. By deviating from established practice, an investigator risks having his work rejected by such guardians of established practice as editors and m e m b e r s of professional societies. In conclusion, then, the following speculation is founded on my estimate of what might be expected in the future for the assessment of moral development when we consider: (1) the categories of need defined by the model proposed in Chaper 1 ; (2) the apparent degree of need within a category; (3) feasibility; and (4) popularity. Now to the s u m m a r y of past results and the prediction of future directions. The presentation is organized according to: (1) definitions of moral development; (2) the nature of people's moral facts, concepts prescriptive causal relations, and attributions of cause; (3) emotional components of moral development; (4) the direction and strength of moral values: (5) processes of moral development; (6) explanations of the 'true" causes of moral development; (7) assessment techniques as environmental stimuli; and (8) consistency between moral thought and behavior as well as consistency over time.

Assessing Moral Development

463

Definitions of Moral Development Researchers, educators, theologians, parents, social workers, law-enforcement personnel, and others often disagree about the proper way to define the domain of moral behavior. Therefore, to understand moral-development studies and moral-education programs, it is important to recognize which definition a particular author has adopted. And if we are to estimate the extent to which the authors' reported results can be trusted, it is useful as well to compare their definition with their assessment techniques to see how closely the two match. The comparison is intended to reveal the extent of construct or content validity of their evaluation approach. Because authors often fail to explain precisely the definition of moral on which their work is based, their definition may need to be inferred from an analysis of the assessment techniques used in their study. In effect, their operational definition of moral is reflected in what their assessment techniques actually measure. As for the future, I think it unlikely that the variety of conceptions about what constitutes the moral domain will decrease in number. I do believe, however, that people in the field will become m o r e precise in their definitions so that we understand more clearly what they mean by moral development and how their conceptions compare with others' definitions. In other words, I think that the trend toward greater precision that has been observed in the recent past will continue in the years ahead.

Facts, Concepts, Prescriptive Causal Relations, Attributions A great number of the investigations of moral development have focused on the nature of the facts, concepts, prescriptive causal relations, and attributions that people hold. The assessment techniques most frequently used to discover people's beliefs about such matters have been individual interviews (principally in the form of people's responses to moral dilemmas) and written tests. Such methods have appeared to be effective, but only for revealing those varieties of moral belief which a respondent can consciously recognize and report. Some investigators, in an effort to explore not only conscious beliefs but unconscious material as well, have employed such 'depth' assessment methods as Freud's psychoanalytic or Jung's analytic interview approaches. For the same purpose, others have used such projective devices as picture-story tests and incomplete-sentence tests. Typically, these 'depth' methods are used on the assumption that they will yield unconscious content in symbolic form. Subsequently a person's responses must be deciphered to reveal the meaning behind the symbols. But there has been great question in the field of assessment about the validity of deciphered interpretations when they are necessarily produced by other humans whose own unconscious biases can inadvertently distort the meanings they propose. Despite such limitations of available assessment methodology, the greatest success so far in evaluating moral development has been in discovering people's understanding of moral facts, concepts, prescriptive causal relations, and attributions of cause. Researchers have produced a large body of information about ways of conducting such assessment and about the conditions that influence assessment results and their interpretation. In speculating about the future, I imagine that the largest amount of published research

464

R. MURRAY THOMAS

will continue to involve assessments of people's conscious command of facts, concepts, principles, and causal relations. The reason for this proposed trend is not that the greatest need for information lies in such matters but, rather, that such things are the most feasible to study with the popular assessment techniques of the past. It seems likely that more effort will be invested in refining those techniques than in devising new ways of assessing components of development that have proven more difficult to evaluate. In an attempt to imbue moral dilemmas with a greater real-life quality, researchers and educators may be expected to develop improved assessment methods that draw on the advantages of electronic technology, such as videotape equipment and films. A videotaped moral drama can contain more variables that may affect people's real-life moral decisions than does the typical brief verbal anecdote that has been so popular for assessing moral reasoning in the past.

Emotional Aspects of Moral Development Attempts to evaluate the affective components of moral development have met with far less success than attempts to evaluate such cognitive aspects as facts, concepts, and causal relations. Emotions are a dimension of human experience that apparently cannot adequately be represented either in verbal form or by such physiological measures as heart rate, skin conductivity, or pupillary reaction. By adequately I mean that such words as hate and joy or the report of a pulse rate of 135 fails to convey the phenomenological experience of feeling such emotions. In contrast, cognitive e l e m e n t s - - facts, concepts, attributions of cause - - are more readily conveyed in verbal form, since the intellectual behavior they represent is essentially the manipulation of verbal symbols. People think by manipulating facts, concepts, and their relationships. Not only do emotions inadequately translate into words, but they seem to involve a heavy measure of the unconscious. Even if people could find the words to communicate the aspects of which they are clearly aware, they apparently still could not express the segment of emotions of which they are not aware. When the affective aspects of moral thought and action cannot adequately be assessed, we are unable to make much progress in answering such questions as: In what appear to be nearly identical moral encounters, how do the emotions of one person compare with the emotions of another? What influence does such an emotion as fear or anger exert on a person's thought and action in moral situations? And how will thought and action be affected by different degrees or strengths of such emotions? How does the expectation that a given emotion (joy, guilt, depression) will arise during a moral encounter influence the way an individual engages in (or avoids) the encounter? What emotions arc aroused by threats of punishment or promises of reward, and how to these emotions and their degree of strength either deter people from, or motivate them toward, engaging in the acts that ostensibly will produce the predicted punishment or reward? In brief, much work remains to be done if better means of assessing the affective components of moral development are to be produced. And while the need is great, the technical problems to be solved appear so formidable that I doubt much research attention will be given to improving methods for evaluating this aspect of moral development in the near future.

Assessing Moral Development

465

Moral Values -- Their Direction and Strength As mentioned in Chapter 6, moral values seem comprised of both cognitive and affective aspects. Investigators have been far more successful in assessing the cognitive than the emotional component. Available assessment approaches have enabled people to report their opinions about value direction (whether they approve or disapprove of a policy or behavior) and value strength (whether their approval or disapproval is strong or weak). However, the way emotions combine with these reported opinions so as to fashion the behavior a person will display in moral encounters has not been adequately appraised. Nor has the distinction or relationship between conscious and unconscious values been satisfactorily clarified. Such matters call for greater attention in theory-building and empirical study in the years ahead.

Processes of Moral Development As suggested in Chapter 7, mentalprocesses can be defined as the complex sequences of interactions among components of the mental apparatus that produce moral development. In recent years there has been a great surge of theoretical proposals and empirical studies appearing under the information-processing rubric. In effect, analyzing and assessing mental processes has become a very popular activity. Furthermore, those people working on such projects have displayed considerable ingenuity in devising methods for the analysis of mental processes. So it is that researchers recognize the need for understanding cognitive processes, new methods (including computer simulation of mental operations) appear to render investigations increasingly feasible, and studying cognitive processes has become a respected popular activity. As a consequence, the prospect seems bright that the near future will witness marked advances in the assessment of processes involved in moral development.

Explanations of the 'True' Causes of Moral Development A great amount of theorizing and a large number of empirical studies have been conducted to reveal the factors that determine why people develop their particular moral beliefs, thought processes, and behavior. And from the viewpoints of child rearing and constructive social action, questions about the 'real' causes of moral development are the most importannt ones to answer. These questions also appear to be the most difficult, since conditions of causation in the realm of morality can be extremely complex. Because of both their practical importance and the theoretical challenges they pose, questions of true causes will probably continue to attract a great amount of attention in the future in terms of theory, of improvements in research methodology, and of empirical studies conducted on an increased variety of populations under more diverse circumstances.

Assessment Techniques as Environmental Stimuli In Chapter 8, assessment techniques were analyzed in terms of how nearly they approx-

466

R. MURRAY THOMAS

imated the conditions that are obtained in real-life moral encounters. The two assumptions regarding realism on which the analysis was founded were: (1) The closer an assessment-setting matches the significant characteristics of a real-life environment, the m o r e accurately the results of that assessment will predict a person's moral reasoning and behavior in real-life moral encounters. (2) The more actively a person is involved in resolving a moral dilemma represented in the assessment situation, the more accurately the person's decisions and actions in that situation will predict that individual's moral reasoning and behavior in real-life moral encounters. When these two assumptions were applied as criteria for judging the quality of realism displayed by the various modes of assessment surveyed in Chapter 8, I concluded that many of the most frequently used assessment methods were found wanting. Much of the information that people appear to derive from the environment in real-life moral encounters to help guide their moral decisions is missing from many of the "arranged environments' represented by the most popular evaluation techniques. A brief anecdote I like the one about Heinz stealing a drug to treat his wife's i l l n e s s - - lacks many of the environmental components that apparently influence people's real-life moral decisions. In effect, many researchers have appeared to sacrifice the quality of realism for the sake of convenience or feasibility. I would speculate that the usefulness of the popular sorts of arranged environments as assessment media for predicting people's real-life moral thought and action is very questionable. The inconsistent and usually low relationship found between the results of such assessments and people's moral behavior would seem to support this observation. As for the future, 1 believe that both researchers and educators will seek to incorporate more life-like variables into assessments of moral development. One method for doing so, as mentioned above, is the use of videotaped dramatized moral encounters for students to analyze. A n o t h e r is the use of moral incidents that naturally occur during students' daily life in school or on the playground, with the students then attempting ex-post-facto analyses of the intentions and emotions involved in the encounters.

Matters of Consistency A variety of aspects of consistency in moral development have attracted the interest of researchers and educators alike. Two such aspects considered in Chapter 9 were those of consistency between moral reasoning and moral behavior and consistency of moral thought and behavior over time. The amount of attention given to assessing the relationship between moral thought and action has not yielded answers that are generally considered satisfactory. A n d while there is no lack of hypotheses about the factors that contribute to the thought/action relationship, techniques for adequately assessing these factors and their connections are still lacking. Therefore, what appears needed in the future are better theories about the components connecting moral thought to behavior and suitable methods of assessing these components. Methods for assessing the consistency of either thought or behavior over time are rather well understood, with long-term longitudinal studies yielding the most convincing answers about consistency for both individuals and groups. In the years ahead it would seem desir-

Assessing Moral Development

467

able to have more longitudinal investigations. However, the likelihood that many such studies will be forthcoming, I believe, is not great, because of the lengthy time-span they necessarily require and the practical difficulties encountered in finding and reassessing the people who participated in the first stage of an investigation. It seems likely, however, that increasing numbers of studies may consist of successive short-term (one or two years) longitudinal assessments of each of the age cohorts that are included in a cross-sectional design, thereby drawing on the advantages of both longitudinal and cross-sectional approaches. For example, subjects within four age groups (ages, 4, 7, 10, 13) might be assessed for their moral reasoning or behavior at six-month intervals over a period of two years, thereby furnishing data both about changes in individuals over the two-year period and about group trends extending from age four to age 15.

Additional Persistent Methodological Problems In addition to the array of difficulties with assessment methods mentioned throughout the foregoing chapters, three further methodological problems that continue to challenge researchers' ingenuity are those concerning: (1) techniques suitable for the analysis of both individuals and groups; (2) circularity in theorizing and assessing; and (3) surface structure versus deep structure.

Individuals and Groups One of the continuing knotty problems is that of devising evaluation techniques that will both: (1) accurately represent the uniqueness of an individual's perception of moral matters; and (2) produce data which permit valid comparisons between individuals and across groups. Piaget recognized this problem when he implied that he had reservations about how accurately each child's view of moral issues could be evaluated by means of a researcher telling the same moral incident to all children and then questioning them all in the same way. In effect, he judged his moral stories to be of necessity in "the crude schematic form that is indispensable to an interrogatory addressed to all" (Piaget, 1932/1948, p. 275). What seems needed is an assessment methodology which: (1) ensures that each person clearly comprehends the moral issues or dilemmas being faced; (2) samples all the varieties of belief and behavior that fall within the moral .domain; (3) includes all of the variables that may significantly influence people's decisions in real-life moral encounters; (4) presents moral dilemmas which are accepted by the subjects as representing true-to-life moral encounters; and (5) equips investigators to explain both the dynamics of in.dividuals' moral development and the relationship between such individual development and the belief and behavior patterns of groups.

Circularity in Theorizing and Assessing A further persistent problem is reflected in the question: When empirical assessment resuits fail to confirm a theoretical proposal about development adequately, where is the source of e r r o r - - in the assessment methodology or in the structure of the theory? Is some-

468

R. MURRAY THOMAS

thing wrong with the theory or does the problem lie with the assessment techniques and the investigators' manner of analyzing and interpreting the data? Kohlberg (1984, pp. 410-425) alluded to this problem in relation to empirical results which cast doubt on his proposals that: (1) his stages of moral development represent an invariant sequence; and (2) all six of his stages are not simply theoretical constructs but, rather, each of the stages is represented in the normal reasoning expressed by actual people in a society. Instead of abandoning either his theoretical proposals or his assessment methodology, Kohlberg adopted a variety of Loevinger's (1979) "saving circularity" to rationalize his readjusting the relationships between the theory and the assessment approach. Kohlberg's definition of an invariant sequence of moral-growth stages included the notion that once a person advanced to a higher stage in the sequence, he would never return to a lower stage of moral reasoning. However, data gathered over the years from the same group of subjects he originally studied brought his notion into question, because in subsequent years some of the individuals displayed lower-stage reasoning than they had when originally interviewed. To rationalize this inconsistency, Kohlberg revised his method of scoring interview responses in relation to the stages they were thought to represent. In other words, he readjusted his method of interpreting the results so they would fit his theoretical construct rather than altering the construct to fit the obtained empirical data. This m o d e of saving his stage concept in the face of apparently non-confirming data is what he referred to as "saving circularity". He identified the process with Pierce's (Kohlberg, 1984, p. 425) comparing scientific theory to a: leaky boat you patch in one place and then stand on in another place while you patch or revise e l s e w h e r e . . , l'd be happy to stop patching up Piagetian assumptions if I could see another boat on the horizon which handled my problems and data better than the stage concept. H o w e v e r , in the case of his proposition that his Stage 6 was truly represented in people's belief systems, Kohlberg admitted that saving circularity does not "always save all cherished assumptions". When his empirical studies failed to reveal anyone expressing Stage 6 moral reasoning, he concluded that "Stage 6 remains as a theoretical postulate but not an operational empirical entity" (1984, p. 425). In summary, then, researchers continue to struggle with problems of how to convincingly rationalize discrepancies between assessment results and theoretical conceptions regarding moral development.

Surface Structure Versus Deep Structure A related persistent problem concerns the relationship between the apparent surface characteristics evaluated by an assessment technique and the deeper personality structures that the investigator actually hopes to evaluate. Kohlberg has suggested that difficulties with this matter of the connection between surface and subterranean structures helped account for the above-mentioned discrepancy between his theory and his empirical results. "~Our stage definitions had confused content or surface structurc with the deeper structure wc meant to describe with our stages" ( 1984, p. 411 ). So it is that researchers often hypothcsizc thai traits or processes embedded within thc dccp structurc of personality arc key componcnts of moral development. It then becomes the investigators" responsibility to cstablish construct validity for their a s s e s s m c n t methods

Assessing Moral Development

469

by convincingly demonstrating that results of their assessment techniques are accurate reflections - - a least on the level of logical analysis - - of the hypothesized traits and processes. This demanding assignment has yet to be performedd to everyone's satisfaction.

Conclusion As the contents of this monograph have shown, over the past half century the matter of assessing moral development has attracted the efforts of a large n u m b e r of researchers and educators, with interest in the field growing markedly in recent years. Such activity has resulted in both the creation of new assessment techniques and the refinement of existing methods. However, there remain many unsolved problems both in the realm of theory that defines what factors should be assessed and in the technical development of assessment devices for accurately evaluating the components suggested by theory. In short, there is no lack of challenges in the years ahead for people who wish to contribute to progress in this field.

REFERENCES

Abrahamsen, D. (1960). The psychology of crime. New York: Columbia University Press. Allport, G. W., Vernon, P. E., & Lindsey, G. (1900). Study of values (3rd ed.). Chicago: Riverside. Amatora, S. M. (1944-1962). Personality rating scale. Cincinatti, Ohio: Educators-Employers' Tests and Services Associates. Baier, K. (1965). The moralpoint of view. New York: Random House. Bandura, A., & Waiters, R. H. (1963). Social learning and personality development. New York: Holt, Rinehart, and Winston. Bloom, B. S. (1956). Taxonomy of educational objectives: Handbook I, cognitive domain. New York: David McKay. Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. American Psychologist, 32,514-532. Buros, O. K. (1974). Tests inprint 11. Highland Park, New Jersey: Gryphen. Buros, O. K. (1978). The 8th mental measurements yearbook. (Vol. 1). Highland Park, New Jersey: Gryphen. Chandler, M. J., Greenspan, S., & Barenboim, C. (1973). Judgments of intentionality in response to videotaped and verbally presented moral dilemmas: The medium is the message. Child Development, 44, 315-320. Colby, A., & Kohlberg, L. (1984). The measurement of moral judgment. Cambridge, Mass.: Harvard University Press. Colby, A., Kohlberg, L., Gibbs, J., & Lieberman, M. (1983). A longitudinal study of moral development. (Monographs of the Society for Research in Child Development), Serial No. 200, 48. Collins, J. T., & Hagen, J. W. (1979). A constructivist account of the development of perception, attention, and memory. In G. A. Hale & M. Lewis (Eds.), Attention and cognitive development. New York: Plenum. Cookson, P. W. (1982). Boarding schools and the moral community. Journal of Educational Thought, 16(2), 89-97. Damon, W. (1977). The social world of the child. San Francisco: Jossey-Bass. Damon, W. (1980). Structural-development theory and the study of moral development. In M. Windmiller, N. Lambert, & E. Turiel (Eds.), Moral development and socialization (pp. 35-68). Boston: AIlyn & Bacon. Damon, W., & Killen, M. (1982). Peer interaction and the process of change in children's moral reasoning. Merrill-Palmer Quarterly, 28(3), 347-367. Engel, M., Cohen, R. B., & Sanfilippo, J. (1964). Children tell stories about school - - an exploration of the differences in elementary grades. Paper read at the American Psychological Association Convention, Los Angeles, 1964. Enright, R. D., Franklin, C. C., & Manheim, L. A. (1980) Children's distributive justice reasoning: a standardized and objective scale. Developmental Psychology, 16, 193-202. Erikson, E. H. (1964). Insight and responsibility. New York: W. W. Norton. Forgas, J. P. (1979). Social episodes. London: Academic. Freud, S. (1938/1973). (1973 edition translated into English from the 1938 German edition) An outline of psychoanalysis. London: Hogarth. Gibbs, J. C., Widaman, K. F., & Colby, A. (1982). Construction and validation of a simplified, groupadministerable equivalent to the moral judgment interview. Child Development, 53,895-910. Gibbs, J. C., Widaman, K. F., & Colby, A. (1980). The sociomoral reflection measure. In L. Kuhmerker, M. 471

472

R. MURRAY THOMAS

Mentkowski, & V. L. Erickson (Eds.), Evaluating moral development. Schenectdy, NY: Character Research Press. Gilligan, C. (1982). In a different voice. Cambridge, Mass.: Harvard University Press. Grinder, R. E. (1962). Parental child-rearing practices, conscience and resistance to temptation of sixth grade children. Child Development, 35,802~$20. Gronlund, N. E. (1982). Constructing achievement tests (3rd ed.). Englewood Cliffs, N.J. : Prentice-Hall. Grueneich, R. (1982). Issues in the developmental study of how children use intention and consequence information to make moral evaluations. Child Development, 53, 2943. Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139-150. Haan, H., Smith, M. B., & Block, J. (1968). Political, family, and personality correlates of adolescent moral j udgment. Journal of Personality and Social Psychology, 10, 183-201. Habermas, J. (1983). Interpretive social science vs. hermeneuticism. In N. Haan, R. Ballah, P. Rabinow, & W. Sullivan (Eds.), Social science as moral inquiry. New York: Columbia University Press. Hagen, J. W., & Hale, G. A. (1973). The development of attention in children. In A. D. Pick (Ed.), Minnesota symposia on child psychology (Vol. 5). Minneapolis: University of Minnesota Press. Harris, B. (1977). Developmental differences in the attribution of responsibility. Developmental Psychology, 13, 257365. Hartshorne, H., & May, M. A. (1928). Studies in deceit: book one. New York: Macmillan. Hays, W. L. (1981). Statistics (3rd. ed.). New York: Holt, Rinehart, & Winston. Henderson, M., Morris, L. L., & Fitz-Gibbon, C. T. (1978) How to measure attitudes. Beverley Hills, Cal. : Sage. Henry, R. M. (1981). Validation of a projective measure of aggression-anxiety for five-year-old boys. Journal of Personality Assessment, 45,359-369. Henry, R. M. (1983). The psychodynamic foundations of morality. Basel: Karger. Higgins, A., Power, C., & Kohlberg, L. (1984). The relationship of moral atmosphere to judgments of responsibility. In W. M. Kurtines & J. L. Gewirtz (Eds.), Moral#v, moral behavior, and moral development. New York: Wiley. Hoffman, M. L. (1970). Moral development. In P. Mussen (Ed.), Carmichael's manual of child psychology (3rd cd.). New York: Wiley. Hogan, R. (1970) A dimension of moral judgment. JournalofConsulting and Clinical Psychology, 35,205-212. Hogan, R., & Dickstein, E. (1972). A measure of moral values. Journal of Consulting and Clinical Psychology, 39,210-214. Johnson, J. E., & McGillicuddy-Delisi, A. (1983). Family environment factors and children's knowledge of rules and conventions. Child Development, 54, 218-226. Johnson, R. A., & Wichern, D. W. (1982). Applied multivariate statistical analysis. Englewood Cliffs, NJ: Prentice-Hall. Kail, R. (1979). The development o f memory in children. San Francisco: W. H. Freeman. Kohlberg, L. (1967). Moral and religious education in the public schools: a developmental view. In T. R. Sizcr (Ed.). Religion andpublic education. Boston: Houghton Mifflin. Kohlberg, L. (1971). From is to ought, in T. Mischel (Ed.), Cognitive development and epistemology. New York: Academic. Kohlberg, L. (1976). Moral stages and moralization: the cognitive-developmental approach. In T. Likona (Ed.), Moral development and behavior. New York: Holt, Rinehart, & Winston. Kohlberg, L. (1984). The psychology of moral development. San Francisco: Harper & Row. Kohlberg, L., & Candee, D. (1984). The relationship of moral judgment to moral action. In W. M. Kurtines & J. L. Gewirtz (Eds.), Morality, moral behavior, and moral development (pp. 52-73). New York: Wiley. Krathwohl, D. R., Bloom, B. S., & Masia, B. B. (1964). Taxonomy o f educational objectives: Handbook I1, a lfective domain. New York: David Mckay. Kreutzer, M. A., Leonard, C., & Flavell, J. H. (1975). An interview study of children's knowledge about memory. Monographs of the Society for Research in Child Developrnent, 40 (1, Serial No. 159), 1-58. Kurtines, W. M., & Gewirtz, J. L. (Eds.). (1984). Moralio,, moral behavior, and moral development, New York: Wiley. Kusche, C. A.. & Greenberd, M. T. (1983). Evaluative understanding and role-taking ability: A comparison of deaf and hearing children. Child Development, 54, 141 147. Lapsley, D. K.. Harwell, M. R., Olson, L. M., Flannery, D., & Quintana, S. M. (1984). Moral judgment, personality, and attitude to authority in early and late adolescence. Journal of Youth and Adolescence, 13(6), 527-542. kehrer, R. 1,. (1967). Sex dif¢k'rences in moral behavior and attitudes. Unpublished Ph.D. dissertation, University of Chicago. Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 1-55. Lin. H. Y. (1985). A Confucian theory of human development. In T. Husen & T. N. Postlethwaite (Eds.), International encyclopedia of education (Vol. 3, pp. 969-972). Oxford: Pergamon. Lippey, G. (Ed.). (1974). Computer-assisted test construction. Englewood Cliffs, N J: Educational Technology

Assessing Moral Development

473

Publications. Loevinger, J. (1979). Scientific ways in the study of ego development. Heinz Werner Lectures Series. (Vol. 12). Worcester, Mass. : Clark University Press. Loevinger, J., & Wessler, R. (1970). Measuring ego development (Vol. 1). San Francisco: Jossey-Bass. Luria, A. R. (1976). Cognitive development: Its cultural and social foundations. Cambridge, Mass.: Harvard University Press. Maccoby, E. (1968). The development of moral values and behavior in childhood. In J. A. Clausen (Ed.), Socialisation and society. Boston: Little, Brown, & Co. Maitland, K. A., & Goldman, J. R. (1974). Moral judgement as a result of peer-group interaction. Journal of Personality and Social Psychology, 30,699-704. Mandler, J. M., & Johnson, N. S. (1977). Remembrance of things parsed; story structure and recall. Cognitive Psychology, 9, 111-151. Marascuilo, L. A., & Levin, J. R. (1983). Multivariate statistics in the social sciences. Monterey, Ca.: Brooks/ Cole. Marshall, J. D., & Hales, L. W. (1973). Essentials o f testing. Reading, Mass.: Addison-Wesley. Maslow, A. H. (1970). Motivation and personality (2nd ed.). New York: Harper & Row. Mednick, S. (1985). Crime in the family tree. Psychology Today, 19, (3), 58~il. Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30,955-966. Michel, C. (1985). A model of environmental influences on moral development. Unpublished doctoral dissertation, University of California. Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social Psychology, 67,373-378. Miller, D. T., & McCann, C. D. (1979). Children's reactions to the perpetrators and victims of injustices. Child Development, 50,861-868. Mitchell, J. V. (1983). Tests in print IlL Lincoln: University of Nebraska Press. Mixon, D. (1974). If you don't deceive, what can you do? In N. Armstead (Ed.), Reconstructing social psychology. Harmondsworth: Penguin. Morgan, C. D., & Murray, H. A. (1935). A method for investigating phantasies: The Thematic Apperception Test. Archives of Neurology and Psychiatry, 34,289-306. Murray, H. A. (1943). Thematic Apperception Test (3rd rev.). Cambridge, Mass.: Harvard University Press. Nil, N. H., Hull, C. H., Jenkins, J. G., Steinbrenner, K., & Brent, D. H. (1975). SPSS. New York: McGraw-Hill. Nitko, A. J. (1983). Educational tests and measurement: An introduction. New York: Harcourt Brace Jovanovich. Nucci, L. P., & Nucci, M. S. (1982). Children's social interactions in the context of moral and conventional transgressions. Child Development, 53,403-412. Nucci, L. P., & Turiel, E. (1978). Social interactions and the development of social concepts in preschool children. Child Development, 49,400--407. Obeid, R. (1985). An Islamic theory of human development. In T. Hus6n & T. N. Postlethwaite (Eds.), International encyclopedia of education (Vol. 5). Oxford: Pergamon. Oppenheim, A. N. (1966). Questionnaire design and attitude measurement. New York: Basic Books. Osgood, C. E. (1952). The nature and measurement of meaning. Psychological Bulletin, 49,197-237. Page, R., and Bode, J. (1980). Comparison of measures of reasoning and development of a new objective measure. Educational and Psychological Measurement, 40, 317-329. Piaget, J. (1932/1948). (1948 edition translated into English from original 1932 French edition) The moral judgment o f the child. Glencoe, Ilk Free Press. Pratt, M. W., Golding, G., & Hunter, W. J. (1983). Aging as ripening; character and consistency of moral judgment in young, mature, and older adults. Human Development, 26,(5), 277-288. Raths, L. E., Harmin, M., & Simon, S. B. (1966). Values and teaching. Columbus, Ohio: Charles E. Merrill. Rest, J. R. (1979). Development in judging moral issues. Minneapolis: University of Minnesota Press. Rest, J. R., Cooper, D., Coder, R., Masanz, J., & Anderson, D. (1974). Judging the important issues in moral dilemmas - - an objective test of development. Developmental Psychology, 10,491-501. Rhine, R. J., Hill, S. J., & Wandruff, S. E. (1957). Evaluative responses of preschool children. Child Development, 38, 1035-1042. Rokeach, M. (1968). Beliefs, attitudes, and values. San Francisco: Jossey-Bass. Rokeach, M. (1967-1973). Rokeach value survey. Sunnyvale, Cal. : Halgren Tests. Rothman, G. R. (1980). The relationship between moral judgment and moral behavior. In M. Windmiller, N. Lambert, & E. Turiel (Eds.), Moral development and socialization. Boston: Allyn & Bacon. Rousseau, J. J. (1773). Emilius; or, A treatise of education. Edinburgh: W. Coke. Rumelhart, D. E. (1975). Notes on a schema for stories. In D. G. Bobrow, & A. Collins (Eds.), Representation and understanding: studies in cognitive sciences. New York: Academic Press. Rybash, J. M., Roodin, P. A., & Hallim, K. (1979). The role of affect in children's attribution of intentionality

474

R. MURRAY THOMAS

and dispensation of punishment. Child Development, 50, 1227-1230. Schlesser, G. E., Finger, J. A., & Lynch, T. (1941-1969). Personal value~ inventor),. Hamilton, NY: Colgate University Testing Service. Schneirla, T. C. (1957). The concept of development. Minneapolis: University of Minnesota Press. Shaw, M. E., & Wright, J. M. (1967). Scales for the measurement of attitudes. New York: McGraw-Hill. Shultz, T. R. (1980). Development of the concept of intention. In W. A. Collins (Ed.), Development o] cognition, affect, and social relations (pp. 131-164). Hillsdale, New Jersey: Lawrence Erlbaum. Siegal, M. (1982). Fairness in children. London: Academic Press. Sigel, I. E., Flaugher, J., Johnson J. E., & Schauer, A. (1977). Parental-child interaction coding manual. Princeton, N J: Educational Testing Service. Skinner, B. F. (1983). Origins of a behaviorist. Psychology Today, 17(9). Stein. N. L., & Glenn, C. G. (1979). An analysis of story comprehension in elementary school children. In R. O. Freedle (Ed.), New directions in discourse processing (Vol. 2) Norwood, N J: Ablex. Tate, R. (1985). Experimental design. In "P. Hus6n & T. N. Postlethwaite (Eds.), International encyclopedia o f education (Vol 3). Oxford: Pergamon. Thomas, R. M. (1985). A Christian theory of human development In T. Husen & T. N. Postlethwaite (Eds.), International en~3,clopedia o f education (Vol. 2). Oxford: Pergamon. Thomas, R. M., Clermont, C., & Mainbolwa-Sinyangwe, I. (1984). Assessing the meaning of moral. Studies in Educational Evaluation, 10, 167-177. Thomas, R. M., & Murray, P. V. (1982). CASES: A resource guideJ~r teaching about the haw. Glenview, 111.: Scott, Foresman. Thomas, R. M., & Niikura, R. (1985). A Shinto theory of human development. In T. Husen & T. N. Postlethwaite (Eds.), h~ternational encTclopedia of education. Oxford: Pergamon. Thorndike. R. M. (1985). Correlation procedures. In T. Husen & T. N. Postlethwaitc (Eds.), International encyclopedia of education. Oxford: Pergamon. Thurstone, L. L. (1959). The measurement of values. Chicago: University of Chicago Press. Thurstone, l.. L., & Chave, E. J. (1929). The measurement of attitude. Chicago: University of Chicage Press. Tiee. T. N. (198(I). A psychoanalytic perspective. In M. Windmiller, N. Lambert & E. Turiel (Eds.), Moral development and socialization (pp. 161-199). Boston: Allyn & Bacon. Turicl~ E. (1978). The development of concepts of social structure: social convention. In J. Glick & A. ClarkeStewart (Eds.), Personality attd social development (Vol. l). New York: Gardner Press. Turiel. E. (1980). The development of social-conventional and moral concepts. In M. Windmiller, N. Lambert & E. Turiel (Eds.), Moral development and socialization (pp. 69-106). Boston: Allyn & Bacon. Wcston, D., & Turiel, E. (198(I). Act rule relations: children's concepts of social rules. Developmental P~Tchology, 16,417 424. Wilson, E. O. (1978). On human nature. Cambridge, Mass.: Harvard University Press. Wilson, G., & Patterson, J. (1968). A new measure of conservatism. British Journal ~1"Social and Clinical Ps)'chology, 8,264-269. Worchel, S., & Cooper, J. (1979). Understanding social psychology. Homewood, II1.: Dorsey. Wrightstone, J. W., & Doll, R. C. (1951). Life adjustment inventory. Munster, Indiana: Psychometric Affiliates. Zuckerman. D. (1985). Can genes help helping':' Psychology Today, 19(3), 80.

APPENDIX S U B J E C T I N D E X F O R "ASSESSING M O R A L D E V E L O P M E N T " B Y R. M U R R A Y T H O M A S

Affect 354,407-413 Anecdotal confirmation 453-454 Assessment defined 350 Assumptions, foundational 355-356 Attributions 405-406

Facts 373 Fairness 363 Feasibility 361 Generalizations

Biometric measure

373-376

412 Heredity

Caring 364 Causal relations 351-352,374 descriptive/explanatory 351-352,374,393-406 prescriptive/principles 351-352,374-376 Children's Apperception Test 408 Clinical method 377-378,385 Concepts 373,375-376 Conscious 352,423-427 Consistency, moral 355,361,453-459,466-467 across time 457-459 of thought and action 453-459 Correlation 40(/-404 multiple 401 partial 401 Critical-life-incident inquiry 432 Cross-cultural studies 368-371 Cultural relativism 364, 368-371

395-396

Individualistic relativism 364 Information-processingtheories 351-353,427-429 Interviews 37%386, 416417, 445-446,454-456

Justice 363 Justificatory reasoning 376 Life Adjustment Inventory 418

Defining Issues Test 356,386-387 Deliberative reasoning 375-376 Developmental history 396-397

Mental processes 351-353,423-433 Moral atmosphere 439-440 Moral development defined 349,363-371 theory 35(t-355,429-432 Moral dilemmas 378-386 Moral education defined 349 Moral Judgment Interview 356,387 Motion pictures 449-450

Emotions

Observation 456

354, 407413

Environmental influences 396,435-452 Equivalence, coefficient of 357 Ethics 364 Evaluation defined 350 Experiments 404-405, 456457

Path analysis 401-403 Personality Rating Scale 418 Personal Values Inventory 418 Pictorial materials 420-421 Proiective techniques 407-411 475

476 Psychoanalytic inquiry Punishment 383-384

R. MURRAY THOMAS 423-427

Reliability 355 Rokeach Value Survey 418 Role playing 450-451 Rorschach Test 408 Rules 363,364, 384-386 Semantic-differential measures 411-412 Simulated moral incidents 446-451 Social convention 363 Social-learning theory 351-353,427-428 Social relativism 364,368-371 Sociodrama 45(I-451 Sociomoral Reflection Measure 387-388 Spontaneity 361 Stability, coefficient of 357,361 Stages of development 366,380-386 Stories, finished and unfinished 410,446-449 Strategies defined 423 Study of Values 418 Supernatural intervention 398 Tests academic achievement

completion 390-391,420 essay 391,420 matching 390 multiple-choice 389-390,419 production 387-388 recognition 386-387 sentence-completion 409-410 short-answer 391,420 true-false 389,419 written 38(~-392 Thematic Apperception Test 408 Twin studies 403-404 Unconscious 352,354-355,423-427 Validity, types of 355,357 360 Values defined 374,415-416 direction and strength 415-421 tests and inventories 417-419 Videotapes 444,449-450

Washington University Sentence Completion Test 409 Willpower 397 389-392,419-420