Evaluation and research for technology: not just playing around

Evaluation and research for technology: not just playing around

Evaluation and Program Planning 26 (2003) 169–176 www.elsevier.com/locate/evalprogplan Evaluation and research for technology: not just playing aroun...

110KB Sizes 0 Downloads 24 Views

Evaluation and Program Planning 26 (2003) 169–176 www.elsevier.com/locate/evalprogplan

Evaluation and research for technology: not just playing around Eva L. Bakera,*, Harold F. O’Neil Jr.a,b a

University of California, National Center for Research on Evaluation, Standards and Student Testing (CRESST), 300 Charles E Young Drive North, Room 301, Los Angeles, CA 90095-1522, USA b Rossier School of Education, 600 Waite Phillips Hall, University of Southern California, University Park, Los Angeles, CA 90089-0031, USA Received in revised form 1 December 2001

Abstract In virtually every part of American life, the development of usable skills and knowledge is essential. Documentation of performance is needed in order to demonstrate competency in school, to acquire and maintain employment, for career advances, and especially in areas where the cost of failure is unacceptable, such as in medicine, protective services, and the military. Technology has emerged as both a context for and a solution to educating the great numbers needing to learn and to expand their repertoires. In this paper, we will discuss some of the challenges of technology-based training and education, the role of quality verification and evaluation, and strategies to integrate evaluation into the everyday design of technology-based systems for education and training. q 2003 UCLA. Published by Elsevier Science Ltd. All rights reserved. Keywords: Evaluating technology; Instructional technology; Military applications; Technology in education; Test development

1. Introduction Although there have been many definitions of instructional or training technology, the best is encompassed in Lumsdaine’s notion of ‘a vehicle which generates an essentially reproducible sequence of instructional events and accepts responsibility for efficiently accomplishing a specified change’ (Lumsdaine, 1964, p. 385). In other words, instructional technology is instruction that yields predictable outcomes. Although this broad definition can encompass procedures that operate without the benefit of hardware systems (that is, Hatha Yoga or workable recipes for Hollandaise sauce could be conceived of as technologies under this definition), most common-language interpretations of education and training technology focus on the hardware and involve some form of multimedia, interactivity, display technology, and sophisticated background software support. To determine whether Lumsdaine’s definition were true, one would first have to investigate the extent to which instruction were replicable (including the range of action that learners could be called on to exhibit during the experience). The second feature of Lumsdaine’s definition of technology, ‘accepts responsibility’ for results, puts the paraphernalia of technology far in the background * Corresponding author. Tel.: þ 1-310-206-1530; fax: þ1-310-267-0152. E-mail address: [email protected] (E.L. Baker).

(that is, the wires, wireless networks, active matrices and the like) and pushes into the foreground instead the requirement of effectiveness. Given known conditions, or specifying a range of contexts, instructional interaction with the system will yield reproducible results—the technology produces known and predictable outcomes—somewhat like a telephone. In this paper, our focus is simple: do educational technologies meet Lumsdaine’s second criterion: are they capable of accepting responsibility and delivering results? Can we find out, using acceptable, scientific methods, including credible evaluation approaches? More importantly, how can we predict outcomes of particular technologically supported experiences with increasing accuracy? Strong prediction is the gold standard; it will allow us to create robust, scientifically-based guidelines to assist those charged with the design and development of technological interventions as well as to set expectations for the procurement of software interventions marketed with effectiveness claims.

2. Challenges of technology We have taken as contexts for discussion two sectors of technology use: education of the civilian population, usually

0149-7189/03/$ - see front matter q 2003 UCLA. Published by Elsevier Science Ltd. All rights reserved. doi:10.1016/S0149-7189(03)00008-9

170

E.L. Baker, H.F. O’Neil Jr. / Evaluation and Program Planning 26 (2003) 169–176

in schools, and the training of military personnel. In the first case, educational technology is often more broadly conceived. It includes not only those implementations or actions intended to improve instruction, but also approaches that might be used to train teachers and to communicate better with families and constituents. Educational technology stands out since most of education in practice uses so little technology. In the military, the situation is somewhat reversed. Military readiness increasingly depends on technology of all sorts included in everyday work, whether weapons, monitoring devices, or vehicles. The military is high tech, and in a reversal of the civil education system, much of its training has not been embedded in the technology readily available to its members. One goal of our approach is to inform those researchers who have focused on only one of these areas, either school or military education, about the issues resident in the other. Second, we believe that there are ways in which these two sectors can learn from one another. If we are to create as rapidly as possible the best new interventions to meet training and education needs, then we need to draw upon reliable knowledge, whatever its source. 2.1. Technology use in education The role of technology in education can be summarized by one word over the last 50 years: It is always ‘poised’ to make an impact, but almost never does. Educational technology remains promising, with many as yet unfulfilled promises. From Encyclopaedia Britannica films and lantern slides to wireless, content-rich environments, whenever technology has been applied to educational and training settings, success is usually just around the next corner. What are the constraints in regular schools that account for the cycle of great hope and little success? There seem to be a number of reasonable candidates. Schools and education systems have diffuse goals despite their increasing reliance on educational content standards (or explicit goals), such as the expectations incorporated in the recent reauthorization of the Elementary and Secondary Education Act, No Child Left Behind (NCLB) (No Child Left Behind Act of 2001, 2002). Even the tightest educational goal statements are open to interpretation, and this flexibility results in mismatches of instructional strategy and measured outcomes. In this milieu, it is easy to imagine that technology will provide a remedy, but it is subject to the same potential mismatch of its capacity and the available goals and outcome measures. Furthermore, because the education system is decentralized constitutionally and at every unit of organization—state, district, school and classroom-flexibility and confusion are maximized. Instruction, for example, is built on the premise of ‘teacher knows best’, independent of teacher quality. But even if quality of teacher preparation and competence were held constant, a set of unclear goals would be rendered through very different approaches to teaching. Enter a technological

intervention, with the intended purpose of improving learning in this complex setting. For the most part, despite their operational flexibility, teachers have very little authority to select instructional interventions. So the technology is not sought by the end users (the teachers, let’s say) but rather is provided by someone outside the classroom as a ‘solution’ to be incorporated into teachers’ practice. Not surprisingly, teachers do not automatically know how to incorporate off-the-shelf interventions or even tools into their daily activities. Consequently, many forms of technology become marginalized and are used ‘after the real work has been done’, either as rewards or in remedial contexts. As a result, technology in the form of computer support is installed in numerous schools, but rarely has it found its way into an integrated curriculum intended to lead to measurable learning outcomes. Education also lags in embracing technology as a regular part of daily life. Only with cell phone technology in the last few years have teachers even had the ready use of communication devices that are automatically a part of every other work environment. The educational budgeting process still treats technology as a capital expenditure to be amortized rather than as a consumable to be replenished. As a result, there is a persistence of low-end technology, and training on obsolete equipment. Educational bureaucracies are both financially stressed and timid. They need to maintain their legitimacy now simply by meeting explicit goals or targets and by pleasing and placating their various constituencies, including parents, policymakers, administrators, teachers, and, of course, the media. Technology in classrooms may be seen as a marker of educational equity and fairness among populations rather than a coherent set of tools with procedures designed to improve learning outcomes. As one would expect, recent data suggest that computers are found in schools, in greater numbers, with increasing bandwidth, and in distributions that presage the closing of the gap between poor and advantaged students (National Center for Education Statistics, 2002). Yet, the lagging implementation of technology options in schools continues to be explained by the lack of understanding of teachers and the prohibitive costs required for professional development to help individuals use off-the-shelf technology options to improve achievement or to enrich the curriculum. Increased connectivity of education systems and the availability of the Internet as a content resource occupy a niche in schooling, along with the use of word processing, spreadsheets, and, to a lesser degree, databases; but for the most part, technology use is not central in the teaching and learning of most students. Not surprisingly, repeated attempts to measure the impact of technology (Baker & Herman, 2003; Baker, Herman, & Gearhart, 1996b; Baker, Niemi, Novak, & Herl, 1992) have struggled with the outcomes to which technology is supposed to contribute, the actual implementation of any target intervention or system, and the challenge to disentangle the contributions of technology from the vast

E.L. Baker, H.F. O’Neil Jr. / Evaluation and Program Planning 26 (2003) 169–176

differences in teaching, teacher commitment, and other context influences. Summary. Technology implementation in the public sector of education has enormous challenges to address. Diffuse in its goals and authority, and at the financial margin, public education and the legitimacy of educational functions and personnel are frequently challenged. Educators are, at best, consumers of technology created by others to address problems thought to be relevant to learning. Because access is through commercial markets, the designer creates systems intended for broad implementation, resulting in poor fits with individual teacher or even school needs. The primary gatekeeper, the teacher, typically does not see technology as an integral part of his or her work, as it is noticeably absent in the system at large. There is no culture of providing the best environments in schools, whether for facilities, music education, or up-to-date technology. If technology doesn’t matter to the oversight system, it is likely to be irrelevant to the on-the-ground potential user. 2.2. Instructional technology in military applications Compare the way technology is used in US military training and readiness environments. In this kind of setting, technology is both a key requirement of job knowledge and an approach that is an integral part of training and education for a number of reasons. First, effort is put forth to make the outcomes of training and education consistent and effective. There is buy-in on agreed outcomes. Consequences of poor training are observable and have obvious high-stakes outcomes. While educators may imagine that military training goals rely exclusively on procedural learning (e.g. put widget x in slot y and rotate), the military has always had as education and training goals what they have termed ‘soft skills’, such as leadership. In addition, recent changes have resulted in a commitment to education and training in far more complex forms of information utilization and problem solving. Specifically, current redesigns of the Army (Future Combat System) and the Navy (Task Force EXCEL) training systems emphasize goals demanding complex problem solving, rapid assimilation of information, and close coordination of independent units. The factors leading to this decision to increase goal difficulty include a greater commitment to the decentralization of decision making, more rapid deployment, engagement in areas where little previous knowledge has been accumulated, and the expanded requirements of complex technology. As a result, training tasks are planned for those just out of high school that would shake the confidence of most civilian educators, notwithstanding the current commitment to challenging educational goals in schools. These very complex tasks are to be accomplished by the very students often less successful in the civilian educational system. So at precisely the time that the civilian policy through NCLB is addressing challenging educational standards, the military is raising

171

the bar and implementing complex problem solving with trainees who may not have been academically talented. Whereas the civilian education system is bureaucratic, decoupled, and staffed by individuals with a wide range of competence, the military sector is making a multipart investment focused on applying training knowledge to emerging technical systems, adapting the training system to predictable fluctuations in the capability of the available pool, and documenting performance of individuals and units. Because high standards have been set internally and timelines are compressed, the military has a history of designing its own training albeit with the assistance of contractors. The military can create and manage ‘doctrine’ rather than the policy that each instructor knows best, although there is considerable variation among instructors. By supporting research and development on the attainment of high-quality learning outcomes, the US military research establishment has made major contributions to both the theory and the practice underlying the systematic design and implementation of instruction. Because of its unique mission, the military early on recognized the need for high-stakes instruction—that is, instruction where the consequences of success or failure have unmistakable and identifiable consequences for the individual, for the unit, and, potentially, for the larger society. Therefore, military versions of instructional programs have been far closer to Lumsdaine’s (1965) original features than instruction typically produced by or for the civilian sector. The rigor exercised by the military R&D community in creating systematic training and managing it on a large scale has resulted in significant impact on instructional models adopted for use by the education and business communities. While the advances that can be credited to the military are numerous and diverse, it is possible to collapse these contributions into four major and somewhat overlapping stages: (1) advances in testing; (2) the application of research-based knowledge; (3) the systematizing of instructional design and development; and, (4) advances in cognitive science and technology. The first stage, creation of the intellectual technologies underlying the use of standardized tests to identify key abilities, has been summarized in a variety of volumes (Cronbach & Suppes, 1969; Mayberry, 1987; Wigdor & Green, 1991). The second stage, the support of empirical studies of training, has a long and well-known tradition in military settings. Applied research studies conducted during WW II are a case in point. Theoretically oriented studies were conducted in applied training settings, where competing psychological models of social psychology, perception, and learning were experimentally examined. The impact of treatment variations interacting with individual differences was assessed on procedural, conceptual, and attitudinal learning, as measured by immediate and longer-term retention and transfer tasks. For a coherent methodology and sets of examples, see Hovland, Lumsdaine, and Sheffield (1949). That these studies were undertaken during

172

E.L. Baker, H.F. O’Neil Jr. / Evaluation and Program Planning 26 (2003) 169–176

the crisis of war illustrates the early commitment of senior leadership to the continual improvement of training effectiveness. The third stage, instructional systems design (ISD), involved the creation and routinization of models of instructional development. In the 1960s and 1970s, the best thinking from psychologists, evaluators, and management experts was synthesized into ISD models based on several principles: a focus on goals, the reduction of tasks into manageable steps, and the inclusion of strategies such as prompting, feedback, and practice (Baker, 1973; Briggs, 1977; Gagne´ & Briggs, 1974; Glaser, 1965a,b, 1976; Merrill & Tennyson, 1977; O’Neil, 1979a,b). Unfortunately, these models became less powerful as they were expected to be used by relatively unsophisticated training personnel, who had strong subject matter expertise (e.g. how to troubleshoot a particular subsystem) but little knowledge about teaching and learning. The models became highly institutionalized, bureaucratized, rationalized, and far too rigid to support the development of training for highly variable contexts, a relatively small numbers of users, and extraordinarily complex skills. As a counterweight to the rigid formalization of ISD, the research community refocused attention on specialized domains, rather than overall system development. In the 1970s, 80s and 90s, explorations and extrapolations from cognitive psychology and constructivism, on the one hand, and computer science, on the other, led to a synthesis of the several strands of work (Alessi & Trollip, 2001; Gagne´ & Medsker, 1996; Jonassen, Peck, & Wilson, 1999; van Merrie¨nboer, 1997) that collectively represent the most recent fourth stage. Computer-oriented researchers attended to applications of technological advances in newly merged fields involving simulations, computational linguistics, speech recognition, graphical interfaces, and networking. Considerable investment was also made in tutoring systems based on artificial intelligence (AI) with some notable successes (Anderson, Corbett, Koedinger, & Pelletier, 1995; Gott, Lesgold, & Kane, 1996; Koedinger, Anderson, Hadley, & Mark, 1995; Means & Gott, 1988; Shute & Psotka, 1996) despite difficulties imposed by the constraints of domain design, software tools, and available platforms. The AI tutoring investment refocused attention on three important dimensions: the details of the content and task domains to be mastered (e.g. what counts as expertise); the individual differences (student models) of trainees; and the importance of mapping both domain-dependent and domain-independent aspects of instruction to variable instructional paths. Summary. During the last decades of the 20th century, military R&D has supported advances in adaptive testing, simulation development, the tactical use of technology tools and embedded training in equipment, and, with less success, the development of authoring tools to assist the design of high-quality instruction. Now, rather than being adopted wholesale, these more focused efforts are playing out, albeit in a more circumscribed way, in both public and

private civilian sectors. Once again, the military’s investment in fundamental research and exploratory development has paid off.

3. Evaluation and improvement of instructional technology We now approach the fifth stage of attention to instructional design, one enabled by the explosion of technology use in everyday life. As recently as 3 or 4 years ago, the idea of mass-scale, high-quality distance learning was unevenly accepted. But the existence of the World Wide Web in schools, homes, cafe´s, and libraries has changed expectations and fueled desire to use technology to support goals of information access and personal development. The technology itself has created a set of learners who expect to interact with technology and to learn from technology-enhanced settings. It is the challenge of the military in this context not to see the new stage of instructional opportunity as disconnected from previous models. Rather, to be successful and efficient, military and civilian R&D must build on existing theoretical and empirical foundations. The fifth stage of instructional design must selectively incorporate the most successful or promising attributes of prior approaches. We therefore have proposed a model for the fifth generation of instructional design built on a new synthesis and expansion of applications derived from decades of accumulated knowledge (Baker & Bewley, 2001). This research-based model is characterized by the inclusion of, and reliance on, testing and evaluation as an integral component, rather than as an add-on verification. The model: † incorporates testing and assessment information as a key driver in decision making; † involves sophisticated modeling and graphical displays to enhance the fidelity of data analysis and reporting; † acknowledges, supports, and builds on the entering skills, preferences, and propensities of the learners; † creates submodels, objects, and designs that are effective and generalizable to a variety of training domains and situations. Attempts are underway to create tests of this model. For instance, the Office of Naval Research is supporting a project on a new design of distance learning that incorporates the features above (Baker & Bewley, 2001). To make the testing and evaluation components of the model a reality, a series of conceptual, technical and practical problems need to be solved. 3.1. Test development Not the least of the challenges of using data to drive instructional decisions is determining what to measure.

E.L. Baker, H.F. O’Neil Jr. / Evaluation and Program Planning 26 (2003) 169–176

There is a choice between two main options. The first requires the design and validation of outcome measures that match the kind of experience given by the technology intervention—a strategy that requires additional time and cost, but one that is likely to show program impact, if there is any. The second approach to measurement involves using available (perhaps archival) measures, such as standardized tests, or available test formats (fill-in-theblanks, multiple-choice templates of the sort found on the Web). Such measures will have far less relevance to the intervention, be less likely to show impact, but will cost far less and will be rapidly implemented. Some writers have focused on the design of new outcome measures, presumably more sensitive to what technology has to offer uniquely. Baker and O’Neil (2003) have described a model that measures technology fluency, including content knowledge, propensity, and skills. Beyond the usual conceptions of computer literacy, measures of these types assume that technology will be an integral part of a systematic learning system, a vision not yet realized in the public schools. At the UCLA Center for Research on Evaluation, Standards, and Student Testing (CRESST), tools are under development to provide authoring support for potential evaluators of technology, particularly those committed to using online support, such as required in distance learning programs. These tools are generated from the conception of model-based assessment (Baker, 1997; Baker, Abedi, Linn, & Niemi, 1996a; Baker, Freeman, & Clayton, 1991). The fundamental idea is to treat test design like other systems design efforts and to start with domain-independent components based on families of cognitive demands. The starting point has been our analysis of cognitive demands into five clusters: problem solving, metacognition, communication, content knowledge, and teamwork and collaboration. These sets of demands, independently or in concert, provide common specifications, task structures, and scoring strategies that are intended to make assessment design more replicable, less expensive, and more consistent. Authoring tools have been developed (Chung, O’Neil, & Herl, 1999; O’Neil, Wang, Chung, & Herl, 2000) to help guide the instructor or test developer in embedding the models in particular subject matter domains, much as production rules were embedded in content as part of the engineering of expert systems. As part of the Knowledge, Models and Tools to Improve the Effectiveness of Naval Distance Learning work (Baker & Bewley, 2001) and other planned projects, the engine for such development will be designed with the ambitious intention of creating a menu of objects that can be combined and used to guide the development of assessments. Part of the discipline of this practice will be to make the objects reusable and to have the software meet the standards promulgated by the Advanced Distributed Learning Co-Lab (Alexandria, Virginia) and the ADL Shareable Content Object Reference Model (SCORM; http://www.adlnet.org/Scorm/

173

scorm.cfm). The other key element is to create the authoring system so that it incorporates the best knowledge about how cognitive demands like problem solving can best be measured, and to include this knowledge as default conditions in the software. A second major challenge for authoring systems intended to support test design and development is the ideal or, better yet, validated ratio of domain independence to domain dependence. One feature of a recently published report by the National Research Council (Pellegrino, Chudowsky, & Glaser, 2001) focused almost exclusively on domain-dependent conceptions of testing. It included a compendium of examples of testing approaches based on conceptions of learning. However, each of the systems cited focused on learning in content domains, a bottom-up approach to test development, and one that trades off precision for economies of scale and for transfer. 3.2. Understanding technical soundness of tests and assessments The know-how and how-to of authoring systems, built on the first principles of cognitive demands, potentially will help both military and civilian testing communities to move beyond their persistent focus on format (e.g. multiplechoice, open-ended) in their conceptions of tests for distance learning. Unfortunately, help in design and development does not fully address the necessity to use assessments and tests that have validity for the purposes to which they are put. For example, in distance learning environments, test data are intended to be valid for the following purposes: evaluation, both formative and summative; diagnostic testing; and certification of performance. Validity implies that the information given by tests leads to appropriate inferences about all of these purposes. Typically, however, the validity loop is left out of most technology design and evaluation. More usual is some mapping approach to the goals or objectives of testing, with the goal being to match the objectives to the test items, usually in terms of content. This approach is wholly insufficient for validity, a point brought home in the recent revision of the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999). A major step forward would be taken if instructional technology used good, validated measures to drive instructional decisions (presumably based on a scientific understanding of powerful learning variables). The dependent variable problem would be solved. However, the ability to evaluate existing implementations and to improve our knowledge supporting future instruction requires a new way to think about evaluation.

174

E.L. Baker, H.F. O’Neil Jr. / Evaluation and Program Planning 26 (2003) 169–176

4. Towards a new paradigm for research in, and evaluation of, technology Evaluation of technological interventions is obviously a complex undertaking. Were it not, there would be more high-quality examples of it. The present system of post hoc studies of the outcomes of interventions is rarely responsive to the realities of cycle time, the difficulties of test development, the limits of instructional knowledge, and the inability of systems to change once they are fielded. One important path to improving technology-based systems is to develop strategies and mechanisms for the continued improvement of our theoretical and practical knowledge related to the design, development, use and impact of distance learning. Much of the relevant literature is theoretical and weak methodologically. There are numerous ‘we did it’ studies, describing execution of a distance learning course or computer-based intervention on a particular topic, for a particular set of learners, using a particular mix of approaches (e.g. text plus video plus synchronous chat options). Comparative analyses are almost exclusively limited to contrasts of different versions of a course (for instance, a distance version and a ‘traditional’ classroom version), with outcomes for the most part attending to student preferences. This type of research has well-known limitations (Clark, 1994; Leifer, 1976; Lumsdaine, 1965) based on the likely within-method variation, lack of replicability, lack of generalization, and so on. Rather than research, the studies represent weakly designed evaluations. Furthermore, research intended to improve our knowledge of how these systems work is usually the initiation of a lone researcher or two who need to find opportunity to raise key questions. If evaluation takes place too late (tacked on at the end) and research is a happenstance event, yielding scattered rather than cumulative knowledge, are there any options? One that we believe needs consideration is the creation of testbeds for the evaluation of instructional programs that use technology. A testbed would remove a key limitation of much research and evaluation: access to stable technology infrastructure. To investigate the contribution of theoretically based variables, one would need, at minimum, a range of courses and topics in which to embed studies, agreement of the institution to conduct multiple versions of a course or module as well as to permit random assignment of students, and, of course, consent of students. In addition, regular access for the experimenters would be required, as well as a strategy for instrumenting and collecting student responses prior to, during, and following the instructional intervention, preferably using technological data-capture methods. The testbed would be available for the conduct of formative evaluation studies and relevant validity studies of online assessments. To be successful, the testbed would be a unique structure that allows the acquisition of sets of information

during design, development, and administration to support the three types of activities above. To accomplish this important goal, structural, administrative, and data-capture approaches will need to be put in place very early in the process. More challenging will be the construction of a culture among the course designers, subject matter experts, and researchers that emphasizes collaboration, shared access, and a deep commitment to quality. The testbed does not need to be located in one place, but could be linked on a network. Rules for deciding on access and a cost structure would be desirable. Emphasis at the outset could be on both research and formative evaluation. Envisioning a testbed to verify the quality of instruction is possible, of course, but such a function would require teams of attorneys responding to potential conflicts with the proprietary rights of developers. Testbeds of the sort described earlier, for formative evaluation and for research on technology, could be set up in a university environment. The effort to create one or more of these structures could well lead to economies in research costs, reducing duplicative efforts and easing access for the fledging researcher. The technology testbed would be updated to keep current with bandwidth and other infrastructure environments in order to be useful for the formative evaluation and research functions. It is likely that different testbeds would be needed for military and civilian sectors, although efforts to share findings should be made. For summative evaluation purposes, Baker and Herman (2003) have proposed a distributed evaluation model, where users of new interventions would be providers of data streams from the system, as well as deliberate local evaluators of the intervention used in their particular locale. Using a system such as the Quality School Portfolio, individual sites could have access to instruments and tools that would allow them to modify the context in which technological interventions were being used. These instruments could address conditions of use, needed ancillary support, and motivational effects (or lack thereof) and provide immediate guidance to local users about modifications that they could make to improve the utility of the instruction. Far more immediate than any distal, third-party evaluation, the data could then be exported to a ‘program evaluation site’, such as the testbed where data from a number of sites could be integrated and reported to program managers or funding agencies. A distributed, interactive system for evaluation would also provide information on a schedule far more congruent with technology development cycle times.

5. Summary In this article we have considered important differences between civilian and military use of technology and identified some key difficulties shared by both

E.L. Baker, H.F. O’Neil Jr. / Evaluation and Program Planning 26 (2003) 169–176

constituencies in conducting high-quality research and evaluation. The need to focus on cognitively sensitive, economically developed measures was described. The regular provision of effectiveness data for instructional interventions would meet Lumsdaine’s second criterion. Validity evidence for purposes of assessment would also need to be accumulated. Two approaches were sketched for the conduct of more productive efforts in the future: testbeds for technology research, and formative evaluation and summative evaluation conducted distributively with appropriate software support. These are but two of a number of potential solutions that need rapid consideration if the promise of technology is to be met firmly by evidence of effectiveness.

Acknowledgements The work reported herein was supported in part under Office of Naval Research Award Number #N00014-02-10179, as administered by the Office of Naval Research. The findings and opinions expressed in this report do not reflect the positions or policies of the Office of Naval Research. The work reported herein was also supported in part under the Educational Research and Development Centers Program, PR/Award Number R305B960002-01 and Award Number R305B960002-01, as administered by the Office of Educational Research and Improvement, US Department of Education. The findings and opinions expressed in this report do not reflect the positions or policies of the National Institute on Student Achievement, Curriculum, and Assessment, the Office of Educational Research and Improvement, or the US Department of Education.

References Alessi, S. M., & Trollip, S. R. (2001). Multimedia for learning (3rd ed.). Boston: Allyn and Bacon. American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. The Journal of the Learning Sciences, 4, 167 –207. Baker, E. L. (1973). The technology of instructional development. In R. M. W. Travers (Ed.), Second handbook of research on teaching: A project of the American Educational Research Association (pp. 245–285). Chicago: Rand McNally. Baker, E. L. (1997). Model-based performance assessment. Theory Into Practice, 36, 247–254. Baker, E. L., Abedi, J., Linn, R. L., & Niemi, D. (1996a). Dimensionality and generalizability of domain-independent performance assessments. Journal of Educational Research, 89(March/April), 197 –205. Baker, E. L., & Bewley, W. L. (2001). Knowledge, models and tools to improve the effectiveness of naval distance learning (Proposal to

175

the Office of Naval Research). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST). Baker, E. L., Freeman, M., & Clayton, S. (1991). Cognitive assessment of history for large-scale testing. In M. C. Wittrock, & E. L. Baker (Eds.), Testing and cognition (pp. 131–153). Englewood Cliffs, NJ: PrenticeHall. Baker, E. L., & Herman, J. L. (2003). Technology and evaluation. In E. Haertel, & B. Means (Eds.), Evaluating educational technology: Effective research designs for improving learning. New York: Teachers College Press, in press. Baker, E. L., Herman, J. L., & Gearhart, M. (1996b). Does technology work in schools? Why evaluation cannot tell the full story. In C. Fisher, D. C. Dwyer, & K. Yocam (Eds.), Education and technology: Reflections on computing in classrooms (pp. 185–202). San Francisco: Jossey-Bass. Baker, E. L., Niemi, D., Novak, J., & Herl, H. (1992). Hypertext as a strategy for teaching and assessing knowledge representation. In S. Dijkstra, H. P. M. Krammer, & J. J. G. Van Merrie`nboer (Eds.), Instructional models in computer-based learning environments (pp. 365 –384). Berlin: Springer. Baker, E. L., & O’Neil, H. F., Jr. (2003). Technological fluency: Needed skills for the future. In H. F. O’Neil, Jr., & R. Perez (Eds.), Technology applications in education: A learning view (pp. 245–265). Mahwah, NJ: Lawrence Erlbaum Associates. Briggs, L. J. (Ed.), (1977). Instructional design: Principles and applications. Englewood Cliffs, NJ: Educational Technology Publications. Chung, G. K. W. K., O’Neil, H. F., & Herl, H. E. (1999). The use of computer-based collaborative knowledge mapping to measure team processes and team outcomes. Computers in Human Behavior, 15, 463– 494. Clark, R. E. (1994). Assessment of distance learning technology. In E. L. Baker, & H. F. O’Neil, Jr. (Eds.), Technology assessment in education and training. Hillsdale, NJ: Lawrence Erlbaum Associates. Cronbach, L. J., & Suppes, P. (Eds.), (1969). Research for tomorrow’s schools: Disciplined inquiry for education. Stanford, CA/New York: National Academy of Education, Committee on Educational Research/ Macmillan. Gagne´, R. M., & Briggs, L. J. (1974). Principles of instructional design (2nd ed.). New York: Holt, Rinehart and Winston. Gagne´, R. M., & Medsker, K. L. (1996). The conditions of learning: Training applications. Fort Worth, TX: Harcourt Brace College Publishers. Glaser, R. (1965a). The new pedagogy. Pittsburgh, PA: University of Pittsburgh, Learning Research and Development Center. Glaser, R. (Ed.), (1965b). Teaching machines and programmed learning II: Data and directions. Washington, DC: National Education Association. Glaser, R. (1976). Components of a psychology of instruction: Toward a science of design. Review of Educational Research, 46(1), 1–24. Gott, S. P., Lesgold, A., & Kane, R. S. (1996). Tutoring for transfer of technical competence. In B. G. Wilson (Ed.), Constructivist learning environments: Case studies in instructional design. Englewood Cliffs, NJ: Educational Technology Publications. Hovland, C. I., Lumsdaine, A. A., & Sheffield, F. D. (1949). Experiments on mass communication. Princeton: Princeton University Press. Jonassen, D. A., Peck, K. L., & Wilson, B. G. (1999). Learning with technology. A constructivist perspective. Upper Saddle River, NJ: Prentice Hall. Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1995). In J. Greer (Ed.), Intelligent tutoring goes to school in the big city (pp. 421– 428). Proceedings of the Seventh World Conference on Artificial Intelligence in Education, Charlottesville, VA: Association for the Advancement of Computing in Education. Leifer, A. D. (1976). Teaching with television and film. In N. L. Gage (Ed.), Psychology of teaching methods NSSE Yearbook (pp. 302 –334). Chicago: University of Chicago Press. Lumsdaine, A. A. (1964). Educational technology, programmed learning, and instructional science. In E. R. Hilgard (Ed.), Theories of learning

176

E.L. Baker, H.F. O’Neil Jr. / Evaluation and Program Planning 26 (2003) 169–176

and instruction: Sixty-third yearbook of the national society for the study of education. Part 1 (pp. 371–401). Chicago: University of Chicago Press. Lumsdaine, A. A. (1965). Assessing the effectiveness of instructional programs. In A. A. Lumsdaine, & R. Glaser (Eds.), Teaching machines and programmed learning II. Data and directions (pp. 267 –320). Washington, DC: National Education Association. Mayberry, P. (1987). Developing a competency scale for hands-on measures of job proficiency (Research Contribution 570). Alexandria, VA: Center for Naval Analysis. Means, B., & Gott, S. P. (1988). Cognitive task analysis as a basis for tutor development: Articulating abstract knowledge representations. In J. Psotka, L. D. Massey, & S. A. Mutter (Eds.), Intelligent tutoring systems. Lessons learned (pp. 35 – 57). Hillsdale, NJ: Lawrence Erlbaum Associates. van Merrie¨nboer, J. J. G. (1997). Training complex cognitive skills. A fourcomponent instructional design model for technical training. Englewood Cliffs, NJ: Educational Technology Publications. Merrill, M. D., & Tennyson, R. D. (1977). Teaching concepts: An instructional design guide. Englewood Cliffs, NJ: Educational Technology Publications. National Center for Education Statistics (2002). Beyond school-level Internet access: Support for instructional use of technology (Issue Brief, NCES 2002-029). Washington, DC: US: Department of Education.

No Child Left Behind Act of 2001 (2002). Pub. L. No. 107– 110, 115 Stat. 1425. O’Neil, H. F., Jr. (Ed.), (1979a). Issues in instructional systems development. New York: Academic Press. O’Neil, H. F., Jr. (Ed.), (1979b). Procedures for instructional systems development. New York: Academic Press. O’Neil, H. F., Jr., Wang, S.-L., Chung, G. K. W. K., & Herl, H. E. (2000). Assessment of teamwork skills using computer-based teamwork simulations. In H. F. O’Neil, Jr., & D. H. Andrews (Eds.), Aircrew training and assessment (pp. 245– 276). Mahwah, NJ: Lawrence Erlbaum Associates. Pellegrino, J., Chudowsky, N., & Glaser, R. (Eds.), (2001). Knowing what students know: The science and design of educational assessments (Committee on the Foundations of Assessment; Board on Testing and Assessment, Center for Education. Division on Behavioral and Social Sciences and Education). Washington, DC: National Academy Press. Shute, V. J., & Psotka, J. (1996). Intelligent tutoring systems: Past, present, and future. In D. H. Jonassen (Ed.), Handbook of research for educational communications and technology (pp. 570–600). New York: Simon and Schuster/Macmillan. Wigdor, A. K., & Green, B. F. (Eds.), (1991). (Vols. 1 and 2). Performance assessment in the workplace: Technical issues, Washington, DC: National Academy Press.