SYSTEM System 26 (1998) 485±513
Design and evaluation of a computer-based TOEFL tutorial Joan Jamieson a*, Carol Taylor b, Irwin Kirsch b, Dan Eignor b a Department of English, Northern Arizona University, Flagsta, AZ, USA Language Learning and Assessment Group, Educational Testing Service, Princeton, NJ, USA
b
Abstract In order to train examinees whose native language is not English to take a computerized Test of English as a Foreign Language (TOEFL), a special set of tutorials was designed and developed. The TOEFL tutorials were trialed as part of a computer familiarity study in 1996. This article describes the development of the tutorials. Also, the experiences of the 1169 individuals who participated in the computer familiarity study are characterized in terms of timing and performance data, as well as self-reported attitudes. These analyses took into account computer familiarity and English ability, which both proved to be important in explaining some dierences in time to complete the tutorials and perception of the tutorials' usefulness. Most examinees were successful in completing the practice items in the tutorials and thought that the tutorials were helpful. Some changes were subsequently made before operational implementation of the computerized TOEFL test in order to reduce the time needed to complete the TOEFL tutorials. # 1998 Elsevier Science Ltd. All rights reserved.
1. Introduction A task force of the American Council on Education (1995) produced one of the ®rst American sets of guidelines for the development and use of computer-adaptive tests. While seen as supplementing and not replacing existing standards for *Corresponding author. E-mail:
[email protected] 0346-251X/98/$Ðsee front matter # 1998 Elsevier Science Ltd.. All rights reserved PII: S034 6-251X(98)0003 4-7
486
J. Jamieson et al./System 26 (1998) 485±513
educational testing,1 these guidelines ``were developed to improve test practice by articulating critical dimensions of computerized-adaptive testing'' (p. x). One guideline deals with examinee training in the use of computer testing systems and speci®es that ``examinees should be given enough practice or training to enable them to interact adequately with both the computer and the test system'' (p. 7). It further stipulates that: The pre-administration training of examinees should detail all of the important features of the test administration so that lack of familiarity with the computerized delivery system will not be a concern when interpreting examinees' scores. At a minimum, the training should inform students about how to indicate their answers to questions, how to scroll through the text, and how to manage time (if there are time restrictions). It is strongly recommended that the examinees be asked to demonstrate their pro®ciency in using the computerized delivery system through practice items or their equivalent. (p. 7) Between July 1995 and June 1997, Educational Testing Service (1995) (ETS) administered more than 793,000 computer-based tests worldwide. For these tests, ETS attempted to address the issue of training by providing a non-optional computer tutorial as part of each test administered. It developed a set of standard tutorials for use with the computerized versions of the Graduate Record Examinations (GRE) and Praxis I: Academic Skills Assessments tests. More recently ETS released POWERPREP: Preparing for the GRE General Test (1995), a software program that oers test-taking tips, GRE practice questions with explanations, timed tests, and the standard tutorials. The Test of English as a Foreign Language (TOEFL), another ETS program, plans to introduce a computer-based TOEFL test in selected North American and international test centers in 1998. As more testing programs make the transition to computer-based testing and begin to introduce new item and response types, the use of the standard ETS tutorials needs to be re-examined in terms of their appropriateness for new tests and more diverse testing populations. For example, the computerized TOEFL test will include a section on listening comprehension ± which introduces a technology that is not a part of the current set of standard ETS tutorials. Another reason for considering changing the standard tutorial is due to the TOEFL program's unique population; testing non-native speakers of English over dierent levels of English ability might necessitate reducing the amount of reading in the tutorials' instructions, perhaps by adding more graphics and animation. 1
Existing recognized standards include the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education's Standards for Educational and Psychological Testing (in revision). Washington, DC: American Psychological Association, 1985.
J. Jamieson et al./System 26 (1998) 485±513
487
Because TOEFL examinees are diverse not only in terms of their language backgrounds but also in their familiarity with computers, ETS recognized the potentially confounding eect of computer familiarity on computerized test performance. ETS and the TOEFL program funded a two-phase study to survey TOEFL examinees regarding their computer familiarity and to examine the eects of computer familiarity on performance on a set of computerized language tasks after taking a new tutorial designed especially for TOEFL examinees. Three separate reports (Kirsch et al., 1998; Eignor et al., 1998; Taylor et al., 1998) provide details of that study. This article describes the design and evaluation of a tutorial to be used in conjunction with the computerized TOEFL test. The article is divided into three main sections. First, ``Developing the Tutorials'' documents the development process for the new computerized TOEFL tutorials. The second section, ``Implementing the TOEFL Tutorials in the Computer Familiarity Study,'' describes and characterizes the experiences of a sample of high and low-computer-familiar TOEFL examinees who worked through those tutorials.2 Finally, ``Going Operational'' relates the examinees' experiences to the design features, and describes subsequent changes that were made in the tutorials. 2. Developing the tutorials This section describes the process that helped the designers formulate their approach to the content and its delivery ± which then led to development of a set of six new TOEFL tutorials. Elements of instructional design were reviewed (e.g., Dick and Carey, 1990; Dick, 1996; Smith and Ragan, 1993; Alessi and Trollip, 1991) with an interest in taking advantage of new software that had not been available when the standard tutorials were created.
2
Computer-familiar and computer-unfamiliar examinees were identi®ed based on their responses to eleven items which were part of a 23-item questionnaire that focused on individuals' access to, attitude toward, and experience with computers as well as related technologies. The questionnaire was administered to all TOEFL examinees in April, 1996 and to examinees in China in May, 1996. The eleven items that were selected from the questionnaire all loaded heavily on the ®rst factor of a two factor common factor solution. Based on a composite computer familiarity score created from the 11 items, examinees were classi®ed into one of three computer familiarity groups (low, moderate, and high). Two separate reports (Kirsch et al., 1998; Eignor et al., 1998) provide detailed discussions of the procedures used to develop the questionnaire and the computer familiarity scale as well as the pro®le of TOEFL examinees with respect to their level of computer familiarity.
488
J. Jamieson et al./System 26 (1998) 485±513
2.1. Goal and needs assessment The goal of the tutorial was clear from the outset: Provide sucient computer training so that an examinee's TOEFL test score would not be aected by lack of previous computer experience. In order to achieve this goal, we began with an analysis of a group of English as a second language (ESL) students using the standard ETS tutorials. 2.1.1. Review of existing tutorials Prior to the development of the new tutorials, TOEFL program and test development sta used the existing standard ETS computer tutorials as part of a trial of the new item and response types being considered for the computerized TOEFL test. It was decided to use this planned trial to obtain feedback on the perceived strengths and weaknesses of features in the standard tutorials for the TOEFL population. These tutorials were composed of four sections (Powers and O'Neill, 1993): ``How to Use a Mouse,'' ``How to Select an Answer,'' ``How to Use the Testing Tools'' (e.g., the help and clock functions), and ``How to Scroll.'' The trial consisted of the standard ETS tutorial and experimental TOEFL test items in reading comprehension, structure, and listening comprehension. Twenty international students, ten female and ten male, attending local American colleges and universities close to ETS, participated in two trials in November, 1995 (ten students in each group). Thirteen of the twenty students were graduate students. They ranged from 20 to 35 years of age and represented a variety of academic disciplines, native languages, and ability levels as measured by TOEFL paper-andpencil test scores. Following a brief orientation session, each student was paired with two ETS sta members (one observer and one interviewer) and worked through each section of the materials twice. The ®rst time through, the student worked independently with sta observing. During this session, the student's answers and timing data were collected on-line. The second time through, the sta interviewer took control of the mouse, proceeded through each screen and exercise, and asked the student questions from a protocol instrument3; the sta observer recorded all comments. At the end of the day all students and sta met together for a general discussion and debrie®ng session. Most of the students who participated in the trial had prior computer experience and were able to perform all of the required computer functions. However, the observations and comments elicited from the protocol instrument revealed that many students did not easily understand the standard tutorial material presented by textual explanations; rather, they experimented, trying, for instance, to click and see what would happen. From the student comments, it was clear that they thought 3
The protocol instrument is available from the ®rst author upon request.
J. Jamieson et al./System 26 (1998) 485±513
489
there was too much text with dicult vocabulary; many could not distinguish important from unimportant information. Moreover, they all expressed concern that fellow international students, especially those with little or no computer experience, would not understand the explanations provided in the standard tutorials suciently to take a computer-based TOEFL test without having their test performance aected by their computer familiarity. They recommended using labels, highlighting, color, and pictures to simplify text and convey important information. These recommendations were taken into account when designing screen displays, described below. 2.1.2. Context, content, and constraints The context of where the examinees would use the skills they were learning in the tutorials was straightforward ± the skills would be needed to take the computerized TOEFL test. The tutorials would become a preliminary part of each TOEFL test administration. In accord with a task analysis done earlier at ETS, the content was identi®ed as moving a mouse, scrolling, navigating, and answering test questions; the analysis of the content will be explained in more detail in the next section. Time presented two very dierent types of constraints. First, learning the computer functions for the computerized TOEFL test would have to be done in a relatively short period of time in order to avoid fatigue on the part of the examinee and in order to reduce the overall cost of administration of the computerized test. Based on timing data on the standard tutorials as used in the GRE program, it was thought that the tutorial should take examinees between twenty and forty minutes to complete. Secondly, the tutorial was part of the computer familiarity study referred to in the introduction. That study was approved in January, 1996 and data collection was scheduled to begin in June, 1996. This timetable left a little less than ®ve months for design, development, programming, and formative assessment of the tutorial.
Fig. 1. Components of TOEFL tutorial.
490
J. Jamieson et al./System 26 (1998) 485±513
2.1.3. Instructional analysis The content of the TOEFL tutorials can be divided into two main types (see Fig. 1): information on how to use a computer (i.e., information to help develop computer skills) and information on how to answer questions on the three sections of the TOEFL test (i.e., TOEFL skills). Each of these content areas will be brie¯y described. Based on earlier work at ETS, the content regarding computer skills was divided into three tutorials: ``How to Use a Mouse,'' ``How to Scroll,'' and ``How to Use Testing Tools.'' (See Fig. 2 for more detail.) All of the tutorials began with a title page which served as an introduction. All ended with a summary page which allowed students to review a segment of the tutorial or go on; this served as a conclusion. ``How to Use a Mouse'' was subdivided into three subskills. These were (1) how to move the mouse, (2) how to point the arrow (cursor) at an object by moving the mouse, and (3) how to click the mouse once the arrow is on target. ``How to Scroll'' was also broken down into three areas. However, whereas the three subskills of the mouse tutorial were treated as equally important, the three subskills of the scrolling tutorial were not considered equally important. The designers decided that the most important skill examinees needed to know was how to scroll, and scrolling one line at a time required fewer steps to learn than other ways to scroll such as a page at a time and a block of text at a time. These two other ways to scroll were also included, but not treated in as much detail. Another less important concept that was presented in the scrolling tutorial was that of what
Fig. 2. Structure of computer skills components.
J. Jamieson et al./System 26 (1998) 485±513
491
an icon is and how to use the ``go on'' or ``see more'' icons to proceed through the rest of the tutorials. The third computer use tutorial, ``How to Use Testing Tools,'' was also broken down into subskills. How to let the computer know that an answer had been selected and the examinee was ready for the next question was considered the most important skill in this tutorial. In order to do this, examinees had to click on two icons in succession, ``next'' and ``con®rm answer.'' Three other areas were presented in the tools tutorial. One was an orientation to the screen layout of the test itself, showing where the title bar, the test questions, and navigation icons would appear. Students were also shown how to toggle the ``time'' icon to either display or not display the amount of time left for the section of the test. Finally, examinees were shown how to access more instructions by clicking on the ``help'' icon. The next set of tutorials dealt primarily with the item types for each of the three sections of the test: Listening Comprehension, Structure, and Reading Comprehension (see Fig. 3). Whereas the standard ETS tutorial presented a single ``How to Answer'' tutorial, the TOEFL tutorial presented three separate ``How to Answer'' tutorials. These would be presented prior to each test section. This modi®cation was considered important because the computerized TOEFL test would introduce a variety of new item stimuli and response types, and examinees would need an opportunity to practice items before taking the corresponding test sections. As with
Fig. 3. Structure of TOEFL skills components.
492
J. Jamieson et al./System 26 (1998) 485±513
the ®rst set of tutorials, these three tutorials began with an introductory title page and concluded with a summary/review page. ``How to Answer Listening Comprehension'' began with instructions to put on a headset and an explanation on how to adjust the volume by clicking on the ``volume'' icon. Three item types were presented and practiced: (1) single selection multiple choice questions answered by clicking on an oval, a letter in a box, or a picture; (2) multiple choice questions with more than one answer; and (3) matching/ ordering questions answered by clicking on a source and a destination. Following this practice, examinees were given a ``simulated'' test lecture, followed by several items to which they were to respond. In this ®nal section, no responses were judged for correctness. ``How to Answer Structure'' presented and provided practice for two item types. One was single selection multiple choice in which an examinee would click on one of four choices to correctly ®ll in a blank in a sentence. The other item type was to click on an underlined word or phrase in a sentence to highlight an error. After the practice, examinees were given two sample structure items, which were not judged. ``How to Answer Reading Comprehension'' began by giving an examinee the option to review the ``How to Scroll'' tutorial. Three item types were included in this tutorial. Again, there was single selection multiple choice. In addition, there was clicking on a word, phrase, or sentence within a reading passage. Finally, clicking on one of several black boxes would insert a sentence into a reading passage. Visual cues such as arrows, bold text, and highlighted text were also presented and explained. This tutorial ended with a sample reading passage and items, which were not judged. 2.2. Developing an instructional strategy Having decided upon what to teach, the next step was to decide on how to teach it. Traditionally speaking, two approaches seemed plausible ± discovery or expository (Shafritz et al., 1988). In the discovery approach ``students are presented with ambiguous materials and are given the opportunity to organize them conceptually or draw their own conclusions...'' (p. 154). Often termed inductive learning, it is contrasted with expository, or deductive, learning±an ``approach that starts with the presentation of a concept and then allows students to apply the concept to speci®c situations'' (p. 188). Although both the discovery and expository approaches have proven eective in various situations (Andrews, 1984; Alessi and Trollip, 1991), the expository approach seemed more suitable for several reasons. Research has reported that success of discovery methods has often been limited to high ability students (Corno and Snow, 1986), yet the audience for the TOEFL tutorial would include individuals with a range of abilities in both English and computer use. Expository methods seemed more suitable because the material to be learned was essentially a set of low-level skills that could be segmented into a series of steps that students would have to apply
J. Jamieson et al./System 26 (1998) 485±513
493
when subsequently taking the TOEFL test; Rosenshine and Stevens (1986) reported that the expository model is more suitable for this type of material. Expository instruction has also been associated with student achievement for young second language learners when learning basic skills (August and Christian, 1997; Snow and Leinhardt, 1997). Another factor favoring expository instruction was that learning the computer functions for the computerized TOEFL test would have to be done in a relatively short period of time; pacing of the material could be managed more easily in an expository approach than a discovery approach. Discovery and expository approaches have been incorporated in the computerbased methodologies of simulations and tutorials (Newby et al., 1996; Alessi and Trollip, 1991; Soulier, 1988). Simulations, for example, provide learners with simpli®ed representations of a real life situation in which they are encouraged to explore. Tutorials, on the other hand, present information to be learned and then provide guided practice. As computerized methods have developed and become more sophisticated, many researchers have experimented with a combination of the two approaches using guided exploratory learning and intelligent tutoring systems (Gibbons and Rogers, 1991; Cox, 1992; Dijkstra et al., 1992; Fox, 1993). These more complicated computer-based methodologies, however appealing, seemed more sophisticated than our project required. Having chosen an expository approach, the designers decided to use a tutorial methodology. Tutorials are characterized by the presentation of information, question and response, analysis of the response, appropriate feedback, remediation, and determination of how to proceed (Soulier, 1988; Alessi and Trollip, 1991; Newby et al., 1996). These elements can be grouped into two stages: presenting information and guiding the student. Information such as rules, concepts, or skills is presented either verbally or pictorially, by example or in animated form. The second stage, guiding the student, is interactive as it involves both the student and the instructor. As Steinberg (1990, 1991) pointed out, interactions serve two main functions ± mechanical (navigating though computerized material) or learning. Interestingly, the purpose of the TOEFL tutorial was to foster learning of the mechanical aspects of the computerized TOEFL test. In this interactive stage the student is asked a question and then inputs a response. Based on the response, feedback is given. If incorrect, suggestions or hints for how to respond are given to the student. Remediation might include repeating information or providing new information with more detail; the student who answers a question incorrectly might also be branched to a less dicult item, focusing on a simpler aspect of the question. Remediation provides guided practice which needs to be understood as distinct from the type of practice found in the ``drill'' methodology. In drills, there is no presentation of new information; their purpose is to increase ¯uency or mastery by repeated practice. It is often recommended that items of equal diculty be grouped together in drills and that students master easy items before being given more dicult ones (Alessi and Trollip, 1991). Although feedback is given, remediation is not part of the interaction cycle in drills as it is in tutorials.
494
J. Jamieson et al./System 26 (1998) 485±513
Informed by the characteristics of computer-based tutorials, a ``cycle'' for each skill or concept which the examinee needed to learn was created. In the six TOEFL tutorials, there was a total of twenty-four important skills/concepts. Each of the twenty-four cycles consisted of explanation, demonstration, and practice. First, the skill or concept was explained by simple printed text. (Although spoken text was considered as a desirable accompaniment to the written presentations of information, it was not used ± except in Listening Comprehension ± due to the prohibitive amount of space the sound ®les would require.) Next, animation was used to model the skill. After the animation, the examinee was directed to practice the skill. This practice was designed to be at the same level of complexity and diculty needed to take the TOEFL test. Hints were given if an examinee answered incorrectly or did not respond at all. For example, in the cycle on scrolling one line at a time, a small screen was displayed on the left-hand side of the examinee's screen; running from top to bottom were the phrases ``line 1 of text,'' ``line 2 of text,'' etc. A scroll bar formed a column along the right-hand edge of the box. Directions were given in the practice to ``Scroll down to line 23.'' After thirty seconds, a line pointed to the down arrow and the directions were replaced with a message that stated, ``Click on the down arrow to scroll to line 23.'' (Throughout the tutorials, a ``correct'' or ``incorrect'' response to a question always involved the mechanical aspect of how to manipulate the computer; a response should not be confused here with a response to a language question, as would be found on the computerized TOEFL test). As was mentioned earlier, some skills were considered more important than others. So, although a total of twenty-four skills and concepts were presented using the cycles described above, seven of them were not considered ``critical.'' For example, in the scrolling tutorial knowing how to move a page at a time was not considered critical, whereas knowing how to put the cursor on the up or down arrow on the scroll bar in order to move text was considered critically important. The seventeen skills and concepts that were considered critical followed the same cycle described above but instead of just one practice item these had guided or ``leveled'' practice which included remediation for incorrect responses. (see Fig. 4). The decisions to include remediation represented a point of contrast with the standard ETS tutorials in which only one practice item was given, without feedback or help for an incorrect response. In the TOEFL tutorials, if an examinee answered incorrectly to the practice item (considered Level 1 practice) or if thirty seconds elapsed without a correct response, he or she would be given hints. If thirty more seconds elapsed without a correct response, the answer would be coded as incorrect and the examinee would ``drop down a level'' to Level 2 guided practice. (See Fig. 5.) The explanation would be repeated, though in simpler text, and the skill would again be animated. Then, the examinee would be given an easier practice item. To continue with the scrolling example, in the Level 1 practice, the examinee was directed to click down ®ve lines; in the Level 2 practice, he or she was directed to click down one line. If the examinee answered correctly, he or she would be given a second
J. Jamieson et al./System 26 (1998) 485±513
Fig. 4. Structure of instructional cycles with guided practice.
Fig. 5. Screen display from TOEFL tutorial±level 2 scrolling practice.
495
496
J. Jamieson et al./System 26 (1998) 485±513
Level 1 item; if the examinee answered incorrectly and again timed out, the answer was coded as incorrect and the examinee was dropped down to Level 3 guided practice. In the instance of scrolling, all text was removed from the screen, and the examinee was directed to just click on the down arrow in the scroll bar (see Fig. 6). After this guided practice, he or she would then work back up to another Level 2 practice and ®nally to another Level 1 practice. Note that within the guided practice as an examinee encountered a Level 1 or Level 2 item for the ®rst time, dierent paths were taken based on the correctness of the response. On the other hand, when an examinee got to a Level 3 or encountered a Level 1 or 2 item the second time, the same path was taken regardless of the response. As the tutorial was designed, we planned to capture data on examinee performance and time on these leveled practices as part of the computer familiarity study. Performance data would help us subsequently determine at what point we might advise an examinee that his or her computer skills were inadequate to take the computerized TOEFL test. Timing data would allow adjustment of the delay times for help messages as well as adjustment of the time allowed before judging a response as incorrect before using the tutorial in an operational setting. 2.3. Developing a strategy for screen display While designing and developing these tutorials, consistency became a key word. We believed that internally consistent screen displays across the six tutorials would help all examinees to learn the critical tutorial content. Internal consistency included not only
Fig. 6. Screen display from TOEFL tutorial±level 3 scrolling practice.
J. Jamieson et al./System 26 (1998) 485±513
497
introductory title pages and summary/review conclusions, but also the layout and the body each tutorial would need in order to follow the same format (Newby et al., 1996; Alessi and Trollip, 1991; Steinberg, 1991). The layout divided the screen into four main areas which can be seen in Fig. 7 (presented here in black and white). A small blue banner with white text was always placed in the upper left corner of the screen. Its purpose was to orient the examinee; in a word or phrase, the main point of that section of the tutorial was stated. The row across the bottom of each screen was reserved for navigational icons, such as ``go on,'' which were located at the right-hand end of the bottom row. This row was colored dark gray; the icons were colored a lighter gray with black letters and symbols. The remainder of the screen was colored light gray and was divided into two areas of unequal size. The left-hand side of the screen consisted of about two-thirds of the screen. It was used to illustrate whatever skill was being taught; graphics and animation occurred in this space. The right-hand third of the screen resulted in a column where directions and explanations were presented to the examinee via text. During explanations, this column would be colored pink from top down to the bottom in order to key the student to watch, not act. (Color was used sparingly in the tutorial on the advice of ETS sta who reviewed the tutorial with sensitivity to examinees with visual disabilities.)
Fig. 7. Screen display from TOEFL tutorial±answer matching/ordering questions.
498
J. Jamieson et al./System 26 (1998) 485±513
Contrast Fig. 7 with Fig. 8. Both show an explanation on how to respond to matching/ordering questions. Fig. 8, however, is from the standard ETS tutorials. Note the amount and relative complexity of the language that was used in the standard tutorial compared with the language in the TOEFL tutorial. What could not be demonstrated here but was also a point of contrast between the two was that the TOEFL screen was ®rst presented with the ``To Answer'' segment, and after clicking on ``See More,'' the ``To Change an Answer'' section was added; the standard ETS screen was presented as one static display. Another dierence between the two tutorials was that the behavior of clicking on a source and then clicking to its destination was animated in the TOEFL tutorial, but was only textually described in the standard ETS tutorial. A ®nal point of contrast was the area on the screen where the skill was practiced±the area was boxed in the standard ETS tutorial, whereas a simulation of the screen the examinee would see while taking the TOEFL test was used in the TOEFL tutorial. 2.4. Summary of design features These instructional and software features, together with information gathered at the trialling of the standard tutorial, helped structure the design of the TOEFL tutorial:
Fig. 8. Screen display from standard ETS tutorial±answer matching/ordering questions.
J. Jamieson et al./System 26 (1998) 485±513
499
1. 2. 3. 4. 5.
Present the skills needed to take the computerized TOEFL test; Provide interaction of those skills through guided practice; Use graphics and animation to reduce the amount of text; Use common words, and short sentences and phrases to simplify the English; and Specify a set of variables on which to capture data in order to assess examinee performance. These features were incorporated in the tutorial in order to achieve our goal of providing sucient computer training in a relatively short period of time so that an examinee's TOEFL test score would not be aected by lack of previous computer experience. In order to include graphics, animation, pictures, sound, and interaction with answer judging, we decided to program the tutorials in Authorware (1995) by Macromedia. Another advantage of Authorware was that it had built-in algorithms which captured data as part of the background (i.e., the user was unaware that data was being collected) on a variety of variables. 2.5. Formative assessment Formative assessment consisted of internal reviews and small scale piloting. The tutorials were internally reviewed at ETS for both content and cultural sensitivity. Three groups reviewed the tutorials as they were being developed. One small group of reviewers consisted of about ten ETS sta members from test development and research. This group was shown prototype material for each tutorial; based on their comments, numerous cycles of revisions were made. Another larger group of reviewers consisted of other sta members from test development and research, the TOEFL program, other ETS programs such as GRE, and various ETS computer groups who would ultimately have to implement the tutorial and integrate it with ETS' computer delivery system. This group was shown each tutorial after revisions had been made; based on their comments, more revisions were made. Finally, the tutorials were reviewed by ETS editors, again followed by revisions. The tutorials were piloted twice with a small number of adults (8±10 individuals in each group) who had learned English as a second language (ESL). The ®rst time, ETS employees volunteered; the second time, ESL students at a local intensive English program participated. These ESL employee and student volunteers worked one-on-one with an ETS sta member as observer. As the volunteer worked through each tutorial, the ETS observer ®lled in a protocol document with notes on behavior and timing. At the end of each tutorial, questions were asked regarding any diculties or any opinions about the tutorial the ESL volunteer wanted to share. An example of the kind of information that was derived from these pilot sessions follows. In the ®rst pilot, one individual had diculty understanding when to watch the animation and when to actually do the exercise. She was clicking the mouse during the animation and thought that she had performed the action; when the exercise appeared, she was frustrated that she had to ``do the exercise again.'' After this pilot session, two changes were made in the presentation: the word ``EXAMPLE'' was
500
J. Jamieson et al./System 26 (1998) 485±513
placed in the middle of the screen and then after 2 seconds it was moved via animation into the banner that is illustrated in Fig. 7. Once the banner was made, the ``Read and Watch'' column was colored pink from top to bottom. In the second pilot, these changes proved to be eective in distinguishing between demonstration and practice stages of a cycle. Finally, the tutorials went through a one-month period of debugging before going out to the ®eld for administration.4 3. Implementing the TOEFL tutorials in the computer familiarity study In this section, the administration of the tutorials as part of the computer familiarity study will be described along with the examinees who participated in the study. In order to evaluate the experiences provided by the TOEFL tutorial, three types of data generated during that study will be presented: time to complete the tutorials, examinees' performance on the tutorials, and examinees' attitudes toward the tutorials. 3.1. Tutorial administration The tutorial ®eld test was conducted between June and August, 1996 as part of the data collection for the two-phase computer familiarity study referred to earlier (Eignor et al., 1998; Kirsch et al., 1998; Taylor et al., 1998). The study included 12 sites: Bangkok, Cairo, Frankfurt, Karachi, Mexico City, Oakland, Paris, Seoul, Taipei, Tokyo, Toronto, and Washington, DC. These sites were selected because they represented the geographic areas (i.e., Africa/Near East, Asia, Europe, Latin America, and North America) proportional to the annual TOEFL testing volumes. Certi®ed computer-based test administrators administered the tutorial and TOEFL test-like items on laptop computers under standardized testing conditions and procedures. Each computer was equipped with a 486 or Pentium processor, hard drive, soundcard, 256 color display, headset, standard English keyboard, external mouse, mousepad, and Microsoft Windows 3.1. Test administrators copied all data to disks and returned the disks and all study materials to the researchers at ETS. 3.2. Examinees A total of 1169 usable data records were obtained (35 ®les were corrupted due to various malfunctions) from the examinees who participated in the ®eld test; 613 had a high level of computer familiarity and 556 had a low level of computer familiarity.
4 The TOEFL tutorials were designed, developed, and programmed by ETS sta Debbie Pisacreta, Thomas Florek, Mary Lou Lennon, Louis Mang, Janet Stumper, Michael Ecker, and Holly Knott, and by Joan Jamieson while on sabbatical at ETS from Northern Arizona University.
J. Jamieson et al./System 26 (1998) 485±513
501
(See Table 1.) (See Footnote 2 or Eignor et al., 1998 for a more detailed description of how the computer familiarity classi®cation was determined.) All of these examinees had taken a paper-and-pencil TOEFL test in either April or May, 1996. Overall, the group represented a range of language ability: 340 examinees had TOEFL scores below 500, 413 had scores between 500 and 549, and 416 had scores at or above 550. The score intervals of less than 500, 500±549, and 550 or greater were used because they re¯ected common cutos used for admission decisions at North American colleges and universities. At many of these institutions, applicants are required to have a TOEFL score of at least 500 to be considered for undergraduate admission and to have a score of at least 550 to be considered for graduate admission. The examinees in the ®eld test included 611 males, 544 females, and 14 who did not indicate their gender; 333 examinees reported taking the TOEFL test for undergraduate admissions, 595 for graduate admissions, and 241 for reasons other than admissions. (See Taylor et al., 1998, for the breakdown of study participants by background characteristics.) Also, as can be inferred from the range of data collection sites, the computer-familiar and computer-unfamiliar examinees represented a variety of native languages. 3.3. Reporting on timing, performance, and self reports In order to characterize the experiences examinees had while they were taking the tutorial as well as to assess the eectiveness of certain design features, three types of data were examined in relation to the examinees' computer familiarity and English ability. First, timing data were analyzed. Second, performance data on the tutorial were examined. Third, examinees' opinions of the helpfulness of the tutorials and their attitudes toward using computers and taking computerized TOEFL tests were analyzed. The ®rst two types of data were captured while students were working on the tutorial. The self-report data was collected via an on-line questionnaire administered to the examinees after the last section of the test-like items contained in the computer familiarity study.
Table 1 Number of examinees by computer familiarity and English ability Background characteristic
Computer familiar
Computer unfamiliar
Total
Total TOEFL scores Below 500 Between 500 and 549 550 and above
613
556
1169
146 205 262
194 208 154
340 413 416
502
J. Jamieson et al./System 26 (1998) 485±513
3.3.1. Analyses of time The developers of the tutorial had thought that both computer familiarity and English ability would be related to the amount of time an examinee spent working through the tutorial. Observing the average time it took groups to complete the tutorials (see Table 2), the computer-familiar examinees took just over the predicted 40 min; they averaged 14.8 min on the ``How to Use a Computer'' tutorials and 26.4 min on the ``How to Answer TOEFL Items'' tutorials. The average time for the computer-unfamiliar group of examinees was about 54 min; they averaged 22.6 min on the ``How to Use a Computer'' tutorials and 31.3 on the ``How to Answer TOEFL Items'' tutorials. To estimate the time needed for operational test administrations, ETS programs often look at the time it takes 85% of a group of examinees to complete a tutorial, rather than the mean time for completion. Using the 85% criterion, the computer-familiar group took 54 min and the computer-unfamiliar group took about 73 min. Clearly, the TOEFL tutorial took longer to complete than planned. On the surface, the fact that computer-familiar examinees took less time to complete the tutorials than their unfamiliar counterparts is what had been expected. However, closer inspection of the two dierent content areas in the tutorials revealed an interesting pattern. The dierence on average between computer-familiar examinees and computer-unfamiliar examinees on the ``How to Use a Computer'' tutorials was 14.8 vs. 22.6. The dierence between these two groups on the ``How to Answer TOEFL Items'' tutorials was much smaller ± 26.4 vs. 31.3. Analyses which included examinees' English language ability were conducted to investigate further these dierences between the time to complete the two types of tutorials.
Table 2 Time to complete the tutorial (in minutes) by computer familiarity Computer-familiar
Computer-unfamiliar
Tutorial section
Mean
85%
Mean
85%
Total
41.2
54.1
53.9
72.9
How to use a computer Mouse Scrolling Testing tools
4.6 4.9 5.3
7.0 6.7 7.0
7.8 7.7 7.1
12.9 10.9 9.3
Subtotal
14.8
20.7
22.6
33.1
How to answer TOEFL items Listening Structure Reading
12.2 3.0 11.2
14.4 3.8 15.2
14.4 3.4 13.3
17.6 4.3 17.9
Subtotal
26.4
33.4
31.3
39.8
J. Jamieson et al./System 26 (1998) 485±513
503
In the ®rst three tutorials on using the computer, the examinees who were computer-familiar worked faster on average than those examinees who were computerunfamiliar, regardless of their English ability (see Table 3). (The following analyses of timing data were computed using seconds rather than minutes to allow for ®ner discrimination.) Recall that the content of the last three tutorials dealt with how to answer the test questions (listening, structure, and reading) for the computerized TOEFL test. From the descriptive statistics for these last three tutorials, it is not clear that computer familiarity was having the same eect on the time it took to complete these tutorials as was the case for the ®rst three tutorials. In order to investigate this relationship in more detail, two-way ANOVAs were run in which time to complete the ``How to Use a Computer'' tutorials was the dependent variable and computer familiarity and English ability were the independent variables. Because of the large sample size, many of our ®ndings were statistically signi®cant. Practical signi®cance was included to help better interpret the meaning of statistical signi®cance since, often in studies with a large number of subjects, small dierences between groups might reach statistical signi®cance but not be of any real practical dierence. Cohen (1988) recommended interpreting 20% of a standard deviation as a small but practical dierence between means. We have adopted his recommendation for this study and use it to determine the extent to which a statistical dierence reaches practical signi®cance. Cohen also notes that a large practical eect is reached when the dierence approaches 75% of a standard deviation. The results presented in Table 4 illustrate the eect sizes of familiarity, ability, and their interaction, as well as both their statistical and practical signi®cance. The average dierence between the computer-familiar and computer-unfamiliar groups represents 74% of the standard deviation, indicating a large practical dierence between the two groups in time needed to complete the set of tutorials. The other main eect on time to complete the tutorials that was of interest was English language ability. The dierence, representing 49% of the standard deviation, is both statistically and practically signi®cant. The interaction between computer Table 3 Average time (in seconds) to complete ``How to Use a Computer'' and the ``How to Answer TOEFL Test Questions'' TOEFL test score range
How to Use a Computer tutorials
How to Answer TOEFL Test Questions tutorials
Computer-familiar Computer-unfamiliar Computer-familiar Computer-unfamiliar Total group
883 (357)
1351 (679)
1583 (404)
1878 (549)
Less than 500 1034 (431) 500 ± 549 907 (338) 550 and above 780 (285)
1460 (772) 1377 (679) 1177 (493)
1831 (469) 1635 (375) 1403 (283)
1991 (655) 1919 (492) 1679 (401)
Note: Numbers in parentheses are standard deviations.
504
J. Jamieson et al./System 26 (1998) 485±513
Table 4 Signi®cance of eect sizes in ``How to Use a Computer'' tutorials Eect (Predictor)
Signi®cance Statistical
Practical (SD=583)
Familiarity Ability Familiarity ability
Yes p 4 0.000 Yes p 4 0.000 No p > 0.05
Yes 74% of SD Yes 49% of SD No
Table 5 Signi®cance of eect sizes on ``How to Answer'' tutorials Eect (Predictor) Familiarity Ability Familiarity ability
Signi®cance Statistical
Practical (SD=500)
Yes p 4 0.000 Yes p 4 0.000 No p > 0.05
Yes 48% of SD Yes 81% of SD No
familiarity and English ability was not statistically signi®cant. So, while both computer familiarity and English-language ability each aected how long it took examinees to complete this set of tutorials, these two variables did not interact in a signi®cant way. Table 5 displays results of the 2-way ANOVA on the other set of tutorials, the ``How to Answer'' TOEFL test questions. Again both computer familiarity and English ability contributed to meaningful performance dierences, and again there was no interaction between them. Note, however, that the magnitude of the sizes of the mean eects between the two types of tutorials, as seen in Table 4Table 5, are reversed. In other words, while familiarity had a large practical eect in the ``How to Use a Computer'' tutorials (74% of SD), English ability had a large practical eect in the tutorials on ``How to Answer'' TOEFL test questions (81% of SD). This ®nding will be explored in more detail in the next section. 3.3.2. Analyses of performance The second area of the tutorials to be examined was the performance of the examinees. As described earlier, the tutorials were designed to present feedback and remediation by means of guided practice for each of seventeen of the main points to be learned. The developers had assumed that at ®rst the computer-unfamiliar or low English ability examinees might not correctly answer a Level 1 exercise (the ®rst level presented, which was at the same level of complexity as would be found on the test); however, after remediation, it was hoped that a Level 1 exercise would be answered correctly on a subsequent try. In order to examine the eectiveness of the guided practice, the status on the 17 Level 1 exercises was broken out by level of computer familiarity. (See Table 6.)
J. Jamieson et al./System 26 (1998) 485±513
505
Table 6 Percentage of computer-familiar and computer-unfamiliar examinees on passing Level 1 exercises Tutorial level 1 exercise
Computer-familiar Passed 1st
Mouse 1 ± Move Mouse 2 ± Point Mouse 3 ± Click Scroll 1 ± Down Scroll 2 ± Up Tools 1 ± Next/Con®rm Listening 1 ± Single answer Listening 2 ± Click picture Listening 3 ± Click letter Listening 4 ± Two answers Listening 5 ± Match/ordering Structure 1 ± Fill in blank Structure 2 ± Click word/phrase Reading 1 ± Single answer Reading 2 ± Click word/phrase Reading 3 ± Click paragraph Reading 4 ± Add sentence
100 100 98 100 100 100 99 100 100 100 99 100 100 98 98 100 99
Passed 2nd
Computer-unfamiliar Not Passed
2 * * 1 * 1 2 2 1
Passed 1st 98 99 92 99 99 98 97 100 100 98 93 100 100 97 97 99 97
Passed 2nd
1 1 2 3 * * 2 7
Not passed 2 1 8 *
* 3 3 1 3
Note: * indicates that a cell had examinees, but not enough to equal at least 1%.
In this table each of the seventeen critical skills is listed down the vertical axis. Across the top, for both computer-familiar and computer-unfamiliar examinees, percentages are given of those who passed a Level 1 exercise the ®rst time, those who passed it on the second attempt, and those who never passed it, but were allowed to proceed. As Table 6 shows, over 97% of the computer-familiar examinees and (with two exceptions) over 96% of the computer-unfamiliar examinees got all seventeen main points correct on the ®rst try, needing no remediation. When remediation was received, the success rate was often 100%. Although not shown in the table, analysis by level of English ability revealed a similar pattern of over 95% correct on the ®rst try regardless of level. Table 7 indicates the total number of the seventeen Level 1 exercises that were correctly answered either on the ®rst try or on the second try. This descriptive Table 7 The average total score on tutorial exercises by computer familiarity and TOEFL ability TOEFL test score range
Total
Computer-familiar
Computer-unfamiliar
Total group
16.87 (0.44)
16.94 (0.27)
16.79 (0.56)
Less than 500 500±549 550 and above
16.82 (0.49) 16.83 (0.52) 16.94 (0.26)
16.90 (0.35) 16.91 (0.32) 16.98 (0.15)
16.76 (0.57) 16.76 (0.65) 16.88 (0.37)
Note: Numbers in parentheses are standard deviations.
506
J. Jamieson et al./System 26 (1998) 485±513
analysis seemed to indicate that there was no eect of either computer familiarity or English ability on performance, as over 16.7 of the 17 Level 1 exercises were answered correctly regardless of computer familiarity or ability. ANOVAs were not run to examine these results in more detail as there was virtually no variance to account for. Most examinees correctly completed the leveled exercises; hence, the actual value of the type of remediation included in the TOEFL tutorials could not be adequately assessed because so few examinees were unable to perform the required skills. 3.3.3. Analyses of examinees' self-report data The third area of the tutorials to be examined involved the opinions of the examinees. Three topics were examined regarding examinees' self-reports: opinions of the ``helpfulness'' of each of the six tutorials, examinees' comfort using computers, and their comfort taking a TOEFL test on computer. First, examinees were asked to rate each of the six tutorials in terms of helpfulness in preparing to answer the TOEFL test questions. Response options ranged from ``very helpful'' to ``not at all helpful.'' Numeric values were assigned so that ``very helpful'' equaled 4, ``helpful'' equaled 3, ``somewhat helpful'' was given a 2, and ``not at all helpful'' was given a 1. Thus, the total scale ranged from a low of 6 to a high of 24. As seen in Table 8, the majority of examinees thought that the tutorials were helpful, although it appeared that the computer-unfamiliar examinees were slightly less positive than their computer-familiar counterparts. The eects of computer familiarity and English ability were again investigated through the use of an ANOVA. Table 9 illustrates that there was no interaction between ability and familiarity. Ability had no signi®cant eect on attitude toward helpfulness of the tutorials, but computer familiarity did have a signi®cant eect. Perhaps the novelty of the computer experience for the computer-unfamiliar examinees precluded a more enthusiastic response. Next, change in perspectives toward comfort with computers and taking a TOEFL test on computers was examined. Because examinees had answered single questions on these two topics in both April, 1996 and summer, 1996, it was decided to compare the responses of examinees across time. The questions were ``How Table 8 The average total score to questions asking about the utility of the six tutorials by computer familiarity and ability TOEFL test score range
Total
Computer-familiar
Computer-unfamiliar
Total group
19.6 (3.7)
20.2 (3.6)
18.9 (3.8)
Less than 500 500±549 550 and above
19.4 (3.7) 19.4 (3.8) 20.0 (3.7)
20.3 (3.5) 20.2 (3.4) 20.3 (3.7)
18.7 (3.6) 18.7 (3.9) 19.6 (3.7)
Note: Numbers in parentheses are standard deviations.
J. Jamieson et al./System 26 (1998) 485±513
507
Table 9 Signi®cance of eect size of computer familiarity and TOEFL ability on the response to a set of questions about the utility of the tutorials Eect (Predictor)
Signi®cance Statistical
Practical (SD=3.74)
Familiarity Ability Familiarity ability
Yes p 4 0.000 No p > 0.05 No p > 0.05
Yes 33% of SD No No
comfortable are you with using a computer?'' and ``How comfortable are you with taking a TOEFL test on a computer?'' Response options for both questions ranged from ``very comfortable'' (coded as 4), ``comfortable'' (coded as 3), ``somewhat comfortable'' (2), and ``not at all comfortable'' (1). In April, 1996 the examinees responded to a questionnaire by pencil after they had taken a paper-and-pencil TOEFL test; in summer 1996, examinees answered the same questions after they had completed the computer-based tutorial and the TOEFL-like items by clicking on their responses on-line. To explore whether there were changes in these self-assessments over time, responses were compared and placed in one of three categories: (1) the response in summer was the same as that given in April, in which case it was categorized as ``no change''; (2) the response in summer was more positive than it had been in April by one or more response options (e.g., changing to ``very comfortable'' from ``comfortable''), in which case it was categorized as ``positive change''; or (3) the response in summer was more negative by one or more response options than it had been in April (e.g., changing to ``not at all comfortable'' from ``somewhat comfortable''), in which case it was classi®ed as ``negative change.'' First, the question of whether examinees had become more comfortable with using computers was addressed in terms of their English ability. Overall, 33% of the total group had become more comfortable and about 15% had become less comfortable, as shown in Table 10. One crude way to interpret this data is to regard the negative change as baseline data on the instability of the instrument; the fact that the positive change is double Table 10 Status of examinees' attitude toward comfort in using a computer by English ability TOEFL test score range
Positive change
No change
Negative change
Total group
383 (33%)
604 (52%)
168 (15%)
Less than 500 500±549 550 and above
129 (39%) 131 (32%) 123 (30%)
151 (45%) 212 (52%) 241 (58%)
53 (16%) 64 (16%) 51 (12%)
Note: 2=12.79, 4 df, p=0.01.
508
J. Jamieson et al./System 26 (1998) 485±513
that of the negative change gives an indication that the group has become more comfortable with computers. English ability did not seem to have a major eect on change of comfort level toward computers. At all three ability levels, about half of the examinees answered the same both times. The signi®cant chi-square does, however, indicate a slight tendency for a positive change in attitude. Table 11 illustrates the change in comfort level in using computers in terms of examinees' computer familiarity. For the computer-familiar candidates, roughly two-thirds did not change their view toward their own comfort level. The percentage distribution of those who did change their responses toward comfort was nearly the same in both positive and negative directions; these changes could be due to the instability of the instrument, as stated above. The pattern of changes was quite different for the computer-unfamiliar examinees, as supported by the large chi-square statistic. After participating in the study, only about 10% reported feeling less comfortable; about 40% reported the same level of comfort; and about 50% reported that they were more comfortable using a computer. The data suggest a change of opinion toward individual comfort with computers which tended to be more positive than negative, particularly for examinees who were unfamiliar with computers or who had low English ability. The ®nal question examined the change in comfort level of examinees in regard to their taking a computerized TOEFL test. It was ®rst examined in terms of examinees' English ability. Table 12 illustrates that 46% of the examinees became more positive about taking a computerized TOEFL test after they had participated in the computer familiarity study, versus 17% who had become less positive after the experience. The relatively small, but signi®cant, chi-square statistic indicated an interaction between change in attitude and English ability. In terms of how computer familiarity was related to a change in attitude about taking a computerized TOEFL test (see Table 13), less than half of the computerfamiliar candidates maintained the same opinion towards taking a computerized test, with 30% becoming more positive and about 24% becoming more negative. As for the unfamiliar candidates, 9% became more negative, 27% maintained the same view, and 64% became more positive after completing the computer familiarity study. Although there was a positive shift in opinion on the part of the computer-unfamiliar examinees, this must be interpreted with caution. In April, about 80% of the computer-unfamiliar candidates responded with either ``not at all comfortable'' Table 11 Status of examinees' attitude toward comfort in using a computer by computer familiarity Computer familiarity
Positive change
No change
Negative change
Total group Familiar Unfamiliar
384 (33%) 102 (17%) 282 (52%)
605 (52%) 390 (64%) 215 (39%)
170 (15%) 120 (20%) 50 (9%)
Note: 2=160.68, 2 df, p < 0.01.
J. Jamieson et al./System 26 (1998) 485±513
509
Table 12 Status of examinees' attitudes toward taking the TOEFL test on computer by English ability TOEFL test score range
Positive change
No change
Negative change
Total group Less than 500 500 ± 549 550 and above
518 (46%) 166 (52%) 185 (47%) 167 (41%)
422 (37%) 94 (29%) 150 (38%) 178 (44%)
187 (17%) 61 (19%) 62 (16%) 64 (16%)
Note: 2=16.01, 4 df, p < 0.01. Table 13 Status of examinees' attitudes toward taking the TOEFL test on computer by computer familiarity Computer familiarity
Positive change
No change
Negative change
Total group Familiar Unfamiliar
519 (46%) 177 (30%) 342 (64%)
424 (38%) 281 (47%) 143 (27%)
188 (17%) 141 (24%) 47 (9%)
Note: 2=140.90, 2 df, p < 0.01.
or ``somewhat comfortable''; in summer, nearly 50% of these examinees still were not comfortable or only somewhat comfortable with the prospect of taking a computerized TOEFL test. Thus, while there was a strong shift among examinees toward a more positive attitude about taking a TOEFL test on computer, a large percentage of the overall examinee population still reported some discomfort. However, it is not clear whether this discomfort was with a computerized TOEFL test or a TOEFL test in general, whether computer-based or paper-and-pencil. 4. Going operational Overall, the data collected and analyzed as part of the computer familiarity study indicated that the design of the TOEFL tutorial was successful. First of all, the decision to include data collection routines in the tutorial was important as it allowed assessment not only of the examinees' performance, but also of the tutorial itself. Secondly, due to the expected range of English abilities and levels of computer familiarity, it was decided to use simple English, and to illustrate skills and concepts through graphics and animation in order to reduce the reading load of the tutorials. The analyses of the timing data lead us to believe that this approach was successful. Recall that computer familiarity had a larger eect than English ability on time to complete the three ``How to Use a Computer'' tutorials, but English ability had the larger eect in the tutorials re¯ecting the content of the TOEFL test. We cautiously suggest that this has implications for design of computerized language tutorials. We think that when designing computer tutorials for second language learners, it is
510
J. Jamieson et al./System 26 (1998) 485±513
important to take both language ability and computer familiarity into account as both have roles to play, but perhaps at dierent levels of content. In other words, by consciously planning to use simple text, graphics, and animation wherever possible, the role of language was reduced. However, when the tutorials themselves were about language, and the language could not be simpli®ed and still re¯ect the actual test tasks to be completed, the role of language resumed its importance. Another important tutorial design feature was the use of ``cycles'' in which presentation was followed by interaction of each skill needed to take the computerized test. Recall that the presentations consisted of a textual presentation followed by an animation of the skill, and the interaction consisted of a question, response, feedback, and remediation. The performance data indicated that, regardless of their computer familiarity or English ability, over 95% of the examinees successfully completed the exercises. The data also revealed that the vast majority of the examinees needed no remediation. As a design feature, the inclusion of guided practice in the form of levelled exercises was thought to be an important addition that had not been included in the standard ETS tutorials. This belief did not ®nd support in the data. Finally, the results of the questionnaire data indicated that examinees on average found the tutorials helpful which leads us to infer that examinees generally had a more positive than negative experience in taking the tutorials. Needs and content assessment, as well as formative assessment, were conducted to make the best tutorial that we could. However, no claims can be made regarding the relative or absolute eectiveness of the TOEFL tutorial. Project constraints prohibited the development of multiple tutorials or the inclusion of a group of examinees who received no tutorial. (See Taylor et al., 1998 for details.) Thus, the design of the study precluded a comparative assessment of the TOEFL tutorials with the standard ETS tutorials. Also, the guidelines cited in the introduction and ETS policy for computerized tests require examinee training with computers before administration of computerized tests; therefore, the question of how the TOEFL tutorials compared with no training at all could not be addressed. One place where the tutorials did not meet expectations was in the time it took examinees to complete them. The tutorials took much longer to complete than projected. Taking this into consideration along with the apparent success of the examinees in responding to the tutorial exercises, the TOEFL program decided to implement the new TOEFL tutorials as part of its operational computerized test administrations, after some changes were made in order to reduce redundancy, minimize explanation of non-essential procedures, and reduce time. Below, some of the changes that were subsequently made are brie¯y described. In all of the tutorials, animation was speeded up by 2 seconds as it was generally felt that they could run slightly faster without aecting performance. Other changes are explained below. In the ``How to Use a Computer'' tutorials, the explanation of ``other ways to scroll'' was simpli®ed to one screen display and practice was deleted; how to scroll
J. Jamieson et al./System 26 (1998) 485±513
511
one line at a time remained intact with its explanation and leveled exercises ± this basic scrolling method was considered essential for moving through the tutorials and the test, but the other methods were not. In the testing tools tutorial, the text of a sample passage on a ``help'' screen was replaced with lines in order to reduce time spent reading. The important skill was to become familiar with the interface. Also, examinees could read the content of the ``help'' screen during the practice items in the tutorials. Mandatory practice for the ``time'' icon was deleted as this was a simple interface which was explained and animated. The ``How to Answer'' tutorials were streamlined in two ways. Redundancy existed in these tutorials in that each one explained how to answer single selection multiple choice items. Also, single selection multiple choice had been explained separately for clicking on an oval, a picture, and a letter. These explanations and exercises were eliminated and replaced with a section on general response types in the ``How to Answer Listening Comprehension'' tutorial. Examinees were presented with the explanation, demonstration, and leveled exercises, and were then shown through explanation and examples that the item type may look dierent, but required the same response skill. In the structure and reading tutorials, examinees were reminded that they had already practiced these response types and were shown a brief animation of answering each type of question with structure and reading content. Also in the last two tutorials, animation was removed from the explanation for how to change an answer. (It was kept in the listening tutorial.) Two features were added to the ``How to Answer Reading Comprehension'' tutorial. Due to changes in the test design, two new icon interfaces were added ± ``next'' and ``previous'' ± so that examinees could go back to review items and change their answers if they desired to do so. As the layout seemed to work well, no changes were made except that for each new screen the cursor appeared on the ``go on'' or ``see more'' icons ± something that had initially been overlooked. Although the performance data indicated that the vast majority of examinees made no use of the levelled exercises, these remediation cycles were considered important characteristics of the tutorial design for computer-unfamiliar examinees. Given the concern for examinees who are unfamiliar with computers, the TOEFL program has decided to keep the guided practice as a feature of the TOEFL tutorials. The basic design for the instructional cycles was not changed except that the time delay for a help message was changed from 30 to 20 seconds in many of the exercises, and the time delay for an incorrect response or a non-response was reduced from 60 to 40 s. These and other changes are projected to result in a savings of a little over 15 min. The revised TOEFL tutorials are expected to average about 31 min for completion, with 85% of examinees expected to take no more than 42 min. As mentioned in the introduction of this article, the development of the TOEFL tutorials was part of a broader set of questions investigating the eects of computer familiarity on performance on a set of computerized language tasks. The reader may
512
J. Jamieson et al./System 26 (1998) 485±513
wish to refer to the article by Taylor et al. (1998) which reported ``after administering the CBT tutorial and controlling for language ability, there were no meaningful dierences in performance between candidates with low and high levels of computer familiarity'' (p. 27). In other words, having taken language ability into account, after the TOEFL tutorial training package was administered, there was no meaningful eect of lack of prior computer familiarity on performance on computerized TOEFL-like items. The revised TOEFL tutorials are currently being used as part of large-scale ®eld trials and will continue to be administered as part of the computerized TOEFL test when the test goes operational in 1998. The TOEFL tutorials described in this article will be used to train examinees before administration of the operational test. The TOEFL program has also made the computerized tutorials available on CD-ROM and at the TOEFL program website (www.toe¯.org). References Alessi, S., Trollip, S., 1991. Computer-Based Instruction. Prentice Hall, Englewood Clis, NJ American Council on Education, 1995. Guidelines for Computerized-Adaptive Test Development and Use in Education. Author, Washington, DC Andrews, J., 1984. Discovery and expository learning compared: Their eects on independent and dependent students. Journal of Educational Research 78, 80±89 August, D., Christian, D., 1997. Studies of school and classroom eectiveness. In: August, D., Hakuta, K. (Eds.), Improving Schooling for Language-Minority Children. National Academy Press, Washington, DC, pp. 163±249 Authorware 3.0 (computer software), 1995. Macromedia, San Francisco, CA Cohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillside, NJ Corno, L., Snow, R. 1986. Adapting teaching to individual dierences among learners. In: Wittrock, M. (Ed.), Handbook of Research on Teaching, 3rd ed., Macmillan, pp. 605±629 Cox, R., 1992. Exploratory learning from computer-based systems. In: van Merrienboer J. (Eds.), Instructional Models in Computer-Based Learning Environments. Springer, Berlin, pp. 405±420 Dick, W., 1996. The Dick and Carey Model: will it survive the decade? Educational Technology Research and Development. 44, 55±63 Dick, W., Carey, L., 1990. The Systematic Design of Instruction. HarperCollins, New York Dijkstra, S., Krammer, H., van Merrienboer, J. (Eds.), 1992. Instructional Models in Computer-Based Learning Environments. Springer, Berlin Educational Testing Service, 1995. POWERPREP: Preparing for the GRE General Test [Computer Software]. Author, Princeton, NJ Eignor, D., Taylor, C., Jamieson, J., Kirsch, I., 1998. Development of a Scale for Assessing the Level of Computer Familiarity of TOEFL Examinees. TOEFL Research Report No. 60. Educational Testing Service, Princeton, NJ Fox, B., 1993. The Human Tutorial Dialogue Project. Lawrence Erlbaum Associates, Hillside, NJ Gibbons, A., Rogers, D., 1991. The practical concept of an evaluator and its use in the design of training systems. Educational Technology 31, 7±15 Kirsch, I., Jamieson, J., Taylor, C., Eignor, D., 1998. Computer Familiarity among TOEFL Examinees. TOEFL Research Report No. 59. Educational Testing Service, Princeton, NJ Newby, T., Stepich, D., Lehman, J., Russell, J., 1996. Instructional Technology for Teaching and Learning. Merrill, Englewood Clis, NJ
J. Jamieson et al./System 26 (1998) 485±513
513
Powers, D., O'Neill, K., 1993. Inexperienced and anxious computer users: Coping with a computeradministered test of academic skills. Educational Assessment 1, 153±173. Rosenshine, B., Stevens, R., 1986. Teaching functions. In: Wittrock, M. (Ed.), Handbook of Research on Teaching, 3rd edition. Macmillan, New York, pp. 376±391 Shafritz, J., Koeppe, R., Soper, E., 1988. Dictionary of Education. Facts on File, New York Smith, P., Ragan, T., 1993. Instructional Design. Merrill, New York Snow, C., Leinhardt, G., 1997. Cognitive aspects of school learning: Literacy development and content learning. In: August, D., Hakuta, K. (Eds.), Improving schooling for language-minority children. National Academy Press, Washington, DC, pp. 53±83 Soulier, J.S., 1988. The Design and Development of Computer Based Instruction. Allyn and Bacon, Boston Steinberg, E., 1990. Computer-Assisted Instruction: A Synthesis of Theory, Practice and Technology. Lawrence Erlbaum Associates, Hillside, NJ Steinberg, E., 1991. Teaching Computers to Teach. 2nd ed. Lawrence Erlbaum Associates, Hillside, NJ Taylor, C., Jamieson, J., Eignor, D., Kirsch, I., 1998. The Relationship Between Computer Familiarity and Performance on Computer-based TOEFL Test Tasks. TOEFL Research Report No. 61., Educational Testing Service, Princeton, NJ