Keeping assessment local: The case for accountability through formative assessment

Assessing Writing 12 (2007) 44–59 Keeping assessment local: The case for accountability through formative assessment Libby Barlow ∗ , Steven P. Lipar...

Download PDF

409KB Sizes 1 Downloads 63 Views

Report

PDF Reader
Full Text

Assessing Writing 12 (2007) 44–59

Keeping assessment local: The case for accountability through formative assessment Libby Barlow ∗ , Steven P. Liparulo, Dudley W. Reynolds University of Houston, Texas, United States Available online 22 June 2007

Abstract This paper discusses the method, results, and implications of a study of undergraduate student writing at a large, diverse research university, and makes the case for grounding substantive writing assessment in local stakeholders and formative design. Data were collected from multiple sources: writing samples obtained from courses across disciplines, faculty descriptions of the assignments behind the samples, survey data reporting what faculty expect and observe in student writing, and survey data representing the writing-related attitudes and behaviors of students. Analyses included scoring samples against a rubric, exploratory factor analysis, and regression analysis. Discussion focuses on the design, stakeholder and philosophical issues that determine the utility and impact of the results of a writing assessment, particularly as accountability pressure increases. © 2007 Elsevier Inc. All rights reserved. Keywords: Accountability; Embedded assessment; Impact; Multiple data sources; Stakeholders; Undergraduate writing research; Writing assessment

David Russell, in his historical account of Writing in the Academic Disciplines, argues that, Efforts to meaningfully assess student performance in writing (and learning) across the curriculum are still not widespread . . .. Yet collaborative assessment of writing done in courses can provide more authentic measures and at the same time bring stakeholders together to develop and apply standards. This can lead to a different—and often more useful—kind of objectivity, where mutual understanding and consensus building is the key. (2002, p. 323) ∗ Corresponding author at: 207 Ezekiel Cullen Building, University of Houston, Houston, TX 77204-2035, United States. Tel.: +1 713 743 0644; fax: +1 713 743 7899. E-mail addresses: [email protected] (L. Barlow), [email protected] (S.P. Liparulo), [email protected] (D.W. Reynolds).

1075-2935/$ – see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.asw.2007.04.002

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

45

In spring 2004 when the University of Houston’s Undergraduate Council asked the Executive Director of Institutional Research to undertake an investigation of problems with student writing, they were not aware of Russell’s work, but now as we—the team that conducted the assessment—look back on the significance of our work, we find that Russell’s characterization summarizes with uncanny accuracy the motivation, process, and benefits of the study that emerged. The study did not report the results of an objective test; instead, it presented a formative assessment grounded in “mutual understanding and consensus.” The following describes a collaborative type of writing assessment that, as Russell notes, is not yet widespread in U.S. postsecondary education, in part because it is not driven by any of the traditional engines of higher education writing assessment: diagnostic placement, competency certification, or program evaluation (White, 1994b). Rather, it is an exploratory, formative assessment designed to inform university-wide, curriculum planning efforts. In this regard it can be seen as aligned with recent changes to accrediting standards for U.S postsecondary institutions, as well as reform initiatives emanating from the Department of Education (e.g., U.S. Department of Education, 2006) that emphasize accountability for student learning (as opposed to program quality). The request for the study, however, as well as the design decisions made by the assessment team and its intended applications, were entirely local. We would argue nevertheless that collaborative assessment which elevates local institutional values and processes over nationally standardized criteria as a basis for understanding student learning ultimately has the best chance of achieving the curricular and pedagogical transparency and accountability sought by this latest round of postsecondary reform (for similar perspectives see Huot, 2002; Moss, 1994; Rutz & Lauer-Glebov, 2005). This article thus presents a writing assessment study in which a large number of small, locally taken steps add up to a giant leap for an institution and its ability to meet the demands of both a diverse student body and a national agenda for accountability assessment. For us, as it may evolve for others, the key to balancing national issues and local imperatives was to think in terms of making our assessment useful (Leonhardy & Condon, 2001). 1. Project beginnings The impetus for our study was a request by a university-level general-education curriculum committee to the university’s head institutional research officer for data that described problem areas in student writing, identified students in need of assistance, and suggested how resources could be targeted to address identified needs effectively. The request stemmed from committee discussions regarding concerns among faculty about the quality of student writing. As noted by a number of authors (e.g., Daniels, 1983; Miller, 1989; Russell, 2002), such concerns have been a recurring theme in U.S. postsecondary institutions for well over a century and frequently have emerged from “the assumption that writing was an elementary transcription skill” (Russell, 2002, p. 7). This assumption in turn “led educators to look for a single solution to a specific educational problem when they actually needed a whole new conception of the role of writing in learning, one that would take into account the modern organization of knowledge through written communication.” In response to the committee’s request the institutional research officer assembled an assessment team comprising representatives from the originating committee, the Writing Center, and the English department. Team members held institutional roles as university, college, and departmental administrators, faculty, and staff, and could claim expertise in educational measurement and assessment, writing pedagogy, second language writing, and writing in the disciplines. The committee’s membership represented a conscious attempt to address from the outset some of the

46

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

pragmatic and political issues for writing assessment created by what E.M. White referred to in the premier issue of this journal as assessment’s “diverse and often conflicting stakeholders” (1994a, p. 12). 2. Initial discussions of ﬁnal impact The assessment team’s initial discussions focused on the mission and guiding principles for the study. Often these discussions, which prepared the ground for our choices about data collection and analysis techniques, dealt with what we might find if we did x and how it might be received by the various stakeholders. Thus, one of the first discussions dealt with the sources for faculty concerns about writing. We recognized that as language, teaching, and assessment specialists our assumptions probably differed from those of the more-vocal faculty we had been hearing, but we also realized that the most useful way ahead was to explore those concerns as part of our institutional values. As Moss writes: “assessment systems provide more than indicators of students’ achievements: they provide potent and value-laden models of the purposes and processes of school, of the appropriate roles for teachers, students, and other stakeholders in the discourse of teaching and learning, and of the means through which educational reform is best fostered” (1994, p. 124). We could thus see a formative function for our assessment. We also came to the conclusion that if faculty already believed that students were not learning to write “properly,” then the views of external “experts” about what constitutes good writing would be unlikely to challenge significantly their conceptions or convince them otherwise. Instead, it would be important to establish locally identified characteristics against which to judge student performance; we would need to tap into the vocabulary and traits that faculty used when discussing “good” and “bad” writing. Hamp-Lyons (1991) describes a similar conclusion as behind an extensive consultation with faculty during the development of a multi-trait placement rubric at the University of Michigan in the 1980s. While such a process should clearly be a part of the construct validation for any assessment (Messick, 1989), our point here is that it also makes good political sense if you want your findings to lead to change. Pragmatically, we grew committed to an ethnographic element to our study, to locate and organize faculty perceptions about student writing. Since we hoped that our findings would serve an educational function, we next needed to conceptualize the “learning objectives” for the study. Our discussions rendered us willing to sponsor a certain level of controversy about the complexities of writing and learning to write. The University of Houston is “the most ethnically diverse research university in the nation” (“UH at a Glance”, 2006). Additionally, we are firmly committed through a series of partnerships between the Writing Center and academic units across the university to discipline-specific writing instruction. We wanted the study to be open to and explicitly acknowledge the potential differences among our students, the ways they learn, and the ways they write. Finally, we had to temper our discussions with considerations of what we could feasibly achieve. Viewing writing as an “elementary transcription skill” often leads to the mistaken perception that it is an easy problem to solve. We felt it was important to portray our study as more exploratory than definitive, as laying groundwork rather than being comprehensive. We also knew that we needed to balance the time necessary for a detailed and credible study with the window of opportunity created by the committee’s specific request. If our study was to start a process of reconsidering academic writing, then we needed to report back before the committee’s membership rotated off and while they still remembered the discussions that prompted the original request. This concern

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

47

with practicality led to a number of decisions regarding sampling but also resulted in a fairly strict timeline for generating our final report to the committee.1 These initial discussions made utility and the impact of our results the principal criteria for decisions about data collection, analysis, and reporting. Our priorities are in keeping with current socially based understandings of test validity (Messick, 1989) and test usefulness (Bachman & Palmer, 1996), but they also reflect a more organic understanding of assessment as a form of exploratory research designed to yield formative results (Huot, 2002). The following sections provide more detail about the decisions we made, our findings, and the lessons we learned. 3. Study design and implementation—taking on the troll beneath the bridge The design decisions that guided our actual study crystallized around our collective and evolving responses to three problematic questions: 1. What is academic writing? 2. How should we measure writing? 3. Where should we look for influences on student success? These deceptively simple questions nonetheless pinpoint the issues that have turned writing into something of a troll beneath the bridge in higher education—the monster created by fears of its elusive and complex nature, the shadowy beast that hides out of sight waiting for someone else to take it on using methodologies the academy fails to fully own. Following Rose (1985), Russell (2002) refers to this problem as the “transparency of rhetoric” within academic disciplines. Rose argues that each generation of academics “sees the problem [of student writing] anew, laments it in the terms of the era, and optimistically notes its impermanence. No one seems to say that this scenario has gone on for so long that it might not be temporary” (p. 541). Russell in turn argues that the scenario is an unavoidable result of gradual and transparent practices of learning to write in specific disciplines—initiatory practices that are mostly disowned as writing instruction at all, and therefore invisible unless conscious attention is called to them. 3.1. “What is academic writing?” Our first challenge was defining the construct, deciding what our study would examine to represent academic writing. Many of the university stakeholders tend to imagine a writing “test,” based on their experiences with writing placement instruments that typically rely on holistically rated, timed, impromptu responses to a standardized prompt. Our own institution had used such an exam for a number of years to “certify” the writing proficiency of graduates, and even some of the committee members initially assumed that this would be our only option. As Weigle (2002) relates, however, whether such tests provide authentic indicators of university writing is questionable: they do not require the use of source materials; their timed format is typically not a requirement of academic writing; they disrupt normal relationships between writer and audience; and their rating rubrics frequently overemphasize organization and language control at the expense

1 Plans were developed in the fall of 2004, data were collected spring 2005, and analyses were conducted summer and fall of 2005. Preliminary results were shared in December 2005 with faculty and department chairs representing the courses from which data were collected and the final written report to the committee was delivered in spring 2006.

48

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

of more rhetorical traits such as topic development and content. Fortunately for our efforts, the university’s experience with its proficiency exam supported Weigle’s critique. Faculty members had been required to help grade the exam, and many attribute its eventual demise to their concerns about the authenticity of timed responses to a generic prompt as well as the mismatch between the evaluation rubric and their own procedures for grading. Many also felt its results were not particularly useful, since there were no real consequences for student performance on the rating. The context was ripe therefore for the assessment team’s proposal to request copies of student papers for a required writing assignment in a junior level course in every academic department as our primary data source. This embedded assessment would provide us an authentic and descriptive look at writing in the disciplines, at a level where we could expect a higher level of student achievement and faculty expectations than in general education courses. We solicited the papers by contacting chairs in each department and asking them to find a willing instructor who assigned writing as part of his or her course. Ultimately we received 769 student papers from 24 departments housed in seven of the university’s eight colleges that offer undergraduate majors. In the interest of practicality we randomly selected 20 papers (or as many as provided if less than 20) from each course to actually assess for a total of 419. Decisions about authenticity and standardization are often pitted as a choice between validity and reliability (Huot, 2002; Moss, 1994; Weigle, 2002; White, 1994a; Yancey, 1999). If, however, we adopt a design framework such as Bachman and Palmer’s (1996) that emphasizes test usefulness as a holistic construct responding to the assessment’s purpose and context, then our decision to sample course writing makes sense—we wanted our study to explore the variety of writing in which our students engage. Thus, this is the first of what we will be referring to throughout the rest of this article as “small steps,” edging up cautiously on the troll under the bridge. By involving as many departments as we could in our data collection and asking them to identify where writing occurred within their curriculum, we were starting a feedback loop. Our definition of academic writing turned out to be simple, after all, once we reconceived our objectives as balancing summative and formative aims: academic writing in our study would be the writing going on in courses taught at the university. We would collect, and let the dimensions of our yield point out the next steps. 3.2. How should we measure academic writing? The assessment team’s next task involved identifying constructs for measurement that would be locally meaningful to our faculty but also able to accommodate the diversity of the writing assignments collected. A request was sent out over the university faculty listserv introducing the study and asking the recipients to complete a brief on-line questionnaire (see Appendix A). The questionnaire asked for scaled responses regarding areas that faculty respond to in student writing and then open responses about the faculty member’s general perceptions of student writing. We received 187 responses (21% of ranked faculty), with proportional representation from all colleges within the university. Our analysis of the faculty’s responses identified a primary emphasis on what we termed “conceptual objectives” (e.g., synthesize, analyze, argue, etc.), and secondarily, composition-oriented constructs such as clarity and organization. In order to turn these general priorities into a usable rubric we also consulted the Writing Program Administrators’ (WPA) Outcomes Statements for First-Year Composition Programs (2000) and the Texas Higher Education Coordinating Board’s Exemplary Educational Objectives for Composition. By going back and forth between these different sources of information for how we could talk about (and measure) writing we began to see

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

49

Table 1 Mean performance levels (possible range 1.00–3.00) in sample of course-embedded writing assignments (N = 419)

Mean S.D.

Purpose

Evidence-based reasoning

Flow management

Audience awareness

Language control

2.54 0.54

2.41 0.53

2.35 0.56

2.59 0.50

2.07 0.46

a pattern where criteria descriptive of the writer in rhetorical situations (purpose, audience, level of formality, etc.) combined with expectations related to genre and correctness (format, style, documentation, mechanics, etc.). This led to a five-trait rubric measuring: Clarity of Purpose, Evidence-Based Reasoning, Management of Flow, Audience Awareness, and Language Control. Given the diversity of writing assignments included in the sample, we chose to measure the traits very broadly using three performance levels: (1) criterion not present; (2) criterion inconsistently present; and (3) criterion consistently present. Graduate students with experience teaching composition served as raters. Following standard practice (e.g., White, 1994b), a member of the assessment team discussed the evaluation criteria with the raters, reviewed benchmark essays chosen by members of the assessment team, and then evaluated a set of practice papers with the raters. If the raters differed by more than one level for any of the traits, then the team leader provided a third rating. Table 1 reports the findings of this rating process. We interpreted the scores as indicating that students consistently meet expectations when it comes to establishing a purpose in response to the assignment and to addressing an appropriate audience. They were also generally successful in terms of putting the writing together, both conceptually and formally, but not to the extent that they were with purpose and audience. In contrast, the students were less consistent in controlling the style and mechanics of their written language. These findings constitute what many of the stakeholders expected from our study—a diagnostic assessment of where students are strong and where they are weak. Moreover, the findings can be seen as confirming a number of different stakeholders’ expectations. The faculty who predicted we would find problems were right: our students were not always editing their work effectively. On the other hand, students tended to perform better on the conceptual objectives that the faculty questionnaires indicated are valued most and which faculty perhaps emphasize in their teaching. We shared these findings with each chair and faculty member who had helped with the data collection process in a “writing report card” as well as with the university community at large in our final report (http://www.uh.edu/writecen).2 Being able to provide information that stakeholders expected was another small step—we were validating our community’s ability to assess student writing, which is essential if they are to take ownership of it. 3.3. “Where should we look for inﬂuences on student success?” The next part of our analysis, correlations between student writing performance and potentially contributing factors, was probably less expected by the stakeholders. It grew out of our initial commitment to acknowledge as part of the study differences among our students, the ways they learn, and the tasks that we ask them to accomplish. Too often accountability studies stop with

2 In the reports to the individual departments the means for the sample they had contributed were included along with the university-wide results. The final report is available from the authors by request, or at http://www.uh.edu/writecen/).

50

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

the “report card,” with a rating but no exploration of what may account for it. Alternatively, searching for the smoking gun, such studies consider potential influences on student performance in isolation without allowing for interacting variables. As part of our final report to the university community, however, we chose to regress a wide range of potential predictor variables against a sum of the individual’s writing trait scores.3 The variables we chose were: • • • • •

past academic performance as measured by the student’s cumulative grade point average (GPA), the age at which they started learning English, whether they were transfer students, the strength of their individual attitudes and beliefs about writing, and the explicitness of the guidance for writing in the assignment they were given.

The information about GPA and transfer status simply came from the university’s records. Measures of student attitudes towards writing as well as the age at which they began learning English came from a survey completed by students in the courses that supplied the writing samples. The survey was written by the assessment team using items that had been validated in previous research by the Writing Center. It included 23 attitudinal items for which students provided Likertlike responses as well as four demographic questions. Responses to the attitudinal statements were analyzed using an exploratory factor analysis, which indicated 16 of the 23 items loaded into three groupings: items related to the student’s confidence in their writing, their perception of writing as a difficult task, and whether they considered an audience when writing. Statistical data for the factor analysis and resulting scales is provided in Appendix B. The analysis of the assignments that the students were responding to in their writing also followed a somewhat exploratory path. After initially analyzing the assignments in terms of their expectations for genre, we decided instead to focus on the explicitness of the guidance they provided students for completing the assignment. Through discussions among our team we identified seven types of instructions that could be included in an assignment as a way of making it more explicit: state a purpose; demonstrate evidence-based reasoning; consider an audience; control conventions for format, structure, and/or flow; control conventions for voice, tone, and/or formality; follow x process for completing the assignment; and take into account the specific evaluation criteria.4 Three team members rated the assignment materials provided by the course instructors (N = 18) for the presence of each type of instruction. The final analysis of a set of materials was based on majority rule. If two of the three raters agreed an instruction was present, that was the consensus judgment. We found that most of the assignments identified the purpose of the assignment, the expected format or structure, and the need for evidence-based reasoning or critical thinking. Grading criteria were specified more often than not, but not as frequently as these first three categories. Consideration of audience, voice, and writing process were less-often specified. A summary score, which we used for the regression analysis as an indicator of degree of assignment specificity, was arrived at by summing the number of categories for which explicit information was provided. 3 See Kuhlemeier and Bergh’s (1997) study of Dutch secondary students’ writing for a similar position regarding multivariate analysis. 4 Note that the first five areas match the five performance criteria rated in the student samples. It should be noted, also, that faculty were not asked to provide the assignment materials according to a standardized format. Thus, the codings were based on a consensus reached by three of the study team members.

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

51

In a reform environment keyed to transparency and accountability, this decision to focus on the degree to which instructor assignments take responsibility for student learning and performance deserves special attention. While it is a commonplace to note that you must teach what you want the students to learn, this is especially crucial in the politically charged environment of the nation’s most ethnically diverse research university. Highly explicit instructions can significantly level the playing field for a diverse student body. Lisa Delpit’s (1997) “The Silenced Dialogue” makes the case for explicitness rather well, suggesting “that students must be taught the codes needed to participate fully in the mainstream of American life” (p. 585, emphasis in original). We chose to assess the relationship implied in Delpit’s statement—that the degree of explicit teaching of the codes of academic writing would have some effect on the quality of student writing performance. In selecting all of these variables for the regression analysis, we also took another small step: consciously modeling the students’ current writing as potentially connected to their academic history (GPA, length of exposure to English, transfer status), their individual orientations towards writing, and the parameters of the immediate task. We felt that the model made intuitive sense and tapped into some of the concerns we had heard faculty listing, such as whether students completed their general education requirements before transferring into the university, and the large number of English as a second language writers on campus. At the same time it resisted the temptation to examine these issues in isolation from the student’s overall academic experience or faculty input. In short, the model communicated that responsibility for student learning is distributed and complex. We chose to report regression analysis simply in our report. The statistical information, which we included in footnotes, gave the report credibility with an academic audience, but what we wanted the community to remember was that each of our three dimensions—academic history, individual orientation, and immediate task—turned out to be important predictors of perceptions about quality. (We include the statistical data here in Appendix C.) Guiding our small steps, these findings allowed us to argue that writing ability was a product of cumulative learning (not initial instruction), that conceptions must be developed of discipline-specific audiences, and most importantly that students benefit from explicit instruction about writing. In a sense, these findings allowed us to bring the writing troll out into the daylight, revealing that we can establish some basic facts about academic writing. 4. Lessons learned For our more remote audience—that is, our audience with no stake in the local investigation, results, or implementation of changes stemming from the results—it is possible to explore the conceptual subtext of the study. While we knew when we began that there were stakeholder biases to which we should wisely attend, we had perhaps underestimated the extent to which the fundamental question is more one of ownership than identification of writing problems. Perhaps because of the prior experience on our campus of a timed writing exam, or perhaps because the design committee represented so many constituent groups, we knew that simply collecting diagnostic information would be only a partial step. The diagnostic information would need to be translated to steps toward change, and responsibility for that change was a key piece our study would need to provoke. Without implicating ownership or responsibility, even the best data would make no impact. The cross-disciplinary nature of writing fosters the sense among faculty that someone else is responsible for addressing problems with student writing. Russell’s notion that writing stood for

52

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

so long as an elementary transcription skill that required no instruction plays into this sense, as does faculty habit of operating in discipline-specific groups. Since no single group is responsible for writing, it is all too easy to sidestep serious consideration of the issue. This deeply seated assumption about writing is a created reality, or construction, which can be shaken apart in what Guba and Lincoln (1989) term a hermeneutic dialectic. The dialectic begins when the construction encounters dissonances which are, in effect, the precondition for the process of reaching a new view. In this process, a connection between divergent views is formed through a negotiation of comparisons, so that a new, higher level synthesis results (Guba & Lincoln, p. 149). If this new synthesis is successfully reached, the benefits extend beyond the reconceptualization itself to the corollary of ownership on the part of those participating in the reconceptualization. Ownership cultivated in the negotiation process translates to a greatly increased likelihood of action in response to the reconceptualized view. Such abstractions are of little use in a report of results, and provide no guidance for design of a locally relevant writing assessment. The same principles can be expressed, however, in terms of three lessons that do provide some guidance. While there were indicators in the early stages of our study design phase, it became clear as we finalized the results and began to reflect on our work that when undertaking an inquiry into as broad a domain as undergraduate student writing: stakeholders must be included, design must emerge (rather than being pre-defined), and the study must be formative rather than summative. 4.1. Stakeholders must be included Although the findings of the present study comprise only the first few brush strokes on the canvas, their clear connection to multiple stakeholders magnifies their significance and potential impact. Stakeholders included the Undergraduate Council who called for the study, seeking evidence-based description of student writing so that resources could be targeted effectively. They had no stake in a particular result; they simply wanted an actionable result. Institutional Research staff were invested in a study that would yield results faculty found interesting, which could in turn become leverage for growing faculty engagement in assessment activity in general. Writing Center staff were eager to develop wider understanding of their principles for student writing instruction. And finally, the English department felt compelled to participate in order to be able to defend against the likely outcome that all fingers would point to them to fix the problem by addressing it in core composition courses. The participation of all of these groups both constituted a hermeneutic dialectic and averted a paralyzed process. At every step, assumptions were negotiated and renegotiated as design and implementation decisions were made, so that all stakeholder-participants are now able to carry back to their respective groups full confidence in the process and results, and each stakeholder-participant has taken steps to ensure the faculty move forward with results and their implications. Had a group conducted a study of writing in isolation and delivered a final report to the Undergraduate Council, faculty would most likely have heard something vaguely interesting but not very compelling, and follow-through of any kind would be unlikely. 4.2. Design must emerge from the process Local assessment is an uncomfortable notion for higher education faculty, who live by the standards of publishable, peer-reviewed academic and scientific research. For them objectivity, or

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

53

decontextualization, is typically a necessary condition for validity. Furthermore, when assessment takes place within a discipline, it is inevitably anathema to the conventions of the discipline. All disciplines have deeply entrenched conventions governing the process of discovery, and assessment is generally not constructed to fit within those conventions. So there is no academic reference point for assessment at either the discipline-specific or broad academic level. Faculty are asked to buy into assessment when it does not dovetail with the standards and values of their profession as they know it. And so the assessment of any student learning outcome, especially one that is not discipline-specific, is hobbled at the outset. The challenge for accountability assessment is to produce valid, reliable and useful results. The results must be context-specific enough to be meaningful, but “objective” enough to be trustworthy. This is a daunting task, but we found that an emergent and iterative research process made it possible. Multiple methods for examining multiple data sources were important for bringing in the stakeholders and examining the relevant information, but what moved the results into their highest level of significance were the discoveries that emerged unexpectedly as the process unfolded, even in spite of our original plan. We did not anticipate, for example, the emphasis we found on assignment specification. Indeed, our initial design included assignment data as a point of information for the readers rather than a factor having an impact on the student writing product. Only an emergent process that embraces mid-course adjustments allows for results that are meaningful in this way. Further, an emergent process takes advantage of multiple methods, which alone can be a significant contributor to faculty buy-in. Multiple data sources allow multiple analyses and multiple layers of results, so that stakeholders with divergent interests are more likely to be satisfied. Faculty entrenched in the experimental sciences may be satisfied that acceptable quantitative methodology has been employed, social science faculty may find meaning in the data representing student attitudes, and policy-makers may find action steps rooted in multiple sources of information a more palatable and useful basis for decisions than single-source assessments that weakly implicate broad policy changes. Like any good research, assessment of writing must take place within a theoretical context (Huot, 2002). There has been a tendency for the validity and reliability of writing assessment to be focused on narrowly conceived elements of the process, with the result being an inadequate recognition of—or accounting for—a more complex understanding of the writing enterprise (cf. Huot, pp. 37–43). Because writing is multidisciplinary, and because undergraduate student writing has roots that extend to the most deeply held assumptions of higher education (Russell, 1991), an assessment of student writing that is helpful must necessarily be theoretically situated. If an inquiry is truly to examine writing, it must be conducted and understood as research rather than test development. It must be informed by theory, it must recognize all the potential influences, and it must attempt to uncover all the influences. To do so requires mustering the strongest tools of research including quantitative and qualitative methodology, and engaging in a dialectical process whereby each new step is responsibly negotiated. In local terms the emergent process began with our awareness that we were proceeding on an uncharted course. We adopted methodological values that have been well documented in the literature on qualitative research (Denzin & Lincoln, 2000). Specifically, we set out to capture complexity through an iterative process of analyzing multiple sources and inching toward the emergent themes or conclusions. The process is best illustrated in the finding about audience awareness. Our core data were samples of student writing for which we knew it would be difficult to design a general rubric capable of handling the conventions of so many disciplines. To help assess the writing samples appropriately, we decided to ask professors providing the samples to

54

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

give us information about the assignment to which their students had responded. After reading through the assignment information, however, we began to see it as a potentially rich source of data rather than simply background information for the raters. Likewise when we wrote the survey items, we worked from a general list of topics. The realization that awareness of audience was an important component of students’ beliefs about writing emerged only through an exploratory factor analysis. This particular result has generated discussion on campus, and we would not have been able to offer it had we committed to a rigid plan at the outset. The dialectic process was at play as the work of the committee progressed. Results were then phrased in our internal report with recognition that we needed to seed a similar dialectical process among the faculty. Too much information and explanation in our report would terminate the discussion; a small amount framed for the relevant stakeholders in terms of concrete but small steps would stand a better chance of precipitating change. 4.3. The study must be formative When the subcommittee asked for a study of student writing, they were asking for a summative report on what the problems were so that resources and interventions could be targeted appropriately. They wanted to “fix the problem,” and they felt that their ability to do so was tied to having a focused and conclusive description of the problem. They wanted results they could use. Useful results are relevant and truthful, they are actionable, and they can be absorbed by those who would be responsible for action. These criteria tie back to the stakeholder and iterative process issues already discussed, and introduce a third criterion, that an inquiry into student writing must necessarily be formative. It must provide information about what to do next and what to examine next as much or more than it provides conclusive information. Although we were not very adept at articulating it, the design committee knew that providing the requested conclusive analysis in one study was simply not realistic. Our private goal, then, was to learn enough that direction for further inquiry would become clear. The challenges facing a formative study of writing in higher education are significant. Russell (2002) describes the fragmented state of the academic discourse community, which in and of itself can make any truthful implication about student writing overwhelming. In any institution of higher education, the prevailing assumption that writing is a skill students should have before they are admitted, that it is someone else’s job to teach writing, or that faculty should not be asked to articulate or teach the communication conventions of their disciplines renders systemic solutions to student writing extraordinarily difficult. Moreover, ours is an urban, commuter, research university, which means this problematic assumption is compounded by a critical mass of nontraditional students. Our mission is to serve the local population, which ranks among the most populous and diverse in the country. Fewer than half of our students are white, more than half are financially independent of their parents, nearly half are the first generation in their families to attend college, half speak a language other than English at home, and the average age of undergraduates is 24. They reflect the increasing numbers of college students who do not arrive at the doorstep of academia having been initiated into the rhetorical conventions of higher education, let alone the micro-conventions within a discipline. Bringing them into the language of academe, as part of the process for reforming transparency and accountability in higher education, will require a sea change of faculty culture and the business of instruction. Given the magnitude of any attempt to explore and change student writing, perhaps the most compelling case for formative design is that stakeholders, particularly faculty, must be willing to own the results, which means results must come in small “steps,” with space for the hermeneutic

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

55

dialectic in between. Our study was designed in large part in response to this subtext. Some members of the committee assumed the answer was going to be that faculty in all disciplines will need to take on writing in their own classrooms; some were braced for the result to be increased pressure on the English department; some harbored the uncomfortable awareness that faculty attribute poor student writing to the university admissions requirements that must be too low. While we think our findings point to the larger conceptual issues, the results presented in our local report address all of the stakeholder assumptions by specifying very specific action steps rather than preaching conceptual realignment of any kind. The recommendation to help students understand more clearly what is expected of them on writing assignments, for example, is translated in our report to the small steps of offering students workshops about deciphering assignment expectations, developing resources for faculty on assignment structure, and referring students to work with Writing Center consultants. 5. Coaxing the troll from beneath the bridge At this stage of our process, we can report that an inquiry has been designed and conducted, findings have been reported, and recommendations identified. To our own surprise, the inquiry process yielded a fairly rich understanding of the beast that is student writing. In so far as we were tasked with diagnosis, we were far more successful than we anticipated we would be. The fact that much of what we learned appears in a journal article rather than the local report communicates eloquently the status of the prescription following from the diagnosis. Our job is to recommend next steps—not all steps from here to the new world, and not next steps based on abstractions about student writing. The next steps to the Undergraduate Council we recommended are rooted in the larger conceptual frame presented here, but specified in accordance with the local context. They represent actions we believe are possible in our institution at this time. We are aware that in a different time and place, the recommendations might be framed differently. Our job, then, is to point the university in the right direction. Having identified at least a preliminary sense of what that direction is, our job is to invite others to come along. The early indications suggest they will. We issued our report six months ago. As of the time this article was submitted for publication, tangible outgrowths include selecting writing in the disciplines for the short list of topics under consideration for the university’s Quality Enhancement Plan, the substantive plan for improvement to student learning now required by our institutional accrediting body. Additionally, a number of colleges have signed on with our Writing Center for collaborative writing courses and programs partly as a result of contacts made during discussions of the final report. Finally, after years of paralysis on the identification and assessment of core competencies, that discussion is now underway once again. The Undergraduate Council is now meeting about the possibility of assessing critical thinking. While accreditation pressure also plays into this discussion, the relationship to the success of the writing assessment is made clear by the Council’s statement that we succeeded in assessing writing, so assessing critical thinking is now also realistic. Nothing really begins or ends in the academic world, but currents do shift and energy rises and falls, and so an assessment can provide meaningful results without presenting a complete picture or comprehensive solution. No one of the results reported here is truly significant alone, but together they add to the momentum necessary to achieve some of the substantive curricular and pedagogical change called for by today’s diverse population of postsecondary students and by the proponents of public accountability. Hence our claim of a giant leap brought about by keeping assessment local.

56

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

Acknowledgement The authors would like to thank Bill Condon for his helpful discussion in an early stage of this article.

Appendix A. Faculty questionnaire

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

57

Appendix B. Results of exploratory factor analysis of student survey

Factor Writing confidence (α = 0.92) I am confident in my writing ability. I am able to express my knowledge clearly through writing. I am able to communicate ideas effectively in writing. I can write persuasively. I am able to organize my ideas well when I write. I am able to write papers that professors like. I know how to evaluate and revise my papers. I am comfortable with the kind of language used in college writing. I am confident in my knowledge of English grammar. I am able to identify a clear purpose when I write a paper. My prior education has prepared me for the written work required in my courses. I am able to collect and organize information for my writing.

0.831 0.827 0.811 0.762 0.744 0.743 0.698 0.673 0.614 0.615 0.543 0.511

Writing difficulty (α = 0.65) I find it difficult to understand what writing assignments are asking for. I have a hard time figuring out how to approach a writing assignment.

0.834 0.703

Audience awareness (α = 0.66) When I write, I think about who is going to read it. I think about how my papers will sound to someone else.

0.766 0.774

Appendix C. Summary of simultaneous regression analysis for variables predicting Mean Writing Trait Score

Descriptive statistics (N = 272)

Mean Writing Trait Score Cumulative GPA Explicitness of writing goals in assignment Writing confidence factor Writing difficulty factor Audience awareness factor Age group when began speaking English Transfer student

Possible range

Mean

S.D.

1–3 0–4 7–14 1–5 1–5 1–5 1–5 1–2

2.39 2.95 10.88 3.87 2.53 4.04 1.46 1.44

0.40 0.55 1.29 0.60 0.92 0.72 1.07 0.50

58

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

Appendix C (Continued ) Coefficients

(Constant) Cumulative GPA Explicitness of writing goals in assignment Writing confidence Writing difficulty Audience awareness Age group when began speaking English Transfer student

Unstandardized coefficients

Standardized coefficients

B

Std. error

Beta

0.81 0.23 0.04 −0.02 0.02 0.11 −0.04 0.04

0.31 0.04 0.02 0.05 0.03 0.03 0.02 0.05

0.32* 0.13* −0.03 0.04 0.20* −0.11 0.06

Note: R2 = .20; adjusted R2 = .18. * p < .05.

References Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University Press. Council of Writing Program Administrators. (2000). WPA outcomes statement for first-year composition. Retrieved from http://www.english.ilstu.edu/Hesse/outcomes.html. Daniels, H. (1983). Famous last words: The American language crisis reconsidered. Carbondale, IL: Southern Illinois University Press. Delpit, L. (1997). The silenced dialogue: Power and pedagogy in educating other people’s children. In: V. Villanueva Jr. (Ed.), Cross-talk in comp theory (pp. 565–588). Urbana, IL: National Council of Teachers of English. Reprinted from Harvard Educational Review 58 (1988), 280–298. Denzin, N. K., & Lincoln, Y. S. (Eds.). (2000). The handbook of qualitative research. Thousand Oaks, CA: Sage Publications, Inc. Guba, E. G., & Lincoln, Y. S. (1989). Fourth generation evaluation. Newbury Park, CA: Sage Publications. Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In: L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (2nd ed., pp. 241–276). Norwood, N.J.: Ablex Huot, B. (2002). (Re)articulating writing assessment for teaching and learning. Logan, UT: Utah State University Press. Kuhlemeier, H., & Bergh, H. V. D. (1997). Effects of writing instruction and assessment on functional composition performance. Assessing Writing, 4 (2), 203–223. Leonhardy, G., & Condon, W. (2001). Exploring the difficult cases: In the cracks of writing assessment. In: R. H. Haswell (Ed.), Beyond outcomes: Assessment and instruction within a university writing program (pp. 65–79). Westport, CN: Ablex Publishing. Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18 (2), 5–11. Miller, S. (1989). Rescuing the subject: A critical introduction to rhetoric and the writer. Carbondale, IL: Southern Illinois University Press. Moss, P. A. (1994). Validity in high stakes writing assessment: Problems and possibilities. Assessing Writing, 1 (1), 109–128. Rose, M. (1985). The language of exclusion: Writing instruction at the university. College English, 47, 341– 359. Russell, D. R. (2002). Writing in the academic disciplines: A curricular history (2nd ed.). Carbondale, IL: Southern Illinois University Press. Rutz, C., & Lauer-Glebov, J. (2005). Assessment and innovation: One darn thing leads to another. Assessing Writing, 10 (2), 80–99. UH at a Glance. (2006). Retrieved July 1, 2006, from http://www.uh.edu/uh glance/index.php?page=info.

L. Barlow et al. / Assessing Writing 12 (2007) 44–59

59

Department of U.S. Education. (2006). A test of leadership: Charting the future of U.S. higher education. Washington, D.C.: U.S. Department of Education Author Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press. White, E. M. (1994). Issues and problems in writing assessment. Assessing Writing, 1 (1), 11–27. White, E. M. (1994). Teaching and assessing writing: Recent advances in understanding, evaluating and improving student performance (2nd ed.). San Francisco: Jossey-Bass. Yancey, K. B. (1999). Looking back as we look forward: Historicizing writing assessment. College Composition and Communication, 50 (3), 483–503.

Keeping assessment local: The case for accountability through formative assessment

Keeping assessment local: The case for accountability through formative assessment

Recommend Documents