Identifying low-performance public schools

Identifying low-performance public schools

Studies in Educational Evaluation PERGAMON Studies in Educational Evaluation 26 (2000) 259-287 www.elsevier.nl/stueduc IDENTIFYING Salwa Ammar*, ...

2MB Sizes 0 Downloads 90 Views

Studies in Educational Evaluation PERGAMON

Studies in Educational Evaluation 26 (2000) 259-287 www.elsevier.nl/stueduc

IDENTIFYING

Salwa

Ammar*,

LOW-PERFORMANCE

Robert

Bifulco**,

William

PUBLIC

Duncombe**,

SCHOOLS

and Ronald

Wright*

*Department of Business Administration, Le Moyne College, Syracuse, USA **Department of Public Administration, Syracuse University, USA

The growing emphasis on performance in public organizations has reached public schools. In contrast to past attempts to reform process and government, present efforts to reform American schools have focused more directly on performance. Key features of these performance-based reforms in schools as well as other public organizations include: establishing clear, measurable performance standards; granting local actors the autonomy to find the best means of achieving these standards; providing rewards for local actors that achieve performance goals; and developing remedies for cases when goals are not met (King & Mathers, 1997). A majority of states now prepare report cards on individual schools which include test scores and other outcome measures, and a growing number of states use this information to rank, reward or sanction schools and districts (Jerald, 2000). While receiving less media attention, an important element in performance-based reforms is the identification and improvement of organizations with the lowest performance. In the case of education, at least 19 states have established procedures for identifying low-performance schools and districts (Jerald & Boser, 1999). The 1994 reauthorization of the federal Title I program encouraged more programs of this sort by requiring states and districts that receive Title I money to establish systems, based primarily on student achievement on state-wide assessments, for identifying and improving low-performance schools (Advisory Council to New York State Board of Regents, 1994). Programs to rank schools on student performance and identify those with low performance face two controversial and difftcult sets of issues. The first set of issues concerns how to identify which schools should be classified as having low performance. Although not without critics, the idea that low-performance schools should be identified

0191-491x/00/% - see front matter Q 2000 Published by Elsevier Science Ltd PII: sol91-491x(oo)oool9-5

260

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 25S287

primarily on the basis of educational outcomes, such as achievement test scores is generally accepted by states with such programs (King & Mathers, 1997). However, several important issues concerning how to develop reliable identification systems remain unresolved. The second challenge facing these programs is to evaluate the causes of low-performance and determine what actions should be taken to improve the organizations identified. In this paper, we will address the first set of issues in the context of public education - How do state governments identify schools with the lowest student performance? At first glance, the task of identifying low-performance schools would Simply find the schools with the worst test scores. appear quite straight-forward. However, there are several major issues involved in identifying low-performance schools that need to be addressed. Given the potentially serious consequences for the school and its staff, accurate identification methods are crucially important. At a minimq schools classified as low performance are required to develop an improvement plan. Eventually, assuming no improvement, the school staff may be replaced, the state may take over management, or the school may be closed. In the first section, we outline and provide a general discussion of these issues. While states have moved aggressively to implement a set of standards, much less attention has been paid to how these test results should be used in evaluating schools In the second section, we use test data for New York City elementary and students. schools to demonstrate the extent to which choices about how to aggregate individual student test results can affect which schools are identified as low-performance schools. In the third section, we introduce a promising new methodology, mzzy logic, and demonstrate its potential usefulness in identifying low-performance schools. We conclude by discussing the implications that our findings have for the design of lowperformance school programs and directions for future research in performance accountability. Issues Involved in Identifying Low-Performance

Schools

Once the decision is made to evaluate school performance based on the outcomes Specifically, of students in the school, a number of issues still need to be resolved. officials must answer four questions: What student outcome measures should be used? 1. 2. Should identification be based on absolute levels of student achievement or a schools’ relative contribution to student learning? How should student outcome measures be aggregated into school-wide outcome 3. measures? What role should “expert judgment” play in the identification process? 4. What Outcome Measures Should be Used? The choice of outcome measures is of paramount importance in any performance accountability system. Low-performance school programs involve ranking of schools on

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

261

some student performance measure, and designation as a low-performance school often leads to public or state pressure for improvement. Improvement efforts are likely to be focused on those outcomes that are used to identify low-performance schools. If these outcomes are not important ones, then valuable resources will be misspent. Thus, it is crucial that the student outcome measures used to evaluate schools reflect important school improvement goals (King & Mathers, 1997). Norm-referenced and statewide criterion-referenced test scores, particularly in reading and math, are commonly used to identify low-performing schools (King & Mathers (1997). While the reliability of many norm- and criterion-referenced tests is well established, the use of statewide tests to evaluate school performance has been criticized on other grounds. Standardized tests are not always aligned with curricular goals. Subjects such as science, social studies and the arts are often not tested, and even in the tested subjects higher order thinking and problem solving skills are often not assessed. There are also concerns that the development of higher order thinking skills is undermined when emphasis on improving test results leads teachers to “teach-to-thetest” (Darling-Hammond, 1991). In surveys conducted by Schumaker and Brookshire (1992), superintendents in the State of Texas rated state mandated test results less important than nine other school quality indicators. This provides evidence for the claim that “educators are unconvinced that group-administered tests provide an accurate indication of what a student knows or has learned.” (Sanders & Horn, 1995). The move to student portfolios and other “authentic” assessment devices are attempts to correct the perceived limitations of standardized tests. Unfortunately, more comprehensive assessment instruments often do not provide measures of performance that are comparable across schools. Besides standardized tests, student attendance and dropout rates are also commonly used in assessing school performance (King & Mathers, 1997). Other outcome measures that are used include grade promotions, suspension/incident rates, college placements and job placements (Schumacker and Brookshire, 1992). Multiple outcome measures provide more information about student performance, but also complicate the identification of low-performance schools. The focus of this paper is not on evaluating the pros and cons of different performance measures, but instead on how the information from multiple measures can be pooled in the ranking of schools. Should Identification be Based on Absolute Outcome Levels or the “Value-Added” Levels? The answer to this question depends crucially on the objectives of the program, and, ultimately, on the unit of analysis. If the program is intended to identify students in the most need of help, then absolute performance measures should be used. Since success in college, and to a growing extent in the job market, depends on the acquisition of certain cognitive skills in high school, identifying schools with a large portion of the students failing to develop the skills needed to succeed is important. Regardless of the reasons why students are not developing the required skills, action needs to be taken to

262

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

ensure that these students are given the opportunity to succeed in high school, college and the job market. Such a program is more accurately called a low-performance school program since the performance of students is the focus of the program. Most states with low performance school programs use absolute test scores as the principal measure of performance (Jerald 8c Boser, 1999). If, on the other hand, the objective of the program is to help schools improve their efftciency (maxim&n g increases in student achievement give some budget constraint), then value-added measures rather than absolute outcome levels is the appropriate measure of performance. The commonly used term low-performing school program more appropriately describes a program with an efficiency focus. If the program is primarily intended to provide incentives to poor performing school officials to change their behavior, then school performance measures should reflect factors under the control of school officials (Meyer, 1996). A program that holds school officials accountable for student performance levels that are unlikely to be achieved in the short-run, regardless of the school’s effectiveness, will not provide much incentive for those offtcials to change their behavior. Such a program might only serve to demoralize school members or encourage fraudulent reporting practices (Darling-Hammond, 1991; King & Mathers, 1997). Because a school has more control over how much a student’s achievement improves while she is at the school than over the absolute level of performance, a measure of student gains is preferred for assessing school performance. Using student gains on test scores, rather than absolute levels of performance, only partially addresses this issue. If disadvantaged students are likely to show smaller gains in achievement horn year-to-year as well as lower absolute levels of achievement, then using a measure of student gains to assess school performance will still bias identification toward schools with high percentages of disadvantaged students (Clotfelter & Ladd, 1996). How Should Outcome Measures be Aggregated into an Organization- Wide Outcome Measure? Whenever individual outcomes are aggregated into a measure of organization performance, information is lost. In the case of schools, three aggregation issues are of particular importance. First, what part of the student performance distribution should be the focus of the measure? Of particular concern is the fact that mean test scores can mask low achievement by groups of students within the school. The scores of high-achieving students can skew results and create the impression that all students in the school are performing effectively (Darling-Hammond, 1991; King & Mathers, 1997). One way to avoid this problem is to use the percentage of students in a school achieving a specified New York State, for instance, identifies schools that show the largest standard. percentages of students below a state specified standard of minimum competency. This helps to focus attention on the amount of students not receiving the level of service they need to prepare them for future endeavors. However, this measure also masks important It provides no information on whether the students in a school are information. developing higher order thinking skills. In addition, it provides little information for

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

263

schools with small numbers of disadvantaged students who are, in any case, unlikely to show significant numbers of students falling below minimum competency. Some states have moved beyond single indicators by providing performance measures for different parts of the distribution and/or different socio-economic groups. The school accountability program in Texas, for example, uses school ratings that depend on the overall percentage of students meeting an achievement standard on state tests, and the average year-to-year gains on those tests made by four student groups: AfricanAmerican, Hispanic, White and Economically Disadvantaged. In Kentucky students are categorized into one of four achievement levels: distinguished, proficient, apprentice and novice. Each school that fails to make a specified amount of progress towards the goal of having students performing on average at a proficient level is required to participate in school improvement activities (King & Mathers, 1997). A second aggregation issue may arise if there is performance variation across different cohorts of students. In general, we would expect the performance of a given cohort of students in a school to be similar to the cohorts of students who enter school one and two years earlier, i.e., no cohort effect. However, if there are noticeable performance differences across student cohorts, then a school’s performance ranking may be partially determined by the cohort of students included in the evaluation. Under circumstances of significant cohort effects, some method for combining results across several cohorts may need to be developed. The third aggregation issue is how different types of tests can be combined into one school-wide index. New York State, for instance, uses four different student outcome measures to identify low-performance elementary schools. If a school’s results on any one of these four measures meet the identification criteria, the school can be identified, regardless of how well its students do on the other outcome measures. Thus, New York State uses an or combination rule. A different combination rule is an and rule where schools need to be below the identification criteria on all outcome measures to be identified as low-performance. Alternately, an average rule could be employed, where the average of the different outcome measures is used to rank school performance. An argument can be made for New York State’s approach. If students are to succeed in high school and beyond they need to develop a certain level of competency in several areas, particularly in math and reading. If a large number of students in a school are not developing adequate math skills, this is a problem that needs to be addressed regardless of how well the school’s students are reading. Using an aggregate measure that combines math and reading results can mask problems in one of these areas. However, if a larger number of school performance indicators are used (as many recommend), then New York State’s approach becomes more problematic because fewer and fewer schools are identified by each test (for a fixed total number schools to be identified). Given the potentially serious consequences for the school and the state, state governments have been hesitant to identify more than a small percentage of schools as low performance.

264

S. Ammar et al. /Studies

in Educational Evaluation 26 (2000) 259-287

What is the Role of Expert Judgement? One of the most controversial aspects of programs to identify low-performance organizations is whether expert judgement should be used to modify school rankings. On the one hand, the constructive use of expert advice can improve the validity of the selection process by allowing for the introduction of more subtle criteria that may be missed with simple combinations of test scores. Expert judgement can be particularly helpful in determining the relative importance of various outcome measures. For example, should reading and math test scores be treated equally, or is competence in reading a higher priority? Should a higher weight be given to the number of students performing above minimum proficiency in a subject, or to the number performing at grade level? The weights assigned to different outcome measures should match the priorities of the program and experts can play an important role in establishing program priorities and the appropriate weights. For example, in Dallas a set of externally generated weights was used to combine different exams into one school-wide index (Webster, Mendro, & Almaguer, 1994). However, the use of expert judgement may open up the process to political intervention to protect or punish particular schools. In New York, for example, the Commissioner of the New York City Board of Education typically negotiates with the State Commissioner of Education over the specific schools that are going to be identified as low performing (White, 1998). To avoid the identification process becoming a political ranking exercise, the criteria used to identify a particular school should be applied to all schools. If an expert feels that a particular school should not be on the list because it has demonstrated substantial improvement in performance over the last three years, then the criteria of school improvement should be applied in the ranking of all schools. The Importance of Aggregation Rules Issues about how to aggregate individual student outcomes into a measure of the overall level of performance in a school are easy to overlook or dismiss as “technical” details. If decisions about these issues make little difference for which schools end up being identified as having low performance, then they do not warrant much attention from policy makers. If on the other hand, the choice of aggregation method changes markedly the ranks of schools and the list of the low performers, then the choice of aggregation methods has serious policy consequences. How much difference these issues make for which schools are identified is, therefore, an important question. To examine this issue, we used data provided by the New York City Board of Education. The data is drawn from files used by the Board to produce citywide school report cards. We limited our study to elementary schools. For each elementary school, the files report the percentage of students who scored at or above the state reference point (SRP) on reading and math Pupil Evaluation Program (PEP) tests administered in Grade 3. The SRP is an objective assessment criterion that indicates minimum proficiency and is used to identify students in need of remedial assistance. The data files also report

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

265

the percentage of students who scored above grade level on the citywide math and reading tests administered in grades 3 and higher. The citywide testing program uses different assessment instruments than the statewide PEP program, and the grade level measure is a norm-referenced standard that indicates a level of performance substantially higher than minimum proficiency. After eliminating schools that do not participate in one or the other of the two testing programs, as well as schools with missing observations, our final dataset included 605 schools. Our question is whether decisions about how to combine various outcome measures, about what student cohort to examine, and about the relative importance of various performance criteria make a difference for which schools end up being identified as having the lowest performance. To answer this question, we constructed several different methods for identifying low-performance schools and compared the lists of schools generated by these methods. Each method assumes that decisions have been made to base identification on the absolute level of achievement in math and reading. Because each of the methods applied shares these features, we are able to isolate the effect of decisions about how to aggregate student test results. Figure 1 provides a summary of the different methods that we applied. In all, we applied 18 different methods. These methods differ from each other along one or more of three dimensions. The first dimension is defined by the rule used to combine the math and the reading results. Student Performance Criteria Used Combination Rule

Cohort Used

SRP on Gr. 3 PEP Tests

SRP on PEP Tests and Grade Level on Citywide Tests

Or Rule

1995 1996 Average of 1995 & 1996

1 2 3

10 11 12

And Rule

1995 1996 Average of 1995 & 1996

4 5 6

13 14 15

Averaging Rule

1995 1996 Average of 1995 & 1996

7 8 9

16 17 18

Figure 1: Alternative Methods for Identifying Low-Performing

Schools

Three different combination rules were used - an or rule, an and rule, and an averaging rule. To apply the or and and rules we first ranked each school according to percent of students in the school who scored at or above a specified performance standard (either

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

266

the SRP or grade level) in math and according to the percent of students who scored at or above the same criterion on the reading test. The or rule, then, identifies any school that ranks among the lowest either on the math test or on the reading test. The and rule identifies any school that ranks among the lowest both on the math test and on the reading test. To apply the averaging rule we first calculated an average of the percent below a specified criterion on the math and the percent below the same criterion on the reading. We then ranked the schools according to this average percentage and identified the schools that rank lowest on this measure. The year or years of test results used defines the second dimension along which the methods we applied differ from one another, Each of the three combination rules was applied using three different sets of test results. These are the 1995 test results, the 1996 test results, and an average of the 1995 and 1996 test results. Using each of these three sets of results to apply each of the three combination rules creates nine different identification methods. The final dimension of variation concerns the relative importance placed on alternative student performance criteria. Each of the nine different methods just described were applied using the percentage of students below the SRP on statewide tests, and alternatively, using both the percentage of students below the SRI? and the percentage below grade level on the citywide tests. Thus, half of the methods place an absolute priority on the minimum proficiency criteria and the other half place an equal weight on the grade level criteria. Comparing the lists of schools generated by these two sets of methods will provide some indication of the importance of individual judgement about the relative importance of various student outcome measures. Table 1: Comparison of Alternative Identification Methods No. of Methods

18 16 to 17 13 to 15 10 to 12 7 to 9 4 to 6 3 2 1 At Least 1

No. of Schools Identified bv this No. of Methods 5 5 6 6 10 17 7 10 16 82

Pet of Total No. of Schools Identified 6.10% 6.10% 7.32% 7.32% 12.20% 20.73% 8.54% 12.20% 19.51%

Cumulative Pet of Schools Identified 6.10% 12.20% 19.51% 26.83% 39.02% 59.76% 68.29% 80.49% 100.00%

Each of the 18 methods we applied generated between 30 and 32 schools, roughly 5% of the sample. A total of 82 schools were identified by at least one method. Although this indicates a significant amount of overlap among the 18 lists generated, it

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

261

also indicates that there are significant differences among the lists. As Table 1 shows, only five schools were identified by all 18 methods (row 1, column 2) and only 27% of the schools identified were on more than half of the lists (row 4, column 4). Nearly a fifth of the schools identified were identified by only one of the methods (row 9, column 3), and nearly 60% of the 82 schools identified were identified by less than one-third of the methods. Does the Rule Used to Combine Math and Reading Results Make a Diflerence? Table 2 indicates the sensitivity of identification methods to the type of rule used to combine multiple outcome measures. The third column in Table 2 counts, for each student cohort and set of student performance criteria used, the number of schools identified either using the or rule, the and rule or the averaging rule. The fourth column represents the percentage of schools identified by at least one of these three methods that are in fact identified by all three. The last column indicates the percentage of identified schools that are identified by only one of the three methods. This last group represents those schools that would be identified by one of the methods but not by the others. For these schools, the decision about what part of the student distribution to focus on determines whether or not they are identified. Table 2:

Effect of Decision About What Combination averaging rnle)a

Student Performance Criteria

Student Cohort

1995 SRP on Grade 3 PEPS

Rule to Use (01 rule vs. and rule vs.

% of Schools # of Schools Identified by at Least Identified by All One of the Three Three of the Methods Methods 40 50.0%

% of Schools Identified by Only One of the Three Methods 25.0%

*996 Average of 1995 & 1996

39

56.4%

23.1%

31.1% 31.1% 1995 45 SRP on PEPS 35.4% 43.8% 48 and Grade Level l996 on Citywide 34.1% 44 38.6% Average of Tests 1995 & 1996 a One method identifies a school if it ranks among the lowest on math or reading tests, the second identifies schools if it ranks among the lowest on math and reading, and the third identifies a school if it ranks among the lowest on an average of math and reading. For each row in the table decisions about what student cohort and student performance criteria to use are held constant.

The first row in Table 2 indicates that when the or, and and averaging rules are each applied using the third grade cohort in 1995 and only the minimum proficiency

268

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

criterion, then 40 schools are identified by at least one of the three methods (Column 3). Since each of these three methods identifies a list of 30 schools, this indicates a substantial amount of overlap among the three lists. Half of the schools identified by at least one of the three methods are identified by all three (Column 4). A quarter of the schools identified are only identified by one of the combination rules (Column 5). There was slightly less overlap among the three methods when the 1996 results were used, and a similar amount of overlap when the average of the 1995 and 1996 results were used. These findings suggest that if we choose to focus solely on the percentage of students in a school failing to achieve very low standards, then the identification of lowperformance is not too sensitive to the way in which we combine the results ti-om math and reading tests. However, in these cases, the combination rules are being used to aggregate across only two sets of outcomes, math and reading. Many argue that a larger number of outcome measures is needed to gain a complete picture of how well students in a school are doing. As more dimensions of student well-being are incorporated into the assessment of student and school performance, the choice of combination rule might become more important. This suspicion is confirmed by bottom half of Table 2. In cases where the percentage of students performing at or above grade level on citywide tests as well the percentage at or above the SRP on the statewide tests are considered, the number of student outcome measures used in the identification process is increased from two to four. In these cases, the overlap between the lists of schools identified using the or, and and averaging rules is smaller. In each case where the grade level performance criterion is also used, less than 40% of the schools identified by at least one of the combination rules are identified by all three, and more than a third are only identified by one of the three. In each case, 17 or fewer schools are identified by all three methods. This provides evidence that the choice of combination rule can make a significant difference for which schools are identified, particularly when a large number of outcome measures need to be combined. Does the Choice of Student Cohort Matter? Table 3 provides some indication of the sensitivity of the identification methods to cohort effects. By this we are referring to differences in student performance across different cohorts of students. If there are no cohort effects, we would expect third graders in one year to perform the same, on average, as third graders in the same school for the previous year (assuming no major changes in the school and staff). The first method listed in Table 3 applies the or combination rule using the percentage of students at or above the SRP on the statewide PEP tests. The figures in the first row of the table compare the lists generated by this particular method using students in the third grade during 1995, students in the third grade during 1996 and a combination of these two cohorts. Only 12 of the 50 schools that were identified using at least one of these sets of students (24.0%) were identified regardless of which cohort was used (Column 4). Over

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

269

40% of schools identified using at least one these three cohorts were identified on only one of the three lists (Column 5). These figures are similar for the other combination rules and whether the percent below the minimum proficiency criterion only or the minimum proficiency and grade level criteria both are used. This suggests that relying on the test results for a single cohort or from a single year might provide a misleading measure of the typical level of performance at a school, and thus that relying on such data might lead to inappropriate identification of the lowest performing schools. Table 3 :

Effect of Decision About What Cohort to Use (1995, 1996 or Average of Both Years)’

Student Performance Criteria

Student Cohort

Or Rule SRP on Grade 3 And Rule PEPS Averaging

Or Rule SRP on PEPS and Grade Level And Ru1e on Citywide Tests

a

Averaging

# of Schools Identified by at Least One of the Three Methods 50

% of Schools Identified by All Three of the Methods 24.0%

% of Schools Identified by Only One of the Three Methods 42.0%

48

27.1%

39.6%

49

26.5%

38.8%

45

37.8%

37.8%

45

33.3%

33.3%

Rule

One method identifies schools based on student test results in 1995, the second identifies schools based on student test results in 1996, and the third identifies schools based on the average of student test results in 1995 and 1996. For each row in the table decisions about what combination rule and student performance criteria to use are held constant.

Prima facie, it seems that using the results of students from several different grades would provide the most representative indication of the level of performance in the school. The problem is that in many practical settings standardized tests are not For instance, New York State administers administered to more than one cohort. statewide tests only in Grades 3 and 6. Thus, for many elementary schools outside New York City that do not have a sixth grade, state officials can only use test results for third graders. In such cases, some information on the performance of other cohorts might be culled fi-om past years test results. However, interpreting results f?om past years can be complicated by the presence of trends in the level of student performance. In any case, the evidence presented in Table 3 suggests that it is important for the purposes of identifjring low performance to collect and use information on more than one student cohort.

270

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 25S287

Do Judgments About How to Weight Various Outcomes Make Diflerence? Table 4 provides some indication of the sensitivity of school identification to judgments about the relative importance of various student outcome measures. The first method listed in Table 4 identifies a school if the percentage of students in the school during 1995 who scored at or above specified performance criteria in either math or reading ranked among the lowest. Whether the method gives absolute priority to the SRP or gives equal weight to both the SRP and grade level criteria, in this case, makes a difference for 18 schools. That is, 18 schools would be identified under one alternative but not the other. This means only about half (53.8%) of the total number of schools identified under either alternative are identified under both alternatives (Column 4). These figures are similar in cases where the and and averaging rules are used and where 1996 test results are used. Table 4:

Effect of Judgments About Outcome Weightings (Giving Absolute Priority to the SRP on Grade 3 PEP Tests vs. Giving Equal Weight to SRP on PEPS and Grade Level on Citywide Tests)a

# of Schools Identified by % of Schools Identified by Combination

Or Rule

And Rule

Averaging

a

Rule

Rule

Student Cohort

at Least One of the Two Methods

Both Methods

1995

39

53.8%

1996

39

59.0%

Average of 1995 & 1996

36

69.4%

1995

39

53.8%

1996

39

53.8%

Average of 1995 & 1996

36

66.7%

1995

34

76.5%

1996

38

57.9%

Average of 1995 & 1996

37

62.2%

One method identifies a school using the percent of students at or above SRF’ on the Statewide PEP tests. The other method identifies a school using the percent above the SRP on Statewide tests and the percent at or above grade level on the Citywide tests. For each row in the table decisions about what combination rule and what student cohort to use are held constant.

Thus, we find significant differences among the two sets of lists, despite the fact that the percent above the SRP is given a 50% or larger weighting in both sets of methods. If we wished to incorporate a wider range of student perfotmance criteria, we

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

211

might expect judgements about the relative importance of various criteria to become even more important. This suggests that judgements about how to weight various student performance criteria can make a substantial difference for which schools end up being identified as lowest performing. In sum, it appears that decisions about which combination rule, student cohort and student performance criteria to use can play an important role in determining which schools end up being identified as low performance. Nearly any school that is placed on a low performance list would make its way off the list if enough of the decisions about how to aggregate individual test results were changed. This is true even when decisions about what tests to use in the identification process and about whether or not to rely solely on absolute levels of performance are held constant. In most cases, more than half the schools identified as low-performing by a particular method would not be identified if the decision about only one of the aggregation issues were changed. Thus, even though these issues are easy to dismiss as “technical” issues, it is important that they be given due consideration to ensure that the way in which individual student outcomes are aggregated accords with existing policy priorities. In the next section we introduce a potentially useful framework for addressing these issues. Using Fuzzy Logic to Develop Aggregate School Outcome Measures The results of the analysis of the New York City elementary schools indicate that whenever there are several potential measures for student performance, different aggregation methods for these measures can significantly change the ranks of schools and the list of schools that are identified as low performance. In general, including more information would seem preferable to using a single measure since multiple measures provide the potential for finer distinctions in performance across schools and students. However, when additional information is used in a ranking process, the potential variation in schools identified as low-performance increases. In addition, the more outcome measures used, the greater the potential for anomalous results using traditional aggregation approaches. For example, under an or rule, a school that performs poorly on one test but well above minimal standards on all the others will be identified while schools that just miss the cut off for every test will not be identified. States have tried to deal with these anomalies by selectively using judgment to change ranks on individual However, inconsistent use of expert judgment opens up the identification schools. process to political manipulation. The ideal identification method would be able to incorporate multiple measures in a manner similar to the way in which an expert’s judgments are made. Human judgement tends to seek additional information when it is needed and ignore it when it is irrelevant. Rule based expert systems provide a mechanism for modeling just such judgement. In particular, fuzzy rule based systems have the ability to model expert judgement that is not black or white but contains shades of gray. Fuzzy models have been shown to be especially useful in performance evaluation of service organizations (Ammar & Wright,

212

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

1997). This section describes a simple fuzzy rule based system designed to identify low performance schools based on standardized test scores in the City of New York. The aim is to investigate the potential of the fuzzy approach and compare it to traditional identification methods. How Fuzzy Logic Works It is beyond the scope of this paper to provide a detailed description of htzzy logic (see Ammar & Wright, 1997; Bezdek, 1993; Durkin, 1994; Treadwell, 1995). By way of introduction, however, we will highlight the key steps in using this methodology to develop aggregate performance measures. After describing the key decisions that need to be made at each step, we provide details concerning the specific choices that we made for this particular application. Choosing Fuzzy Sets The aggregation methods presented previously are based on classic set theory in which a particular outcome places a school either in or out of a set. Partial membership in several sets is not allowed. For example, schools cannot be partly a member of the lowperformance group, and the group of schools with average student performance. Fuzzy logic is an alternative form of logic that has been developed to allow partial membership in several sets. Fuzzy set theory represents a generalization of classical set theory. A fuzzy set is defined by a membership function that ranges between 0 and 1, which assigns the degree of membership to each element in a set. Intuitively, the degree of membership represents the extent to which an expert’s judgement places an element in a set. An element can belong to more than one set with different degrees of membership. For example, the same test result could be classified as 0.7 for a low-performance set, and 0.4 for a marginal performance set, indicating partial membership in each. The rationale is to allow for gradual rather than abrupt transition among adjacent sets. One advantage of using fuzzy measures is the robustness in the categorization. Any crisp definition of the categories, where the memberships are either zero or one, will be sensitive to small changes at the boundaries. With the htzzy representation, small changes in the definition of sets will only shift the memberships slightly and will not cause abrupt changes in the categories. In the New York City example, we need to develop fuzzy sets for the four outcome measures used to evaluate schools - percentages of students scoring at or above the SRP on the math and reading PEP tests, and the percentages scoring at or above grade level on citywide math and reading exams. For each measure, three fuzzy performance sets were identified: low, marginal or adequate. Including a high performance category was not necessary for identifying the low performance schools. The ranges of the scores defining each of the sets were determined using 1, 10 and 20th percentile or 1, 15, and 25th percentile cut off points.

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

I

I-

ir

_I

273

274

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

For example, a percentage at or above the SRI’ on the math test in the lowest 10” percentile is considered low to varying degrees, a result in the 1 to 20 percentile-range is considered marginal to varying degrees, and a result above the 10 percentile is considered adequate to varying degrees. A score in the lowest 1 percentile is fully low with the maximum membership in that category of 1. Also a score above the 20* percentile is fully adequate with a membership in the adequate category of 1. Figure 2 presents the membership functions for the four performance measures we use. The horizontal axis represents the test score and the vertical axis represents the value of the membership function. Choosing Combination

Rules

One of the major advantages of fuzzy logic is that it allows the utilization of expert judgment in a systematic way. Experts can be asked to both help develop the fuzzy sets, and to prioritize the different performance measures. These priorities are typically reflected in several over-arching rules which state the conditions under which a school will be placed in a particular performance category. The first step in the process of building rules is to get experts to agree on macro rules. These macro rules involve the use of if-then statements containing conditions and a conclusion, For example, assume that our hypothetical expert on low-performance schools places a higher priority on reading than math and places a higher priority on minimum proficiency over grade-level performance. Specifically, grade-level performance should be considered primarily when minimum proficiency performance on math and reading are contradictory. The following might be a set of macro rules developed by this expert for the low performance set. The first two rules establish the priority of minimum proficiency. The second two define how grade level performance can be used to resolve contradictions in minimum proficiency, giving priority to the importance of reading. 1. If the percent of students in a school scoring at or above the SRF’ in reading is low, and the percent of students scoring at or above the SW in math is low, then the school is classified as low (regardless of how it does on the grade-level exams). If the percentage of students scoring at or above the SRP in reading or math is low, 2. and the percentage of students scoring at or above the SRP in the other subject is only marginal, then the school is classified as low (very high results on the gradelevel tests could however influence this judgement). If the percentage of students in a school performing at or above the SRP in reading 3. is low, and the percentage of students scoring at or above the SRP in math is adequate, but the percent of students scoring at or above grade-level in math is low, then the school is classified as low. If the percentage of students in a school performing at or above the SRF’ in reading 4. is adequate, but the percentage of students scoring at or above the SRF’ in math is low, and the percent of students scoring at or above grade-level in math and reading is low, then the school is classified as low.

Statewide Reading

low

marginal

adequate

Figure 3 :

Statewide Math

Fuzzy Rule Matrix

276

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

Once the macro rules are developed for all of the fuzzy sets, they can be used as guidelines to define a comprehensive set of rules. Since there are three performance sets (low, marginal, and adequate) and four outcome measures (four tests), there are 81 possible combinations of aggregate outcomes. These combinations can be represented in the form of a matrix as shown in Figure 3. Each cell in the matrix corresponds to a rule. The aggregate outcome combination that describes the position of the cell is the condition (if-statement) of the rule. The value of the cell denotes the conclusion of the rule (then-statement) where 1 means low, m means marginal and CImeans adequate. For example, the cell in the northwest corner in Figure 3 can be expressed as the following rule: If percent at or above minimum proficiency on math is low AND percent at or above minimum proficiency on reading is low AND percent at or above grade level in math is low AND percent at or above grade level in reading is low THEN the school is identified as ‘1’(low). The 81 rules in Figure 3 are further reduced by combining adjacent rules with the same conclusion into a single rule. .The reduction in the number of rules makes the system computationally efficient by eliminating redundancy. In this example, the 81 rules can be reduced down to 39 rules. In order to define the complete matrix it is usually necessary to refine the original judgements by adding rules. For example, the parenthetical remark in the second macro rule suggests further refinement is necessary. These additional rules not only assure completeness but also provide consistency and gradual transition in rule conclusions. After evaluating the results from a fuzzy rule based system, expert judgment may be used to further modify the rules expressed in this matrix. Using Fuzzy Set Theory to Develop School Performance Rankings Using the membership functions and rules identified above, output sets are generated using properties of fuzzy set theory, and ultimately schools are ranked on performance (Dubois & Prade, 1988). The aggregation process involves three principal steps. First, the test results and the rules identified in Figure 3 are used to determine each school’s membership level in a particular output set. Appendix A provides an example of how the rules associated with one part of the matrix in Figure 3 are used to determine output membership levels for one school in New York City. The result of this process will produce for each school a series of membership levels in each output set. For example, two rules in Appendix A result in a school having membership in the low output set with a level of 0.5 and 0.3. The second step in the process is to combine these membership levels into one membership value in each output set. To assign one membership level to the low output set, we take the maximum of these two levels, 0.5 in this case. The results of this process will be different membership levels in each output set for each school, as represented by Figure 4. School A has a

S. Ammar et al. /Studies in Educational Evaluation 36 (2000) 259-287

271

membership value of 0.5 in low, 0.4 in marginal, and 0 in adequate, compared to values of 0.6, 0.3 and 0.2, respectively for school B. One of the strengths of the fuzzy logic approach is that it allows a school to have membership in more than one output set.

Figure 4: Aggregate Degree of Membership in Performance Sets for Two Sample Schools The final step in this process is developing one number for each school so they can be ranked. For some applications, the visual image presented by Figure 4 is enough. However, in the case of identifying low performing schools, we need to be able to rank the schools so that the bottom 30 schools, for example, can be identified as low performance. The process of coming up with one number or category for a fuzzy set is called defuzzification. While there are several different defuzzification methods, the most popular is the center of gravity method (Dubois & Prade, 1988). In Figure 4, both schools could be candidates for low-performance identification, however the center of gravity of School A is lower than School B. This relative measure allows us to rank order any list of schools. A brief discussion of this defuzzification method is given in Appendix B. Comparison of Fuzzy Model to Other Methods

The fuzzy rules, as represented in Figure 3, model an illustration of an expert’s judgement. That judgement shows a slight priority to reading over math. It also first looks at the percent above minimum competency and then if needed at the percent above grade level. Finally, it allows an adequate percent above minimum proficiency in reading to compensate for a poor percent above minimum proficiency in math (and vice versa) provided that there is not contradictory information in the grade level tests. To test the effectiveness of the fuzzy methodology we compare the schools it identifies to those selected by three other methods. Each method is expected to identify 30 schools. The three comparative methods correspond to methods 11, 14 and 17 in

278

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

Figure 1. All of these methods use 1996 test results and consider both the percent of students at or above the SRP on statewide exams and the percent at or above grade level on citywide tests. The three methods differ in the way they combine the four outcome measures-the first uses the or rule, the second uses the and rule and the third uses the averaging rule. Table 5 shows the level of agreement between each of these methods and the f&zy model. Most methods will agree on the worse 15 to 20 schools. It is in fact at the margins that the differences occur and they can be substantial. One way of judging the effectiveness of each method is to look at these schools at the margin and ascertain which an expert would more likely judge to be a low performance school. When making this comparison, it is easier to use the percentile rank for the school on each test rather than the percent of students achieving a cut off grade. Since we are identifying about 30 schools out of 605, schools below the 5th percentile (5%) on a particular test are certainly candidates for identification if based on that test alone. Schools ranking in a percentile above 20% have a performance that qualifies as adequate and would not be considered candidates at least based on that particular test. There are a total of 44 schools that were identified by one method but not the f?.zzy model or vice versa. Each of these schools can be individually reviewed and a judgement made as to which method is more appropriate. We will look at six schools that are representative of the differences between the fuzzy model and the three comparison methods. Table 6 contains the percentile for each school on each of the four measures as well as an indication of which method identifies that school. Comparison to the “Or” Rule The or rule method produces results which disagree the most extensively with the fuzzy model both in the number of schools (see Table 5) and the differences in specific cases (see Table 6). For example, school A, at the third and fourth percentile on the statewide math and reading tests, is a school that our hypothetical expert would easily identify as a low performance school particularly since there is no contradictory information in the citywide tests. The tizzy model replicates this judgement. In order to ident@ 30 schools out of 605, the or rule can only select a school which has a score below the second percentile on at least one test. Hence it fails to identify School A even though it has low scores on every test. Table 5: Comparison of Schools Identified by the Fuzzy Method to Three Other Methods Comparison Method

Or Rule And Rule Averaging Rule

# of Schools Identified 30 30 30

# of Schools in Common with Fuzzy Method 19 23 26

% of Schools in Common with Fuzzy Model 63.3% 76.7% 86.7%

s. Ammar

et al. /Studies

in Educational Evaluation 26 (2000) 259-287

219

When considering School B our expert will first look at the statewide results and observe a very high reading score (76th percentile) and a very low math score (second percentile). The citywide tests can be used to determine which measure better characterizes this school. The high citywide scores (69th and 65th percentile) enable our fuzzy model to conclude that this is not a low performance school. The or rule however must identify School B because it falls below the rigid second percentile on one test. Any informed judgement would conclude that School A belongs on a low performance list long before School B. Comparison to the “And” Rule The and rule comes closer to reflecting our expert’s judgement because it includes all four measures in each conclusion. However the aggregation remains naive in comparison. For the and rule to identify 30 schools it must require test results below the 15th percentile on all four measures. Consequently it exhibits the opposite of the problem faced by the or rule. It cannot identify a school that has very low scores on only two or three measures even if they include the two tests our expert has judged to be more critical. For example the fuzzy model identities School C (see Table 6) because it has extremely poor results (third percentile) on both statewide tests. The single higher percentile (23rd) on the citywide math prevents School C from being identified by the and rule. On the other hand, since we are seeking to identify schools that are among the worse 5%, the fuzzy model will not identify schools that have results well above the fifth percentile on both statewide tests. For example School D with statewide percentiles of 12 and 14 is not selected by the fuzzy model although it does meet the criteria used by the and rule. Table 6:

Examples of Schools at the Margin of Fuzzy Model or Comparison Model

School

Statewide

Statewide

Citywide

Citywide

Comparison

Math

Reading

Math

Reading

Method

Model

Percentile

for Test Results

Identified by FUZZy

Comparison Model

A

3%

4%

1%

8%

or

Yes

No

B

2%

76%

69%.

Or

No

Yes

C

3%

3%

23%

65% 7%

And

Yes

No

D

12%

14%

9%

4%

And

No

Yes

E

5%

3%

26%

1%

Average

Yes

No

F

6%

12%

0%

4%

Averqe

No

Yes

Comparison to the “Average” Rule The average rule is not influenced by situations in which a single measure just misses a predefined cutoff as is the case with both the and and or rules. However, averaging still allows extreme values on one measure to unduly effect outcomes. For example, the fuzzy model selects School E (Table 6) based on the low results on the

280

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

statewide test (fifth and third percentile). As indicated by our expert, the single higher percentile (26th) in citywide reading is not relevant given the consistently low statewide scores. This one high score is enough to enable School E to avoid being identified by the average rule. School F provides a further illustration of cases in which averaging allows scores on the secondary citywide tests to overshadow the results of the primary statewide tests. In this case low citywide scores result in identification by the average rule while the fuzzy model avoids identification based on the statewide results. An attempt could be made to use weighted averages to reflect the relative importance of various measures. For example, if the statewide tests are weighted at twice the value of the citywide tests we get disagreement with the fuzzy model on six schools rather than the eight Tom simple averaging (Table 5). A careful selection of weights might even better approximate the judgement the fuzzy model is able to duplicate. However, regardless of the weights selected, a single high score on one measure can prevent a school from being identified even if the overall picture clearly describes low performance. Furthermore, selecting the weights that produce the best results is likely to be more difficult for our expert than differentiating between scenarios as allowed by the fuzzy rule based system. OveralI Comparison In each of the three comparisons the fkzy model is able to consistently identify schools that, at least by the criteria ascribed to our expert, should be on the list of poorest performing schools. In fact, this is a guaranteed outcome. If the model results in an incorrect identification according to the experts, they would be expected to reconsider the particular rules that produced the misidentification. The new set of rules would then be consistently applied to all schools. Each of the comparison methods, bound by a rigid cutoff or formula, identifies schools that a reasonable person could conclude should not be on the list and fails to identify schools that should be included. An ideal identification system might well take into .account more than four measures. The diffkulties described above for the three methods will only become more pronounced. For example an and rule based on eight measures might well require all measures to be in the equivalent of say the 25th percentile to identify the worse 5%. Any averaging method would have even more opportunities for one or two results to excessively affect the outcome. A fuzzy model of expert judgement can take into account as many measures as necessary to arrive at fair and consistent evaluation for all schools. Furthermore, just as expert judgement will often combine many forms of input, a fuzzy model can incorporate qualitative measures as well quantitative data. Any of the possible outcome measures discussed in the early sections of this article could be included in a well-structured model.

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

281

Conclusion The successful implementation of performance accountability systems in public organizations depends crucially on the accurate identification of organizational performance. In the area of education, many states are moving aggressively to implement school accountability programs, particularly for identifying and improving schools with The focus of this article is on the choices facing the lowest student performance. designers of the identification method for low performance schools. Four key questions confront state education officials in the identification process: 1. What student outcome measures should be used? Should identification be based on absolute levels of student achievement or a 2. schools’ relative contribution to student learning? How should student outcome measures be aggregated into school-wide outcome 3. measures? 4. What role should “expert judgment” play in the identification process? If the goal of the program is to direct resources to schools with students with the lowest performance levels, then outcome measures should identify the percent of students above minimum competency in basic skills, and dropout rates. Achievement levels do not have to be adjusted for student characteristics since the principal goal is not to identify the most inefficient schools. Because the consequences of identifying a school as low performance can be serious, it is not surprising that “expert” judgment is often employed in developing the final list of schools. While advice from experts is certainly appropriate, the non-systematic use of expert judgment can open up the process to political manipulation. To illustrate the sensitivity of the identification process to different decisions, we used New York City elementary schools as an example. We compared the different lists generated when different decisions are made about the type of test considered, the cohort of students examined, and the combination rule used. Only 6% of schools are identified by all 18 methods we evaluated, and 40% of the schools are identified by three or fewer methods. Identification is particularly sensitive to the cohort of students examined and the tests used in the identification method. The sensitivity of the identification method and the potentially important consequences horn identification, suggest that experts are likely to intervene in the final selection process. To avoid having the process degenerate into an exercise of political We have ranking, methods for systematic use of expert judgment are important. introduced in the second part of this article a tool that is explicitly designed to systematically use expert judgment, namely, fuzzy logic. A fuzzy rule-based identification system offers several key advantages over the simple identification methods commonly employed. First, the use of fuzzy sets, which allow partial membership, makes the fuzzy logic approach less sensitive to changes in set boundaries or in the list of outcome measures included. Second, the construction of the fuzzy rule matrix (Figure 3) allows for a more complex and subtle set of rules to guide identification. Finally, the rule matrix

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

282

provides a forum for experts to change the ranking process in a systematic way. Rule modifications proposed to better match program objectives are applied to all schools. In the growing movement towards performance accountability systems in public schools, fuzzy logic methods offer the potential for more robust and flexible methods for combining a wide variety of information into performance indicators. Most importantly, fuzzy rulebased methods provide an avenue for the inevitable political intervention in the ranking process to be channeled into systematic rules applied to all organizations.

Acknowledgement This paper has benefited significantly from the comments of Hampton Lankford and John Yinger, and the assistance of Henry Solomon with the data on New York City. However, we are solely responsible for any errors or omissions.

References Advisory Council to the New York State Board of Regents Subcommittee on Low Performing Schools. (1994). Perform of perish: Recommendations of the Advisory Council to the New York State Board of Regents Subcommittee on Low Performing Schools. Albany, NY: New York State Board of Regents. Ammar, S., & Wright, R. 1997. Fuzzy logic: A case study in performance measurement, In B. Ayyub and M Gupta (Eds.), Uncertainty analysis in engineering and sciences: Fuzzy logic, statistics, and neural network approach (pp. 233-245) Boston: Kluwer. Bezdek, J.C. (1993). Fuzzy models - what are they, and why? IEEE Transactions on Fuzzy Systems, I, 1-5.

Clotfelter, C., & Ladd, H. (1996). Recognizing and rewarding success in public schools. In H.F. Ladd (Eds.), Holding schools accountable: Performance-based reform in education (pp. 23-63). Washington, DC: The Brookings Institution. Darling-Hammond, ERIC/CUE Digest, No.

L.

(1991).

Accountability

mechanisms

in big city school systems.

71.

Dubois, D., & Prade, H. (1988). Possibility theory. New York: Plenum Press. Durkin, J. (1994). Expert systems: Design and development. New York: Macmillan . Jerald, C. (2000). The state of the states. Education Week, 19, 62-74. Jerald, C., & Boser, U. (1999). Taking stock. Education Week, 18, 8 i-105. King, R., & Mathers, J. (1997). Improving schools through performance-based accountability and financial rewards. Journal of Education Finance, 23, 147- 176.

283

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

Meyer, R.H. (1996). Value-added indicators of school performance. In E. Hanushek & D. Jorgenson (Eds.), Improving America’s schools: The role of incentives (pp. 197-223). Washington, DC: National Academy Press. Saade, J.J. (1995). A unifying approach to defuzzification controllers. IEEE Transactions on Fuzzy Systems, 4, 227-237.

and comparison

of outputs of fuzzy

Sanders, W.L., & Horn, S.P. (1995). Educational assessment reassessed: The usefulness standardized and alternative measures of student achievement as indicators of the assessment educational outcomes. Educational Policy Analysis Archives, 3. Schumaker, R.E., & Brookshire, schools. Educational Research Quarterly,

W.K. (1992). 15, 5- 10.

Treadwell, W. (1995). Fuzzy Administration Review. 55, 91-97.

set

theory

Defining

movement

quality indicators

in

the

social

for

of of

secondary

sciences.

Public

Webster, W., Mendro, R.L., & Almaguer, T.O. (1994). Effectiveness indices: A “value added” approach to measuring school effect. Studies in Educational Evaluation, 20, 113-145. White, K. (1998). November 25.

State adds to list of N.Y.C.‘s low-performing

schools.

Education

Week,

The Authors SALWA H. AMMAR is a Professor of Business Administration at Le Moyne College. Her research has involved the development of performance evaluation systems for both the private and public sector. ROBERT BIFULCO, JR. is a Research Associate in the Center for Policy Research at the Maxwell School, Syracuse University, and is completing a Ph.D. in Public Administration. During the years 1994 to 1996, he worked with the New York State Education Department where he helped draft revisions to the regulations that govern the New York State Education Department’s Registration Review process. He is involved in research on school efficiency, comprehensive school reform, and school accountability. WILLIAM D. DUNCOMBE Research Associate in the research specialties include state and local government

is Associate Professor of Public Administration and Senior Center for Policy Research at Syracuse University. His education finance and the evaluation of demand and costs of services.

RONALD H. WRIGHT is a Professor of Business Administration at Le Moyne College. For over twenty years he has worked as a research and development consultant to major corporations.

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

284

Appendix A: Rule Application To demonstrate the fuzzy rule application, consider a school with the following raw scores representing the percent of students performing above proficiency on the statewide tests and percent of students performing above grade level on the city tests. The relative position of this school to others is represented for each measure in the percentile column. Table A- 1: Test Scores and Membership Levels for a Particular School

Statewide Math Statewide Reading City Math City Reading

School A Score 67 29 47 27

Percentile 2 3 22

low .7 .5 0

19

0

Membership marginal .3 .5 .3 .6

adequate 0 0 .7 .4

The table above also contains the corresponding membership for each of the scores in each of the performance tizzy sets given in Figure 2. To determine an aggregate performance set for this school, we need to determine the set of applicable rules. For the purpose of this illustration, we will consider just the northwest sub- rulematrix in Figure 3. These nine cells (given below) represent combinations of low performance on For efficiency and both statewide tests with all three performance sets on the city tests. consistency, the cells are combined to produce three logical rules. Rule 1. IF Statewide Math is low AND Statewide Reading is low AND City Math is ANY AND City Reading is NOT adequate THEN the school is identified as ‘1’(low) Rule 2. IF Statewide Math is low AND Statewide Reading is low AND City Math is NOT adequate AND City Reading is ANY THEN the school is identified as ‘1’(low) Rule 3. IF Statewide Math is low AND Statewide Reading is low AND City Math is adequate AND City Reading is adequate THEN the school is identified as ‘m’ (marginal)

The membership

function for a particular school is then applied to components of each rule.

285

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

As represented below, school A has membership levels in the low set for statewide math and reading of 0.7 and 0.5. Note that ANY means that the score is not relevant to this rule, and hence city math is non binding and is assigned the highest membership of 1. For city reading the NOT function is applied, which is defined as the compliment or 1 minus membership in the adequate set (l-0.4=0.6).

Statewide Math

Statewide Reading

City Math

City Reading

Rule1

low

low

ANY

NOT adequate

‘I

School A Membership

0.7

0.5

1

0.6

0.5

Conclusion

The next step in the process is to combine these input membership levels into one output membership level. In fuzzy set theory this is defined by the intersection of the inputs, i.e. the minimum membership value, which in this case is 0.5. Similarly for Rule 2, the membership levels for each test can be represented in the following table, and they are combined by taking the minimum to produce a value of 0.3 for the “1” (low) output set. Statewide Reading low

City Math

City Reading

Rule2

Statewide Math low

Not adequate

ANY

‘I

School A Membership

0. I

0.5

0.3

1

0.3

Conclusion

Rule 1 and 2 both produce membership levels in the low output set. To assign one membership value to the low set, we must combine these results. In fuzzy set theory this is accomplished by using the union of the outputs, i.e. the maximum membership value. Thus, school A is identified as ‘I’ (low) with a level of 0.5, the maximum of the two rules with the same conclusion. Also school A is identified using Rule 3 as ‘m’ (marginal) with a level of 0.4, the rninimum of the membership levels for the four tests.

Rule3

Statewide Math low

Statewide Reading low

City Math adequate

City Reading adequate

School A Membership

0.7

0.5

0.7

0.4

Conclusion ‘m’ 0.4

After the complete rule application each school will have a membership in each of the conclusion categories. For this rule subset, school A is identified as: with a membership of

low

marginal

adequate

0.5

0.4

0.0

A similar process is carried out for all of the rules implied by the matrix in Figure 3, with the final product being a membership level for school A in each of the three output sets.

286

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 259-287

Appendix B: Ranking Turning the membership levels in each of the output sets into one number that can be used for ranking schools, requires some method for combining the information from the three sets, commonly called a defuzzification method. The topic of defuzzification and ranking fuzzy conclusions has received much attention in the literature on fuzzy logic (Saade, 1995). As in many heuristic approaches, appropriate methods can be chosen for specific applications. For the purpose of this illustration, we will describe a common method for defirzzification based on calculating the “center of gravity”. As with the inputs, there is a need to first construct fuzzy output sets for each of the output categories. The definition of the fuzzy sets reflects an expert judgement on the scale of performance. In the Figure B-l below, we have constructed three fuzzy output sets, which range from 0 to 20 in value. The low set goes from 0 until 10, the marginal set from 5 to 15, and the adequate set from 10 to 20. Using the membership values for school A from Appendix A, the membership function is represented in Figure B-l by the dark line. The ranking of schools is determined by calculating the center of gravity of their fuzzy conclusions. Center of Gravity = j x u(x)sx / j u(x)6x , where x is the output level, and u(x) is the membership function. This approach finds the center of gravity for the fuzzy output sets for each school. For school A, the center of gravity is 6.63.

adequate

School A membership function

0.5 0.4 .X

0

5

10 Figure B-l:

15

20

Fuzzy Output Sets for School A

For comparison purposes, we have included a second school. The output membership levels for school B are listed below, and are illustrated in Figure B-2. School B is identified as:

with a membership

of

low 0.6

marginal 0.3

adequate 0.2

281

S. Ammar et al. /Studies in Educational Evaluation 26 (2000) 2591287

The center of gravity is calculated as 7.60. Using the center of gravity approach, school A ranks below school B.

School B membership function

1 0.6 0.3 0.2

X

Figure B-2:

Fuzzy Output Sets for School B