Evaluation of accounting students

Evaluation of accounting students

Journal of Accounting Pergamon Education, Vol. 12, No. 3, pp. 193-204, 1994 Copyright 0 1994 Elsevier Science Ltd Printed in the USA. All rights r...

813KB Sizes 0 Downloads 44 Views

Journal

of Accounting

Pergamon

Education, Vol. 12, No. 3, pp. 193-204, 1994 Copyright 0 1994 Elsevier Science Ltd

Printed in the USA. All rights reserved 0748-5751/94 $6.00 + .OO

EVALUATION

OF ACCOUNTING

STUDENTS

Gerald P. Weinstein JOHN CARROLL UNIVERSITY Abstract: This paper points out two disturbing weaknesses in the commonly used method for assigning final grades to students. It demonstrates an alternative method for ranking students based on standardized scores that can result in different individual rankings in comparison to weighted total points. Further, an argument is advanced that professors should allow their professional, subjective judgements to have a major influence on the grade they give.

PROBLEM

STATEMENT

Among the most important tasks facing accounting instructors is the final evaluation of a student (“the grade”). Academic freedom and individual differences virtually assure that there are as many ways of determining grades as there are instructors. Such a proliferation of methods can be viewed as both a blessing and a curse. The freedom to develop personal standards for assigning grades makes what can be an otherwise nervewracking experience only moderately stressful. From the perspective of the instructor, the manner of the grade determination is usually one of individual taste. Most instructors state in the syllabi what items make up the grade and the relative weights of the items. Performance can be measured by various means, although most probably use common types of written assignments (e.g., tests, quizzes, homeworks, etc.) for a major portion of the grade assigned. Many accounting instructors go beyond the writings of students and give (or deduct) credit for oral types of performance, such as class participation and presentations. Frequently, evaluation of oral components is given relatively little weight in comparison to that of written components. In fact, the weights assigned to the various grade components represent one of the grade determination differences among instructors. For instance, even if all use written exams, different relative weights may be assigned. Regardless of the items (or their relative weights) included in the list of constructs employed in the evaluation of the student, the technique employed at the end of the term usually proceeds along the following: Points are totaled from the various sources, the percentage of total points earned is determined, and the students’ grades are assigned by fitting these percentages to a predetermined scale (i.e., 92-100 = A, 84-91 = B, etc.). The purpose of this paper is to point out two disturbing weaknesses in 193

194

G. P. Weinstein

this evaluation method. First, the use of total points in determining the final grade ignores the variances of each component of the total points scored. As will be shown, this can lead to a situation in which the rankings of students may differ depending on whether the total points or standardized performance measures are employed in the evaluation. Secondly, the use of a grade assignment scale (such as, 92 and above is an A) varies with instructors, and therefore can lead to different evaluations across teachers. In fact, grading scales are inherently subjective and therefore of questionable validity. THE PURPOSES OF GRADES The purposes of grading are somewhat different for different teachers at different grade levels. Curwin, Fuhrmann, and DeMarte (1988) have written an insightful monograph on grading, which is primarily aimed at teachers below the college level, but their comments and suggestions are relevant for this discussion. Among the purposes of grades, they list the following (p. 153): (1) reporting information to future teachers; (2) reporting information to parents; (3) supplying screening information for jobs, colleges, and other future schooling; (4) motivating students to work harder and better; (5) evaluating students’ learning for improvement. Interestingly, the first three items in this list have in common a use of grades that is independent of the student. In each of these three cases, information is being passed up the chain of command to third parties. These purposes of grades are primarily external, being directed toward others. For accounting professors, the third item in this triad probably is the most significant. The recruiting process clearly reveals that prospective employers use this measure (grades) as one of their selection criteria. Moreover, scholarship awards may be based in large part or entirely on grade attainment, and students aspiring to graduate studies will likewise discover the importance of attaining high grades. The last two purposes of grades cited above direct information towards the student. Motivation grading is evidenced by grades on individual examinations or midterm grade reports. Motivation occurs when a student takes action(s) as a result of receiving the grade. The final grade provides the professor’s summary evaluation and can be seen as an example of the final purpose of grading. The final grade can be interpreted as the teacher’s recommendation to the student concerning the competency level the student has achieved. In beginning classes, it can determine whether a student advances to upper division work. In upper level classes, it can be interpreted as a measure of the student’s grasp of complex technical material and/or the level of skill attained in preparation to enter the profession. Competition for grades has intensified as students have become more

Evaluation of Accounting Students

195

job-centered in their collegiate pursuits. Students know that some prospective employers will not interview candidates with a grade point average below a certain level. Keenly aware of this, they have a special incentive to attain a high level of performance. Under these conditions, the value of grades is magnified. Some have argued that the use of grades should be de-emphasized, basing this on the grounds that grades provide extrinsic, artificial, and hence undesirable stimuli and rewards (Ebel & Frisbie, 1986). Nonetheless, grades are firmly entrenched as educational-evaluation measures. They are clearly viewed with importance by a majority of students, faculty, parents, and employers. THE PROBLEMS WITH GRADES Generally, the concession is granted that there are major shortcomings associated with grades. These shortcomings focus on two salient attributes of grades: validity and reliability. Validity is the extent to which measurements are useful in making decisions relevant to a given purpose (Sax, 1989). Validity is threatened when teacher biases enter. Systematic or constant errors affect validity. For example, some instructors use the grade as a behavioral reward or a punishment unrelated to the student’s attainment of instructional objectives. This error can be observed in instances in which teachers are influenced by their prior experiences, positive or negative, with students. This is widely known as the “halo” effect. Reliability describes the extent to which measurements provide consistent, unambiguous information. In contrast to validity, the type of error that affects reliability is random. Measurements are said to be reliable if they reflect “true” rather than chance aspects of the traits being measured. Chance factors include conditions within the examinee (fatigue, boredom, lack of motivation, carelessness), characteristics of the test (ambiguous items, trick question, poorly worded directions), and conditions of scoring (carelessness, disregard or lack of clear standards for scoring, and counting and computational errors) (Sax, 1989). As an example, anyone can construct a test which all students fail. The scores would be systemically lower and hence the test would not be valid, but it might be reliable if it would nonetheless differentiate relative degrees of understanding. This reveals an interesting relationship between validity and reliability: a valid test is always reliable, but a reliable test is not necessarily valid. A reliable test can consistently measure the wrong thing and hence be invalid (Gay, 1981). Raths, Wojtaszek-Healy, and Della-Piana (1987) found that “problems arising from giving grades are rooted . . . in communication failures between instructors and students” (p. 133). They recommend: (1) students

196

G. P. Weinstein

should be taught the standards teachers use in assigning grades; (2) instructors should reserve the right to use subjective evaluation techniques; and (3) grades should be viewed with some skepticism, particularly given the shortcomings associated with them. Most accounting instructors reveal the grade components in their course syllabus, and attempt to portray the process as being as objective as possible. A common means of doing this is the use of an ordinal numbering system points assigned to the performance level demonstrated on specific assignments. Typically, these points are weighted, totaled, and a final grade determined. This method is not without its weaknesses, however. Among the disadvantages of point systems which Madgic (1988) cites are the following:

1. Misplaced emphasis on the objectives of learning. Students

know the importance of grades, especially to parents and potential future employers. Armed with this insight, they seek to accomplish the desired end, but the means they employ may be at odds with the true goalthe acquisition of knowledge. The game becomes one of discriminating what will be tested from what will not and then forgetting information which will not be (re)tested in the future. This evaluation method conveys the message that learning is equivalent to the accumulation of points, rather than the pursuit of skill and knowledge. Of course, in a sense, this is a disadvantage of any evaluation system. 2. Illusion of objectivity. Grading is actually a much more subjective process than point totals imply. This is so because the school generally leaves it to the teacher to decide what items to test, and how the grade will be determined and assigned. Where one accounting teacher might test the sum-of-the-year’s_digits method of depreciating an asset, another may not. Some teachers will give tests which emphasize understanding concepts, while others will focus on techniques. The same lo-point problem may be assigned different points if graded by different instructors. If the variances of the two distributions differ, this same problem will be given a different weight within the environment of each examination. In this manner, individual biases prevent the method from being truly objective. This lack of objectivity in grading is discussed further in a later section of this paper. 3. Reduction of teacher judgment and responsibility. If one accepts the notion that objective grading is not truly an objective process, it follows that the teacher’s interpretation of the scores for the purpose of assigning the final grade is subject to his or her judgements concerning the importance of each component. The assigning of grades must allow a teacher to make professional judgments. In the process of trying to teach accounting students the importance of developing a basis for exercising professional judgment, care should be taken not to shirk our duty to do the same. For example, if one believes that a test

Evaluation

of Accounting

Students

197

was an imperfect measuring device, one can compensate for the defect when, using professional judgment, one deems it necessary. A point system can minimize the importance of this professional judgment by implying that grades can be derived directly from the numbers. 4. Learning versus fine discriminations. The point system conveys the message that learning can be precisely discriminated. A student can get the impression that a single point means the difference between a B and a C. While it is easy to distinguish the letter grade given to someone with a 63% score from someone with a 78% score, it is not so easy to explain why different grades should be assigned to two people when one has earned 79% and one has earned 80%. It is this dilemma which best illustrates the need to allow for the subjective judgment of the professor to intercede. 5. Cumulative point totals and cumulative errors. If point totals are merely accumulated, the teacher is unable, at the time of grading, to analyze all the factors that went into each point sum leading to the final total. Measurement error exists in each component of the grade, and the final total represents the sum of all these errors. Weighting points in order to count certain assignments more compounds the error, since the actual weight depends on the standard deviation of the grade components, not the mean or the number of times an individual score is weighted in determining the composite grade. A demonstration of this issue is presented later. 6. Fallacies of standard percentage categories. A glaring deficiency of a standard percentage approach (92-100 = A, etc.) is that a certain percentage represents a valid rating of a performance level. Scales such as these are established and relied upon based, for the most part, on tradition and social mores. It rests with the teacher to establish the criteria which distinguishes one grade from another (the cut score). Most instructors probably develop their methods of determining grades from experience. They grade the way they were graded in college and adapt this method based on their present experiences. Grading scales such as these are inherently subjective. The consequences of the decision to set a cut score must be weighted in setting it. Even the most conscientious of instructors may misgrade students, if for no other reason than students can demonstrate performance at a level which is above or below their true abilities. This is one area where statistics will not help. College students do not come from the general population. Consequently, the assumption that a section should have an equal number of As and Fs is irresponsible. In addition, drop policies may ensure that the lower tail leaves the distribution. Further, the criteria cannot be established without knowing what the criteria imply, and there is no good answer for that question. Grades

198

G. P. Weinstein may mean different things to different students. If the desire is to assure minimal proficiency, the instructor should establish a low cut score. If a master level is required, a high cut score should be used. Knowing our clientele helps in making this decision. Some students will become Big Six partners while others do not aspire to such heights. In the end, we must recognize that all attempts to establish point criteria for grades are arbitrary. SUBJECTIVITY

OF GRADING

The notion that all grading is inherently subjective is one that may seem foreign to many instructors. Graders usually rely heavily on points accumulated on various testing instruments and assign grades using the objectively determined totals. While a point system may create the appearance of objectivity, it is actually a biased method if it fails to account for the variances within and among the different assignments. The belief that grading is an objective process cannot be supported prima facie. Aside from the misconception that points provide objectivity, other indicators of subjectivity are set forth below. First, different instructors can use different instruments to evaluate students. While almost all use written examinations to provide the largest component of the grade, different examinations (and different types of ex~inations) are employed across instructors and/or across institutions. Few if any have had training in exam construction. Most of us have probably given poor tests, testifying to the difficulty of exam construction. Objective-type items can be combined with problem/essay items in various ratios. Furthermore, as discussed above, other components besides written examinations are used in various combinations to support the derivation of the composite course grade. Testing modes can affect the performance of the student. In the field of accounting, two studies provide empirical evidence to support this contention. Edmonds (1984) found that including review questions on examinations resulted in improved student performance. Results of a limited study by Frakes and Lathan (1983) indicate that class rank (and hence the course grade) can differ based on the types of questions employed. Second, instructors score items differently. This is particularly problematic in scoring essay and problem assignments on tests. Paraphrasing an old saw, if you give 10 accounting professors a test problem to score you will receive in return 11 different scores, and they will each be valid. Much as we each have differing abilities to teach, so do we each have differing abilities to evaluate. Expectations of what constitutes good performance may differ based on personal biases and experiences. The problem of consistency in grading essay tests has been rigorously investigated in the education field. As a result of a series of experiments on

Evaluation of Accounting Students

199

scoring essays in the early 19OOs,Starch and Elliott (1913) concluded that “[tlhe variability of marks is not a function of the subject but afunction of the examiner and the method of examination” (p. 680; emphasis added). This finding has been repeatedly reaffirmed. While no empirical evidence of this in accounting has been forthcoming, assuming it is true appears reasonable until evidence to the contrary is available. Third, the relative weights assigned to various components of the final grade (e.g., tests, quizzes, homework) will generally differ among instructors. Further, the weights may differ with respect to the elements within a particular grade component (e.g., what proportion of a given test is problem, essay, or objective). Accordingly, a student may perform better under one instructor’s criteria than another’s. Fourth, the student may respond differently to different instructional methods. Some prefer a lecture mode of delivery while others tend toward a more interactional classroom experience. A well known fact is that most terminally degreed college instructors, while quite learned in their field, have little if any formal pedagogical training. Training usually occurs onthe-job. The effect this may have on a student’s learning and subsequent accomplishments has not been empirically investigated. The response of the student to these different techniques may affect performance and, therefore, the grade. Fifth, the instructor may have a personal bias toward a student. This may result from a past experience with the student, comments from colleagues, an investigation of the student’s personal file, or even inherent prejudices against an individual’s gender, race, creed, or color. Try as we might to ignore this issue, it can occur, even if on a subconscious level. This problem becomes significant as more subjectivity is allowed to enter into the evaluation process. This issue is discussed further in the Conclusions section. Sixth, pressures from peers or the institution itself can influence the grade. Some schools have rigid policies concerning the percentage of individuals who can receive certain grades. Instructors may also be influenced by the grading practices of colleagues so as not to be viewed as being overly easy or harsh.

AN EXAMPLE An example will demonstrate a typical problem concerning the potential for the lack of reliability of grades. This example shows that using the “typical” grading model can lead to a different ranking than when standard performance scores are used. While this example assumes that the items used in the grade determination are two in-class exams and a final exam, it does not depend upon the items used for evaluation. In other words, its findings are relevant regardless of the criteria used by the instructor to

200

G. P. Weinstein

determine the final grade. For example, the other criteria could include homework and/or a research paper. Panel A of Table 1 presents hypothetical data concerning the performance of a class of 33 students on the three class assignments. Panel B gives the means and standard deviations of each assignment for the class as a whole. The scores have been manipulated for demonstration purposes. The statistics for the three inputs show them to have been inconsistent measurements. The second test had the lowest mean and highest standard deviation, while the third had the highest mean and lowest standard deviation. Each test has been weighted equally, except for the third exam which is counted twice as much as the other examinations. The data have been used to rank each student by both weighted point totals and weighted standardized scores (z scores). z scores are calculated by taking the individual score, subtracting the mean of all scores on that assignment, and dividing the difference by the standard deviation of that assignment. z scores, with a mean of zero and a standard deviation of 1.0, give equal weight to each component of the composite grade. The conversion of raw scores to z scores does not affect the shape of the distribution. If it is highly skewed before conversion, it will be just as highly skewed after (Sax, 1989). Weights can be assigned by multiplying each z score by the relative weight factor. In this simulation, the overall ranking for the entire class is not affected, as demonstrated by a high (0.91) Spearman’s rho correlation coefficient. However, several individual student rankings are significantly affected. Only nine students (27070)do not have their overall rank changed. Students number 1, 5, 7, 17, 24, and 29, each have a z-score-ranked position that is at least five notches different than the student’s ranking based on weighted total points. Most notable is student number 24, who performed very poorly on the doubly weighted exam after doing exceptionally well on the other final grade components. His performance is ranked 8th best by total weighted points, but only 21st by ranking the weighted z score. Clearly, if grade rankings form an important part of an instructor’s analysis, using standardized scores can lead to markedly different results. A SPREADSHEET

SOLUTION

Unfortunately, ignoring the variance of individual components is probably the rule rather than the exception when final grades are calculated. Computerized grading programs seem to offer no relief. The ones reviewed by the author tend to create the grade based only on the total points accomplished. In fact, these computerized grading systems should go beyond simple point accumulation. Calculating variances for each grade component, factoring them into the grade calculation, and producing a class ranking are easy tasks for computers. Fortunately, an electronic spreadsheet which serves the purpose can be

201

Evaluation of Accounting Students

Table 1. A hypothetical grade distribution Panel A Tests ID number 1 2 3 4 5 6 7 6 9 10 11 12 13 14 15 16 17 16 19 20 21 22 23 24 25 26 27 26 29 30 31 32 33

1

2

3

wtcl. total points

67 93 67 60 55 79 73 91 73 70 65 66 61 62 70 96 40 90 75 63 87 92 70 97 69 53 78 61 93 64 66 77 63

52 69 86 38 48 67 69 64 51 72 65 67 66 51 54 90 55 73 73 65 61 93 47 91 70 49 56 66 98 59 55 78 76

95 98 92 79 94 94 95 96 84 83 a4 96 a4 84 80 96 92 87 88 94 86 85 86 75 88 89 65 76 65 65 83 a4 87

309 356 357 256 291 334 332 347 292 308 318 365 317 281 284 376 279 337 324 336 320 355 289 338 315 280 304 283 361 293 267 323 333

Wtd. rankings Wtd. 2 score

Pts.

2 score

0.225 1.249 0.911 -1.517 -0.150 0.614 0.623 0.956 -0.628 -0.413 -0.170 1.254 -0.192 -0.630 -0.976 1.486 -0.479 0.316 0.126 0.653 -0.029 0.520 -0.579 -0.282 -0.035 -0.600 -0.365 -1.110 0.623 -0.572 -0.775 -0.096 0.236

20 4 5 33 25 11 13 7 24 21 17 2 18 30 26 1 32 9 14 10 16 6 26 a 19 31 22 29 3 23 27 15 12

13 3 5 33 18 9 7 4 26 23 19 2 20 30 31 1 24 11 14 6 15 10 26 21 16 27 22 32 6 25 29 17 12

Dif. 7 1 0 0 7 2 6 3 -4 -2 -2 0 -2 0 -3 0 8 -2 0 4 1 -4 0 -13 3 4 0 -3 -5 -2 -2 -2 0

Panel B Tests

Mean Standarddeviation Range

1

2

3

75.94 13.60 40-97

66.55 14.74 38-98

87.61 5.82 75-98

Note. The weighted pointsandz scores are calculatedby assigninga weight of 1 to the first twotests and a weight of 2tothethirdtest.

G. P. Weinstein

202

created with minimum difficulty. Envision a typical gradebook with names on the left and columns going across the page representing the teacherassigned scores on tests and other assignments. ’ The appropriate z score for each individual test score can be reproduced in a parallel array using a single command. Assume the test scores of one exam are contained in column F, rows 10-30. The z score for the student whose score is in cell F20 would be written: (F20 - @AVG(FlO.F30)) / (@STD (FlO.F30)). This command can then be replicated for each student-score and placed into a grid of z scores analogous to the number scores. Weights would be assigned to each z score in the same fashion as one would apply weights to any of the tests/assignments. The weighting scheme is left to the discretion of the instructor (clearly stated in the sytlabus, of course). These weighted z scores can then be averaged by dividing by the sum of the weights. CONCLUSIONS The time has come for accounting professors to use more sophisticated evaluation techniques. Because of the importance of grades to the student and other constituencies, we owe it to the profession to utilize the best techniques at our disposal. To that end, two specific recommendations are advanced: Recommendation

1

Accounting instructors should take more responsibility for grades. They should accept the fact that the grading process is inherently subjective. Accordingly, inclusion of the instructor’s subjective evaluation as a grade component is entirely appropriate. The purpose of this element of the grade is to compensate for the imperfections of written tests and inaccurate scoring of the written grade components by the instructor. The relative weight of this item can be decided upon by the professor, but in most cases it should probably be no more than 15% of the final grade. Under any grading system, professors are responsible for ensuring that the system is unbiased. Students have a right to be evaluated without personal prejudices entering into the process, and they should be treated equally regardless of how poorly or well they perform. Whether the grading system is objective or subjective, it must still be fair. If it is perceived to be fair, students will be less likely to complain, and any inherent weaknesses in the method may be overcome. ‘The method described does not require conversion to percentages of the maximum points available. Most instructors probably do this when calculating the grade.

Evaluation

of Accounting

Students

203

A more radical grading approach would be to state that the final grade is given by the instructor based entirely on subjective judgment. Graded written assignments could still provide evidence to be used in setting the grade, but the instructor would have the freedom to weigh each component as he or she sees fit. A final grade markedly different from a student’s written work would have to be supported from other data, but such a grade could be assigned in the presence of persuasive evidence. Naturally, operationalizing this technique would not be easy. Using a subjective element to determine all or a portion of the grade does leave professors exposed to complaints that the student was unfairly evaluated. In a grading system which purports to be objective, students may not believe they have a chance to affect the grade they received. When a subjective element is employed, one who feels he was wrongly graded may try to administratively (or judicially) challenge the evaluation by alleging personal bias. The professor must therefore be prepared to defend a grade which differs from that determined by the student’s performance on written work. Nonetheless, if professional judgment dictates a grade which is not directly derived “by the numbers,” the professor should have the freedom to employ that judgment and issue the final grade accordingly. Recommendation 2 Evaluations of point totals for student grading purposes should take into account the variances of the individual grade components. The importance of using variances in student evaluations was demonstrated previously. Failure to employ variances in the final grade determination can lead to potentially anomalous results. If nothing else, instructors should employ z-score analysis as a source of additional evidence to determine the grade the student deserves.

REFERENCES Curwin, R. L., Fuhrmann, B. S., & DeMarte, P. (1988). Making evaluation meaningful. New York: Irvington Publishers. Ebel, R. L., & Frisbie, D. A. (1986). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall. Edmonds, T. P. (1984, October). On the benefits of cumulative exams: An experimental study. Accounting Review, 59 (4), 660-668. Frakes, A. H., & Lathan, W. C. (1983, Spring). A comparison of multiple-choice and problem examinations in introductory financial accounting. Journal of Accounting Education, 8189. Gay, L. R. (1981). Educational research- Competencies for analysis and application (2nd ed.). Columbus, OH: Charles E. Merrill. Madgic, R. F. (1988, April). The point system of grading: A critical appraisal. NAASP Bulletin, 29-34.

204

G. P. Weinstein

Raths, J., Wojtaszek-Healy, M., & Della-Piana, C. K. (1987, January/February). Grading problems: A matter of communication. Journal ofEducational Research, 133-136. Sax, G. (1989). Principles of educational andpsychological measurement and evaluation (3rd ed.). Belmont, CA: Wadsworth. Starch, D., & Elliott, E. C. (1913). Reliability of grading work in history. School Review, 21, 616-68 1.