PASS: An automated system for program assessment

PASS: An automated system for program assessment

Computers Educ. Vol. 29, No. 4, pp. 195-206, 1997 Pergamon PII: S0360-1315(97)00021-3 PASS: AN AUTOMATED © 1997 Elsevier Science Ltd. All rights r...

839KB Sizes 38 Downloads 149 Views

Computers Educ. Vol. 29, No. 4, pp. 195-206, 1997

Pergamon

PII: S0360-1315(97)00021-3

PASS: AN AUTOMATED

© 1997 Elsevier Science Ltd. All rights reserved Printed in Great Britain 0360-1315/97 $17,00+0.00

SYSTEM FOR PROGRAM ASSESSMENT

GARETH THORBURN and GLENN ROWE Applied Computer Studies Division, University of Dundee, Dundee DD1 4HN, Scotland

(Received 11 July 1996; accepted 10 June 1997) Abstract--This paper describes PASS (Program Assessment using Specified Solutions), a software system which is used to assess C programs produced by students on an introductory programming course. In programming, it is possible to solve problems in many ways; some good, some bad. The majority of program assessment systems, to date, assess programs by either script based methods, which verify the correct output, or by program metrics, such as cyclomatic complexity. Neither of these methods takes into account the way in which a problem has been solved. Each will generate equal marks for both good and bad programs, provided they produce the correct output. PASS takes into account the way in which a problem is solved by performing a comparison of the submitted program with a solution plan, which is provided by the course tutor. A mark and feedback, based on how well the submitted program corresponds to the tutor's valid solution, is then returned by the system. 63 1997 Elsevier Science Ltd. All rights reserved INTRODUCTION

Programming courses typically require students to solve a number of programming problems throughout the duration of the course. The traditional method of marking programs is by hand. Due to the increasing numbers of students on courses, this is becoming an ever-increasing load on the course tutor. This burden may be shared between two (or more) tutors but this introduces the possibility of inconsistency and discrepancies between each of the tutor's marks. It has previously been found that even individual tutors are capable of inconsistency over the course of marking a number of assignments [1]. A tutor can be prone to modifying their marking scheme as they become familiar with students' solutions but not reassessing earlier assignments. The process of marking solutions is also tedious and repetitive with the result that tutors may become 'program-blind' and mark assignments on whether they appear to be correct [2]. It is clear that the traditional system of assessment would greatly benefit from some sort of automated assessment system which performs the marking of programs. This would introduce a consistent form of marking and also relieve the tutor of the time consuming burden of assessment. The traditional system of marking can be augmented by an electronic submission of students' assignments. The tutor is now capable of compiling and running the program, with test values, to verify that it does work correctly. A short development from this is to use batch or script files to run students' programs with test data and store the resulting output in a file. This relieves the tutor of the process of inputting repetitive sets of test data to a student's program. Instead, the tutor inspects the result file and verifies that the correct output were produced. These batch files can then be expanded to include the process of compilation and to store any errors or warnings in the result file. The actual process of verifying the output produced by a student's program is a matter of comparing the student's output with intended output. Again, this is a process which can be performed by a computer and has led to the development of script based marking systems such as Ceilidh [3] and CAAPE [1]. The whole process is now in a position to be fully automated from the point of electronic submission) There are two main drawbacks with this method of assessment. The first is the process of comparing the student's output with the intended output. If comparison is performed by a simple character match, the student's result file must be identical in format to that of the tutor. This requires a very precise specification of the output to be included in the problem description. Use of parsing tools [2], such as Lex [4] and Yacc [4], allows for a more intelligent form of comparison but still relies on the student's output to follow the intended output format of the tutor. This can cause situations where a student produces a correct program but it is not marked correctly because of an mistake in the output of the program. Such errors are errors of 'formatting' which are not intentionally part of the programming exercise, yet students are being penalised for them. When such assessment systems are used directly by students, they Author to whom all correspondence should be addressed. Fax: 01382-345509; e-mail: [email protected]. I shall refer to this method of automated assessment as script based assessment. 195

196

Gareth T h o r b u r n and Glenn R o w e

can cause confusion, as students trust the marking scheme and try to redesign their solution when all is needed is to adjust an output statement. This can be very frustrating and the source of much annoyance when it is pointed out that the code was correct in the first place. The second drawback is in the method of assessment itself. The process of compiling and analysing the output of an executed program was intended to augment the traditional system of looking at the code, not to replace it. No tutor would mark programs by running them without looking at the program code. An important factor in program solutions is how the problem has been solved, not just if it has been solved. Indeed, Howatt [5] ranks program design as the most important factor in grading students' programs. This is usually reflected in programming courses by the emphasis that is placed on importance of designing a solution before it is implemented in code. However, the marking schemes which are based on the 'script' method are oblivious to the design of a program, determining marks on the end result of the program. Program code can be implemented in many ways to achieve the same end result. For example, the problem shown in Fig. 1 can be solved in many ways but not all of these are equal. Some implementations will show good programming practice, others will be inefficient or use ill-structured code. As long as the correct results are produced, each of the solutions, good or bad, will be marked identically by script based assessment systems. Write a program which declares a global array of 10 integers. Write functions to perform the following. • Initialise all array locations to be zero. (good p r o g r a m m i n g practice) • Calculate the cube of a number (i.e. Return the cube of the number passed to it). • Print out array contents to the screen. Using these and other appropriate functions of your choice, write a program which prints the cubes of numbers from 0-9 to the screen. This program should first initialise array locations to be zero (a good practice to get into when using arrays) then store the cubic values from 0-9 in the array. Use the print function to then print the array values to the screen. Fig. 1. Sample p r o g r a m m i n g problem.

A second approach to the automated assessment of programs is to use complexity metrics [3,6]. Metrics, such as McCabe's cyclomatic complexity [7], lines of code, lines of code per function, average identifier length, are used as the basis for calculating the overall mark for a program. The use of metrics has the advantage that they are easy to calculate and are fully objective in their analysis, therefore totally consistent. The disadvantage is that they are not influenced in any way by the problem which is under analysis. Metrics can be used to give a view of how well a program is written but not how well it answers a given question. Metrics should therefore only be used in conjunction with other marking schemes. A survey into assessment methods, used in introductory programming courses by British universities, was carried out at the beginning of 1997 and produced the results 2 shown in Fig. 2. Although manual assessment has drawbacks, it remains the most popular method of assessment, being used by over 50% of courses. The automated assessment systems used were, without exception, script based methods and used as an aid to the marker, rather than as a replacement. The output from the assessment system being used to form part of the overall mark, with the tutor producing the remainder of the mark. Although some of the replies were opposed to the use of computer based assessment, others took the opposite view and would like to see the availability of more computer based assessment systems. The main motivation towards the use of computer based assessment comes from situations where many scripts have to be marked in a short period of time. As 39% of the introductory courses surveyed have over 150 students and given the tendency towards increasing class sizes, this is likely to become a more common situation. Even with these high numbers of students already present in classes, automated assessment is only used in 11% of courses. 3 Some replies commented that they had experimented with -' A total o f 45 replies were received, o f w h i c h 35 were for introductory p r o g r a m m i n g courses. 3 It is o f note that 3 out o f the 4 courses using a u t o m a t e d assessment have class sizes o f over two hundred students.

PASS: an automated system for program assessment Assessment Method Manual and Active 6% Active 6% l ~ 1 ~ Automated/ 11% ~ Compilex ¢ ~ 2 3 % ~

197

Terminology

Manual 54% v

Manual

Manual assessment

Compile

The program is compiled and run during assessment

Automated

An automated assessment system is used during assessment

Active

The code is assessed during a lab, in the presence of the student

Manual and Active

The program is demonstrated by the student in the lab and the code is marked by manual assessment

Fig. 2. Assessment methods of Introductory programming courses. automated assessment systems but found that they often produced greatly different results to their own. They have therefore stuck with their original methods of assessment. PASS has been designed as an automated assessment system, which analyses programs using a different methodology to existing systems. It is not intended as a system which acts as a replacement for a marker, as there are some areas of program assessment which are not appropriate for automation, such as comments and user interface. Instead, it is intended as an environment to aid the assessment of programs. It will analyse programs and quickly provide information about the design of program to the marker. The marker is then able to use this information as a basis for awarding the overall marks. The overall intention is to leave the marker in charge of how marks are awarded but to speed up the assessment process and improve accuracy of assessment by using automation, where appropriate. PROGRAM DESIGN If an assessment system is to analyse the program design of a solution it must surely be influenced by the program design process itself. The most commonly taught means of designing solutions to programming problems (for procedural languages) is top-down design. This breaks down large tasks into small tasks until each task can be solved individually. Generally, this process is performed using pseudocode, an informal language which is used to express solutions to a problems at a higher level than the programming language. The pseudocode solution is translated into the programming language, with each task being implemented as a function. As the pseudocode is at a higher level than the programming language, the pseudocode solution is sometimes referred to as the plan of the program. A pseudocode solution to the problem of Fig. 1 is shown in Fig. 3. This solution follows good programming practice aDd is how the tutor would like students to solve the problem. As students do not yet possess the knowledge of designing solutions or of the programming language, many solutions will not follow this Main.

(I) lnitiali~arrayto be all zeros.

(2) Set arrayvaluesto be cubesfrom0-9 (3) Displayarray (1)

(2)

(3)

for all arraylocations arraylocation= 0 for all arraylocations arraylocation= cube(locationnumber) for all arraylocations printarraylocation

cube functionthatwhenpassednumbern returns(n * n * n) Fig. 3. Pseudocode solution to sample problem.

198

Gareth Thorburnand Glenn Rowe

plan and may not solve the problem requirements fully. A correct program is one which follows a valid plan a to solve the problem requirements. Incorrect programs are those which do not solve the problem requirements and/or use invalid plans. Even though a program may appear to work correctly, it may be incorrect as it uses an invalid plan. Another difficulty facing novice programmers is that the design environment is very poorly suited to them. Students design plans on paper in an informal language (pseudocode). There is no means of obtaining feedback on this paper plan, other than by asking a tutor. This is then translated into code, using the computer, where feedback is obtained from the compiler and also by running the program. Novice programmers will tend to spend the majority of development time in the latter environment, where they can use the feedback provided to aid in the design process [8]. The two main sources of error in novice programs come from a poorly designed initial pseudocode solution and in the translation of the pseudocode to the programming language. Where poor initial plans cause program errors, the plan should first be redesigned and then translated into a new program. Novice programmers tend not to look at the pseudocode plan once it is translated into code. Instead, they try to fix programs by altering the code. In the cases where poor initial plans are used this amounts to little more than 'hacking' the code to make the program appear to work. The intention in designing PASS is to provide an assessment system which is capable of determining whether a solution is correct and more importantly, if it follows a valid plan. PASS will provide both a mark, based on how well the program follows a valid plan and information detailing where the program loses and gains marks. This information is provided by analysing the program code. PASS does not enforce a programming environment [9] or design language [10] upon the student. Only syntactically correct programs can be analysed by PASS, leaving syntax errors to be analysed by compilers or other specific applications [ 11]. SOLUTION PLANS The tutor provides PASS with a solution plan for each problem PASS is to analyse. The solution plan is a specification of a valid plan for a specific problem. This is represented by combining a hierarchical view of how the functions of a program are called with an abstract description of each function. The abstract description of a function describes the task a function performs but does not specify how it is implemented. The solution plan for the sample problem is shown in Fig. 4. During analysis of submitted programs, PASS compares the solution plan of the submitted program with that of the tutor. The solution plan of submitted programs is extracted by identifying equivalent functions to those specified in the tutor's solution plan. Equivalent functions are functions which perform Top-level

I

I Main Function

Level 2

lnitialise_ArrayI Store_Cubes ~

Print_Array [

Function Main Function Initialise Array Store_Cubes PrintArray Cube

Abstract Description Print cubes of numbersfrom0-9 to the screen. Initialise array to be all zeros. Store cubes from0-9 in array. Print out arraycontents. Return cube of numberpassed as argument.

Fig. 4. Solutionplan for sample problem. 4A valid plan is a solution strategywhich represents 'good' programmingpractice.

PASS: an automated system for program assessment

199

the same task, regardless of how they are implemented. Equivalent functions are therefore functions which correspond to the same abstract description. 5 Feedback is provided on where the extracted solution plan differs from that of the tutor. The submitted program receives a mark, calculated from the number of functions identified at each level of the hierarchy. THE ASSESSMENT AND FEEDBACK SYSTEM If a submitted program follows the tutor's solution plan, the extracted solution plan will be identical to that of the tutor. Such a solution does not differ from the intended plan and receives full marks and feedback indicating that the program is correct and uses a valid plan. A more likely scenario occurs when the submitted program does not fully match the tutor's solution plan. For example, take the situation where a program differs from the solution plan of Fig. 4 by calculating the cube o f a number directly rather than using a cube function. This program and its extracted solution plan are shown in Figs 5 and 6. When the extracted solution plan is compared with the tutor's solution plan, the submitted program has no equivalent to the tutor's cube function at Level 2. However, the program still contains a function which stores the cubes from 0 - 9 in an array. The student's function store_cubes will therefore be identified as an equivalent function to Store_Cubes. 6 PASS is able to determine that the program is a correct solution, as it satisfies the problem requirements, but that it does not follow a valid solution plan. In these situations, PASS is then able to highlight that an improvement can be made to the program design, in this case by using a cube function to calculate the cubic values from the store_cubes function. This is an example of how the hierarchical nature of the marking scheme allows PASS to 'intelligently' assess programs which use unspecified (or novel) solution plans to solve a problem. The hierarchical nature of the marking scheme provides a very flexible method of analysing programs. If a program is submitted which uses an entirely unspecified solution plan to solve a problem, it can still be determined if the problem is solved correctly. This is because equivalent functions will only be found at the top level of the hierarchy (the main functions) and not elsewhere. PASS will inform the student that the program produces the correct results but does not do so in a correct manner. Programs which do not fully solve a problem can also benefit from this system of analysis. If part of a program uses a valid plan but an error elsewhere prevents the program from performing correctly, this void initialisearrayO; void store_cubesO; void printarray(); int global_array[10]; void main() { initialise_arrayO; store_cubes(); printarray(); void initialise_arrayO { int index; for (index=0; index<10; index++) global_array[index] = 0; J void store_cubesO { int index; for (index=0; index<10; index++) global_array[index] = index * index * index; }

void print_arrayO { int index; for (index=0; index<10; index++) printf("%d \n", globalarray[index]); }

Fig. 5. Submitted program code. 5The method of identifying equivalent functions is discussed later in the paper. In order to distinguish between functions, the tutors solution functions use capital letters and the students solution functions use lower case. This is not a requirement of the system but is simply for the purpose of clarifying the discussion.

200

Gareth Thorburn and Glenn Rowe

Top-level

1

Main

~

I 'evell i ~

l

lnitialise_Array [

Store Cubes I

Print_Array

Function

Abstract Description

Main Initialise Array Store_Cubes Print_Array

Print cubes of numbers from 0-9 to the screen. Initialise array to be all zeros. Store cubes from 0-9 in array. Print out array contents.

Fig. 6. Extracted solution plan.

is recognised by PASS. The program shown in Fig. 7 has a bug in the store_cubes function. Instead of storing the cubes of the looped variable i it stores the cubes of the value 1. Analysis and running of the program will both show that the program does not solve the problem requirement of printing the cubes from 0 to 9 to the screen and does not contain an equivalent to the tutor's Store_Cubes function. As the tutor's solution does not contain a function which performs the task of storing the value 1 in an array, the student's store_cubes function will be unidentified. Comparison of the solution plan, shown in Fig. 8, shows that the program is partially correct. Two equivalent functions are found at Level 1 of the hierarchy (initialise_array and print__array) and an equivalent cube function is found at Level 2. This program will still receive marks for having part of a valid plan and the student will be informed which functions are valid. PASS will also be able to point out that the most likely source of error in the program is within the storecubes function as this is the lowest level function which has not been identified. The most likely void initialise_arrayO; void store cubes(); void print_array(); int cube (int num); int global_array[ I 0]; void main0 { initialise_arrayO; store_cubes(); print_array(); } void initialise_arrayO { int index; for (index=0; index<10; index++) globalarray[index] = 0; J void store cubes() { int index; for (index=0; index<10; index++) globalarray[index] = cube (1); } void print_arrayO { int index; for (index=0; index< I 0; index++) printf("%d \n", globalarray[index]); } int cube (int hum) { return (num * num * num); I Fig. 7. Buggy solution.

PASS:an automated system for program assessment

j

Top-level

201

I ,evell II Level2 Initialise-ArrayI

Main

(Utnirdee.iulib; .c ) Print Array ] Function

Abstract Description

Main Initialise_Array Store_Cubes Print_Array Cube

Prints ten l's to the screen. lnitialise array to be all zeros. Store l's in array locations. Print out array contents. Return cube of number passed as argument.

Fig.8. Extracted solution plan from buggy code. cause of error is identified through a form of error localisation [12]. The analysis system used by PASS has the two main benefits that it can recognise plans which solve tasks correctly, whether they use a specified or unspecified approach and it can recognise valid plans within programs, whether they produce the correct overall result or not.

EQUIVALENT FUNCTIONS The ability to recognise the functions from a student's program is due to the system of identifying equivalent functions. Equivalent functions are defined as functions which, given the same input, produce the same output, over the set of all possible input data. A process for testing two functions for equivalence is shown in Fig. 9. In practice, it is impractical to use the set of all possible input as the data source is often very large, e.g. on a 16-bit system the set of integers would consist of over 64,000 values. This can be overcome by using a subset of this data which is generated randomly, the subset containing enough values to reduce the probability of a coincidental match occurring to minimal levels. This is similar to the method used by the mathematical software CALM [13] to mark questions requiring algebraic expressions as answers. Algebraic functions, like programming functions, can be expressed in many ways to perform the same task. Rather than formally prove the algebraic equivalence of the submitted expression and the expected expression, CALM uses the previously described method of evaluating both expressions with randomly generated values. If both equations produce the same result then the equations are taken to be the same. The method of determining equivalence for C functions is more complex than its mathematical

Set ofDataall Source [ input data~ t /,nputdat /

/

set

t I Function

/

t

1

Output data set 1 /

t

] Function2

[ ~

/ Output / •data set 2 /

Fig.9. Equivalence-of functions.

202

Gareth Thorburn and Glenn Rowe

p,,not,on/

/ p~,ra.~oters/

ariables ferenced / function/ arameters/

Fig. 10. Function inputs and outputs. analogy due to the presence of variable types, structures and the different ways in which variables can be referenced from a C function. 7 The ways in which variables make up a function's possible inputs and outputs are shown in Fig. 10. Inputs to a function are variables which are supplied from outside the function and are referenced from within it. The outputs of a function consist of the return value of the function and variables which can be modified by the execution of the function. The variables which make up a function's inputs and outputs are known as the 'signature' of the function. Only functions which have the same signature may be equivalent functions. Functions in the submitted program code are identified by comparison with tutor's functions of the same signature. If a function in the submitted code has a signature which matches none of the tutor's functions, then this function will go unidentified. A student's function which has a matching signature with one or more of the tutor's functions is compared, with each of these functions, using the process previously described. If the student's function has an equivalent tutor's function then the student's function has been identified. If there is no equivalent tutor's function then the student's function is unidentified. An important feature of identifying functions by their signature is that functions are identified regardless of the name which is given to them. This means that the tutor does not need to specify the names of functions which should appear in the code. If this were the case, the tutor would be influencing the design of the students' solutions by supplying the names of required functions. The approach used by PASS does not constrain the students' programs in any way, leaving them free to use their own names for variables and functions. A potential problem with this system of identification is where arbitrary ordering of parameters can be used to perform a task. For example, the functions shown in Fig. 11 can both be used to provide a subtract function, but require different orderings of arguments to work as intended. If the solution plan requires a subtraction function and the tutor provides the definition of subtract2_a, then only subtraction functions which return argumentl-argument2 will be identified. If a student has used a function which returns argument2-argumentl, such as subtract2b, then this should still be marked correct, s PASS overcomes this problem by automatically generating and testing permutations of function arguments. Using the previous example, PASS will identify subtract2_a and subtract2_b as functions which perform the same

/* Returns the result of b - a) */

/* Returns the result of a - b) */

intsubtract2 a(inta, intb) { return (a-b);

int subtract2_b(int a, int b) { return (b-a);

}

}

Fig. 11. Arbitrary ordering of parameters. 7PASS is currently limited to analysing a subset of the C language. This consists of the four basic data types (int, float, char, void), arrays of these types and all language operators. Advanced features, such as pointers and user defined types, are not supported but work is in progress to include these. 8 Unless a tutor has specified the precise ordering of variables for each function. This is not always possible as it would require the tutor to provide the functions which makes up the solution. Thus, influencingthe students solving of the problem.

PASS: an automated systemfor programassessment

203

action. Therefore, functions which may be implemented using different orderings of parameters will still be identified by PASS, regardless of which ordering is used. USING PASS PASS has been developed as a Windows application and runs on IBM-PC compatibles. It also requires access to a compiler, in order to compile and run tests on the submitted programs. The tutor provides PASS with the program code of a valid solution in a standard text file. The tutor's solution program is analysed by the system, using the previously described method, and an internal representation of the program is produced. This has the advantage that no special language need be learnt before PASS can be used. During this process, the tutor provides information about each function. This information consists of the range of values, for each 'input' variable used by the function and a text description of the function. The 'input' and 'output' sets are then produced by running each function, a number of times, using randomly chosen values from the variable range. Once the creation of the solution is complete, each function will have a number of 'input--output' sets of data, which can be used to identify equivalent functions. Producing this data during the creation of the solution is a more efficient process than performing it each time a solution is assessed. To assess students' programs, the tutor selects a solution plan and then the program to be assessed. The student's program is analysed and compared with the solution plan, with PASS identifying the functions which have equivalents in the tutor's solution plan. The tutor is presented with a list of the student's functions, which were identified, together with the function code and a text description of the function. PASS also provides the tutor with similar information, showing the functions of the solution program which were not present in the student's solution and conversely, the functions in the student's program which did not have equivalents in the tutor's solution program. PASS also provides a mark, based upon how well the program compared with the tutor's. On a standard Pentium PC, this information is provided in around 30 s. This provides the tutor with a very quick picture of how the student's program works and removes the much of the need for manual verification of functions. RESULTS The PASS system has been evaluated by comparing its performance with that of two existing methods of assessment. The first method, manual assessment, is the most common method of assessment used by British universities. The marker receives paper printouts of students' code and performs the assessment without the aid of a computer. The second method uses a computer and a compiler as a tool to assist the assessment process. Here, the marker receives students' programs in an electronically readable format and uses the computer to compile and run the students' programs. The PASS system was evaluated in two ways. The first involved using it as an aid to the assessment process, here the tutor used the information produced by PASS to assist in forming an assessment of the program. The second involved using the marks which were produced directly from the PASS system. Each marker was given the same scripts to mark and must fill in a grading sheet for each script. The grading sheet contains a formal marking scheme, which indicates how marks are to be awarded. The marking scheme is composed of 6 criteria on which a program is to be judged, each criterion being weighted according to its importance. In the case where the marks are taken from the PASS system directly, only the overall mark is used. The grading sheet is also used to store the time taken to mark each script. The evaluation is to be based upon the speed on which programs are assessed and the accuracy of the assessment. The programs used in this exercise were those produced by a first year class, as required by their first lab submission. The problem asked them to produce a program which defined and tested five functions, following the specifications shown in Fig. 12. The evaluation was performed using 17 scripts. The overall marks awarded, by each of the assessment methods, correlate very strongly as shown in Fig. 13. This would indicate that using the PASS method of assessment is at least as accurate as alternative methods. The PASS method itself was much quicker than the other methods, being purely computer based. The PASS assisted method was only marginally quicker than the other methods, the times and statistics being shown in. Taken at face value, the results would indicate that there is not really much wrong with the manual method of assessment, as it is fairly quick and appears to be just as accurate as the others. A closer examination of the scripts revealed that this may not be the case.

204

Gareth Thorburn and Glenn Rowe Average function Add function Area function Temp conversion function

Takes three floats and returns the average as a float Takes two integers and returns the sum as an integer Takes the radius of a circle as a float and returns the area as a float Takes the Celsius temperature as a float and returns the farenheight equivalent as a float Takes a float value for an initial deposit, an integer value for the number of years and returns the final deposit value, for a fixed rate of interest

Deposit function

Fig. 12. Function specificationsfor assessment exercise.

One o f the manual scripts took much longer than the others to be assessed. This script took 6 min, which is much higher than the average of under 3 min. The reason for this was that the program implemented the calculation for the final deposit using an unusual formula. This had to be verified by the manual marker by hand, which took time. The other markers did not have this additional overhead as they were in a position to have the function verified, by using the computer. This is reflected in their time taken to assess this script, which does not differ significantly from the average time. The compiler marker would see that the function produces incorrect output and the PASS marker would be informed that the function was not identified as one of the tutor's functions. This indicates that the time taken to mark scripts by manual assessment is in some way, dependant on the scripts themselves. This does not appear to be the case with the other methods o f assessment. It was also noticed that a number of errors were made by each o f the markers, see Fig. 13. The manual marker made three errors in not recognising that a function was incorrect, not noticing that a function was not called from main and not noticing that a function was called with the arguments in the wrong order. These are all mistakes that would have been apparent by being able to run the program. Although having this ability, the compiler and the PASS assisted marker still made two mistakes each. The compiler marker did not recognise an incorrect function but this was only in the case where the function was not called. Running the program did not produce any information about this function and the marker was therefore reduced to manually assessing the function. The other error made by the marker was simply poor marking. The PASS assisted marker was not affected by the previous script, where the function was not called, as the PASS system identifies functions individually. However, the PASS assisted marker did have to manually assess a program, which would not compile (the PASS system will only assess programs which are syntactically correct), and this is where an error was made in not noticing an incorrect function. The other mistake made by the PASS assisted marker was in not noticing a script where the main function did not call the defined functions. This was an unusual script where the program defined correct functions but did not call them from the main function. Instead, the main function

Correlation

Manual

Manual

Compile PASS (assisted)

Compile

PASS (assisted)

PASS

0.896

0.915 0.927

0.885 0.918 0.874

Time

Manual

Compile

PASS (assisted)

PASS

Average Std Dev

2.686 1.302

6.000 1.506

2.625 0.885

0.5 0

Errors

Manual

Compile

PASS (assisted)

PASS

Number

3

2

2

Fig. 13. Assessment results and statistics.

1

PASS: an automated system for program assessment

205

manually calculated the values and displayed them to the user. This error was not apparent to the PASS system due to the correct function definitions being provided. Not surprisingly, this was also the error made by the PASS only assessment. Interestingly, this error would not have been spotted by a 'script' based assessment system, even if the correct function definitions were omitted, as the program would still produce the correct output. CONCLUSIONS The paper proposes an assessment system which assesses programs according to whether they use a valid plan. This system is currently implemented for the C language but is applicable to most procedural languages. The initial evaluation shows that the system, when used as the sole method of assessment, produces an acceptable accuracy of assessment. This is shown in the correlation figures. It is also a much quicker method of assessment than the others. The use of PASS as an aid to assessment provides a quicker and more accurate method of assessment than that of manual assessment. Although the figures show only a marginal speed gain over manual assessment, it can be speculated that this is largely due to the scripts being of a relatively high standard. If the scripts were of a lesser standard then there may be have been more instances of higher assessment times, such as the anomalous six minute assessment. Similarly, although each method of assessment appears to produce roughly the same number of mistakes, it should be noted that some of the mistakes made by the compiler and PASS assisted markers were when purely manual assessment was required. The system is currently limited in its assessment scope to programs which do not use structures or pointers. By the nature of this limitation, only relatively simple programs have been assessed using the system. In this context, the system has performed well and has been able to identify programming mistakes which would not have been identified by 'script' based assessment methods. The system has also been shown to offer speed and accuracy benefits over manual and compile marking, both in assistive and stand-alone use. Development to advance the system to cope with all aspects of the ' C ' language is in progress. With this capability, more advanced and complex programs can be analysed by the system and a similar evaluation can be carried out. One of the perceived problems, in analysing the programs produced as solutions to complex problems, is that there may be some valid solutions which do not follow the same functional layout as the tutor's solution. This is an expected situation and is to be resolved by allowing for multiple solutions, for each problem. The analysis system will compare the submitted student's program with each of the solutions and base its assessment on the solution which has the closest match with the program. If a tutor encounters a script which is not handled well by the current solutions, they will be presented with the option of adding the script as a solution to the problem. A further extension to this principle, is to allow for alternative implementations to exist at the functional level. A fully alternative solution is, after all, an alternative implementation for the 'main' function. A further possibility is the development of the PASS system into a tool which can be used by students for self assessment. REFERENCES I. Edmunds, G., Experiences using CAAPE: Computer assisted assessment of programmingexercises. Computers & Education, 1990, 15(1-3), 45-48. 2. Jackson, D., Using software tools to automate the assessment of student programs. Computers & Education, 1991, 17(2), 133-141. 3. Benford, S., Burke, E., Foxley, E., Gutteridge, N. and Zin, A. M., Experience using the Ceilidh system. In Proceedings of the 1st All-Ireland conference on the teaching of computing, 1993, pp. 32-35. 4. Levine, J. R., Mason, T. and Brown, D., Lex and Yacc. O'Reilly and Associates, 1992. 5. Howatt J. W., On criteria for grading student programs. SIGCSE Bulletin, 1994, 26(3). 6. Hung, S., Kwok, L. and Chan, R., Automatic programmingassessment. Computers & Education, 1993, 20(2), 183-190. 7. Beizer, B., Software Testing Techniques. Van Nostrand Rheinhold, 1990. 8. Davies, M., Boyle, T. and Gray, J., Braque: A Multimedia CAL system for program design. In Proceedings of the 1st AllIreland conference on the teaching of computing, 1993, pp. 48-51. 9. Bonar, J. and Cunningham, R., Bridge: An intelligent tutor for thinking about programming. In Artificial Intelligence and Human Learning, ed. J. Self. 1988, pp. 391-409. 10. Ellis, G. P. and Lund, G. R., G2 - A design language to help novice C programmers. In Proceedings of the 2nd All-Ireland conference on the teaching of computing, 1994, pp. 91-96. 11. Elsom-Cook, M. and Du Boulay, B., A Pascal program checker, In Artificial Intelligence and Human Learning, ed. J. Self. 1988, pp. 361-373.

206

Gareth Thorburn and Glenn Rowe

12. Fritzson, E, Gyiomothy, T., Kamkar, M. and Shahmehri, N., Generalised algorithmic debugging and testing. In Proceedings of A CM S1GPLAN 91 conference, 1991, pp. 317-326. 13. Beevers, C. E., Cherry, B. S. G., Foster, M. M. G. and McGuire, G. R. M., Software Tools for Computer Aided Learning in Mathematics. Avebury Technical, 1991. 14. Johnson, W. L. and Soloway, E., PROUST: Knowledge-based program understanding. IEEE Transactions on Software Engineering, 1985, SE-11(3), 267-275.