Symbiosis of Human and Artifact Y. Anzai, K. Ogawa and H. Mori (Editors) © 1995 Elsevier Science B.V. All rights reserved.
F a c t o r s i n f l u e n c i n g t h e c l a s s i f i c a t i o n of o b j e c t - o r i e n t e d Supporting program reuse and comprehension.
653
code:
Simon P. Davies 1, David J. Gilmore 2 and Thomas R. G. Green 3 1Department of Psychology, University of Hull, HU6 7RX, UK 2Department of Psychology, University of Nottingham, NG7 2RD, UK 3MRC Applied Psychology Unit, 15 Chaucer Road, Cambridge, CB2 2EF, UK.
This paper reports a study of the use of card sorts in the categorisation of fragments of object-oriented programs. We are interested in the way in which programmer's think about code so that we might attempt to provide support for browsing and reuse within object-oriented environments. Hence, we have been exploring the use of knowledge acquisition techniques in order to elicit programmer's knowledge about code. Our results showed that experts tended to focus upon the functional relationships between the code fragments, and that the novice group were much more concerned with objects and inheritance. We discuss these results in terms of claims that have been made about the naturalness of conceiving the world in terms of objects and their relationships.
1. INTRODUCTION The study reported here required expert and novice programmers to sort through a number of cards each containing a fragment of code. In the case of the expert group, half of the subjects were familiar with the code whereas half were unfamiliar. The subjects sorted the cards according to any criteria they felt were appropriate. 24 subjects were recruited from a population of undergraduate and graduate students, researchers and lecturers drawn from 2 large computer science departments. Of the 12 subjects in the expert group, all but one had more three years experience of C++, and half had some familiarity with the code fragments used. The novice group consisted of 12 undergraduate students all with around one years experience of C++. For the purpose of this study 41 code fragments were derived from a large C++ library, known as InterViews, designed for graphics applications. These fragments included 36 method and function definitions and the remaining five
654 were class definitions. We included examples of containment and subclass/super-class relationships to try to encourage an object-based decomposition and classification. For the purpose of the experiment itself, subjects were asked to study each of the fragments for a short period and then to sort the fragments into any number of piles corresponding to what they felt was a meaningful categorisation. After this first sort, subjects were asked to state their reasons for classifying the fragments in they way they choose and to make a differentiation between each of the piles they had constructed. The subjects were then asked to repeat this procedure by splitting any pile that they thought might be usefully subdivided. Most of the subjects provided only one sort, though two subjects produced three separate sorts. Once again, following this task the subjects were asked to state their reasons for the particular subdivision they had chosen.
2. RESULTS
2.1 How do subjects classify code? The sorts produced by subjects were analysed in terms of the percentage of the total classification concerned with objects, classes and inheritance vs classification into functional subsets and other non object-based data structures. This analysis was based upon the labels subjects provided during each sort and upon the canonical sorts that were developed prior to the experiment. The majority of the descriptions provided by subjects were straightforwardly classifiable into either the object/class/inheritance category or the function category, with a maximum of 8% of statements being 'mixed' or 'unclassifiable'. This categorisation was undertaken by two independent analysts and a high level of agreement was obtained with only 3.5% of the categorisations differing. There was a high level of agrement (76%) between the classifications that were developed by subjects and the canonical decomposition. Our results showed that the expert subjects derived classifications of code fragments based mainly upon their functional relationships with significantly less emphasis placed upon object-based classifications. Conversely, the novice subjects showed a marked tendency to found their classification on an objectbased derivation as opposed to a functional derivation. This difference was analysed using two chi-squared tests; the first comparing expert's derivations of functional vs object classifications (X 2-- 10.02, p < 0.01), and the second comparing novice's derivations of functional vs object classifications (X 2 = 11.62, p < 0.01). 2.2 Are subject's classifications consistent? These data were further analysed in order to examine the degree of consistency between subjects within each group and between the expert and novice group. Figure 1 shows a scattergram constructed from the scores of individual subjects.These data were analysed using single link cluster analysis
655 employing a K-means splitting method. This analysis indicated a high degree of consistency between individual subjects sorts. That is, different subjects tended to sort the same cards into common classifications. As far as the expert group were concerned, agreement (in classification and labelling) tended to be based u p o n syntactically derivable entities such as constructors, directories, inlines, dialog code, event handlers, and application specific functional entities. Often the experts sorts were based u p o n similarity in the surface structure of the code. Where class based relationships were mentioned these were often subsidiary to a functional description. These classifications suggest a fairly high level functional view of the code driven primarily by surface similarity where the functional structure of the code shared many common elements. Cluster analysis suggested that the functional groupings also tended to share many c o m m o n elements across subject's classifications. Using an overlapping clustering method, functional groupings accounted for 58.2% of the variance in the clusters formed by experts, with only 12.3% of the variance accounted for by object-based clusters. In the case of the novice group, subjects clustered around an object based classification. Here, analysis suggested a high level of agreement between subjects in terms of the cards that they felt should be grouped together. Additional analysis was undertaken in order to assess the extent to which classifications might be based u p o n syntactic or semantic similarities between different fragments of code. This task presents some ambiguity since it is not entirely clear how such similarities might be defined. For instance, lexical and orthographic features might be used to form the basis for a syntactic classification. In many cases, the experts' functional groupings were based upon lexical features. 40 _ /
30 Object
[] Familiar . Unfamiliar
Classification
20
[]
10
Irl
BB
'
0
I
10
'
I
'
I
20 30 Functional Classification
$
'
I
[]
40
'
l 50
Figure 1. A scattergramshowingindividual subject's scores in this classification space. Notice that expert's classifications tend to emphasisefunctional relationships. The novicedata also suggestsa fairly consistent classification primarilybased uponobjects.
656
40-
30 m •
Object
Classification
20-
Familiar Unfamiliar
Ill ID
10-
IDQ • f
w
~
,
ira
/¢
0
10
20
30
40
50
Functional Classification
Figure 2. A scattergram showing the proportion of code classified in terms of objects or functional properties for subjects familiar and unfamiliar with the code.
For example, matching fragments which manifested shared key words like constructors and inlines etc. However, there are other shared features which are more structurally based. For example, access functions pass arguments to access a class, and while locating these functions clearly demands some understanding of the code, they can nevertheless be straightforwardly identified by locating elements of structural similarity. At other levels, semantic analysis is called for. For example, an object-based decomposition cannot be simply derived from a syntactic analysis since the object hierarchy is not explicitly represented in the code, but rather must be inferred by tracing the calling and access paths implicit in the program as a whole. Of course, it is possible that the subjects could extract object-based information fairly straighforwardly from the type-definitions, but none reported or appeared to do so. In order to minimise the potential effects of such ambiguity, definitions of syntactic and semantic classifications were adopted which were based upon a primary distinction between the two which assessed each form of classification using a single metric. Here the raters were asked to classify the labels produced by each subject into those representing a syntactic relationship between fragments and those representing a semantic relationship. This involved identifying syntactic classifications as those which explicitly mentioned keywords which were shared between fragments and formed the basis for the classification. This definition of semantic classification is probably somewhat conservative in the sense that it ignores those classifications based upon surface structural
657 similarities (such as access functions) between fragments which might otherwise be regarded as syntactically based. Note that even with this conservative estimation, experts produced significantly more syntactic classifications than novices. Once again two chi-squared tests were employed to analyse these data. In the case of the expert subjects, there were a greater n u m b e r of syntactic classifications as opposed to semantic classifications (X 2 -6.21, p < 0.05), whereas in the case of novices this situation was reversed with a greater number of semantic classifications in comparison to syntactic classifications (X 2 - 8.15, p < 0.02). Moreover, the was a high level of inter-rater agreement (74.3%) in terms of the categorisation of labels which were assigned to either the syntactic or the semantic categories. Another question that we were able to address in this study relates to the extent to which domain familiarity might facilitate object decomposition. In this context we compared the classifications of the expert subjects. As was mentioned before, half of this group had specific experience of this code, in some cases using it extensively in their work, while the remaining half had no experience. Comparing the classifications of those familiar and unfamiliar with the code, 29.8% of fragments were classified as functional by those who were familiar with the code compared with 17.7% for those unfamiliar. In the case of the objectbased classification only 8.2% of code fragments were grouped according to this criteria by those familiar with the code compared to 19.8% classified by the unfamiliar group (See Figure 2). These results were again analysed using two chisquared tests one concerned with the familiar data and the other with the unfamiliar data. Here, unfamilar subjects tended to produce a greater number of object-based classifications (X 2 = 7.27, p < 0.01), whereas the familiar subjects tended to focus upon functional classifcations (X 2 = 7.19, p < 0.02).
3. IMPLICATIONS These results pose serious implications for claims made about the naturalness of object-oriented programming (BoocK 1986). In particular, our contention here would be that if naturalness claims are correct then an open ended classification task ought to emphasise those entities and relationships which are supposed to be psychologically meaningful to a particular group of programmers. However, both the expert group and that subset of this group who were familiar with the code tended to emphasise other entities and relationships which were based u p o n a functional rather than an object-oriented view of the code. One implication of this may be that while an object-oriented view might be of significance at the early stages of code generation, comprehension and design, once some familiarity has been established then other sources of knowledge tend to take precedence. This may suggest understanding mechanisms that are driven by different knowledge sources which may be much more application than
658 language specific, at least during the beginning stages of code comprehension or generation. Hence, while claims about naturalness might be appropriate for some stages of the p r o g r a m m i n g activity, it seems that we should not rule out other important perspectives on the code which may be equally as important at different stages of this activity. We believe that this is not an unreasonable position to hold and we would call for tools and environments which are able to support a range of multiple perspectives rather than being b o u n d by the conventions and prescriptions of a methodology centred around a strict objectoriented view. This clearly raises other issues concerning the extent to which the provision of different representations of the code, i.e., object hierarchies etc. might influence a programmers classification of code fragments. Indeed, a number of subjects who produced a functional decomposition suggested that other views of the code might be appropriate, but that a different representation of the program would tend to facilitate a different classification. It seems fairly clear that the provision of different tools or representations would change the classification space. Indeed, in other studies we have observed that programmers who use textual tools tend to describe their code in a textual fashion whereas diagrammatic representations and descriptions are provided by those w h o typically use tools which emphasise a graphical view of code. We suggest that different tasks and different p r o g r a m m i n g tools and environments will tend to emphasise the role of different forms of knowledge. The claim that object-oriented p r o g r a m m i n g and design support naturally occurring psychological phenomenon, seems to artificially constrain the space of possible representations by effectively ruling out other views on the code. Such a position seems unreasonable since it is clear that different tasks require different perspectives on the code, not all of which will be object-based. The study presented here should be considered to be a preliminary analysis of the knowledge programmers bring to bear when attempting to classify code. We believe that this study constitutes one of the few attempts to empirically evaluate claims about the naturalness of object-oriented programming. However, while this study may throw some light on the cogency of these claims, more research will be needed to further elucidate the role of different knowledge structures in code classification. REFERENCES
Booch, G., (1986). Object-oriented development. IEEE Transactions on Software Engineering, SE-12(2), 211 - 221. Green, T. R. G., Gilmore, D. J., Blumenthal, B. B., Davies, S. P. and Winder, R., (1992). Towards a Cognitive Browser for OOPS. International Journal of HumanComputer Interaction, 4, (1), 1-34.