Available online at www.sciencedirect.com
ScienceDirect
Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000
ScienceDirect
www.elsevier.com/locate/procedia
Procedia Computer Science 124 (2017) 311–318
4th Information Systems International Conference 2017, ISICO 2017, 6-8 November 2017, Bali, Indonesia
Multiview Similarity Assessment Technique of UML Diagrams Alhassan Adamua, b, Wan Mohd Nazmee Wan Zainonb* Kano University of Science and Technology, Wudil, Nigeria b Universiti Sains Malaysia, Penang, 11800 Malaysia
a
Abstract Software system development has been beset by consistency problem since its infancy. This problem is still greater during the software design stage, and are greater still in UML diagrams. This is principally caused because of the existence of multiple views among UML diagrams. The main issue that arise when assessing the similarity of UML diagrams from multiple views is how consistency can be maintained across different views during the comparison. Previous works have failed to address this issue. They only compare one type of diagram in one stage and another type of diagram in another stage, this manner of similarity assessment lead to incorrect similarity scores, thereby creating inconsistencies in the information conveyed by the UML artifacts during software reuse. This paper presents Multiview similarity assessment technique, in which the similarity of UML diagrams is computed from structural and behavioral views of software systems. © 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 4th Information Systems International Conference 2017. Keywords: Multiview; Similarity Accessment, UML Matching Metrics
1. Introduction Unified Modelling Language (UML) is a general-purpose modelling language that graphically represents systems requirements [1] and was accepted by the International Organization for Standardization (ISO) as a standard specification. UML provides diagrams for visualizing, specifying and documenting software systems [2]. UML comprises of a set of diagrams that can be used to model a software system. The diagrams were categorized into two categories: structural diagrams which documents static structure of systems objects, and behavioral diagrams which shows the behavior of systems object [3]. Each of the category represents a particular aspect of software systems to
* Corresponding author. Tel.: +604-653463; fax: +604-6573335. E-mail address:
[email protected] 1877-0509 © 2018 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the 4th Information Systems International Conference 2017.
1877-0509 © 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 4th Information Systems International Conference 2017 10.1016/j.procs.2017.12.160
312 2
Alhassan Adamu et al. / Procedia Computer Science 124 (2017) 311–318 Alhassan Adamu and Wan Mohd Nazmee Wan Zainon / Procedia Computer Science 00 (2018) 000–000
be developed; collectively they provide complete software systems. The UML taxonomy of diagrams consider only structure and behavior diagrams, without any category for the functional diagrams. However, according to Ahmed [4] use case and sequence diagrams could be interpreted as a mean of specifying the functionality of a system. Each use case diagram can be represented by one or more sequence diagrams which depicts how objects interact and work together to provide service. Hence, sequence diagrams can be considered as the representative of functional perspectives of software systems. Fig. 1 shows the taxonomy of UML diagrams.
Fig. 1: Taxonomy of UML Diagrams [5]
UML is a de facto modelling language used by software developers during the initial stages of software development. Matching of these models is challenging due to different reasons like the multi-dimensional nature of the modelling process, the variety of models to be designed, and the multiple views of software systems which should be modelled [6, 7]. For example, a structural perspective may describe static relationship between various software elements, while behavioural perspectives may describe the behaviour of software system. The problem of matching software design artefacts modelled using UML diagrams is the necessity to take into account the collective information contained in the multiple views of software systems. In other words, the similarity of software systems should be evaluated in a consistent manner by simultaneously considering the different views of the software system, rather than simply aggregating similarity values obtained from independent perspectives. Many of the researches investigating model matching has focused on single view [8], thus creating inconsistency between the UML models. These inconsistencies among different models of a system may be a source of numerous errors for the software to be developed. To the best of our knowledge, there is no clear evidence about a well-defined similarity metrics to be used for UML model matching from multiple-views. Therefore, this paper presents a general approach for accessing the similarity between different views software systems. 2. Motivation The widespread of the use of Unified Modelling Language in the architecture and design phases of software engineering projects is increasing since its inception [9]. When using UML, software systems are described by constructing a set of diagrams. Usually, these diagrams are created independently and typically contained overlapping information. For example, the dependencies contained in a class diagrams and the messages in message sequence diagrams are related [10], thus this relationship between both diagrams induces consistency requirement.
Alhassan Adamu et al. / Procedia Computer Science 124 (2017) 311–318 Alhassan Adamu and Wan Mohd Nazmee Wan Zainon / Procedia Computer Science 00 (2018) 000–000
313 3
Without appropriate means for assessing the similarity between diagrams, inconsistencies are likely to arise or even worst remain undetected when they arise. The inconsistency between UML diagrams increases the chance of errors and potentially wrong similarity values between software designs projects. Another point of concerns is that UML does not impose an ordering relationship between the diagrams, software engineers are inclined to consider a particular type of diagram more with respect to another type of diagram [11]. As an example, engineers considered class diagrams to be prevailing with respect to other diagrams [12]. 3. Similarity Metric The fundamental aspects of Multiview models matching are the similarity metrics that considers the different views of the models to be matched, so that the overlaps between model’s information are best quantified. Towards this aim, we use three types of views of software system: structural view, functional view, and behavioral view. Class diagram are arguably the most use diagram among structure diagrams [12] to capture the static structure of software system. Other diagrams are mostly used mainly during the architectural and design phase to express artifacts at different design levels [4]. Therefore, the structural similarity presented in this paper relies on the information contained in class diagrams. Use case diagrams are usually used to capture the functionalities of software system. They describe the typical interactions between users and the software systems. During the requirement phase, use case can be taken to the next level by providing more formal refinement inform of sequence diagrams. Each use case diagrams is realized by one or more sequence diagrams which depicts how objects interact and work together to provide services [4]. Therefore, our functional similarity relies on the information contained in sequence diagrams. The behavioral views of software system are mostly capture using state machine diagrams. The diagram represents the system from two different levels: system level and object level. In system level the state machine diagram is used to show the system behavior in response to user actions, while at the object level they show the dynamic behavior of objects. Only state machine diagram is used in the requirement phase to show the flow of events within or between objects, other behavioral diagrams are mostly used during architectural and design phase to express artifacts at different design phases [4]. Therefore, considering the development of state machine the behavioral similarity presented is based on the information contained in state machine diagrams. 3.1. Fundamental Assumption and Similarity Measure Let S1 and S2 be two set of software, supposed that each of the software contained multiple-views say V1, V2, V3, …, Vn, where Vj denoted the views for software Si denoted as Vj(Si). Therefore, the similarity of two set of software considering multiple-views could be computed in the following steps: • The similarity of independent view • The similarity score was then scaled as a weighted sum of independent views • An inconsistency penalty (C) which take into account the conflicting mapping of entities in different views [13]. Fig. 2 shows the hierarchy of Multiview similarity assessment method. Next, we discuss how the similarity of independent view is computed. 3.2. Structural Similarity Computation The structural similarity assessment technique is based on class diagrams, and two forms of similarity assessment were presented: lexical similarity assessment and relationship similarity. The lexical similarity is used to measure the concepts similarity between the compared elements (such as class names, attributes names, and method names) and is referred as concept name similarity CSim. On the other hand, the relationship similarity is used to measure the similarity of the compared elements’ structural relationships (e.g. associations, aggregations). The former is based on substring matching technique in which the similarity between two strings is computed using edit distance, which is minimal cost of operations required to applied on one string in order to obtained another string [14]. The latter on the other hand are compared using the similarity information presented in Table 1, which is adapted from [15]. The
314 4
Alhassan Adamu et al. / Procedia Computer Science 124 (2017) 311–318 Alhassan Adamu and Wan Mohd Nazmee Wan Zainon / Procedia Computer Science 00 (2018) 000–000
entries of Table 1 were obtained by applying questionnaires to the experts, and it indicate the cost required to transformed on type of relationship to another (e.g. the cost of transforming Association to Dependency is 0.4). The structural similarity can be computed as a composition of concepts similarity and relationship similarity as shown in (2).
Fig. 2: Multiview similarity assessment approach Table 1. Lookup table for similarities between relationships in class diagram REL AS AG CO DE GE RE IN NO
AS AG COM DE GE RE IN NO 0.00 0.11 0.11 0.45 0.45 0.66 0.77 1.00 0.11 0.00 0.11 0.45 0.45 0.66 0.77 1.00 0.11 0.11 0.00 0.45 0.45 0.66 0.77 1.00 0.49 0.49 0.49 0.00 0.28 0.21 0.32 1.00 0.49 0.49 0.49 0.28 0.00 0.49 0.6 1.00 0.83 0.83 0.83 0.34 0.62 0.00 0.11 1.00 1.00 1.00 1.00 0.51 0.79 0.17 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 AS:ASSOCIATION, AG:AGGREGATION, CO:COMPOSITION, DE:DEPENDENCY, GE:GENERALIZATION, RE:REALIZATION, IN: INTERFACCE REALIZATION, NO: NO RELATION
Struc _ Sim(C1 , C 2 ) 1 *CSim(C1 , C 2 ) (11 )* RSim (C1 , C 2 )
(1)
As shown in formula (1), 1 0,1 is a weight factor that determines the relative importance of concept name similarity and its relationship similarity, 3.3. Functional Similarity Computation The functional similarity computation is inspired from [16, 17], in which sequence diagrams are converted to equivalent graph representation called Message-Object-Order-Graphs (MOOG). The graphs are directed graphs in which have nodes at each location where events occur (i.e. message sending or receiving). The edges of the graph denote the flow of events messages between objects or the flow of time inside each object and they are represented
Alhassan Adamu et al. / Procedia Computer Science 124 (2017) 311–318 Alhassan Adamu and Wan Mohd Nazmee Wan Zainon / Procedia Computer Science 00 (2018) 000–000
315 5
as message edges or time edges. The similarity between two sequence diagrams is computed by comparing the adjacency matrices of their MOOGs. The matrix entries indicate the type messages in sequence diagrams. 3.4. Behavioral Similarity Computation State machine diagram can be converted to labelled directed graph in which every state (except final) in the state machine represents the node in the graph and the transition between states represents the edges of the graph. All final states are represented by single node. The graph contained four types of edges: initial edges (IE) represents a transition from initial state, transition edges (TE) represents transition between states in state machine, null edges (NE) represents transition from composite state to sub states, and final edges (FE) represents transition from any state to final state. The transition matrix of state machine diagrams form of the basis of the comparison between two state machine diagrams using the similarity measure in (2). Behv _ Sim( Ms, Mt ) min_ total _ sim
s
t
SF (Tm _ s (i, j ), (Tm _ t (i, j ))
i 1 j 1
np
um(t ) um( s ) s
(2)
Ms and Mt are two state machine diagrams to be compared, Tm_s and Tm_t are the transition matrix of s and t respectively, ∂ is a weight that determined how the unmapped states affect the degree of similarity. 3.5. Multiview Similarity Computation Rather than the similarity assessment to be performed in a single view, the Multiview computes the similarity of software system as an aggregation of individual software views into one single similarity measure. However, a similarity measure technique which merely computes Multiview similarity as a weighted sum of single view similarity values would produce wrong aggregate similarity value [8]. Consequently, we propose the simultaneous Multiview similarity assessment technique. This type of similarity assessment method computes the similarity of UML models by mapping all the UML models entities simultaneously rather than independently. The overall similarity is scaled by inconsistency penalty factor C. The Multiview similarity measure is given in (3). 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑣𝑣− 𝑆𝑆𝑆𝑆𝑆𝑆(𝑆𝑆𝑖𝑖 , 𝑆𝑆𝑗𝑗 ) = 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(∑((𝑤𝑤1 ∗ 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑐𝑐− 𝑆𝑆𝑆𝑆𝑆𝑆), (𝑤𝑤2 ∗ 𝐹𝐹𝐹𝐹𝐹𝐹𝑐𝑐− 𝑆𝑆𝑆𝑆𝑆𝑆), (𝑤𝑤3 ∗ 𝐵𝐵𝐵𝐵ℎ𝑎𝑎𝑣𝑣− 𝑆𝑆𝑆𝑆𝑆𝑆)) ∗ (1 + 𝐶𝐶)
(3)
C = (number of unmatched objects in Struc_sim and Func_sim or Behv_sim)/size(Struc_sim)
(4)
It can be seen that struc_sim, func_sim, and hehv_sim are structural, functional, and behavioral similarity respectively and they represent the individual view of the software systems. w1, w2 w3 were constants that determine the relative important of structural, functional, and behavioral similarity scores respectively. wiє [0, 1] such that w1 + w2 + w3= 1. The consistency penalty factor C can be calculated as (4):
unmatched objects in struc_sim and func_sim represented the number of classes that do not appear in sequence diagrams. 4. Experiment This section presents experiment to evaluate the Multiview similarity assessment technique and compare the single view assessment with Multiview technique. Towards this aim, a retrieval system was implemented using Matlab® computing language. Basically, reuser presented a query (new requirement) to the reuse system in XMI format, the system computes the similarity between query and repository projects and returned the list of ranked repository projects to the reuser. We determined how the proposed method is able to correctly mapped UML diagrams (class, sequence, and state machine diagram) in query Q to the UML diagrams in repository R.
Alhassan Adamu et al. / Procedia Computer Science 124 (2017) 311–318 Alhassan Adamu and Wan Mohd Nazmee Wan Zainon / Procedia Computer Science 00 (2018) 000–000
316 6
4.1. Experimental Data Table 2 shows a query containing class, sequence, and state machines diagram. The query data was obtained from undergraduate student’s projects. A repository of six projects containing class diagrams, sequence diagrams, and state machine diagrams were created using the query in Table 2. The projects were created by randomly adding, changing, and/or deleting class diagrams relationship; sequence diagrams messages and state machines transitions. For example, generalization relationship can be change to composition relationship. The queries were obtained from undergraduate student projects. Table 2: Description of query used Query Q1 Q2 Q3 Q4
Description ATM Booking system Advertisement Course registration system
#class diagrams
#Sequence diagrams
1 1 1 1
5 3 17 15
#state machine diagrams 1 3 4 2
The number of messages in the sequence diagram ranges from 1 to 15 while the number of transition in state machine diagrams ranges from 5 to 41, the number of classifiers in the class diagrams ranges from 3 to 22 and the number of relationship between classifies ranges from 3 to 38. 4.2. Result and Discussions The retrieval quality was measured using Mean Average Precision (MAP), a measured commonly used to evaluate information retrieval system. Average precision (AP) for a given query was obtained using precision values calculated at each point whenever a new project was retrieved (i.e. precision = 0 for each of the relevant project that is not retrieved). The Mean Average Precision for a set of queries was the mean of the AP scores for each query, also referred as mean precision at seen relevant projects [18]. The formula is given in (5). 𝑁𝑁
𝑄𝑄𝑗𝑗
𝑗𝑗=1
𝑖𝑖=1
1 1 𝑀𝑀𝑀𝑀𝑀𝑀 = ∑ ∑ 𝑃𝑃(𝑟𝑟𝑟𝑟𝑟𝑟 = 𝑖𝑖) 𝑁𝑁 𝑄𝑄𝑗𝑗
(5)
N is the number of queries, Qj is the number of relevant documents for query j and P(rel=i) is the precision at the ith relevant document. a
b
Fig. 3. (a) Mean of MAP of single view and Multiview similarity assessment method; (b) Mean of MAP of 2-views and 3-views
Alhassan Adamu and Wan Mohd Nazmee Wan Zainon / Procedia Computer Science 00 (2018) 000–000 Alhassan Adamu et al. / Procedia Computer Science 124 (2017) 311–318
7 317
α of (1) is set to 0.05, of (2) is set to 0.05 while w1, w2, w3 of (3) were set to 0.05. The values of MAP for individual similarity assessment methods and Multiview similarity are shown in Fig. 3(a). The MAP for single assessment method with 2-views and 3-views is shown in Fig. 3(b). It can be observed from Fig. 3(a) that Multiview similarity using all the methods (struc_sim+funct_sim+behv_sim) results in better MAP than the individual single similarity assessment methods. The Multiview similarity assessment was able to return to the most relevant project as the top ranking even when the query and the relevant repository projects differ significantly. Fig. 3(b) shows that the 3-views (struc_sim+funct_sim+behv_sim) outperform any combination of the 2-views with MAP of 92%. It can also be examined that the combination of structural and functional similarity assessment methods produces the worst MAP of all the two methods. 5. Conclusion This paper focused on matching of UML diagrams from multiple views. The paper described an effective way of computing the similarity between UML diagrams from structural, functional, and behavioral views. The Multiview similarity is an aggregation of independent views, and is then scaled using a factor called inconsistency penalty which handle the conflicting mapping among the different views. Experimental results show that the Multiview similarity outperformed the single view. The goal of the proposed approach is to help software engineers to retrieved existing software system stored in the repository and reuse the existing software to developed new software application. Currently, our Multiview similarity assessment technique relies only on class, sequence, and state machine diagrams, in the future we intend to include other diagrams in order to improve the matching accuracy. Acknowledgements This work was supported by the Ministry of Higher Education of Malaysia, under the Fundamental Research Grant Scheme (FRGS: 203/PKOMP/6711533). References [1]
Adamu and W. M. N. W. Zainoon, "A Framework for Enhancing the Retrieval of UML Diagrams," Software Reuse: Bridging with Socialawareness, G. Kapitsaki and E. Santana de Almeida (Eds.), ICSR 2016, LNCS 9679, 2016, pp. 384-390.
[2]
A. Torres, R. Galante, and M. S. Pimenta, "A synergistic model-driven approach for persistence modeling with UML," Journal of Systems and Software, vol. 84, pp. 942-957, 2011.
[3]
H. O. Salami and M. A. Ahmed, "UML artifacts reuse: state of the art," arXiv preprint arXiv:1402.0157, 2014.
[4]
M. Ahmed, "Towards the development of integrated reuse environments for UML artifacts," in ICSEA 2011, The Sixth International Conference on Software Engineering Advances, 2011, pp. 426-431.
[5]
O. OMG, "Unified Modeling Language (OMG UML), superstructure," version 2.4. 1. Tech. rep., Object Management Group2011.
[6]
F. J. Lucas, F. Molina, and A. Toval, "A systematic review of UML model consistency management," Information and Software Technology, vol. 51, pp. 1631-1645, 2009.
[7]
S. Paydar and M. Kahani, "A semi-automated approach to adapt activity diagrams for new use cases," Information and Software Technology, vol. 57, pp. 543-570, 2015.
[8]
H. O. Salami and M. Ahmed, "A framework for reuse of multi-view UML artifacts," arXiv preprint arXiv:1402.0160, 2014.
[9]
A. Adamu and W. M. N. W. Zainon, "Matching and Retrieval of State Machine Diagrams from Software Repositories Using Cuckoo Search Algorithm," in International Conference on Information Technology Al Zaytoonah University of Jordan, Amman, Jordan, 2017.
[10] J. Muskens, R. J. Bril, and M. R. Chaudron, "Generalizing consistency checking between software views," in Software Architecture, 2005. WICSA 2005. 5th Working IEEE/IFIP Conference on, 2005, pp. 169-180. [11] C. F. Lange and M. R. Chaudron, "Experimentally investigating effects of defects in UML models," ed: CS-Report, 2005. [12] H. O. Salami and M. Ahmed, "Class Diagram Retrieval Using Genetic Algorithm," in Machine Learning and Applications (ICMLA), 2013 12th International Conference on, 2013, pp. 96-101. [13] R. A. Rufai, "New structural similarity metrics for UML models," Citeseer, 2003. [14] V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," in Soviet physics doklady, 1966, pp. 707-710.
318 8
Alhassan Adamu et al. / Procedia Computer Science 124 (2017) 311–318 Alhassan Adamu and Wan Mohd Nazmee Wan Zainon / Procedia Computer Science 00 (2018) 000–000
[15] K. Robles, A. Fraga, J. Morato, and J. Llorens, "Towards an ontology-based retrieval of UML Class Diagrams," Information and Software Technology, vol. 54, pp. 72-86, 2012. [16] H. O. Salami and M. Ahmed, "Retrieving sequence diagrams using genetic algorithm," in Computer Science and Software Engineering (JCSSE), 2014 11th International Joint Conference on, 2014, pp. 324-330. [17] W.-J. Park and D.-H. Bae, "A two-stage framework for UML specification matching," Information and Software Technology, vol. 53, pp. 230-244, 2011. [18] S. Teufel, "An overview of evaluation methods in TREC ad hoc information retrieval and TREC question answering," in Evaluation of text and speech systems, ed: Springer, 2007, pp. 163-186.