Pattern Recognition Letters 24 (2003) 875–886 www.elsevier.com/locate/patrec
A rough–fuzzy approach for retrieval of candidate components for software reuse D. Vijay Rao, V.V.S. Sarma
*
Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India
Abstract Software reuse means reusing the inputs, the processes, and the outputs of previous software development efforts. Software reuse is a means toward an end: improving software development productivity and software product quality. Effective management of a large set of reusable components requires well-defined structures and processes. Without these, the reuse repository effectively becomes a write-only storage medium. The repository of reusable components is the link between development for reuse, where the components are produced, and development with reuse, where the components are reused. In this paper, we present a novel approach for selecting candidate components used in various previous projects for reuse in the current application. The methodology is based on rough–fuzzy sets, and a decision support tool based on this methodology is implemented. This method reduces the search domain and hence does a more efficient retrieval than the existing methods. 2002 Elsevier Science B.V. All rights reserved. Keywords: Software reuse; Software component repository; Component retrieval; Rough–fuzzy hybridization; Case-based reasoning; Decision support tool
1. Introduction The term ‘‘software crisis’’, which came into vogue in the 1960s and 1970s, was based on the observation that most software development projects end up with massive cost overruns and schedule delays. The focus of research in software engineering at that time was in planning and control of software projects (Basili and Musa, 1991). The growing complexity of software projects led to the emergence of phase-based lifecycle
*
Corresponding author. E-mail address:
[email protected] (V.V.S. Sarma).
models such as the waterfall model and the spiral model (Boehm, 1988). In 1980s, with the decline of hardware costs and widespread use of software in industry and business, the importance of productivity in software development increased significantly. In 1990s, the customer focus has shifted to high quality, consequent on the ever-increasing dependence of the modern society on software. Software reuse is the process of creating new software systems from existing software entities rather than building them from scratch. Reuse has been proposed as a key method for overcoming the software crisis and improving the software quality and productivity. Reusability is the degree to which a software artifact can be reused (Poulin, 1997;
0167-8655/03/$ - see front matter 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 2 ) 0 0 1 9 9 - X
876
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
Jacobson et al., 1997). To achieve significant payoffs a reuse program must be systematic (Frakes and Isoda, 1994). Software reuse can apply to any lifecycle product and is not limited to the reuse of code (Reifer, 1997; Karlsson, 1995). Developers may reuse requirements documents, system specifications, design structures, and other development aspects (Barnes and Bollinger, 1991). In addition to these lifecycle products, processes (such as the waterfall model of software development and the SEI capability maturity model), and knowledge are also potentially reusable. Krueger’s (1992) taxonomy characterizes each reuse approach in terms of its reusable artifacts and the way these artifacts are abstracted, selected, specialized, and integrated. Weide et al. (1991) propose a framework to describe the general model of software structure with abstract and concrete components. The proposed model leads to a components industry, and considers a componentbased software engineering (CBSE) that parallels that of traditional engineering disciplines (Ning, 1996). A critical step in increasing software reuse is to recognize that a new division of labour is required, one in which component developers create reusable components and product developers compose products from these components. Cleaveland et al. (1996) introduce a new term ‘‘multiuse’’ to imply design for multiple uses. Multiuse has three major principles: 1. Components are designed for a range of products, rather than a single product. 2. Products are designed with reuse in mind, by favoring a compositional, rather decomposition approach to design. 3. A distinction is made between the common parts of a component, which are not touched are salvaged, and variable parts, which can be customized for a range of products. Product development with reuse pays off in the course of time in terms of reduced development times and costs (Vijay Rao, 2001). This benefit is to be traded off against the initial investment of additional time and resources to design reusable components and the increased testing time for components that are reused.
2. Component-based reuse: a new approach Table 1 presents a software component’s description as stored in a database. Its general structure is shown in Table 2. A large database of several software components is under development. A case-database containing the history of reusable components has been maintained at the Defence Research and Development Organisation (DRDO) laboratory, Center for Aeronautical Systems Studies and Analyses (CASSA) for several software projects on war-gaming and simulation built over the years. Existing approaches to software component retrieval cover a wide spectrum of component encoding methods and search or matching algorithms. These approaches strike different balances between complexity and cost on one hand and retrieval quality on the other, are classified by increasing order of complexity: • text-based ‘‘encoding’’ and retrieval, • lexical descriptor-based, • specification-based encoding and retrieval (Mili et al., 1995). Examples (a) Text-based retrieval: ‘‘Sort’’, ‘‘Sort þ Missions’’, ‘‘Sort þ Mission Planning’’ may be used as key words to retrieve relevant components from the database. Each of the searches with different key words yields a different set of components that are subsets of the actual set to be retrieved. For example, the first query yielded eight components, the second retrieved six components, whereas the third resulted in four components. (b) Lexical descriptor-based retrieval: Each component Q is assigned a key phrase, K that says what the component is about, e.g., IsAbout(K,Q), as in IsAbout(Sorting, Q1), IsAbout(Mission_Planning, Q2). Other possible key phrases are OperatesOn(Mission_Plans, Q3) and Uses(Array, Q4). OperatesOn, and Uses are descriptors of the components stored as keywords in the database. (c) Specification-based retrieval: Associated with every component Q is a signature (static), Qsig,
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886 Table 1 Information of a component stored in a case-base No.
Attribute
Data for component Mission_Plan
1 2 3 4 5
Component name Alternative name Size (KLOC) Language used Platform developed on Interface(s) with arguments
MPlan_Sort Sort_Mission_Plans 2.5 Visual Basic MS-Windows
6
7 8
11 12 13
Sample usage Complexity (McCabe’s measure) Functional usage profile Criticality in application Domain Designed for reuse Specifications
14 15 16
Certification Test plan/report Known bugs
17
Limitations
18 19
Quality hPrei Specification hPosti conditions
20 21
History of component’s reuse Remarks
22 23
Effort Developed by
24
Distribution/ availability Price Performance
9 10
25 26
Mplan_Sort(int MNo, char Mtype, int M_ETD); Sort_Mission_Plans(int MNo, char MType, int M_ETD); Insert_Mission(int MNo, char MType); Delete_Mission(int MNo, char MType); Modify_Mission(int MNo, char MType); View_Mission(int MNo, char MType); Print_Mission(int MNo, char MType) MPlan_Sort(528, Strike, 0800) 7 High High Product-line (PL) Yes Component sorts the mission plans in war-game applications Yes File link to TPlan_MPlan_Sort Fails for negative mission numbers Component designed for 800 missions High hUnsorted list of missionsi MPlan_Sort hSorted list of missionsi File link to History_MPlan_Sort Component is an array-based implementation 12 person-weeks DVR, GKM, CHS, SRD in May 1991 DRDO laboratories; and others on request Developed in-house Performs well upto 680 missions
877
Table 1 (continued) No.
Attribute
Data for component Mission_Plan
27
Algorithmic complexity Resource utilization Test coverage Inheritance/ dynamic usage Data structures: Flexibility and interoperability
nlogn
28 29 30 31 32
High File link to TPlan_MPlan_Sort None Static [1.800] High
and a specification of its behavior (dynamic) Qspec. Qsig: MatchðQ; Q0 Þ ¼ Match ðReturn type and parameters of Q and Q0 Þ Qspec: MatchðQ; Q0 Þ ¼ hUnsorted arrayi Mplan Sort hSorted arrayi Int array2½ ¼ Mplan Sortðint array1½ ; char; intÞ
With text-based encoding and retrieval, the textual representation of a component is used as an implicit functional descriptor. Arbitrarily complex string search expressions supplied by the reuser are matched against the textual representation. With lexical descriptor-based encoding, each component is assigned a set of key phases that tell what the component is about. Specification-based encoding and retrieval comes closest to achieving full equivalence between what a component is and does and how it is encoded. With text and lexical descriptor-based methods, retrieval algorithms treat queries and codes as mere symbols, and any meaning assigned to queries, component codes, or the extent of match between them is external to the encoding language. Further, being natural language-based, the codes are inherently ambiguous and imprecise. By contrast, specific languages have their own semantics within which the fitness of a component to a query can be formally interpreted. Mili et al. (1997) describe a method for organizing and retrieving software components that uses relational specifications of program and refinement ordering between them. Damiani et al. (1999) describe a hierarchy-aware classification schema for object-oriented code,
878
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
Table 2 Information of a component stored in a case-base No.
Attribute
Search method
1 2 3 4 5
Component name Alternative name Size (KLOC) Language used Platform developed on Interface(s) with arguments Sample usage Complexity Functional usage profile Criticality in application Domain Designed for reuse Specifications Certification Test plan/report Known bugs Limitations Quality hPrei Specification hPosti conditions History of component’s reuse Remarks Effort (pm) Developed by Distribution/ availability Price Performance Algorithmic complexity Resource utilization Test coverage Statement coverage Branch coverage Definition-use Inheritance/ dynamic usage Data structures:
[String]: [Text; Lexical] [String]: [Text; Lexical] [Num]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification]
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31
32
Static and bounds/limits Dynamic and bounds/limits Flexibility and interoperability
[String]: [Lexical; Specification] [String]: [Lexical; Specification] [Num]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification] [Text]: [Lexical; Specification] [Text]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification] [Text]: [Lexical; Specification] [Text]: [Lexical; Specification] [Num]: [Lexical; Specification] [Text]: [Lexical; Specification] [Text]: [Lexical; Specification] [Num]: [Lexical; Specification] [String]: [Lexical; Specification] [Num; String]: [Lexical; Specification] [String]: [Lexical; Specification] [String]: [Lexical; Specification]
where software components are classified according to their behavioral characteristics, such as provided services, employed algorithms, and needed data. In most situations, the designer faces two major issues: how to deal with unknown problem features, and how to make decisions in the presence of these unknowns. Our methodology views retrieval and selecting components from a casebased repository as a decision problem. If the case (component) retrieved is very similar to the problem being solved (selecting a component needed for reuse), adaptation and testing will be easy. case retrieval process. In the case study of Section 4, in Table 4, different applications of the reuse of component C are shown. Observe that for the same attributes in different applications such as C2 and C6 , the judgment regarding the component reuse is different and this gives rise to roughness in the decision. The attributes themselves are fuzzy linguistic variables (such as low, medium, high), and hence rough–fuzzy sets provided an appropriate modeling framework. Rough–fuzzy sets (Dubois and Prade, 1990, 1992) are used as a basis for decision-making in this context. In this paper, a CBR system is proposed as a repository of reusable software components. Every reusable component with its design attributes and specifications is used as a case in a CBR system. The retrieval of components is based on their match in behavior/specification/reuse history. The component match in several applications is classified using rough–fuzzy sets, based on the similarity of the attributes. A decision based on rough–fuzzy sets is applied to the history of a component’s use in different applications to determine its potential reuse in the current application.
[String]: [Lexical; Specification] [Num; String]: [Lexical; Specification]
[Text]: [Lexical; Specification]
3. A rough–fuzzy set methodology to the component retrieval problem In any classification task the aim is to form various classes where each class contains objects that are not noticeably different. These indiscernible or non-distinguishable objects can be viewed as basic building blocks (concepts) used to build up a knowledge base about the real world. For
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
example, if the objects are classified according to color (red, black) and shape (triangle, square, circle), then the classes are red triangle, red square, red circle, black triangle, black square, black circle. Thus, these two attributes make a ‘partition’ in the set of objects and the universe becomes coarse. But if two red triangles of different sizes (small, big) belong to different classes, then it is not possible to correctly classify these two red triangles based on the two attributes of color and shape. This kind of uncertainty is referred to as rough uncertainty (Pawlak, 1982; Pawlak et al., 1995). The rough uncertainty is formulated in terms of rough sets. Fuzzy sets, a generalization of classical sets, is a mathematical tool to model the vagueness present in the human classification mechanism. In the classical set theory, an element of the universe either belongs to or does not belong to a set (i.e. crisp belongingness). In fuzzy sets, the belongingness of the element can be anything in between ‘yes’ or ‘no’ (Klir and Folger, 1988; Zadeh, 1965, 1978). In fuzzy sets, each granule of knowledge can have only one membership value to a particular class. Rough sets assert that each granule may have different membership values to the same class. Fuzzy sets deal with overlapping classes and fine concepts; whereas rough sets deal with nonoverlapping classes and coarse concepts. In a classification task, the indiscernibility relation partitions the input pattern set to form several equivalence classes. These equivalence classes try to approximate the given output class. When this approximation is not proper, the roughness is generated. However, the output classes (concepts) may have ill-defined boundaries. Thus, both roughness and fuzziness appear here due to the indiscernibility relation in the input pattern set (attributes) and the vagueness in the output class (decisions) respectively. To model such situations where both vagueness and approximation are present, rough–fuzzy sets are proposed (Dubois and Prade, 1992).
3.1. Rough–fuzzy sets Let X be a set and R be an equivalence relation defined on X and the output class Dc X be a
879
fuzzy set. A rough–fuzzy set (Dubois and Prade, 1992; Sarkar, 1998) is a tuple RðDc Þ ¼ hRðDc Þ; RðDc Þi;
ð1Þ
where the lower approximation RðDc Þ and the upper approximation RðDc Þ of Dc are fuzzy sets of X =R, with their corresponding membership functions defined by lRðDc Þ ð½X R Þ ¼ inf lDc ðxÞjx 2 ½X R 8x 2 X ; ð2Þ and
lRðDc Þ ð½X R Þ ¼ sup lDc ðxÞjx 2 ½X R
8x 2 X : ð3Þ
The rough–fuzzy membership function of a component x 2 X for the fuzzy output class Dc X is defined by lDc ðxÞ ¼
jF \ Dc j ; jF j
ð4Þ
where F ¼ ½x R and jDc j means the cardinality of the fuzzy set Dc and is defined as X lDc ðxÞ: ð5Þ jDc j ¼ x2X
The intersection operation on A and B is defined as lA\B ðxÞ ¼ min flA ðxÞ; lB ðxÞg;
8x 2 X :
ð6Þ
4. A case study Of the 32 attributes in the information table (Table 2), we selected 8 attributes of the past applications of component C (specifications, adaptability for reuse, domain, quality, functional usage, criticality, resource utilization, and certified) (Table 3). The selection of an optimal subset of features is a non-trivial problem in pattern recognition. The choice at the moment is purely subjective. Each of these features are searched, matched, and retrieved by the CBR (Fig. 1). The applications of a component C labeled C1 to Cn that match the attributes of the current application
880
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
Table 3 Attributes considered for rough–fuzzy sets and their descriptions Attribute
Linguistic (match) variable
Description
1
Specification
Low Medium High
More than 70% of functional requirements are met 30–70% of functional requirements are met Less than 30% functional requirements are met
2
Quality
Low Medium High
Defect density >10 defects/KLOC Defect density 5–10 defects/KLOC Defect density <5 defects/KLOC
3
Functional usage
Low Medium High
Called <15 times Called 15–45 times Called >45 times
4
Criticality
Low Medium High
Negligible effect on system functioning Degraded performance (settings/probability values) Halts the system from working, catastrophic failures
5
Domain
PL BC AF MD
Product-line Business/commercial Allied field Miscellaneous domain
6
Adaptable for reuse
Low Medium High
>10 person-weeks to adapt 4–10 person-weeks to adapt <4 person-weeks to adapt
7
Resource utilization
Low Medium High
8
Certified
Yes No
Component is certified If not certified
As-is (D1 ) Adapt-comp (D2 ) Adapt-specs (D3 ) New (D4 )
Reused with no additional effort to adapt Reused with change in code Reused by changing the requirement specification Develop afresh
Reuse decision:
Fig. 1. Case (component) retrieval as a decision process based on rough–fuzzy sets.
are retrieved. The decision that was taken for reusing the component C in an application is shown
in the ‘‘reuse/adapt decision’’ column denoting the level of reuse. Consider a search query for
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
component’s application history that resulted in 10 matches, C1 –C10 . This search with 10 likely candidate applications to choose from, are displayed by the CBR. The developer who wants to reuse the component in the present application has to decide on a past application(s) that best matches with the present application. These candidate ap-
881
plications (obtained by the CBR search) are stored in the information table with decisions on the level of reuse (Table 4) (Fig. 2) as (a) as is reused (as-is) (b) adapting the component (adapt-comp) (with minor code modification)
Table 4 Information table for 10 applications (from CBR search results) Application
Specs match
Adapt for reuse
Domain
Quality
Functional usage
Criticality
Resource utilization
Certified Reuse/adapt in application
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
High High Medium Medium Low High Low High High High
High High High High Low High High Low Low Low
PL AF PL PL BC AF MD AF AF AF
High High Medium Medium Medium High High High High High
High High High High Medium High High High High High
High Medium Medium Medium Medium Medium Medium Medium Medium Medium
Low Low Low Low Low Low High Low Low Low
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Fig. 2. Software components retrieval process.
As-is Adapt-comp As-is Adapt-comp New As-is Adapt-specs Adapt-comp Adapt-comp Adapt-comp
882
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
(c) adapting the specification (adapt-specs) (with design modification) (d) develop afresh (new). In the information table, roughness occurs because of different decisions on the level of reuse taken for two similar applications; whereas fuzziness in the decision (on the level of reuse) occurs due to overlapping, ill-defined boundaries. Table 5 shows the judgment of a software developer regarding the level of reuse of component C in 10 applications. If it is a crisp decision that the component should be reused ‘as-is’, then D1 ¼ 1, D2 ¼ 0, D3 ¼ 0, D4 ¼ 0. If his decision is ambiguous, he would favor each possible decision with a fuzzy membership value such as D1 ¼ 0:9, D2 ¼ 0:3, D3 ¼ 0:2, D4 ¼ 0:1, which reflects the ambiguity in his judgment about the reuse of the component in the database. In general, such decisions are usually seen to be fuzzy. With a reuse history of a component in 10 past applications in the case-base, using rough–fuzzy sets we compute the lower and upper membership values of similar applications. Equivalent classes for 10 applications that have same attributes are ffC1 g; fC2 ; C6 g; fC3 ; C4 g; fC5 g; fC7 g; fC8 ; C9 ; C10 gg:
For decision D1 (as-is reuse) the membership values of the equivalence classes are calculated using Eqs. (2) and (3) and Table 5. lRðD1 Þ ð½X R Þ ¼ inf lD1 ðfC1 gÞ ¼ 0:9 n o ¼ inf lðD1 Þ ðfC2 ; C6 gÞ
¼ supð0:3; 0:8Þ ¼ 0:8 n o ¼ sup lðD1 Þ ðfC3 ; C4 gÞ ¼ supð0:9; 0:6Þ ¼ 0:9 n o ¼ sup lðD1 Þ ðfC5 gÞ ¼ 0:1 n o ¼ sup lðD1 Þ ðfC7 gÞ ¼ 0 n o ¼ sup lðD1 Þ ðfC8 ; C9 ; C10 gÞ ¼ supð0:3; 0:3; 0:3Þ ¼ 0:3:
ð8Þ
Therefore, lRðD1 Þ ð½X R Þ ¼ ð0:9; 0:8; 0:6; 0:1; 0; 0:3Þ. Similarly, we calculate for decisions D2 , D3 , and D4 to get lRðD1 Þ ð½X R Þ ¼ ð0:9; 0:3; 0:6; 0:1; 0; 0:3Þ lRðD1 Þ ð½X R Þ ¼ ð0:9; 0:8; 0:9; 0:1; 0; 0:3Þ; lRðD2 Þ ð½X R Þ ¼ ð0:3; 0:3; 0:4; 0:2; 0:2; 0:9Þ lRðD2 Þ ð½X R Þ ¼ ð0:3; 0:9; 0:6; 0:2; 0:2; 0:9Þ; lRðD3 Þ ð½X R Þ ¼ ð0:2; 0:2; 0:3; 0:3; 0:9; 0:3Þ lRðD3 Þ ð½X R Þ ¼ ð0:2; 0:3; 0:3; 0:3; 0:9; 0:3Þ; lRðD4 Þ ð½X R Þ ¼ ð0:1; 0:1; 0:1; 0:9; 0:3; 0:1Þ lRðD4 Þ ð½X R Þ ¼ ð0:1; 0:1; 0:1; 0:9; 0:3; 0:1Þ: We need to determine those components that match the application’s attributes and hence Table 5 Judgment of a software developer regarding reuse of a component (10 applications)
¼ infð0:3; 0:8Þ ¼ 0:3 n o ¼ inf lðD1 Þ ðfC3 ; C4 gÞ ¼ infð0:9; 0:6Þ ¼ 0:6 n o ¼ inf lðD1 Þ ðfC5 gÞ ¼ 0:1 n o ¼ inf lðD1 Þ ðfC7 gÞ ¼ 0 n o ¼ inf lðD1 Þ ðfC8 ; C9 ; C10 gÞ ¼ infð0:3; 0:3; 0:3Þ ¼ 0:3:
lRðD1 Þ ð½X R Þ ¼ sup lD1 ðfC1 gÞ ¼ 0:9 n o ¼ sup lðD1 Þ ðfC2 ; C6 gÞ
ð7Þ
Therefore, lRðD1 Þ ð½X R Þ ¼ ð0:9; 0:3; 0:6; 0:1; 0; 0:3Þ.
Application
As-is (D1 )
Adaptcomp (D2 )
Adaptspecs (D3 )
New (D4 )
Actual decision taken
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
0.9 0.3 0.9 0.6 0.1 0.8 0 0.3 0.3 0.3
0.3 0.9 0.4 0.6 0.2 0.3 0.2 0.9 0.9 0.9
0.2 0.3 0.3 0.3 0.3 0.2 0.9 0.3 0.3 0.3
0.1 0.1 0.1 0.1 0.9 0.1 0.3 0.1 0.1 0.1
D1 D2 D1 D2 D4 D1 D3 D2 D2 D2
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
883
Table 6 Rough–fuzzy membership values for decisions (10 applications)
compute the lower and upper approximation of the rough–fuzzy set (information table) for the decisions denoting the level of reuse. The elements of the decisions (concepts) are D1 ¼ fC1 ; C3 ; C6 g D2 ¼ fC2 ; C4 ; C8 ; C9 ; C10 g D3 ¼ fC7 g D4 ¼ fC5 g: This is needed to determine the level of reuse of the component in a new application. Consider four new applications N1 , N2 , N3 , N4 with the eight attributes enumerated as
No.
Equivalence classes
lD1
lD2
lD3
lD4
1 2 3 4 5 6
C1 C2 , C6 C3 , C4 C5 C7 C8 , C9 , C10
0.9 0.55 0.75 0.1 0 0.3
0.3 0.6 0.5 0.2 0.2 0.9
0.2 0.25 0.3 0.3 0.9 0.3
0.1 0.1 0.1 0.9 0.3 0.1
N1 N2 N3 N4
C2 , C6 – C3 , C4 C8 , C9 , C10
0.55 – 0.75 0.3
0.6 – 0.5 0.9
0.25 – 0.3 0.3
0.1 – 0.1 0.1
CD1 ¼ fðC1 ; 0:9Þ; ðC2 ; 0:3Þ; ðC3 ; 0:9Þ; ðC4 ; 0:6Þ; ðC5 ; 0:1Þ; ðC6 ; 0:8Þ; ðC7 ; 0Þ; ðC8 ; 0:3Þ; ðC9 ; 0:3Þ; ðC10 ; 0:3Þg:
lDc ðC1 Þ ¼ ¼
jfC1 g \ CD1 j jF j P x2X minflC1 ðxÞ; lCD ðxÞg
ð9Þ
1
1 ¼ minflC1 ðC1 Þ; lCD ðC1 Þg þ þ flC1 ðC10 Þ; lCD ðC10 Þg 1
1
¼ minf1; 0:9g þ minf0; 0:3g þ minf0; 0:9g þ minf0; 0:6g þ minf0; 0:1g þ minf0; 0:8g þ minf0; 0g þ minf0; 0:3g þ minf0; 0:3g þ minf0; 0:3g
ð10Þ
¼ 0:9 þ 0 þ 0 þ 0 þ 0 þ 0 þ 0 þ 0 þ 0 þ 0 ¼ 0:9:
N1 : High, High, AF, High, High, Medium, Low, Yes. N2 : High, Low, MD, Low, Low, High, High, Yes. N3 : Medium, High, PL, Medium, High, Medium, Low, Yes. N4 : High, Low, AF, High, High, Medium, Low, Yes. The rough–fuzzy membership function (Eq. (4)) (Sarkar, 1998) is used to compute the membership value of a component’s applications for decisions D1 , D2 , D3 , D4 . Using Table 5, column 1 data, we have In a similar way, the other values are calculated. These are summarized in Table 6. The attributes of
the new application N1 are matched with the past applications. The attributes of N1 match exactly with the equivalent class fC2 ; C6 g. For this new application, we obtain the membership values for the decisions D1 , D2 , D3 , D4 to determine the level of reuse of the component C’s application fC2 ; C6 g from the Table 6. This indicates that the new application N1 ’s requirements are similar to applications C2 and C6 and they can be reused as-is (decision D1 ) with membership value 0.55; and adapt-comp (decision D2 ) with membership value 0.6 adapt-specs (decision D3 ) with membership value 0.25; and new (decision D4 ) with membership value 0.1. Application N2 does not match with any of the existing applications and hence it is decided to develop new
884
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
The decision sets are
(decision D4 ). Similarly for applications N3 and N4 the values are calculated. The results are summarized in Table 6. An important decision that arises at this point is whether the components arising from D2 , D3 , and D4 decisions are introduced into the components case-database, and if they would replace/coexist with the existing (versions of) components. This work is being worked in a separate forthcoming paper. In this paper, for simplicity, we assume that they are all added to coexist with the existing components for future reuse. As the number of applications of component C increase, the level of reuse of components in new applications also changes. Consider the case when 20 applications of the same component C are available in the casebase. The information table with 20 applications is shown in Table 7. The membership values for decisions is shown in Table 8. Equivalence classes for the 20 applications that have the same attributes are
D1 ¼ fC1 ; C3 ; C6 ; C16 ; C18 ; C19 ; C20 g D2 ¼ fC2 ; C4 ; C8 ; C9 ; C10 ; C17 g D3 ¼ fC7 ; C11 g D4 ¼ fC5 ; C12 ; C13 ; C14 ; C15 g: Proceeding similarly as above the lower and upper membership values are calculated and summarized: lRðD1 Þ ð½X R Þ ¼ ð0:9; 0:3; 0:6; 0:1; 0; 0:3; 0; 0; 0; 0Þ lRðD1 Þ ð½X R Þ ¼ ð0:9; 0:8; 0:9; 0:1; 0:3; 0:9; 0:9; 0; 0:9; 0Þ; lRðD2 Þ ð½X R Þ ¼ ð0:3; 0:3; 0:4; 0:2; 0:2; 0:4; 0:1; 0:1; 0:2; 0:1Þ lRðD2 Þ ð½X R Þ ¼ ð0:3; 0:9; 0:6; 0:2; 0:9; 0:9; 0:6; 0:1; 0:3; 0:1Þ; lRðD3 Þ ð½X R Þ ¼ ð0:2; 0:2; 0:3; 0:3; 0:3; 0:2; 0:2; 0:4; 0:2; 0:2Þ lRðD3 Þ ð½X R Þ ¼ ð0:2; 0:3; 0:3; 0:9; 0:9; 0:3; 0:3; 0:4; 0:4; 0:2Þ; lRðD4 Þ ð½X R Þ ¼ ð0:1; 0:1; 0; 0:3; 0:1; 0; 0; 0:9; 0; 0:9Þ lRðD4 Þ ð½X R Þ ¼ ð0:1; 0:1; 0:1; 0:9; 0:3; 0:1; 0:9; 0:9; 0:9; 0:9Þ:
ffC1 g; fC2 ; C6 g; fC3 ; C4 ; C19 g; fC5 ; C11 g; fC7 ; C17 g; fC8 ; C9 ; C10 ; C20 g; fC12 ; C18 g; fC13 g; fC14 ; C16 g; fC15 gg:
Table 7 Information table for 20 applications (from CBR search results) Application
Specs match
Adapt for reuse
Domain
Quality
Functional usage
Criticality
Resource utilization
Certified
Reuse/adapt in application
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
High High Medium Medium Low High Low High High High Low High Low Medium Medium Medium Low High Medium High
High High High High Low High High Low Low Low Low Low Low Low Low Low High Low High Low
PL AF PL PL BC AF MD AF AF AF BC MD BC PL MD PL MD MD PL AF
High High Medium Medium Medium High High High High High Medium Low High Low Low Low High Low Medium High
High High High High Medium High High High High High Medium Low High Low Medium Low High Low High High
High Medium Medium Medium Medium Medium Medium Medium Medium Medium Medium High High Low Medium Low Medium High Medium Medium
Low Low Low Low Low Low High Low Low Low Low High High High High High High High Low Low
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No No No No Yes Yes Yes Yes
As-is Adapt-comp As-is Adapt-comp New As-is Adapt-specs Adapt-comp Adapt-comp Adapt-comp Adapt-specs New New New New As-is Adapt-comp As-is As-is As-is
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886 Table 8 Judgment of a software developer regarding reuse of a component (20 applications) Application
As-is (D1 )
Adaptcomp (D2 )
Adaptspecs (D3 )
New (D4 )
Actual decision taken
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
0.9 0.3 0.9 0.6 0.1 0.8 0 0.3 0.3 0.3 0.1 0 0 0 0 0.9 0.3 0.9 0.9 0.9
0.3 0.9 0.4 0.6 0.2 0.3 0.2 0.9 0.9 0.9 0.2 0.1 0.1 0.2 0.1 0.3 0.9 0.6 0.6 0.4
0.2 0.3 0.3 0.3 0.3 0.2 0.9 0.3 0.3 0.3 0.9 0.2 0.4 0.4 0.2 0.2 0.3 0.3 0.3 0.2
0.1 0.1 0.1 0.1 0.9 0.1 0.3 0.1 0.1 0.1 0.3 0.9 0.9 0.9 0.9 0 0.1 0 0 0
D1 D2 D1 D2 D4 D1 D3 D2 D2 D2 D3 D4 D4 D4 D4 D1 D2 D1 D1 D1
The decision values for the new applications N1 , N2 , N3 , N4 are summarized in Table 9. For N1 applications requirements, the attributes are same as those in equivalence class fC2 ; C6 g and hence on calculating the rough–fuzzy membership function for this class, the support for decision D1 and D2 are 0.55 and 0.6 respectively. These values do not change even after the case-database is Table 9 Rough–fuzzy membership values for decisions (20 applications) No.
Equivalence classes
lD1
lD2
lD3
lD4
1 2 3 4 5 6 7 8 9
C1 C2 , C6 C3 , C4 , C19 C5 , C11 C7 , C17 C8 , C9 , C10 , C20 C12 , C18 C13 C14 , C16
0.9 0.55 0.8 0.1 0.15 0.45 0.45 0 0.45
0.3 0.6 0.54 0.2 0.55 0.78 0.35 0.1 0.25
0.2 0.25 0.3 0.6 0.6 0.28 0.25 0.4 0.3
0.1 0.1 0.066 0.6 0.2 0.075 0.45 0.9 0.45
N1 N2 N3 N4
C2 , C6 – C3 , C4 , C19 C8 , C9 , C10 , C20
0.55 – 0.8 0.45
0.6 – 0.54 0.78
0.25 – 0.3 0.28
0.1 – 0.066 0.075
885
increased to 20 components. N2 application’s attributes do not match any of the equivalence classes of 10 components database and hence the existing case-base does not give any guidance on selecting the component. It is decided to develop it afresh (new) and added to the existing case database. Application N3 changes its membership values for the decisions and the support for as-is reuse increases from 0.75 to 0.8; adapt-comp (decision D2 ) with membership value from 0.5 to 0.54; adapt-specs (decision D3 ) with membership value 0.3 (no change); new (decision D4 ) with membership value decreasing from 0.1 to 0.066. Application N4 changes its membership values for the decisions and the support for as-is reuse increases from 0.3 to 0.45; adapt-comp (decision D2 ) with membership value from 0.9 to 0.78; adaptspecs (decision D3 ) with membership value 0.3 to 0.28; new (decision D4 ) with membership value decreasing from 0.1 to 0.075. For new applications, as the case-database increases, the level of reuse improves (from new to as-is). As the casedatabase size increases, there are more as-is reuses than new development. This enables in a higher cost saving for the new application that reuse components from the case repository. The quantitative cost-benefit analyses for different levels of reuse of components from the components repository are being worked out as a separate paper. A software tool called RuFTool is developed as a decision support tool for reuse on Windows platform with MS-ACCESS as the back-end and Visual BASIC as the front-end. The components of past projects have been stored in this database as cases and RuFTool is used on this case-database to retrieve the candidate components for reuse with membership values to support decision in reusing a component.
5. Conclusions The integration of CBR and decision theory has shown its usefulness in the retrieval and selection of reusable software components from a software components repository. Software components are denoted by cases with a set of features, attributes, and relations of a given situation and its associated
886
D. Vijay Rao, V.V.S. Sarma / Pattern Recognition Letters 24 (2003) 875–886
outcomes. These are taken as inputs to a decision support tool that classifies the components as adaptable to the given situation with membership values for the decisions. This classification is based on rough–fuzzy set theory and the methodology is explained with illustrations. In this novel approach, CBR and DSS (based on rough–fuzzy sets) have been applied successfully to the software engineering domain to address the problem of retrieving suitable components for reuse from the case-data repository. A software tool called RuFTool is developed as a decision support tool for component retrieval for reuse on Windows platform with MS-ACCESS as the back-end and Visual BASIC as the front-end to this purpose. The use of rough–fuzzy sets increase the likelihood of finding the suitable components for reuse when exact matches are not available or are very few in number. Acknowledgements The authors thank Prof. N.K. Srinivasan for his encouragement and keen interest in the work, and Prof. B. Yegnanarayana for giving several useful suggestions on the rough–fuzzy sets and applying them in the software engineering domain. The first author wishes to thank Mr. T. Kunjachan and Mr. Ch. Srinivas Rao for their suggestions and help in implementing the RuFTool on Windows platform, and Mr. P. Venkatesh Rao for his timely help in preparing the LaTex form of the document.
References Barnes, B., Bollinger, T., 1991. Making software reuse cost effective. IEEE Software 1, 13–24. Basili, V.R., Musa, J.D., 1991. The future engineering of software: a management perspective. IEEE Comp. 24 (5), 90–96. Boehm, B., 1988. A spiral model for software development and enhancement. IEEE Comp. 21 (5), 61–72. Cleaveland, J.C., Fertig, J.A., Newsome, G.W., 1996. Dividing the software pie. AT&T Tech. J. 75 (2), 8–18.
Damiani, E., Fugini, M.G., Bellettini, C., 1999. A hierarchyaware approach to faceted classification of object-oriented components. ACM Trans. Software Eng. Meth. 8 (3), 215– 262. Dubois, D., Prade, H., 1990. Rough–fuzzy sets and fuzzy– rough sets. Int. J. Gen. Syst. 17 (2–3), 191–209. Dubois, D., Prade, H., 1992. Putting rough sets and fuzzy sets together. In: Slowinski, R. (Ed.), Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory. Kluwer Academic Publishers, Dordrecht. Frakes, W., Isoda, S., 1994. Success factors of systematic reuse. IEEE Software 11 (5), 14–19. Jacobson, I., Griss, M., Jonsson, P., 1997. Software Reuse–– Architecture, Process and Organization for Business Success. ACM Press, New York. Karlsson, E.A. (Ed.), 1995. Software Reuse––a holistic approach. John Wiley and Sons, England. Klir, G.J., Folger, T.A., 1988. Fuzzy sets, Uncertainty, and Information. Prentice-Hall Inc., Englewood Cliffs, NJ. Krueger, C.W., 1992. Software reuse. ACM Comput. Surveys 24 (2), 131–183. Mili, H., Mili, F., Mili, A., 1995. Reusing software: issues and research directions. IEEE Trans. Software Eng. 21 (6), 528–562. Mili, A., Mili, R., Mittermeir, R.T., 1997. Storing and retrieving software components: a refinement-based system. IEEE Trans. Software Eng. 23 (7), 445–460. Ning, J.Q., 1996. A component based software development model. In: Proceedings of the COMPSAC ’96, Seoul, Korea. IEEE Computer Society Press, Los Alamitos, CA, pp. 389–394. Pawlak, Z., 1982. Rough sets. Int. J. Comput. Inform. Sci. 11, 341–356. Pawlak, Z., Busse, J.G., Slowinsky, R., Ziarko, W., 1995. Rough sets. Commun. ACM 38 (11), 89–95. Poulin, J., 1997. Measuring software reuse: Principles, practices and economic models. Addison Wesley Longman, Reading, MA, USA. Reifer, D.J., 1997. Practical software reuse. John Wiley and Sons, USA. Sarkar, M., 1998. Uncertainty-based pattern classification by modular neural networks. Ph.D. Thesis. Department of Computer Science and Engineering, Indian Institute of Technology, Chennai, India. Vijay Rao, D., 2001. A unified approach to quantitative software lifecycle modeling, Ph.D. Thesis. Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India. Weide, B.W., Ogden, W.F, Stuart, H., 1991. Reusable software components. Adv. Comp. 33, 1–65. Zadeh, L.A., 1965. Fuzzy sets. Inform. Contr., 338–353. Zadeh, L.A., 1978. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1, 3–28.