Information and Software Technology 45 (2003) 305–314 www.elsevier.com/locate/infsof
A technique to analyze information-flow in object-oriented programs Bixin Lia,b,*,1 a Department of Computer and Information Science, NTNU, NO-7491, Trondheim, Norway Department of Computer Science and Technology, USTC, 210052, Hefei, People’s Republic of China
b
Received 5 August 2002; revised 9 December 2002; accepted 16 December 2002
Abstract Traditional information-flow analysis is mainly based on data-flow and control-flow equations. In object-oriented programs, because of the mechanisms such as encapsulation, inheritance, and polymorphism, information-flow analysis becomes quite complex and therefore it is not enough to analyze object-oriented information-flow using traditional techniques. Some new techniques are required in order to efficiently analyze the information-flow between basic components (such as statements, methods, classes or packages) in object-oriented programs. Based on object-oriented program slicing techniques, we discuss how to compute the amount of information-flow, the width of informationflow, the correlation coefficient between basic components as well as degree of coupling of a basic component can be computed. We also discuss some applications of the information-flow. q 2003 Elsevier Science B.V. All rights reserved. Keywords: Information-flow; Program slicing; Object-orientation
1. Introduction Object-oriented programming brings new challenges in software analysis. Since object-oriented programming can be more complex than imperative programming, some wellestablished information-flow analysis techniques may not be directly relevant to object-oriented software. For example, traditional data-flow and control-flow should be redefined in order to fit for object-oriented programs and new analysis techniques should be introduced to analyze the data-flows and control-flows in an object-oriented program. Program slicing is a well-used program analysis technique, which was introduced by Weiser [18,19]. In an imperative program, program slicing could be applied in many software engineering areas, such as program comprehension, reverse engineering, software maintenance, software testing, debugging, etc. [4,7,11,13,17,19]. In object-oriented programs, information-flow exists not only between vari-
* Address: Lingshuiyuan South 7# 101, Meiling Road, Hefei City, Anhui Province 230052, People’s Republic of China. Tel.: þ 86-13675602539. E-mail address:
[email protected] (B. Li). 1 Dr Bixin Li is a senior research scientist at University of Science and Technology of China, and now he is an ERCIM post-doc research fellow at Norwegian University of Science and Technology.
ables, statements or procedures, but also between methods, object classes, and packages. These information-flows are caused not only by data-flow and control-flow, but also object-flow or message-flow etc. In this paper, we apply some program slicing techniques to analyze informationflow in object-oriented programs. The rest of the paper is organized as follows. Section 2 discusses the state-of-the-art of information-flow analysis techniques and proposed solution. Section 3 introduces how to capture information-flow by object-oriented program slicing. Section 4 presents a new approach for analyzing information-flow between components in Java programs. Section 5 provides application analysis of such information-flow. Concluding remarks are given in Section 6.
2. Background In traditional structural programs, we analyze information-flow based on the control flow graph (CFG). Control-flow analysis (CFA) and data-flow analysis (DFA) are two kinds of important information-flow analysis techniques over CFGs. In this section, we first introduce some basic concepts and discuss traditional information-flow
0950-5849/03/$ - see front matter q 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0950-5849(02)00223-9
306
B. Li / Information and Software Technology 45 (2003) 305–314
analysis techniques, and then discuss the limitation of these traditional techniques when applied them to object-oriented programs. Finally, we present a technique to analyze information-flow in object-oriented programs. 2.1. Basic concepts and methods In this subsection, we introduce some basic concepts according to the descriptions in Ref. [16]. A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end. A basic block is either executed in its entirety or not at all. A CFG is a directed graph whose nodes are basic blocks with a unique entry node START and a unique exit node STOP. There is a directed edge from node A to node B if control may flow from block A directly to block B. This is the case if the last statement in A is a branch to B, or when B is on the fall-through path from A. We assume that for any node N in the graph there exists a path from START to N and a path from N to STOP. If an edge is labeled T (or F), then the target node of the edge will be executed if the predicate at the origin of the edge evaluates to TRUE (or FALSE). A CFG basic block (node) has out-edges that lead to successor nodes and in-edges that come from predecessor nodes. The pred(B) is the set of all predecessors of block B, succ(B) is the set of all successors of block B. The definition set of variable v contains all definitions that define v. The Gen set of block B contains all definitions that are generated by B. The Kill set of block B contains all definitions that are killed by B. The In set of block B contains all definitions that reach B. The Out set of block B contains all definitions that leave B. Traditional CFA proceeds in two steps: (1) determine the basic block; and (2) build a flow graph by taking branches into account. It analyzes program by computing properties on the control-flow based on the control-flow graph. For example, we may determine the dominator relations between two variables, recognize loop, and detect dead code. DFA provides information about how the execution of a program may manipulate its data. For example: (1) Which assignments to variable v may influence a use of v at a certain program position? (2) Is a variable v used on any path from a program position p to the exit node? The values of which expressions are available at program position p? Data-flow problems fall into two classes: First are those which, given a point in the program, ask what can happen before control reaches that point. That is, what (past) definitions can affect computations at that point. We call these forward flow problems. Second are those which, given a point in the program, ask what can happen after control leaves that point. We call these backward flow problems.
Data-flow problems are usually stated as data-flow equations on a control-flow graph. Data-flow equations for forward flow problems are: (1) OutðBÞ ¼SGenðBÞ < ðInðBÞ 2 KillðBÞÞ (2) InðBÞ ¼ h[predðBÞ OutðhÞ Typical data-flow equations for backward flow problems are: (1) InðBÞ ¼ GenðBÞ < ðOutðBÞ 2 KillðBÞÞ S (2) OutðBÞ ¼ h[succðBÞ InðhÞ DFA can solve problems such as reaching definitions, live variables, etc. Example 1. Data flow equations for statement sequences. (This example is borrowed from Ref. [16].) When two statements are executed sequentially, their effects can be combined. Fig. 1 shows how the effects of the statements S1 and S2 can be combined to give the effects of the sequence S. As demonstrated above, the compound statement S generates everything that is generated by S2. Furthermore, all definitions that are generated by S1 and are not killed by S2 are generated by the compound statement S. Likewise, the compound statement S kills everything that is killed by S2. Furthermore, all definitions that are killed by S1 and are not generated by S2 are killed by the compound statement S.
Fig. 1. Data flow equations for a sequence of two statements.
B. Li / Information and Software Technology 45 (2003) 305–314
307
2.2. The problems
2.3. Proposed solution
In object-oriented programs, on one hand, there are some special new features that do not occur in procedural programs make information-flow problems become more complicated. Information-flow problems include not only data-flow and control-flow problems, but also some other flow problems like object-flow, message-flow, etc. Traditional information-flow analysis techniques cannot be used to solve some object-oriented information-flow analysis problems. For example: (1) How to compute information-flow that is caused by object-flow or message-flow? (2) How to compute information-flows that are caused by inheritance, polymorphism call, and dynamic bindings? (3) How to compute the information-flow from one object to another object? (4) How to determine the width of the information-flow? (5) How to detect a dead component in an object-oriented program? (6) How to assess the coupling degree of component in an objectoriented program? On the other hand, information-flow may occur at different levels in an object-oriented program, which can lead to four-level information-flows in the program, i.e. statement-level, method-level, class-level and packagelevel. The statement-level information-flow occurs among program statements, control predicates, and different kinds of variables, such as local variables, instance variables, and class variables. This is the most complicated level. The method-level information-flow refers to those occurring among the methods and attributes defined in a class. This level corresponds to the procedures or functions in imperative program. There exist several kinds of relations that will possibly cause information-flow, such as the usedby relation between a method and the member variables being accessed, and the procedure-call relation between client and server methods. All these will cause informationflow between methods. The class-level information-flows refer to those occurring between different objects or classes in an object-oriented program (e.g. a Java program). At this level, there are many reasons that may cause informationflow between classes, or between classes and other part of the program P. If c1 and c2 denote two classes, then there may have some relationships between them, such as class c1 is a subclass of c2; class c1 has an attribute of class c2; class c1 has a template attribute with a parameter of class c2 ; class c1 has a method with an input argument of class c2 ; class c1 has a method with an output argument of class c2; class c1 knows of a global variable of class c2, and class c1 knows of a method containing a local variable of class c2, etc. Since Java programs are organized as sets of packages, there also exist some information-flow between different packages. Thus, it is difficult to analyze the information-flows existing in different-level components of object-oriented programs by using only traditional techniques.
In this paper, object-oriented program slicing techniques are proposed to analyze object-oriented information-flow problems. Of course, the precision and correctness of analysis result will depend on the object-oriented program slicing techniques [1,2,14,17]. The steps for analyzing information-flow using object-oriented program slicing are as follows: (1) Compute the backward and forward slices of different basic components in a given program. (2) Use these backward and forward slices to compute the information-flow between two different basic components. (3) Compute the width of the information-flow, correlation coefficient of the two basic components, and coupling degree of a basic component. (4) Track the propagation of errors, debug and test modified program. (5) Detect isolated component based on coupling degree.
3. Information-flow capturing by program slicing 3.1. Development of program slicing techniques Program slicing is a well-used program analyzing technique, introduced by Weiser [18,19]. Weiser defines a slice with respect to a slicing criterion that consists of a program point s and a subset of program variables V. His slices are executable programs that are constructed by removing zero or more statements from the original program. His algorithm uses dataflow analysis on controlflow graphs to compute intraprocedural and interprocedural slices. Ottenstein and Ottenstein [14] define a slicing criterion which consists of a program point s and a variable v that is defined or used at s. They use a graph reachability algorithm on a program dependence graph to compute a slice that consists of all statements and predicates of the program that may affect the value of v at s. Horwitz et al. [5] also use dependence graphs to compute slices. They developed an interprocedural program representation, called the system dependence graph, and a two-pass graph reachability slicing algorithm that uses the system dependence graph. Their two-pass algorithm computes more precise interprocedural slices than previous algorithms because it uses summary information at call sites to account for the calling context of called procedures. Researchers have also considered ways to represent object-oriented programs. Larsen and Harrold [6] constructed objectoriented program dependence by using class dependence graph. Their approach can slice an object-oriented program with inheritance and polymorphism. Malloy [7] extended program dependence graph for representing object-oriented program. Zhao [20,22] presented an approach for slicing concurrent Java program with a multithreaded dependence graph. Li [7,8] proposed a hierarchical approach for slicing object-oriented program. Tip [17] presented a survey of program slicing techniques. In next sections, we first
308
B. Li / Information and Software Technology 45 (2003) 305–314
analyze several dependencies in object-oriented program (especially a Java program). Then define the informationflow based on these dependencies. 3.2. Information-flow capturing In this paper, we propose to capture all kinds of information-flows in an object-oriented program by program slicing. In general, we can compute program slices of an object-oriented program using the following steps: (1) Identify all basic components and analyze all kinds of dependencies between these components in the program. (2) Construct object-oriented program dependence graph (OPDG) of the program. (3) Slice the OPDG based on graph-reachability slicing algorithms. (4) Get sliced program based on sliced OPDG. For example, consider a Java program. First, we identify the basic components such as statements, methods, classes, objects, and packages. Second, we analyze all possible dependencies between any two components in Sections 3.2.1 –3.2.8. Third, we construct OPDG based on these dependencies. Fourth, we compute slices based on graph-reachability slicing algorithms. More details can be found in Refs. [6,7,10,12,16,20]. In order to define the information-flow in object-oriented programs, here we present detailed description of all kinds of dependencies. 3.2.1. Inheritance In Java programs, there are two types of inheritance dependencies: class-inheritance-dependence (CID) and interface-inheritance-dependence (IID) [21]. A class c1 is class-inheritance dependent on class c2, if c2 is an extending class of c1. An interface I1 is interface-inheritance dependent on interface I2 if I1 is an extending interface of I2. CID captures the inheritance information between Java classes. IID captures the inheritance information between Java interfaces. In OPDG, we use class inheritance edge and interface inheritance edge to present them. 3.2.2. Memberships There are three types of membership dependencies in Java programs: class-membership-dependence (CMD), interface-membership-dependence (IMD), and packagemembership-dependence (PMD) [21]. CMD captures the membership relationships between a class and its member methods. IMD captures the membership relationships between an interface and its member methods declaration. PMD captures the membership relationships between a package and its member classes, interfaces and subpackages. In OPDG, we use class member edge to represent CMD, interface member edge to represent IMD, and package member edge to represent PMD. 3.2.3. Call and parameter passing There are three types of primary program dependencies between a call and a called method in Java classes:
method-call-dependence (MCD), parameter-in-dependence (PID) and parameter-out-dependence (POD) [21]. MCD captures the call relationships information between a call and a called method. PID captures the parameter passing information between actual parameters and formal input parameters (only if the parameter is at all used by the called procedure). POD captures parameter passing information between formal output parameter and actual parameter (only if the formal parameter is at all defined by the called procedure). In program dependence graph, we use methodcall edge to represent MCD, parameter-in edge to represent PID, and parameter-out edge to represent POD. 3.2.4. Data-flow and control-flow Control-dependence (CD) captures all control dependencies between program components in Java program. Data-dependence (DD) captures the data-flow between the statements in a single method or different program components. 3.2.5. Polymorphism and dynamic bindings Polymorphism-dependence (PD) captures the polymorphism method calls information at runtime. In OPDG, we use a polymorphic choice vertex to represent the dynamic choice among the possible destinations [6]. Static analysis must consider all possibilities. We presented a method in Ref. [9] called virtual-function-call-graph (VFCG) can be used to analyze statically polymorphic call. 3.2.6. Object-flow and message-flow It is very difficult to handle object-flow and messageflow. In our technique, we handle object and message according to the method proposed in Ref. [6]: handling parameter objects, handling polymorphic objects and object slicing. Parameter objects: when an object is used as a parameter, we call it parameter object. In program dependence graph, we use a tree to represent a parameter object: the root of the tree represents the object itself, the children of the root represent the object’s data members, and edges of the tree represent data dependencies between object and its data members. If the data member of the object is another object, we can further expand the data member into a subtree [6]. Polymorphic object: a polymorphic object in PDG is also represented as a tree: the root of the tree represents the polymorphic object and the children of the root represent objects of the possible types. When the polymorphic object is used as a parameter, the children are further expanded into trees, when the polymorphic object receives message, the children are further expanded into callsites. Object slicing: in objectoriented program, an object (variable) is not only a simple data aggregation, but also a behavior owner. After receiving a sequence of messages, an object having an unexpected (incorrect) state does not mean that it receives a wrong sequence of messages, it might perform an inappropriate behavior declared in its definition class. That is,
B. Li / Information and Software Technology 45 (2003) 305–314
the unexpected state could be caused either by the received messages or class definition or both. So, the object slice should be composed of two slices: state slice and behavior slice. A state slice for an object is a set of messages and control statements that might affect the state of the object, while a behavior slice is a set of attributes and methods defined in related classes that might affect the behavior of the object. 3.2.7. Multithreaded programs There are two types of dependencies between two threads in a Java program: synchronization-dependence (SD) and communication-dependence (COD) [3,20]. A statement S1 in one thread is synchronization dependent on a statement S2 in another thread if the start or terminates of the execution of S1 directly determines the start or terminate of execution of S2 through an inter-thread synchronization. We use SD to capture dependencies between different threads due to inter-thread synchronization. A statement S1 in one thread is directly communication dependent on a statement S2 in another thread if the value of a variable computed at S1 has direct influence on the value of a variable computed at S2 through an inter-thread communication. 3.2.8. Recursion Recursive-dependence (RD) can be used to capture all recursive dependencies of a recursive function in an objectoriented program (or a Java program). In OPDG, we use recursive dependence edge to represent the RD, it appears as a ring on a node for direct recursion call, and a cycle for indirect recursion call. In traditional procedural programs, information-flow analysis only includes CFA and DFA. In object-oriented programs, information-flow analysis should include DFA, CFA, object-flow analysis, message-flow analysis and all kinds of dependencies analysis. According to the basic method of object-oriented program slicing, we can capture all these information-flow by computing program slices. Definition 1. Let com1 and com2 denote two basic components in an object-oriented program, we can define information-flow between com1 and com2 as the union set of all information-flows caused by all the above dependencies.
309
the slicing criterion, marked as BSðkP; si ; vi lÞ; whereas a forward slice is a set that consists of all statements and control predicates, which are affected by the slicing criterion, marked as FSðkP; si ; vi lÞ: By affect-on-the-slicing-criterion, we mean that all statements and control predicates will affect the value of variable vi at position si; by affected-by-the-slicing-criterion, we mean that all statements and control predicates will be affected by the value of variable vi at position si. Definition 3. The slicing criterion is a triple kP; S; Vl; where S ¼ {s1 ; …; sk } is a set of some interesting positions in program P, and V ¼ {v1 ; …; vk } is a set of variables defined or used in S, and the corresponding interesting position of variable vi is si, i ¼ 1; …; k: A backward slice with respect to the slicing criterion kP; S; S Vl can be computed as following equation: BSðkP; S; VlÞ ¼ ki¼1 BSðkP; si ; vi lÞ; whereas a forward slice with respect to the slicing criterion kP; S; Vl can be S computed as follows: FSðkP; S; VlÞ ¼ ki¼1 FSðkP; si ; vi lÞ: Definition 4. The slicing criterion is a triple kP; c; VðcÞl; where c is a class in program P, and VðcÞ is the set of all variables defined or used in class c. Similarly a backward slice with respect to the slicing criterion kP; c; VðcÞl S can be calculated as follows: BSðkP; c; VðcÞlÞ ¼ m i¼1 BSðkP; csi ; cvi lÞ: Similarly, a forward slice with respect to the slicing criterionSkP; c; VðcÞl can be computed as follows: FSðkP; c; VðcÞlÞ ¼ m i¼1 FSðkP; csi ; cvi lÞ; where cs1 ; …; csm are the positions of corresponding variables cv1 ; …; cvm ; respectively, in class c. According to the above definitions, all slices will include those statements that include position si of variable vi, where i ¼ 1; 2; …; m:
4. Analyzing information-flow in Java program In the following equations, set cardinality and restriction operators are used. Set cardinality of a finite set is the number of elements in the set. We use the notation #(A) to refer to the cardinality of a set A, that is, #(A) ¼ number of elements in A. As the empty set has no elements, we have #(B) ¼ 0. For A ¼ {a; b; c}; we have #(A) ¼ 3. Set restriction operator (marked as AlB) is defined as: AlB ¼ {xlx [ A and x satisfy B}.
3.3. Basic definitions 4.1. Basic equations In this subsection, a program slice can be defined according to the following steps. Definition 2. Statement-level slicing criterion is a triple kP; si ; vi l; where si is an interesting position (such as a statement) in program P, and vi is variable defined or used in si. A backward slice is a set that consists of all statements and control predicates, which may have some affect on
In this section, let BSðkP; c1 ; Vðc1 ÞlÞ and BSðkP; c2 ; Vðc2 ÞlÞ represent the backward slice with respect to slicing criteria kP; c1 ; Vðc1 Þl and kP; c2 ; Vðc2 Þl; respectively; FSðkP; c1 ; Vðc1 ÞlÞ and FSðkP; c2 ; Vðc2 ÞlÞ represent the forward slice with respect to slicing criteria kP; c1 ; Vðc1 Þl and kP; c2 ; Vðc2 Þl; respectively. To create following basic equation, we let SðkP; c1 ; Vðc1 ÞlÞ and SðkP; c2 ; Vðc2 ÞlÞ
310
B. Li / Information and Software Technology 45 (2003) 305–314
represent BSðkP; c1 ; Vðc1 ÞlÞ and BSðkP; c2 ; Vðc2 ÞlÞ or FSðkP; c1 ; Vðc1 ÞlÞ and FSðkP; c2 ; Vðc2 ÞlÞ: Now, we first define following three equations based on object-oriented program slicing, then we use all kinds of combinations of the three equations to analyze informationflows in object-oriented programs: IFlowðc1 ; c2 Þ ¼ SðkP; c1 ; Vðc1 ÞlÞ > SðkP; c2 ; Vðc2 ÞlÞ
ð1Þ
In Eq. (1), IFlowðc1 ; c2 Þ computes the intersection of two sets, IFlowðc1 ; c2 Þ denotes a kind of information-flow relationship between two classes. Such information-flow is very useful to wave analysis, propagation analysis of errors, and regression testing [ IFlowðP; cÞ ¼ SðkP; c; VðcÞlÞ 2 c ð2Þ v[VðcÞ
In Eq. (2), IFlowðP; cÞ is the set of all statements and control predicates that affect the value of variable in VðcÞ; but does not includes all those statements and control predicates in class c, IFlowðP; cÞ is used to capture the information-flow from the outside of class c to class c itself [ IFlowðc1 ; c2 ÞRes ¼ SðkP; c2 ; Vðc2 ÞlÞlc1 ð3Þ
variable in Vðc1 Þ or Vðc2 Þ: † If IFlowðc1 ; c2 Þ ¼ f; then the two classes have no common affect scope in program P, i.e. there is no part in program P, which can be affected by classes c1 and c2 simultaneously. † If IFlowðc1 ; c2 Þ – f; then the two classes have common affect scope in program P, i.e. there exists a part in program P, which can be affected by classes c1 and c2 simultaneously. In Eq. (1), replace SðkP; c1 ; Vðc1 ÞlÞ with backward slice BSðkP; c1 ; Vðc1 ÞlÞ; and SðkP; c2 ; Vðc2 ÞlÞ with forward slice FSðkP; c2 ; Vðc2 ÞlÞ; we have: IFlowðc1 ; c2 Þ ¼ BSðkP; c1 ; Vðc1 ÞlÞ > FSðkP; c2 ; Vðc2 ÞlÞ
ð6Þ
† If IFlowðc1 ; c2 Þ ¼ f; then no information-flow will flow from c1 to c2 ; any modification of class c1 will therefore not affect the values of variable vj [ Vðc2 Þ; j ¼ 1; 2; … † If IFlowðc1 ; c2 Þ – f; then there exist information-flow from c1 to c2 ; some modification of c1 may affect the value of some variables vj [ Vðc2 Þ; j ¼ 1; 2; …; in program P.
v[Vðc2 Þ
In Eq. (3), IFlowðc1 ; c2 ÞRes is the set of all statements and control predicates in class ci, which affect the value of variable in Vðc2 Þ; IFlowðc1 ; c2 ÞRes is used to analyze all kinds of data flows, control-flows, message-flows, and object-flows from class c1 to class c2, which can be captured by object-oriented program slicing. 4.2. Information-flow analysis In Eq. (1), replace SðkP; c1 ; Vðc1 ÞlÞ with BSðkP; c1 ; Vðc1 ÞlÞ; and SðkP; c2 ; Vðc2 ÞlÞ with BSðkP; c2 ; Vðc2 ÞlÞ : IFlowðc1 ; c2 Þ ¼ BSðkP; c1 ; Vðc1 ÞlÞ > BSðkP; c2 ; Vðc2 ÞlÞ
IFlowðc1 ; c2 Þ ¼ FSðkP; c1 ; Vðc1 ÞlÞ > BSðkP; c2 ; Vðc2 ÞlÞ
ð7Þ
† If IFlowðc1 ; c2 Þ ¼ f; then no information-flow will flow from c2 to c1, any modification of class c2 will therefore not affect the value of variables set Vðc1 Þ; in program P. † If IFlowðc1 ; c2 Þ – f; then some modification of class c2 may affect the value of some variables vi [ Vðc1 Þ; i ¼ 1; 2; …; in program P.
ð4Þ
† If IFlowðc1 ; c2 Þ ¼ f; and there are no common statements and control predicates, which may affect classes c1 and c2 and classes c1 and c2 do not affect each other, so the two classes are independent, and therefore no informationflow exists between them. † If IFlowðm1 ; m2 Þ – f; there exist some common statements and control predicates in program P, which may affect the values of variables vi [ Vðc1 Þ; i ¼ 1,2,… and the values of variables vj [ Vðc2 Þ; j ¼ 1,2,… In Eq. (1), replace SðkP; c1 ; Vðc1 ÞlÞ with forward slice FSðkP; c1 ; Vðc1 ÞlÞ; and SðkP; c2 ; Vðc2 ÞlÞ with forward slice FSðkP; c2 ; Vðc2 ÞlÞ; we have: IFlowðc1 ; c2 Þ ¼ FSðkP; c1 ; Vðc1 ÞlÞ > FSðkP; c2 ; Vðc2 ÞlÞ
In Eq. (1), replace SðkP; c1 ; Vðc1 ÞlÞ with forward slice FSðkP; c1 ; Vðc1 ÞlÞ; and SðkP; c2 ; Vðc2 ÞlÞ with backward slice BSðkP; c2 ; Vðc2 ÞlÞ; we have:
In Eq. (2), replace SðkP; c; VðcÞlÞ with BSðkP; c; VðcÞlÞ; we have: [ IFlowðP; cÞ ¼ BSðkP; c; VðcÞlÞ 2 c ð8Þ v[VðcÞ
† If IFlowðP; cÞ ¼ f; then there is no information-flow from other parts of program P to class c. † If IFlowðP; cÞ – f; then there exist some external information-flow to class c. This information-flow may affect the result of principle variables in class c. In Eq. (2), replace SðkP; c; VðcÞlÞ with FSðkP; c; VðcÞlÞ; we have: [ IFlowðP; cÞ ¼ FSðkP; c; VðcÞlÞ 2 c ð9Þ v[VðcÞ
ð5Þ
IFlowðc1 ; c2 Þ is the intersection of two sets of all statements and control predicates that are affected by the value of
† If IFlowðP; cÞ ¼ f; then class c will have no effect on other parts of program P.
B. Li / Information and Software Technology 45 (2003) 305–314
† If IFlowðP; cÞ – f; then class c may affect other parts of program P. IFlowðP; cÞ is the parts possibly affected by class c. In Eq. (3), replace SðkP; c2 ; Vðc2 ÞlÞ BSðkP; c2 ; Vðc2 ÞlÞ; we have: [ IFlowðc1 ; c2 ÞRes ¼ BSðkP; c2 ; Vðc2 ÞlÞlc1
with ð10Þ
311
(3) Compute the width of information-flow from class c1 to c2: 0 1 [ #@ ðBSðkP;c2 ; Vðc2 ÞlÞlc1 ÞA v[Vðc2 Þ
WIFðc1 ; c2 Þ ¼
ð12Þ
Nðc1 Þ
v[Vðc2 Þ
† If IFlowðc1 ; c2 Þ ¼ f; then there is no information-flow from class c1 to c2. † If IFlowðc1 ; c2 Þ – f; then there exist some informationflow from class c1 to c2. † If we exchange the position of c1 and c2 in Eq. (3), then we can analyze the information-flow from c2 to c1. In Eq. (3), replace SðkP; c2 ; Vðc2 ÞlÞ FSðkP; c2 ; Vðc2 ÞlÞ; we have: [ IFlowðc1 ; c2 ÞRes ¼ ðFSðkP; c2 ; Vðc2 ÞlÞlc1
(4) Compute the width of information-flow from class c2 to c1, by exchanging the position of c2 to c1 in item (3) 0 1 [ #@ ðBSðkP;c1 ;Vðc1 ÞlÞlc2 ÞA v[Vðc1 Þ
WIFðc2 ;c1 Þ ¼
:
Nðc2 Þ
ð13Þ
with ð11Þ
5.2. Computation of contribution coefficient
v[Vðc2 Þ
† If IFlowðc1 ; c2 Þ ¼ f; then class c2 cannot affect class c1. † If IFlowðc1 ; c2 Þ – f; then class c2 can affect c1. † If we exchange the position of c1 and c2 in Eq. (3), then we can analyze whether c1 can affect c2 or not. Note that if we replace classes with methods or packages in a Java program, we can use a similar way as we showed above, to analyze and compute all kinds of information-flow among methods or packages. Due to the limitation of space, we do not discuss them here.
The contribution coefficient is defined as follows: Let c1 ; c2 ; …; cn ; c be n þ 1 components of program P, let Total be the total information-flow from c1 ; c2 ; …; cn to component c, Part1 be the information-flow from c1 to c, then the contribution coefficient of c1 to c is: coeðc1 ; cÞ ¼ Part1=Total: The following equation shows contribution coefficient of class c1 to class c2: 0 1 [ #@ ðBSðkP; c2 ; Vðc2 ÞlÞlc1 ÞA coeðc1 ; c2 Þ ¼
0 #@
v[Vðc2 Þ
[
1:
ð14Þ
BSðkP; c2 ; Vðc2 ÞlÞ 2 Nðc2 ÞA
v[Vðc2 Þ
5. Application analysis 5.1. Computation of the width of information-flow 5.3. Computation of coupling degree We can compute the absolute amount of informationflow from one component to anther component and the width of information-flow based on Eqs. (1) – (11). (1) Compute the absolute amount of information-flow from class c1 to c2: [ ðBSðkP; c2 ; Vðc2 ÞlÞlc1 IFlowðc1 ; c2 ÞRes ¼ v[Vðc2 Þ
(2) Compute the absolute amount of information-flow from class c2 to c1: [ ðBSðkP; c1 ; Vðc1 ÞlÞlc2 IFlowðc2 ; c1 ÞRes ¼ v[Vðc1 Þ
We can use such information-flow to define and compute the correlativity between two basic components in a Java program. The correlativity coefficient between two basic components in a Java program can be computed based on the above information-flow, which forms the basis for computing the coupling degree of one component. For example, the correlation coefficient (CC) between two classes can be computed as follows: CCðc1 ; c2 Þ ¼
WIFðc1 ; c2 ÞNðc1 Þ þ WIFðc2 ; c1 ÞNðc2 Þ Nðc1 Þ þ Nðc2 Þ
ð15Þ
In order to compute a coupling degree of a basic component caused by the above information-flow, without the loss of generality, we first assume a Java program
312
B. Li / Information and Software Technology 45 (2003) 305–314
P, which includes n þ 1 classes: c and c1,…,cn. Then the coupling degree of class c can be computed according to the following steps: (1) Compute every correlativity coefficient between class c and ci, i ¼ 1; …; n : CCðc; ci Þ ¼
WIFðc; ci ÞNðcÞ þ WIFðci ; cÞNðci Þ NðcÞ þ Nðci Þ
ð16Þ
(2) Compute coupling degree (CD) of class c, which is the weighted average of these correlativity coefficients: Xn CCðc; ci ÞNðci Þ 1 X ð17Þ CDðcÞ ¼ n Nðci Þ 1 Regarding coupling measurement, Harman [4] discussed similar issues, but Harman focused on procedural programs, rather than object-oriented programs. Longworth [11] discussed slice-based program metrics, but he has no any further discussion about coupling measurement. We will further discuss this issue in our another paper. 5.4. Analyzing the propagation of error Let affectðcÞ represent the program scope that may affect c, and affectedðcÞ represent the scope possibly affected by c. Definition 5. (affect and affected) The affectðcÞ of class c can be defined as its backward slice with respect to the slicing criterion kP; c; VðcÞl: The affectedðcÞ of c is defined as its forward slice with respect to the slicing criterion kP; c; VðcÞl: Now, let us analyze the following conditions: † Import Propagation: If affectðc1 Þ > affectðc2 Þ – f; some common components may affect simultaneously components c1 and c2. Any modification of the common components may cause a change in both c1 and c2. † Export Propagation: If affectedðc1 Þ > affectedðc2 Þ – f; c1 and c2 have a common affect scope. Some errors in the common affect scope may be caused by c1 and c2. † I/E Propagation: If affectðc1 Þ > affectðc2 Þ – f and affectedðc1 Þ > affectedðc2 Þ – f; there may exist import propagation to c1 and c2, and export propagation from c1 and c2. † Unidirection Propagation: If affectedðc1 Þ > affectðc2 Þ – f; some modification of c1 may affect c2. In other words, an error originating from c1 may propagate to c2. If affectðc1 Þ > affectedðc2 Þ – f; an error may similarly propagate from c2 to c1.
5.5. Criteria for recognizing isolated components In an object-oriented system, if a component has no coupling relation with any other component, it is an isolated component, and should be deleted from the system, i.e. if Xn CCðc; ci ÞNðci Þ 1 X ¼0 CDðcÞ ¼ n Nðci Þ 1 then c is an isolated class in program P. 5.6. Regression test selection Regression testing is an expensive testing procedure that attempts to verify modified software and ensure that new errors are not introduced into previously tested code. One method for reducing the cost of regression testing is to save relevant test suites, and reuse them to re-verify the product after its modification. A retest-all approach reruns all such test cases, but may consume excessive time and resources. Regression testing selection techniques, in contrast, attempt to reduce the cost of regression testing by selecting a subset of the test suite. Rothermel and Harrold [15] discussed similar issues, using a control-flow graph (CFG). Their technique first creates the CFG of the original program. Then, it uses this CFG to insert probes into the program so that it reports those edges that were actually executed. When this instrumented version of the program is executed with all test cases in the test suite, the techniques create a test history that associates with each edge, those test cases that executed it. When the program is modified, the technique creates the CFG for the modified program. The technique then walks the CFGs of the original and modified program to find the text associated with the ‘sink’ of the edge in the modified program. The technique selects all test cases that, according to the test history, are associated with that edge. We find that the information-flow computed according to our approach is very useful to regression test selection, but for the limitation of space, we will further discuss this issue in our future work. 5.7. A simple example Table 1 is a simple Java program P, now we analyze the information-flow between different classes in P using the above approach. Suppose we use a slicing criterion kP; E; {a1 ; d1 }l; the backward slice with respect to the slicing criterion kP; E; {a1 ; d1 }l is shown in Table 2, i.e. BSðkP; E; {a1 ; d1 }lÞ ¼ BSðkP; E; a1 lÞ < BSðkP; E; d1 lÞ ¼ {1; 2; 5; 6; 7; 8; 15; 18; 19; 20; 21; 22; 23; 24}; FSðkP; E; {a1 ; d1 }lÞ ¼ FSðkP; E; a1 lÞ < FSðkP; E; d1 lÞ ¼ F:
B. Li / Information and Software Technology 45 (2003) 305–314 Table 1 A simple Java program
313
IFlowðE; AÞRes ¼ FSðkP; E; {a1 ; d1 }lÞlA ¼ f IFlowðE; BÞRes ¼ FSðkP; E; {a1 ; d1 }lÞlB ¼ f IFlowðE; CÞRes ¼ FSðkP; E; {a1 ; d1 }lÞlC ¼ f IFlowðE; DÞRes ¼ FSðkP; E; {a1 ; d1 }lÞlD ¼ f Then, the width of information-flow from other classes to class E can be calculated as follows:
Let us look at information-flow from other parts of program P to class E regarding principle variables set {a1 ; d1 }: Information-flow from program P to class E: IFlowðP; EÞ ¼ BSðkP; E; {a1 ; d1 }lÞ 2 E ¼ {1; 2; 5; 6; 7; 8; 15; 16; 17; 18} Information-flow from class A to class E: IFlowðA; EÞ ¼ BSðkP; E; {a1 ; d1 }lÞlA ¼ {1; 2} Information-flow from class B to class E:
WIFðA; EÞ ¼
#ðSðkP; E; {a1 ; d1 }lÞlAÞ 1 ¼ NðAÞ 2
WIFðB; EÞ ¼
#ðSðkP; E; {a1 ; d1 }lÞlBÞ 2 ¼ NðBÞ 3
WIFðC; EÞ ¼
#ðSðkP; E; {a1 ; d1 }lÞlCÞ ¼0 NðCÞ
WIFðD; EÞ ¼
#ðSðkP; E; {a1 ; d1 }lÞlDÞ ¼1 NðDÞ
Every width of information-flow from class E to other class is 0. The contribution coefficient of other classes to class E may be calculated as follows: coeðA; EÞ ¼
#ðSðkP; E; {a1 ; d1 }lÞlAÞ 1 ¼ #ðSðkP; E; {a1 ; d1 }lÞÞ 2 E 5
coeðB; EÞ ¼
#ðSðkP; E; {a1 ; d1 }lÞlBÞ 2 ¼ #ðSðkP; E; {a1 ; d1 }lÞÞ 2 E 5
coeðC; EÞ ¼
#ðSðkP; E; {a1 ; d1 }lÞlCÞ ¼0 #ðSðkP; E; {a1 ; d1 }lÞÞ 2 E
coeðD; EÞ ¼
#ðSðkP; E; {a1 ; d1 }lÞlDÞ 2 ¼ #ðSðkP; E; {a1 ; d1 }lÞÞ 2 E 5
IFlowðB; EÞ ¼ BSðkP; E; {a1 ; d1 }lÞlB ¼ {5; 6; 7; 8} Information-flow from class C to class E: IFlowðC; EÞ ¼ BSðkP; E; {a1 ; d1 }lÞlC ¼ f Information-flow from class D to class E: IFlowðD; EÞ ¼ SðkP; E; {a1 ; d1 }lÞlD ¼ {15; 16; 17; 18} Now, let us look at the information-flow from class E to other parts of program P, regarding principle variables set {a1 ; d1 }: The forward slice with respect to the slicing criterion kP; E; {a1 ; d1 }l is FSðkP; E; {a1 ; d1 }lÞ ¼ {21; 22; 23}: We further have: IFlowðE; AÞ ¼ FSðkP; E; {a1 ; d1 }lÞ 2 E ¼ f Table 2 The slice with respect to the slicing criterion kP; E; {a1 ; d1 }l of the Java program in Table 1
Every contribution coefficient of class E to other class is 0. The correlativity coefficient of any two components can be calculated as follows: CCðA; EÞ ¼
WIFðA; EÞNðAÞ þ WIFðE; AÞNðEÞ 2 ¼ NðAÞ þ NðEÞ 9
CCðB; EÞ ¼
WIFðB; EÞNðBÞ þ WIFðE; BÞNðEÞ 4 ¼ NðBÞ þ NðEÞ 11
CCðC; EÞ ¼
WIFðC; EÞNðCÞ þ WIFðE; CÞNðEÞ ¼0 NðCÞ þ NðEÞ
CCðD; EÞ ¼
WIFðD; EÞNðDÞ þ WIFðE; DÞNðEÞ 4 ¼ NðDÞ þ NðEÞ 9
Finally, the coupling degree of class E in program P will be calculated. Let Z ¼ {A; B; C; D}; we now have: X CCðx; EÞNðxÞ 80 CDðEÞ ¼ x[Z X ¼ 297 NðxÞ x[Z
314
B. Li / Information and Software Technology 45 (2003) 305–314
Note that: (1) CDðEÞ ¼ 0 if and only if CCðx; EÞ ¼ 0; x [ Z; (2) CDðEÞ ¼ 1 if and only if CCðx; EÞ ¼ 1; x [ Z:
6. Concluding remarks In this paper, we proposed a technique for analyzing information-flow in object-oriented programs. This technique has the following advantages: First, our technique provided a computation of information-flow rather than only traditional data-flow and control-flow. Second, our technique can compute more precise absolute and relative amount of information-flow from one component to another component, and can compute the width of information-flow. Third, the technique can compute and analyze correlativity between any two components and the coupling degree of any component in a Java program. On the other hand, the information-flow computed according to the technique can be applied to a lot of aspects. For example, it can be used to track the propagation of error and regression test case selection, etc. However, the abilities of the technique depend on the development of object-oriented program slicing techniques, the precision and correctness of the technique also depend on the precision and correctness of objectoriented program slicing techniques. The problems mentioned here will be further investigated in our future work.
Acknowledgements This work is partially supported by ERCIM Fellow Programme and Norwegian University of Science and Technology. The authors thank Prof. Conradi Reidar, Dr Marco Torchiano and Dr Jianjun Zhao for checking the whole paper.
[5]
[6]
[7]
[8]
[9]
[10]
[11] [12]
[13]
[14]
[15]
[16]
[17] [18] [19]
References [20] [1] J.L. Chen, J.F. Wang, Y.L. Chen, Slicing object-oriented programs, Proceedings of the Fourth Asia-Pacific Software Engineering Conference (APSEC’97), Hong Kong, China December (1997) 395 –404. [2] Z. Chen, B. Xu, Slicing object-oriented Java programs, SIGPLAN Notices 36 (4) (2001) 33 –40. [3] Z. Chen, B. Xu, Slicing concurrent Java programs, SIGPLAN Notices 36 (4) (2001) 41–47. [4] M. Harman, M. Okunlawon, B. Sivagurunathan, S. Danicic, Slicebased measurement of coupling, IEEE/ACM ICSE Workshop
[21]
[22]
on Process Modeling and Empirical Studies of Software Evolution (PMESSE’97), Boston, Massachusetts May (1997) 26 –32. S. Horwitz, T. Reps, D. Binkley, Interprocedural slicing using dependence graphs, ACM Transactions on Programming Languages and Systems 12 (1) (1990) 26–60. L.D. Larsen, M.J. Harrold, Slicing object-oriented software, Proceedings of the 18th International Conference on Software Engineering (1996) 495 –505. B. Li, Program slicing techniques and its application in objectoriented software metrics and software test, PhD thesis, Nanjing University, People’s Republic of China, December 2000. B. Li, X. Fan, JATO: slicing Java program hierarchically, TUCS Technical Reports No. 416, Turku Centre for Computer Science, Turku, Finland, July 2001. B. Li, J. Ni, Analyzing Cþ þ virtual method based on CHG, The 2002 International Conference on Software Engineering Research and Practice, Monte Carlo Resort and Casino, Las Vegas, Nevada, USA, June 23 –27 2002. D. Liang, M.J. Horrald, Slicing objects using system dependence graph, International Conference on Software Maintenance November (1998) 358 –367. H.D. Longworth, Slice-based program metrics, Master’s thesis, MTU, Department of Computer Science, 1985. B.A. Malloy, J.D. McGregor, A. Krishnaswamy, M. Medikonda, An extensible program representation for object-oriented software, ACM SIGPLAN Notices 29 (12) (1994) 38–47. L.M. Ott, Using slice profiles and metrics during software maintenance, Proceedings of the 10th Annual Software Reliability Symposium June (1992) 16–23. K.J. Ottenstein, L.M. Ottenstein, The program dependence graph in a software development environment, Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments (1984) 177–184. G. Othermel, M.J. Harrold, Selecting regressin tests for objectoriented software, Second ACM Conference on Foundations of Software Engineering December (1994) 11 –20. C. Steindl, Program slicing for object-oriented programming languages, PhD thesis, Department of Computer Science, Johannes Kepler University Linz, Austria, April 1999. F. Tip, A survey of program slicing techniques, Journal of Programming Languages 3 (3) (1995) 121 –189. M. Weiser, Program Slicing, Fifth International Conference on Software Engineering, IEEE Press, New York, 1982, pp. 439–449. M. Weiser, Program slicing, IEEE Transactions on Software Engineering 10 (4) (1984) 352– 357. J. Zhao, Multithreaded Dependence Graphs for Concurrent Java Programs, Proceedings of the 1999 International Symposium on Software Engineering for Parallel and Distributed Systems, IEEE Computer Society Press, Silver Spring, MD, 1999, pp. 13–23. J. Zhao, Applying program dependence analysis to Java software, Proceedings of the 1998 International Computer Conference, Tiannan, Taiwan December (1998). J. Zhao, Slicing concurrent Java programs, Proceedings of the Seventh IEEE International Workshop on Program Comprehension May (1999) 126 –133.