A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting

A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting

ARTICLE IN PRESS JID: INFSOF [m5G;February 14, 2017;13:41] Information and Software Technology 0 0 0 (2017) 1–14 Contents lists available at Scien...

1MB Sizes 0 Downloads 54 Views

ARTICLE IN PRESS

JID: INFSOF

[m5G;February 14, 2017;13:41]

Information and Software Technology 0 0 0 (2017) 1–14

Contents lists available at ScienceDirect

Information and Software Technology journal homepage: www.elsevier.com/locate/infsof

A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting Thierry Lavoie a,c, Mathieu Mérineau a, Ettore Merlo a,∗, Pascal Potvin b a

Department of Computer and Software Engineering, Ecole Polytechnique de Montréal, C.P. 6079, succ. Centre-ville Montréal, Québec, H3C 3A7, Canada Ericsson, 8400 Decarie, Montréal, Québec, H4P 2N2, Canada c Software Integrity Group, Synopsys Canada, 800 6th Ave SW, Suite 410, Calgary, T2P 3G3, Canada b

a r t i c l e

i n f o

Article history: Received 28 October 2015 Revised 7 December 2016 Accepted 20 January 2017 Available online xxx Keywords: Clone detection Telecommunications software Test

a b s t r a c t Context: This paper presents a novel experiment focused on detecting and analyzing clones in test suites written in TTCN-3, a standard telecommunication test script language, for different industrial projects. Objective: This paper investigates frequencies, types, and similarity distributions of TTCN-3 clones in test scripts from three industrial projects in telecommunication. We also compare the distribution of clones in TTCN-3 test scripts with the distribution of clones in C/C++ and Java projects from the telecommunication domain. We then perform a statistical analysis to validate the significance of differences between these distributions. Method: Similarity is computed using CLAN, which compares metrics syntactically derived from script fragments. Metrics are computed from the Abstract Syntax Trees produced by a TTCN-3 parser called Titan developed by Ericsson as an Eclipse plugin. Finally, clone classification of similar script pairs is computed using the Longest Common Subsequence algorithm on token types and token images. Results: This paper presents figures and diagrams reporting TTCN-3 clone frequencies, types, and similarity distributions. We show that the differences between the distribution of clones in test scripts and the distribution of clones in applications are statistically significant. We also present and discuss some lessons that can be learned about the transferability of technology from this study. Conclusion: About 24% of fragments in the test suites are cloned, which is a very high proportion of clones compared to what is generally found in source code. The difference in proportion of Type-1 and Type-2 clones is statistically significant and remarkably higher in TTCN-3 than in source code. Type-1 and Type-2 clones represent 82.9% and 15.3% of clone fragments for a total of 98.2%. Within the projects this study investigated, this represents more and easier potential re-factoring opportunities for test scripts than for code. © 2017 Elsevier B.V. All rights reserved.

1. Introduction Clone analysis involves finding similar code fragments in source code as well as interpreting and using the results to tackle design, testing, and other software engineering issues [1–3]. There are four main types of clones defined in the literature [4] as follows: • Type-1: identical code fragments, except for changes in whitespace, layout, and comments (changes affecting layout, with no



Corresponding author. E-mail addresses: [email protected] (T. Lavoie), [email protected] (M. Mérineau), [email protected] (E. Merlo), [email protected] (P. Potvin).

impact on lexical or syntactic information). They are often referred to as “identical” clones. • Type-2: syntactically identical fragments except for variations in names of identifiers and types, and in values of literals. They possibly have the same characteristics of Type-1 clones with regards to changes in whitespace, layout, and comments. Some Type-2 clones are “parametric” clones [4], because they can be normalized by: replacing identifiers with a single parametric identifier (they represent instances of the “consistent renaming” pattern [5]); or replacing constants with a single parameter; or replacing types with a “generic” type or with a “template”. Also, they can be easily transformed using parametric changes, parametric templates, or parametric transformations. Type-2 clones that are not parametric may represent cases of an “inconsistent renaming” pattern.

http://dx.doi.org/10.1016/j.infsof.2017.01.008 0950-5849/© 2017 Elsevier B.V. All rights reserved.

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF 2

ARTICLE IN PRESS

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

• Type-3: fragments with modifications such as changed, added, or removed statements, and possibly with some of the characteristics of Type-1 and Type-2 clones (changes affecting syntactic information). They are referred to as: “similar” (only) clones, “gapped” clones, or “near-miss” clones, in the literature. • Type-4: code fragments performing the same computation but implemented by different syntactic variants. In the literature, they are referred to as “semantic” clones. Type-1, Type-2, and Type-3 clones are based on textual similarity, while Type-4 clones are based on semantic similarity. A finer classification of clone differences has been proposed in the context of object oriented refactoring [1] and a set of mutation based clone differences has been investigated in the context of clone-detection tool evaluation [5]. There are extensive publications on experiments and techniques for analyzing clones in source code. Clone detection in tests has been investigated using the Clone Miner and Clone Analyzer tools [6] on tests written in Java for Android [7]. In this paper, we analyze clones in proprietary industrial test scripts written in Testing and Test Control Notation Version 3 (TTCN-3) language [8]. To the best of our knowledge, this is the first study of TTCN-3 clones in test scripts. TTCN-3 differs from application programming languages in many aspects. It is a language designed to write tests for telecommunication applications. It is standardized and maintained by the European Telecommunication Standards Institute (ETSI) and is also adopted by the International Telecommunication Union (ITU-T). As stated in its introduction [9]: TTCN-3 provides all the constructs and features necessary for black box testing. It embodies: a rich typing system and powerful matching mechanisms, support for both messagebased and procedure-based communication, timer handling, dynamic test configuration including concurrent test behaviour, the concept of verdicts and verdict resolution, and much more. TTCN-3 also supports standard imperative constructs as C++ and Java, such as assignments, conditional statements (if-else), loops (for, while, do), break, continue, and functions [8,9]. The TTCN-3 language is also promoted by the 3rd Generation Partnership Project (3GPP). As part of ISO 9646, the use of this language is widespread throughout the telecommunication industry. Thus, not only it is natural for a company like Ericsson to use TTCN-3, but it is also mandatory. As a proponent of TTCN-3, Ericsson has a small team dedicated to developing tools to support users inside their company. That team has created a complete programming environment as an Eclipse plugin called Titan. That plugin includes API to access a TTCN-3 parser and some of its components, notably the Abstract Syntax Tree (AST). A practical problem was to make our clone detection technology, CLAN (CLone ANalyzer) [10], able to extract data from Titan in order to perform clone detection in TTCN-3. Practical issues are discussed in Section 6. Some of Ericsson developers suspected that cloning is one of the main idioms for test scripts reuse in the testing process for some systems. Management believes that the practice of cloning in test environments increases the cost of maintenance and also leads to possible inconsistencies between tests and evolving software versions. In the long run, Ericsson hopes to reduce the effort in maintenance, design, and comprehension of their TTCN-3 test scripts, in order to improve the quality of the maintenance process of test suites.

As Ericsson had previous experience with clone detection technologies [11] and they believe reduction in cloning could benefit some of the aspects they seek to improve, they elected to do clone analysis on TTCN-3. Before exploring ways to apply solutions, a rigorous verification by means of quantitative figures was required, in order to measure the extent of the existence of clones in TTCN-3. In this paper, we make an industrial case study of the clones in test scripts written in the TTCN-3 language and compare them to clones in C/C++ and Java applications. This study has the following goals: • • • • •

To verify the existence of clones in TTCN-3 scripts; To quantify to what extent clones exist in TTCN-3 scripts; To categorize the clones according to their type; To compute the distribution of clone types in TTCN-3 scripts; To compare the distribution of the clones in TTCN-3 scripts with that of clones in C/C++ and Java systems.

To achieve these goals, we report distributions and statistics of the clones in 500 kLOC (Lines of Code) of TTCN-3 scripts. Moreover, we complement our analysis by a statistical comparison of the distribution of the clones in TTCN-3 with that of the clones previously identified in Ericsson’s C/C++ and Java code, as reported in [11]. This previous work gave a detailed description of the distribution of the clones in C/C++ and Java code in the same industrial setting. The numbers used for comparison are drawn directly from that published work. The paper is organized as follows: Section 2 describes the Ericsson testing environment and process; Section 3 presents experiments and results; Section 4 discusses the results, lessons learned, and threats to validity; Section 5 presents related research; Section 6 describes practical issues and the adaptation of our clone detection technology to TTCN-3; Section 7 discusses further research objectives, while Section 8 concludes the paper. 2. Industrial testing context We analyzed scripts from a legacy testing environment, which was only very slightly automated. Although other more sophisticated testing environments are used within Ericsson, scripts in this testing environment are written and modified by testers by hand. In a sense, the analyzed test scripts are legacy test code maintained by hand by developers. In this context, a test suite is a subset of tests. Each test targets some aspect of one specific feature in a product. A test may be composed of several TTCN-3 files that are kept in a versioning system. Files represent the latest modifications of functions and scripts belonging to a test and they correspond to the latest version used in a test suite. The latest modification of a file may not belong to the current test suite corresponding to the latest version of the product being tested. This occurs when a test was not required nor updated for the current test suite. There is a fundamental difference with respect to application development and code. In applications, components that can potentially be executed must be kept up to date. Components that cannot be executed correspond to “dead” code, because they cannot be called from any reachable statement. Dead code is usually not maintained and not kept up to date. In contrast, test scripts that are not executed in the current test suite can be selected and executed in future versions. They are not “dead” code and they should, in principle, be kept up to date, because of the possibility of selection in the future. Tests are only updated on an “on-demand” basis, because they may or may not be selected in the current test suite (in fact they may never be selected again), and, when selected, updates may or may not be necessary. It is possible that a test and its files are not

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF

ARTICLE IN PRESS

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

updated for several versions of a product. In this case, the necessary updates are harder and more complex to implement. It is not essential to keep tests up to date, but this constitutes some technological debt. This debt is composed of the time and effort needed for eventual on-demand test modifications. Test script maintainability of non-updated tests may be seriously and negatively affected after a number of releases. At a certain point in time, the cost of the effort required to adapt a test may become so high that the testers will not have the necessary time and resources to adapt it and it will be explicitly discarded or no longer used, wasting the original effort to create it. In the Ericsson context, tests are organized around the features of the products being tested. Suppose that a test is designed to test the “make a call” feature in a product. A new version of the product may include a new feature “display number called” and a new test is required. The new test shares the code to make a call with the previous test. Practices that are often used to create a new test from an existing one are: copying the existing test, keeping or perhaps modifying the shared parts, and adding new code as required. In our case, the code to make a call would be physically duplicated in the two tests and it would become a clone. Suppose that, in a future version of the product, the code to make a call changes. All tests selected for this new version of the product that have to make a call, have to be updated. The update effort is multiplied by the number of cloned code in the selected tests. The tests that make a call, but that were not selected for the current test suite, do not immediately need to be updated. They may never need to be updated, if they are never selected for any future test suite. The technical debt of not updating cloned parts of tests, will be reclaimed only if these tests are selected for future test suites. If the code to make a call is changed again in a subsequent version of the product, some of the tests that were already updated may be selected and updated again, whereas others will not. Therefore, clones in tests may be at different levels of updating with respect to the current test suite. After several releases of a product, consistent updating of older tests for the current version becomes more and more costly, complex, and hard to implement. Ericsson management wanted to reduce the time, effort, and cost of updating and evolving test suites from one version of a product being tested to another. They believed that two factors affect the effort of updating tests: the sheer number of clones to be updated, and the gap to be covered between the last version of a clone and the current version of a product. Therefore, management wanted to assess the amount of cloning in a test pool for a product. They also wanted to determine the clone type distribution, to better evaluate opportunities of test clone management.

3

Table 1 Sizes of test script sets. Project

TTCN-3 (LOC)

A B C Total

186,349 181,518 199,836 567,703

Table 2 All systems clone clustering and Dynamic Programming (DP) computation time. Language

Clusters (secs)

DP (mins)

TTCN-3

0.548

58.670

3. Experiments 3.1. Hardware, software environment, and dataset The system architecture for the experiment is based on a virtual machine with Ubuntu Server 11.10, 1 CPU, 8GB of RAM and 200 GB of hard drive disk space and relies on Apache/MySQL/PHP, Java 7, and CLAN. Three sets of test scripts for different projects have been analyzed. Table 1 shows the size of each set. They are all of moderate size. Processing systems of such a size is achieved quickly, since the CLAN clustering algorithm execution is very fast compared to other tools [10]. The exact execution time for all the systems together is reported in Table 2. 3.2. Clone detection process and configuration The CLAN tool used in this experiment computes code fragment similarity by considering syntactically based software metrics and it can be used for identifying code duplication in large software systems. The process used to study clone detection is outlined in Fig. 1 and consists of the following sequential steps: 1. 2. 3. 4.

TTCN-3 Lexical and syntactic analysis; Extraction of the metrics; Clone cluster identification; Dynamic Programming Analysis and Visualization.

The following is a quick overview of the process as shown in Fig. 1. The first box at the far left shows projects with test script files. All the files in the different projects are first analyzed lexically and syntactically. Similarity analysis using software metrics requires a particular fragment granularity and a set of metrics to characterize the fragments. Functions and methods are the chosen granularity for the fragments in this setup. The lexical and syntactic analysis identi-

Fig. 1. Clone detection process with CLAN.

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF 4

ARTICLE IN PRESS

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

Fig. 2. Example of TTCN-3 clone visualization.

Table 3 TTCN-3 detailed clone clustering.

fies all functions and methods in TTCN-3 projects and lists them for further reference in the process. The extraction of metrics is then performed on all the functions and all the methods. The following set of metrics were extracted from the AST and were used to compute clone clusters: fragment size (number of AST nodes); number of passed parameters; number of conditional statements (if-then, if-then-else, switch); number of loop statements (while, for, do); number of local variables.

The fragments are then clustered by CLAN. In our experiment, fragments are clustered together if all their metrics are equal, apart for the size of fragments, which is allowed to vary by 100 units. The last step is Dynamic Programming (DP) matching between pairs of fragments in the same cluster. It is executed using the algorithm described by Kontogiannis et al. [12], although extended to compute the Longest Common Subsequence (LCS) [13] as well. The Jaccard coefficient [14,15] is computed using the LCS results between the sequences of token types of two fragments in the same cluster. If it is greater or equal to 0.7 fragments are considered for token image similarity. In this case, a second LCS computation is performed on the sequences of token images and a second Jaccard coefficient is then computed on LCS results of token images. An example of a visualized clone pair based on token image similarity is shown in Fig. 2. The two Jaccard coefficients are used in the clone type identification strategy described in the following. Recalling the clone type definition reported in Section 1, CLAN is able to detect clones of Type-1, Type-2, Type-3, and some structurally similar clones of Type-4 that are aggregated with Type-3 clones. Type-1 clones are formally identified as those code fragments that have identical sequences of tokens (that is, they have respectively the same token image and the same token type sequences). Type-2 clones are identified as those having the same length, the same token type sequence, but not the same token image sequence, allowing for variations in identifier names, types, and literal values such as constants or strings. Type-3 clones are identified as those having different sequences of token types, allowing for added, deleted, or modified statements. They can be of different lengths too. CLAN does not report Type-4 clones other than those that share some structural similarity. Detection of Type-4 clones is very difficult and subject to the undecidable problem of determining the equivalence of arbitrary programs. 3.3. TTCN-3 results Detailed statistics from the clone detection step are presented in Table 3, where columns “MAX SIZE” and “AVG SIZE” represent the maximum and the average sizes of reported clusters or fragments. The maximum cluster size is 29. As depicted in Fig. 3, the

N

MAX SIZE

AVG SIZE

Clusters Fragments Cloned fragments Type-1 cloned fragments Type-2 cloned fragments Type-3 cloned fragments

2248 16,318 4025 3336 616 73

29 661 661 364 661 203

2.39 24.26 24.32 23.89 26.72 23.72

10000

1000

Frequency

• • • • •

TTCN-3 experiments

100

10

1 1

10

100 Cluster size

1000

10000

Fig. 3. Cluster size distribution.

most frequent cluster size is 2 and it indicates that most clones occur in pairs. Consequently, as reported in Table 3, the average cluster size is 2.39. A total of 4,025 fragments were identified as clones from a total 16,318 analyzed fragments, resulting in a 24.7% cloning ratio over all the fragments that were considered. This translates to approximately 100 kLOC of cloned code. Reported ratios in the literature [16] for open source C/C++ and Java projects vary on average from 2% to 10%, with outliers around 20%. Our previous experiment [11] showed a cloning ratio around 10% in the same industrial setting. This suggests that cloning practices when writing test scripts may differ significantly from common practices used when writing application software, within the same industry and in open source projects. In Section 3.4, we show that this difference is indeed statistically significant, within our industrial context. The predominant variety of clones in this experiment is Type-1. 3,336 cloned fragments out of 4,025 (82.9%) have the same token types and token images. Type-2 clones have identical token types, but different token images. We counted 616 fragments (15.3%) of

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

ARTICLE IN PRESS

JID: INFSOF

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

5

10000

400

350

300

1000

Frequency

Frequency

250

200

100

150

100

10

50

0 1

10

100

1000

1 0.7

Fragment size

0.75

Fig. 4. Fragment size distribution.

0.8

0.85 Similarity

0.9

0.95

1

Fig. 5. Token type similarity. 1.05

1

0.95 Image similarity

this kind. The remaining 73 marginal fragments are clones of Type3 or higher, with different token types and token images. We observed that 98.2% of the clones were either Type-1 or Type-2. This is partly explained by the environment in which the tests are created. Many testers consider copy-and-paste an acceptable practice and do not make use of design patterns or language features to reduce the number of clones. Clearly, in this context, it is not at all surprising to find clones. What is important to note is that, in general, the copy-and-paste culture seems to lead to very small modifications in all the tests. Tests are indeed expected to share common behaviours and to be parametric by nature. Since Type-3 clones are almost non-existent, new tests are not likely to be developed by making deep modifications to existing tests. We hypothesize that families of similar tests are created. Clones are not shared between families, rather they are shared between members of each family. The clones that are most often shared between members of one family are of Type-2 and this is deliberately used by developers as an implementation pattern. Fig. 3 shows the distribution of cluster sizes with respect to cluster size ranking, starting from the smallest ranking. Clusters represent the sets of mutually similar fragments. Both axes have a logarithmic scale. The figure displays a frequency of clusters of size 2 (corresponding to pairs of clones) higher than that of clusters of larger size. Fig. 4 indicates the size distribution of clones. Sizes are measured in Lines of Code (LOCs). Fragments are ranked in order of increasing size. Both figures uses a logarithmic scale on the x-axis. Similarity distributions are reported in Figs. 5 and 7. The scale of the y axis is logarithmic for both these figures. Fig. 5 shows the distribution of token type similarity between fragment pairs in clusters. Fig. 6 shows the cloned fragments ranked by their highest token type similarity. This figure clearly shows that Type-3 clones are marginal and could almost be considered statistical aberrations. Fig. 7 reports the distribution of token image similarity of Type1 and Type-2 clones. These clones have a token type similarity equal to 1.0. Fig. 8 gives more details about token image similarity of Type-1 and Type-2 clones. Fragments are ranked by their highest token image similarity. Image similarity varies from 1.0 down to 0.57. The most interesting feature of Fig. 8 is the high number of Type-1 and Type-2 clones sharing more than 80% of their image. Figs. 5–8 suggest that most of the tests that are replicated are identical, since the Jaccard coefficient of their longest common im-

0.9

0.85

0.8

0.75 0

500

1000

1500

2000 2500 Parametric pair rank

3000

3500

4000

4500

Fig. 6. Token type similarity of ranked clones.

age sub-sequence is 1.0. This means that if a test is cloned, it is unlikely ever to be modified. Moreover, as shown in Fig. 5, for almost all the non-identically replicated tests, the type similarity is above 0.7, with the majority having a type similarity above 0.85. On top of that, as shown in Fig. 8, the textual similarity is very high for almost all Type-1 and Type-2 clones. In Fig. 8, Type-2 clones ranked at positions roughly between 3300 and 3800 have an image similarity higher than 0.8, but lower than 1.0. These 500 fragments only have few token image changes. Thus, if a modification ever occurs in a Type-2 cloned test, it is likely to be limited to a small number of changes in identifiers and literals. Manual inspection of clones confirmed that deep structural changes almost never occur. 3.4. Statistical analysis In this section, we apply two non-parametric tests, MannWhitney U-test and Fishers exact test, to investigate the statistical significance of the differences between the TTCN-3 clone similarity distribution presented in Fig. 6 and the results obtained previously for C/C++ and Java [11]. The first one, the Mann-Whitney U-test is a qualitative test to verify if the TTCN-3 distribution differs from the distribution in C/C++ and Java. The Fishers exact test, is a quantitative test to verify if the proportion of Type-1 and Type-2 clones

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

ARTICLE IN PRESS

JID: INFSOF 6

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14 Table 4 Example of a contingency table.

10000

Variable 2 Variable 1

Value 1

Value 2

Total

Value 1 Value 2 Total

A C A+C

B D B+D

A+B C+D N

Frequency

1000

100

10

1 0.55

0.6

0.65

0.7

0.75 0.8 Similarity

0.85

0.9

0.95

1

Fig. 7. Type-1 and Type-2 token image similarity.

1

pT T C N−C/C ++ < 5.1604 × 10−234

(1)

pT T CN−Java < 2.8804 × 10−291

(2)

We computed p-values using the R Project for Statistical Computing [17]. The returned values are extremely small but not mathematically equal to zero. Since the values are very small, we double checked the results with an independent implementation using Matlab. The values were confirmed. Hence, we reject the null hypothesis and conclude that it is likely that TTCN-3 samples are generated by a distinct process other than the process underlying C/C++ and Java. We did not elect to choose a specific significance threshold as the p-value tends towards zero and we could have rejected the null hypothesis for all common thresholds, especially p< 0.05.

0.8

Image similarity

0.7342, then the rank for that value of similarity is 2. A rank exists for each value of similarity for which there was at least one clone; a rank of zero is implicitly assumed for all other similarity values. It is important to note that while Fig. 7 is a connected histogram with classes of similarity values, the U test does not group similarity values into classes. In the following equations, the p-values for the tests between TTCN-3 and C/C++, and between TTCN-3 and Java are presented:

0.6

0.4

0.2

0 0

500

1000

1500

2000

2500

3000

3500

4000

Parametric pair rank

Fig. 8. Token image similarity of ranked Type-1 and Type-2 clones.

in TTCN-3 is higher than in the other two languages. The detailed steps are explained in the following sections. 3.4.1. Mann–Whitney U test The Mann–Whitney U test is a qualitative test used to determine if two samples are drawn from the same population. It is usually used when the samples are of significantly different sizes. It is a non-parametric test and is also called Wilcoxon rank-sum test. Two distinct variants exist for this test. We used the variant with the ranked means. The other variant would require the median of the distribution, which is unknown in our case. For the same reason, this test is used as an alternative to a Students t-test, as the dataset is not normally distributed. Here, we wanted to verify if samples from the TTCN-3 dataset are generated from a different process than the datasets from C/C++ and Java. The null hypothesis is that, pairwise, TTCN-3 differs from C/C++ and Java. We will reject the null hypothesis if, pairwise, the datasets are from different populations. The Mann–Whitney U test, uses the ranks of all the observations of a sample (see Fig. 7 for this distribution for TTCN-3). It compares the ranks of the observations of a sample to those of another sample. Computing the p-value leads to rejection, or not, of the null hypothesis. The rank in this test is the number of fragments in a language, reported as a clone with a certain value of similarity. For example, if two clones have a type similarity of

3.4.2. Fisher’s exact test Tests of statistical significance are of interest for a quantitative comparison of two samples of software clones. Among this category of tests, Fisher’s exact test can be used to evaluate the associations between two binary variables for which data can be expressed by a contingency table [18]. An example of a contingency table is reported in Table 4. Fisher’s exact test evaluates the pvalue of obtaining the observed distribution of the set partition represented in such a table. The null hypothesis is that the underlying probabilities of getting the set partition follow an hypergeometric law, as follows:





A+B A



(3)

N = A+B+C +D

(4)

p=





C+D C

N A+C

Applied in the same pairwise pattern as the Mann–Whitney U-test, we compared TTCN-3 and C/C++, and TTCN-3 and Java, by classifying the clones as either Type-1 and Type-2, or Type-3. Type-1 and Type-2 clones are those with a token type similarity coefficient of 1.0, and Type-3 clones are the remainder. Table 5 presents the comparison between TTCN-3 and C/C++, and Table 6 corresponds to the comparison between TTCN-3 and Java. We chose Fisher’s exact test over the Chi-Squared test, as the latter has some distortion when the number of elements in a set is either low, or is significantly lower than the others sets, which is the case in our experiment [19]. As the number of Type-1 and Type-2 clones is significantly higher than the number of Type-3 clones in all systems, the Chi-Squared test is, de facto, discarded.

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

ARTICLE IN PRESS

JID: INFSOF

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14 Table 5 Contingency table for TTCN-3 and C/C++. Systems Clones

TTCN-3

C/C++

Total

Types 1-2 Types 3 Total

3 952 73 4 025

254 414 86 151 340 565

258 366 86 224 344 590

Table 6 Contingency table for TTCN-3 and Java. Systems Clones

TTCN-3

Java

Total

Types 1-2 Types 3 Total

3 952 73 4 025

13 976 6 135 20 111

17 928 6 208 24 136

Also, while Barnard’s test would have applied to our experiment, it was impossible to compute, as the computation does not succeed due to the large numbers. Thus, Fisher’s test was the most appropriate. Fishers exact test requires that the two sets of categories are independent. Dependence is defined when the datasets are associated in any way, for example if they are extracted from the same population. In our case study, the systems are developed in different programming languages, either in TTCN-3, in Java, or in C++. As a system can only be in a single dataset, this does not violate the pre-requisite of Fisher’s test. Indeed, no clone pairs are found in these systems implying the datasets are independent. The p-values of the pairwise tests between TTCN-3 and C/C++ and between TTCN-3 and Java are the following:

pT T C N−C/C ++ < 2.2 × 10−16

(5)

pT T CN−Java < 2.2 × 10−16

(6)

Therefore, under a reasonable significance threshold, and making the same considerations and double verifications of Section 3.4.1 about the small values yielded by R scripts, we reject the null hypothesis in both cases. Hence, we conclude that the proportion of Type-1 and Type-2 clones is not only very high in TTCN-3, but this proportion is also significantly different from the proportion in C/C++ and Java. 4. Discussion and lessons learned This experiment differs from previous work, because it analyzes TTCN-3 language, that is specific to test scripts. Observed distributions of TTCN-3 clones are similar to those observed in test clones written in Java [7]. Furthermore, this paper compares the similarity distribution of test clones to that of applications clones. The distributions are significantly different, in contrast with a previous study that investigated and compared application clones in scripting languages and in imperative programming languages. Published results [20] reported that observed distributions were not significantly different. The sponsors of the project suspected beforehand that a significant amount code was probably duplicated, because of the standard way testers write scripts. Clones appear in scripts used to test similar products or different versions of the same product. Most of the time only parameters change between two cloned tests. Type-1 clones correspond to common parts and Type-2 clones correspond to parametric variants of test scripts. Experimental results reported in Section 3 show that 82.9% of cloned test script fragments are of Type-1, 15.3% of fragments are of Type-2, and the remaining 1.8% are of Type-3.

7

Therefore 82.9% of test script clones are identical and have no differences. For an additional 15.3% of clones, the differences could be caused by consistent or inconsistent renaming and changes in identifier names, or changes in literal values such as constants, strings, and characters. For the remaining Type-3 clones, the differences require other clone classification schemes [1,5]. A precise count and classification of parametric or inconsistent renaming of Type-2 clones and of precise differences in Type-3 clones was not in the scope of this paper and is left for further research. Nevertheless, the differences and the changes between TTCN-3 clone fragments that were informally observed seem to be consistent with the published literature [1,5]. However, the near absence of Type-3 clones in TTCN-3 scripts and the prevalence of Type-1 and Type-2 clones gives rise to a higher applicability of automated refactoring and variability management to test clones than to application clones, as discussed in Section 5. One of the goals of this study was to informally describe the opportunities for applying automated solutions to reduce test evolution and migration cost. Possible actions based on test clone information are clone tracking, removal, refactoring, and variability management. The reported figures and distributions indicating 82.9% of Type-1 and 15.3% of Type-2 clones clearly show that refactoring and variability management opportunities exist for the TTCN-3 tests that were analyzed, in agreement with published findings for Java tests [7]. There are other scenarios where identifying clones in test scripts could be useful. It could help testers to easily fix additional bugs in test scripts, once a bug is identified in a cloned test. Cloned copies of the buggy script could be investigated as they could contain an identical mistake. Automated detection could prevent further system failures ahead of time, which is highly valuable. Also, since most of the clones are of Type-2 and possibly parametric, it could be beneficial to automatically suggest templates and examples from the existing tests. This could help testers to consistently repair buggy scripts. In Section 5, several approaches for clone tracking, clone removal, clone refatoring, and variability management in tests are briefly discussed. The CLAN configuration was similar to that used for code in C/C++ and Java. Thus at an equivalent level of similarity, TTCN-3 contains a significantly small amount of higher type clones and has a preponderance of Type-1 and Type-2 clones. We learned that a tool that allows many configurations and parameters is not what customers want to deal with. Customization usually comes at the cost of first learning the impact of the parameters on a system and then learning what the best values for a specific condition are. In most cases, only good enough parameter values are required and will get the job done. From that perspective and knowing that developers have very little time to learn new tools, it was of utmost importance that the interface for the clone detection tool expose the user to the least amount of customization. A small performance loss is acceptable, if it shortens the learning curve for many users. Based on that, the tool presented to the developers for clone detection had to be as simple as possible to be used for clone information extraction and not to take up a lot of their time. It was also very important that our tool was adaptable to the Titan environment. Compared to previous research in [11], visualization, although appreciated by the managerial level, was not a key factor in adopting the technology. This may be partly explained by the almost exclusive involvement of the managerial level, in the project, without much of feedback from developers. From what we have learned, managers are usually more interested in gathering facts, interpreting them, and finding solutions rather than taking a direct look at fine-grained results.

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF 8

ARTICLE IN PRESS

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

Testers are also not interested in software clones, because they generally program using a cloning paradigm and while managers perceive it as a problem, testers think that it is an acceptable practice. There is a resistance to migrate from the current paradigm to the use of more evolved abstract constructions and this poses a major challenge to mitigate the current problems: the prohibitive task of maintaining tests properly aligned between releases of two versions of a product. Visualization or other forms of summarized results did not change the view of testers on that matter. Thus, even with a working technology, working solutions, and evidence supporting the use of those solutions, the cultural setting of the testing groups was enough to severely hinder migration towards a different programming paradigm. Also, where in general software researchers argue that awareness of clones is always at least a step in the right direction, awareness was already there in the case of TTCN-3 scripts. However, even if the problem was acknowledged beforehand and the results suggest many courses of action, it still may not be enough to overcome cultural resistance. As previously discussed, the prevalence of Type-1 and Type2 clones, totalling 98.2%, offers good opportunities for automated refactoring [21] and variability management [7,22,23]. However, despite the presence of refactoring tools available to developers within their IDE of choice (mainly Eclipse is used within Ericsson), the reported results did not trigger actions from the developers to reduce the number of clones. During informal discussion, developers reported that in a large company, making changes affecting the company structure takes some time. The awareness of software clones in TTCN-3 scripts did generate discussion among managers, but in the end, social and organizational changes are required before the company can grasp and take advantage of technological opportunities. 4.1. Threats to validity Threats to validity can represent weaknesses in the design [24] or limitations to the interpretation [25] of our study. The reader should carefully understand the industrial scope in which we conducted this case study. Hence, in this section, we discuss threats to the internal, external, and the construct validity, and threats to the reliability of our study. 4.1.1. Internal threats Threats to the internal validity result from the design of this study [24]. Threats could come from the choice of CLAN to identify the clones in the systems under study. This clone detection tool was used previously when comparing application samples, but it would be interesting to compare it with other clone detection approaches in test suites projects. Metric based clone detection has false-positives. However, the rate of false-positives is known to be low for Type-1 and Type-2 [4], which comprise the majority of the clones identified in this study and is expected to have a limited impact on this study. Furthermore, DP matching helps to reduce the false positives caused by permutation of similar syntactic structures in Type-1 and Type2 clones. We did not measure the exact rate of false-positives for TTCN-3 and assumed that the rate of false-positives does not particularly depend on the programming language. Since the same clone detection method was used for TTCN-3, C++, and Java, the possible bias due to false-positives is the same across all samples, in the experiments presented, and that somehow mitigates its impact on the conclusions. Also, since TTCN-3 is syntactically similar to Java and that false-positives are influenced by the grammar of the language, we think our assumptions are reasonable.

Furthermore, the use of any TTCN-3 parser is expected to yield consistent syntax trees. Although we used the Titan parser from Ericsson, we assumed the parser would correctly recognize any construct in the language, as would the TTCN-3 interpreter used to execute the test scripts. 4.1.2. External threats Threats to the external validity come from the generalization of our results outside the context of our study [24]. Hence, conducting this case study inside Ericsson could be an external threat as it may not be possible to generalize the results to other companies or to open source projects. Furthermore, the systems that were analyzed are all chosen from the telecommunications industry and the results of this study might be specific to this single domain. However, we expect the tests under study to be representative of TTCN-3 scripts, because they also come from different development teams inside Ericsson. The specific choice of projects, selected by interest from Ericsson development teams, is considered to be a selection bias and is not precisely a random sampling [26]. The choice of C++ and Java for the applications and of TTCN3 for the test scripts represents an external threat, too. The clone distributions may be different, if other languages were chosen for applications and for test suites. Hence, for all these reasons, our case study should be replicated in different contexts [26]. In this study, we examined clones “naturally” appearing in applications and in test suites written by human developers. Clones were produced by developers using a copy/paste pattern. In test suites, they appeared from one test suite version to another, because testers left large parts of the suites unchanged, while modifying some scripts based on their knowledge of the Software Under Test (SUT) and of the changes to be tested. Finally, our study did not look at software automatically produced by development frameworks, nor at test scripts produced by automated testing environments. In these latter cases, similarity distribution may be different from that of “naturally” occurring clones. 4.1.3. Construct threats Construct validity concerns whether a test measures the intended construct. We wanted to measure whether the process that generated the TTCN-3 clones is significantly different from the one that generated the application clones, based on two nonparametric statistical tests. The Mann-Whitney U test is a qualitative test used to determine if two samples are drawn from the same population and it is usually used when the samples are of significantly different sizes. This test was used as an alternative to a Students t-test, as the dataset is not normally distributed. Fisher’s exact test evaluates the p-value of obtaining the observed distribution of the set partition represented in a contingency table. The Chi-Squared test was discarded because the number of Type-1 and Type-2 clones is significantly higher than the number of Type-3 clones in all systems. Although applicable, Barnard’s test was impossible to compute, because of the large numbers involved in our experiments. Under the construct validity, other statistical tests could have yielded different results. 4.1.4. Reliability The reliability of our study depends on its reproducibility [24]. Although the dataset from our industrial partner, Ericsson, is not publicly available, other clone results obtained by CLAN on C++ and Java are available from the Bellon’s dataset [10]. CLAN could also be

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

ARTICLE IN PRESS

JID: INFSOF

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

used to analyze clones on other TTCN-3 datasets, if desired, to produce publicly available results. Other clone detectors, mentioned in Section 5, could also be used to analyze clones in TTCN-3, C++, and Java to reproduce the comparisons presented in this paper. 5. Related work Clones are extensively investigated in the literature with hundreds of papers dedicated to their detection and application as reported in surveys of the field [4,27]. In the following, we relate our work to previously published papers and explain the differences and similarities between this current study and the previous work. With respect to the currently published work analyzing test clones [7], we are the first to investigate industrial test clones written in TTCN-3 and compare these to application clones. Other works are relevant in industrial settings and this section gives a quick overview of them. 5.1. Applications in large-scale industrial context Large-scale clone detection studies in an industrial setting can be traced back to Dang et al. [28]. Large scale, inter-project clone detection was investigated by Koshke et al. [29] and Keivanloo et al. [30]. With a growing interest from the industry and the technology achieving scalability, other industrial clone experiments have been published. Notably, authors from Microsoft [28,31] did an experiment with code clones and investigated developer feedback. Further developments in industrial applications were investigated in experiments on the application of clone technology and management in an industrial context [32,33]. Our research shares the industrial context and the large scale of the projects with these other studies. In an industrial context with a focus on different products branching from a common ancestor, inter-system clone detection is a requirement to detect the common parts in the original cloned sub-systems. Therefore, clone detectors able to operate in an intersystem context may be better suited for industrial applications. Other authors have explored the context of large scale inter-system clone detection [34,35]. Our industrial results suggest that we should proceed with this approach. Observed refactoring opportunities in TTCN-3 were manyfold. Papers investigating refactoring opportunities based on clones are published in the literature [36,37]. The recent work of Tsantalis [38] focuses on refactoring clone pairs and small clone clusters localized in small regions of the code base. It does not investigate larger cloning phenomena such as the replication and modification of entire sub-systems, nor their broader refactoring opportunities. Moreover, our industrial experiment and feedback from developers suggests that localized clones are readily refactored before being committed to a central repository and thus do not live long enough to become an issue. In the end, our work suggests changing the focus of refactoring activities from localized clones to management of higher abstraction (such as module) clones. This is an important distinction between our findings and the current state-of-the-art in clone analysis. 5.2. Summary of different clone detection techniques Despite the numerous tools already available today, many new techniques and tools are created and published every year in various conferences and journals. The following will briefly cover the most recent techniques published as well as the established ones still mentioned in the literature. In recent years, new techniques have tried to solve precision and recall issues with Type-3 and Type-4 clones. With regards to Type-3 cloning, techniques based on n-grams have gained some

9

popularity. Kamiya [39] used n-grams to detect Type-3 and Type4 clones in Java bytecode. Yuan [40] used frequency vectors of 1-gram and the cosine distance to detect Type-3 clones. Lavoie [41] used frequency of generalized n-grams with the normalized Manhattan distance with space partitioning to detect Type-3 clones. Sajnani [42] also used similarity on n-grams, but within a parallel framework. Without relying on n-grams for Type-3 clone detection, other tools use distance on token strings or image strings. Murakami [43] uses a weighted Levenshtein distance, also called SmithWaterman, to detect Type-3 clones. Kamiya also uses token strings [44]. Some tools use tokens for clone matching, but add syntactic information to help locate the clones within functional boundaries. This is the case of NiCAD [45], which performs syntactic analysis, program transformation, and normalization, and then computes the Longest Common Subsequence (LCS, related to the Levenshtein distance). In this paper, we also use the LCS as the last step of the clone detection process. Hashing and clustering are also used to detect Type-3 clones. Part of the technique we presented in this paper falls into this category. The recent SimCAD [46] also has a hashing and clustering step in its algorithm. ConQAT [47] is another tool with hashing schemes. Locality sensitive hash (LSH) has been exploited by Jiang et al. [48]. Clone detection by suffix tree matching was introduced by Koschke et al. [49]. Although naive suffix tree matching is usually best fit for Type-1 and Type-2 clone detection, the authors proposed algorithms and heuristics to close gaps between matched segments in order to achieve competitive Type-3 clone detection. The technique was evaluated [50] and the tool iClone dates back to the incremental version of the suffix tree approach [51]. Clone origin and genealogies have also been investigated [52–54]. Basit uses structural clones for higher level detection [6]. Older techniques used AST-based matching for Type-3 clone detection [55]. Although they produced good results, these techniques are used less today, because they lack scalability. Detecting code clones is also possible using by-products of the code instead of the code itself. For example, Program Dependency Graphs (PDG) have been used [56,57]. Analysis of the memory behaviour is another technique that can be used [58]. Semantic related techniques have emerged in recent years. A prime example of this is the work of McMillan et. al. [59], which uses Latent Semantic Indexing (LSI). Although most of the tools mentioned in this section are incremental, incremental clone detection can be done on source code repositories [60], which might be relevant for large scale industrial settings.

5.3. Clone removal Cloning indicates implicit reuse that fulfills some non-explicit design objectives. In test suites cloning serves the purposes of reusing tests that cover corresponding parts of different systems. Code cloning is a software development practice, whose impact is controversial. Several studies have investigated its impact in software development. Some studies have presented evidence of the negative impact of clones on software maintenance and evoution [61–68], while others have described positive aspects of cloning in terms of reduced design complexity, improved ease of code modification, and higher code stability [69–73]. The issue of keeping or removing clones requires careful consideration [71,74,75].

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF 10

ARTICLE IN PRESS

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

5.4. Clone refactoring

5.6. Variability management in test scripts

Research has targeted refactoring clones in applications. In applications, code clone refactoring and redesign are possible ways of re-organizing systems based on clone information [1,76]. Deciding which, if any, clones should be refactored is a difficult task. Clone based refactoring may be quite invasive and not worth the effort, cost, and risks involved. An approach using a tree-based decision classifier, that shows a precision of around 80%, was presented in the literature [37]. Prioritizing clones for refactoring based on resulting design and quality impact was addressed in the literature [77], proposing an effort model and a Constraint-Programming solution for the optimal scheduling of refactoring tasks. Optimization of clone refactoring was proposed and investigated using genetic algorithms [78]. Candidates for refactoring can also be ranked and chosen based on their tendency to change together during software evolution [74,79]. Important refactoring candidates selected by co-change frequency roughly represent 7% of clones and around 93% of clones can be considered lower priority for refactoring decisions. An approach using open-spectrum refactoring techniques, while preserving strong safety properties was proposed to overcome the limitations of other clone refactoring tools in terms of limited generality and extensibility [80]. Whether clones can be safely and easily parametrized for refactoring without causing side-effects and without changing a program’s behavior was investigated in the literature [21]. One of the reported conclusions is that Type-1 clones tend to be more refactorable than the other clone types. Since Type-1 clones are very frequent in the test suites investigated in this paper, test script clones seem to offer excellent opportunities for refactoring. An approach for clone unification and refactoring of subtle and difficult clone differences has been presented in the literature [81]

Test scripts are often reused or cloned and modified to test slightly different products. They share many commonalities dictated by the commonalities in the tested systems and present variants that reflect the specific differences in the products. The similarity of software products appears to produce similarity in tests scripts related to similar features. In test suites for legacy systems, test scripts were often created by using a “copy and paste” practice across versions of the same system, or across test suites for similar products. As measured by the experiments presented in this paper, the high prevalence of Type-1 and Type-2 clones suggests that commonality is high and variants are relatevely few in these test suites. Similar conclusions about the distribution of types in test clones have been reported by Asaithambi [7]. Software variants based on clone analysis can be consolidated into product lines [22,23]. Similarly, test clone information could be used to organize test script variants into product lines. Commonalities that are identified between test scripts may lead to the construction of refactored common test components. The explicit management of the identified variability using appropriate test features may guide the construction of product specific test suites. Generic adaptable test cases for product line testing can be designed by exploiting similarity among test cases based on typical patterns of repetitions [97]. A study on using clone information [6] to extract generic re-usable tests was presented [7] using XMLbased Variant Configuration Language (XVCL) to capture and manage test variability. Specific tests can be developed and maintained in a reuse-based and cost-effective way. Asaithambi performed clone detection using the Clone Miner and Clone Analyzer tools [6] on tests written in Java for Android. We used CLAN [10] on proprietary industrial tests written in TTCN3. Asaithambi found high rates of redundancies that offer opportunities to boost testing productivity, if reuse-based approaches to build test case libraries are used. Asaithambi reported that at least 53% of test files and at least 79% of test methods contain some form of redundancy, and that most clones are either Type-1 or Type-2. A more detailed quantitative analysis of the distribution of similarity degrees and types was not reported, but rather the frequency of clone types was published. In our study, we report in detail the distribution of similarity in cloned test scripts and the total frequencies. Asaithambi found 90.0% of cloned fragments. This represents a higher density than our 20.4%. This could be explained by the different test script selection strategy. Their tests belong to libraries for one open-source system, while ours are test suites from different industrial projects. Asaithambi published explicit figures about Type-1 (called “Identical test Clones”) and Type-2 (called “Parametric Test Clones”) clones, while they reported Type-3 (called “Gapped Test clones”) only at file level. They detected Type-3 clones in 2.3% of files, while we detected 1.8% of Type-3 clones over the total number of cloned tests fragments. Type-3 test clones seem to be marginal in both studies. Nevertheless, when we compare the proportion of similarity, the figures are very similar. They observed a ratio of 87.4% of Type-1 and 12.6% of Type2 test clones, if marginal Type-3 clones are neglected. Similarly, we observed 84.4% of Type-1 and 15.6% of Type2 clones, if Types-3 clones are ignored. Asaithambi identified repetitive patterns in Android platform framework test cases. These patterns can be represented in generic form by applying variability management techniques. We did not

5.5. Software product lines A Software Product Line (SPL) [82,83] is a set of different variants of software-intensive systems that share common software components and manage variability among its systems. Variants among systems are implemented by defining product specific components that are related to different product features. Features must be explicitly managed in SPLs. When properly applied, SPLs can be advantageous in software development by increasing productivity, increasing software quality, and reducing time-to-market. Although criticized in the literature, some companies still use code cloning as a development practice to implement product lines [84]. Refactoring and management can improve product line maintenance [85–87]. Clone detection in application code can be used to support the evolution of product lines [22] by measuring similarity between pairs of functions and then identifying similar functions that will become part of shared components. In this approach, textual similarity among these functions is measured by the Levenshtein distance. This is similar to our DP analysis of clone clusters, as described in Section 3.2. In this way, variants can be consolidated and re-organized as SPLs using clone information and reflexion [23]. Common parts of software systems may also include requirements, code and test artifacts such as test plans, test cases, test procedures, and test data. Testing SPL has been described in the literature [88–94], surveys on SPL testing have been published [95,96], and generic adaptable tests for SPL testing have been proposed [97].

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF

ARTICLE IN PRESS T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

address test clone refactoring nor variability management in test clones, in this paper. Despite the differences, both studies indicate that Type-1 and Type-2 clones in test scripts are the majority of cases. Findings from both studies support the relevance and the opportunity of applying variability management techniques to test scripts for test organization, maintenance, and reuse. 6. Practical considerations for a successful integration in industry Technological transfer for large scale deployment has to conform to hardware and other practical restrictions, that often exist in industrial environments. A technology that is not flexible enough to cope with that, will simply never be adopted. For large scale deployment, the available standard hardware configuration is very often the only option, unless an appropriate large scale investment and organizational effort is envisaged. Although a discussion about practical restrictions is not a key research point for this paper, we think it is important to address practical issues such as memory limitations, especially in the context of an industrial collaboration. Even though the TTCN-3 parser within Titan was accessible via API calls, its extensive use of the Eclipse framework prohibits the parser from being executed from outside of the Eclipse environment. Since the current clone detection architecture deployed at Ericsson runs on a virtual machine on a distant server, this proved to be a major obstacle to overcome. The most important problem was the large memory overhead of Eclipse, which was incompatible with the limited resources of the standard virtual machine configuration available. To reduce resource consumption, we designed a headless plugin meant to start only the bare essentials of Eclipse required to run the Titan environment. This allowed us to interface the parsing architecture of CLAN with the parser of Titan. However, in the end we still had to increase the total amount of main memory on the virtual machine, from 3GB to 8GB. The average size of a TTCN-3 project is much smaller than a standard telecommunication application. Since the headless plugin with the increased size of the main memory is able to successfully process all application projects, it is also expected to process TTCN-3 projects, without problems. In the end, making Titan and CLAN work together made it possible to integrate clone detection in TTCN-3 scripts in a transparent fashion, without the detection process for C/C++ and Java interfering with the new component and vice versa. The AST produced by Titan was used to compute and collect the metrics corresponding to TTCN-3 functions. At this point, the CLAN process of clustering TTCN-3 fragments based on vectors of metrics could be applied. High flexibility in CLAN design was key to achieving that integration. For Ericsson, that flexibility was important, as well as keeping the integrity of Titan. It was imperative that we make no requests to change any part of Titan in order to deploy our software. Once information was extracted from Titan, the current clone detection environment was able to work as it is currently installed. 7. Further research As mentioned before, clones belonging to a test that was not selected for the current test suite, do not need to be updated immediately. Therefore, it would be interesting to study the “late propagation” phenomenon [98] in test suites. It would be also interesting to track test clones, clone genealogies, and clone modifications in test suite evolution, in comparison with application clone evolution.

[m5G;February 14, 2017;13:41] 11

Further research is required to better understand the tradeoff between the cost and effort required for preventive test clone management through clone removal, refactoring, or test variability management, and the cost and effort required to update clones based on the probability of their being re-selected and in need of modifications. Tracking clones in test suite evolution may help testers in understanding the modifications that need to be applied. Updating test clones from older versions could represent an “on-demand” form of late propagation [98] that could be studied. It would be interesting to study how cloned parts of tests evolve over several versions of a product. Also, it could be interesting to know how often late propagation occurs in test clones. Another more ambitious objective could be investigating whether “late propagation” could be automated somehow, in the context of test suite evolution. It would be interesting to study whether some clone transformation rules could be inferred from test suite evolution and to evaluate the automated application of these transformation rules to older tests. Although knowledge about clones is generally perceived as beneficial, which may hold for test clones as well, any actions taken with regard to said clones should be carefully evaluated. Nevertheless, the prevalence of Type-1 (82.9%) and Type-2 (15.3%) clones suggests that refactoring may be relevant and worth investigating because of its ease, effectiveness, and benefits. Type-1 and Type-2 clones are more easily refactorable than Type-3 clones, in applications [21]. Factoring out identical functions is fairly easy. An immediate reduction of updating effort is possible. Unfortunately, the design of test scripts may become slightly more complex. Investigation is required to assess the advantages and limitation of clone removal in test clones. Refactoring Type-2 clones is a little more complicated and should be inspired by published techniques in refactoring application clones. Please refer to Section 5 for references about application clone refactoring. Addressing other types of clones is even more difficult and less significant in our test sample, because only 1.8% of analyzed test clones are of Type-3. Investigation is required to assess the advantages and limitations of test clone refactoring in terms of time, cost, and effort. Systems examined in clone detection literature were applications written in imperative languages and in scripting languages. Previous research [20] showed that the distribution of clones in systems written using application languages does not seem significantly different from that of systems written in scripting languages. The systems we investigated in this paper are of a different nature, that is test scripts versus applications. Test scripts were written in TTCN-3, that is a standard language specifically designed for implementing test suites. The study presented in this paper is a case study, which shares some perspectives with descriptive studies and compares the distributions of clones in systems of a different nature in the same organization. We compared the distributions of clones in applications with that of clones in test scripts. Reported differences may be due to several factors and may be caused by the difference in the programming language and the difference in the nature of the software under investigation. We did not aim to identify the reasons for the reported differences, but simply to measure them and to assess the statistical significance of the different distributions. We did not investigate clones in tests written using application languages such as Java [7], but we observed findings similar to those reported for tests in Java, as discussed in Section 5.6. We did not investigate clones in application software written in TTCN-

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF 12

ARTICLE IN PRESS

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

3 either. These comparisons are beyond the scope of this paper and left to further research. This paper concentrated on the investigation of differences in clone distributions between test suites and applications, in contrast with previous studies that investigated scripting languages used in application software [20]. Automatic classification of clone differences could help refactoring tools. Approaches presented in the literature [5,99,100] could be enhanced, advantageously, by targeted user interactions. Such interactions could better focus and guide some complex refactoring activities and give more flexibility to the users. Investigating whether or not detected test clones and their possible refactoring might help to reduce the total testing effort is another research opportunity. While this was one of the original motivations for this project, helping testers to actually take advantage of the gathered clone information is a complete new project in itself. Studies on the differences between code written by developers and testers is also a possibility. It is generally assumed that code writing is done with the same skill set at every step of the development process, but clearly this study suggests otherwise. In the software development process, understanding why code writing and code idioms differ between different activities could be an important step towards a better integration of new technologies like clone detection. Refinement of the clone detection technique with an emphasis on TTCN-3 could improve the global performance of the tool. Because of the widespread clone stereotypes, some steps of the clone detection tool could be optimized to achieve faster results with a smaller memory footprint. Research in the area of software management and organizational structures to support software development could be carried out to accommodate clone detection and benefit from this information. That could include organizational re-structuring to take advantage of clone analysis, introduction of new technologies into an organization, and defining organizational structures, groups, responsibilities, and organizational processes. 8. Conclusion We reported results of an experiment on clone detection within industrial test suites in TTCN-3. Figures shown in Section 3 present different characteristics of the clone population in TTCN-3. We found that around 24% of fragments in the test suites are cloned, which is a very high proportion of clones compare to what is generally found in source code. The distribution of clone types shows a high percentage of Type-1 clones (82.9%), a smaller number of Type-2 clones (15.3%), and few Type-3 clones (1.8%). Similar conclusions about the distribution of types for test clones written in Java have been reported by Asaithambi [7]. In our study, the difference in proportion of Type-1 or Type2 clones between tests and applications is statistically significant. Type-1 and Type-2 clones are significantly higher in TTCN-3 scripts than in application source code. In applications, Type-1 and Type-2 clones are more easily refactorable than Type-3 clones [21]. In the investigated TTCN-3 test scripts, potential opportunities to promote test organization, maintenance, and reuse using re-factoring and test variability management [7] seem high. The distribution of similarity between Java test clones and TTCN-3 test clones may suggest that duplication in tests reflects the nature of tests rather than the nature of the programming language used to write tests. This is especially interesting, since we found that the distributions of clones in applications are significantly different compared to the distributions in test clones.

We want to emphasize the benefits of cooperation between academia and industry. These two groups have very different perspectives on software development. However, these viewpoints are frequently complementary. Time is needed to establish a good relationship built upon mutual understanding, but once such an understanding is achieved each group will benefit from the best of what the other brings to the table. Anyone following the path of cooperation we exemplified should not expect immediate understanding or benefits from the exercise. Yet, with time, the experience may be positive for both parties. The development of this project allowed us to learn different lessons about deploying a technology withing a group of testers instead of developers. Despite potential benefits, a lot of work is still required for such a technology to become a standard tool in software testing. Acknowledgments This research was funded by Ericsson. We wish to thank Renaud Lepage, Mario Bonja, and Fanny Lalonde Lévesque for their contributions to this paper and the related discussions. We also wish to thank Kristof Szabados for his technical assistance and insights into the Titan environment. We also wish to thank the reviewers for their helpful comments on this paper. References [1] M. Balazinska, E. Merlo, M. Dagenais, B. Lagu, K. Kontogiannis, Advanced clone-analysis as a basis for object-oriented system refactoring, in: Proc. Working Conference on Reverse Engineering (WCRE), IEEE Computer Society Press, 20 0 0, pp. 98–107. [2] F. Deissenboeck, B. Hummel, E. Juergens, B. Schaetz, S. Wagner, S. Teuchert, J.F. Girard, Clone detection in automotive model-based development, in: Proceedings of the International Conference on Software Engineering, IEEE Computer Society Press, 2008. [3] J. Guo, Y. Zou, Detecting clones in business applications, in: Proceedings of the Working Conference on Reverse Engineering, 2008. [4] C.K. Roy, J.R. Cordy, R. Koschke, Comparison and evaluation of code clone detection techniques and tools: a qualitative approach, Sci. Comput. Programm. 74 (7) (2009) 470–495. [5] C.K. Roy, J.R. Cordy, A mutation/injection-based automatic framework for evaluating code clone detection tools, in: Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops, in: ICST ’09, IEEE Computer Society, 2009, pp. 157–166. [6] H.A. Basit, S. Jarzabek, A data mining approach for detecting higher-level clones in software, IEEE Trans. Softw. Eng. (TSE) 35 (4) (2009) 497–514. [7] S.P.R. Asaithambi, S. Jarzabek, Towards Test Case Reuse: A Study of Redundancies in Android Platform Test Libraries, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 49–64. [8] 2014. http://www.ttcn-3.org. [9] 2014. http://www.ttcn-3.org/index.php/about/introduction. [10] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, E. Merlo, Comparison and evaluation of clone detection tools, IEEE Trans. Softw. Eng. - IEEE Comput. Soc. Press 33 (9) (2007) 577–591. [11] E. Merlo, T. Lavoie, P. Potvin, P. Busnel, Large scale multi-language clone analysis in a telecommunication industrial setting, in: Software Clones (IWSC), 2013 7th International Workshop on, IEEE, 2013, pp. 69–75. [12] K. Kontogiannis, R. De Mori, R. Bernstein, M. Galler, E. Merlo, Pattern matching for clone and concept detection, J. Autom. Softw. Eng. 3 (1996) 77–108. [13] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to algorithms, MIT Press,second edition. [14] S. Cha, Comprehensive survey on distance/similarity measures between probability density functions, Int. J. Math. Models Methods Appl. Sci. 1 (4) (2007) 300–307. [15] P. Jaccard, Nouvelles recherches sur la distribution florale, in: Bulletin de la Société Vaudoise des Sciences Naturelles, 1908, pp. 223–270. [16] C. Roy, J. Cordy, An empirical study of function clones in open source software, in: Reverse Engineering, 2008. WCRE ’08. 15th Working Conference on, 2008, pp. 81–90. [17] 2016. https://www.r-project.org. [18] É.D. Taillard, P. Waelti, J. Zuber, Few statistical tests for proportions comparison, Eur. J. Oper. Res. 185 (3) (2008) 1336–1350. [19] M. Hackerott, A. Urquhart, An hypothesis test technique for determining a difference in sampled parts defective utilizing fisher’s exact test ic production, IEEE Trans. Semicond. Manuf. 3 (4) (1990) 247–248. [20] C.K. Roy, J.R. Cordy, Are scripting languages really different? in: Proceedings of the 4th International Workshop on Software Clones, in: IWSC ’10, ACM, New York, NY, USA, 2010, pp. 17–24.

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF

ARTICLE IN PRESS T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

[21] N. Tsantalis, D. Mazinanian, G.P. Krishnan, Assessing the refactorability of software clones, IEEE Trans. Softw. Eng. 41 (11) (2015) 1055–1090. [22] T. Mende, F. Beckwermert, R. Koschke, G. Meier, Supporting the grow-and-prune model in software product lines evolution using clone detection, in: 12th European Conference on Software Maintenance and Reengineering, CSMR 2008, April 1-4, 2008, Athens, Greece, 2008, pp. 163–172. [23] R. Koschke, P. Frenzel, A.P.J. Breu, K. Angstmann, Extending the reflexion method for consolidating software variants into product lines, Softw. Qual. J. 17 (4) (2009) 331–366. [24] S. Easterbrook, J. Singer, M.-A. Storey, D. Damian, Guide to Advanced Empirical Software Engineering, Springer London, London, pp. 285–311. [25] D.E. Perry, A.A. Porter, L.G. Votta, Empirical studies of software engineering: a roadmap, in: Proceedings of the conference on The future of Software engineering, ACM, 20 0 0, pp. 345–355. [26] D.I. Sjøberg, J.E. Hannay, O. Hansen, V.B. Kampenes, A. Karahasanovic, N.-K. Liborg, A.C. Rekdal, A survey of controlled experiments in software engineering, Softw. Eng., IEEE Trans. 31 (9) (2005) 733–753. [27] D. Rattan, R. Bhatia, M. Singh, Software clone detection: a systematic review, Inf. Softw. Technol. 55 (7) (2013) 1165–1199. [28] Y. Dang, S. Ge, R. Huang, D. Zhang, Code clone detection experience at microsoft, in: Proceedings of the 5th International Workshop on Software Clones, in: IWSC ’11, ACM, New York, NY, USA, 2011, pp. 63–64. [29] R. Koschke, Large-scale inter-system clone detection using suffix trees and hashing, J. Softw. (2013). Accepted for publication [30] I. Keivanloo, Leveraging clone detection for internet-scale source code search, in: Program Comprehension (ICPC), 2012 IEEE 20th International Conference on, 2012, pp. 277–280. [31] Y. Dang, D. Zhang, S. Ge, C. Chu, Y. Qiu, T. Xie, Xiao: tuning code clones at hands of engineers in practice, in: Proceedings of the 28th Annual Computer Security Applications Conference, in: ACSAC ’12, ACM, New York, NY, USA, 2012, pp. 369–378. [32] Y. Yamanaka, E. Choi, N. Yoshida, K. Inoue, T. Sano, Industrial application of clone change management system, in: Software Clones (IWSC), 2012 6th International Workshop on, 2012, pp. 67–71. [33] E. Tuzun, E. Er, A case study on applying clone technology to an industrial application framework, in: Software Clones (IWSC), 2012 6th International Workshop on, 2012, pp. 57–61. [34] R. Koschke, Large-scale inter-system clone detection using suffix trees, in: CSMR, 2012, pp. 309–318. [35] J.R. Cordy, C.K. Roy, DebCheck: Efficient checking for open source code clones in software systems, in: ICPC, 2011, pp. 217–218. [36] W. Wang, M.W. Godfrey, Investigating intentional clone refactoring, ECEASST 63 (2014a). [37] W. Wang, M.W. Godfrey, Recommending clones for refactoring using design, context, and history, in: 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, September 29 - October 3, 2014, 2014b, pp. 331–340. [38] G. Krishnan, N. Tsantalis, Unification and refactoring of clones, in: Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week - IEEE Conference on, 2014, pp. 104–113. [39] T. Kamiya, Agec: An execution-semantic clone detection tool, in: Proc. ICPC, IEEE, 2013, pp. 227–229. [40] Y. Yuan, Y. Guo, Boreas: an accurate and scalable token-based approach to code clone detection, in: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, in: ASE 2012, ACM, New York, NY, USA, 2012, pp. 286–289. [41] T. Lavoie, E. Merlo, An accurate estimation of the levenshtein distance using metric trees and manhattan distance, in: IWSC, 2012, pp. 1–7. [42] H. Sajnani, C. Lopes, A parallel and efficient approach to large scale clone detection, IWSC, 2013. [43] H. Murakami, K. Hotta, Y. Higo, H. Igaki, S. Kusumoto, Gapped code clone detection with lightweight source code analysis, in: Proc. ICPC, IEEE, 2013, pp. 93–102. [44] T. Kamiya, S. Kusumoto, K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code, IEEE Trans. Softw. Eng. 28 (7) (2002) 654–670. [45] J.R. Cordy, C.K. Roy, The NiCad Clone Detector, in: Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension, in: ICPC ’11, IEEE Computer Society, Washington, DC, USA, 2011, pp. 219–220. [46] M.S. Uddin, C.K. Roy, K.A. Schneider, Simcad: an extensible and faster clone detection tool for large scale software systems, in: ICPC, 2013, pp. 236–238. [47] B. Hummel, E. Juergens, L. Heinemann, M. Conradt, Index-based code clone detection: incremental, distributed, scalable, in: Software Maintenance (ICSM), 2010 IEEE International Conference on, 2010, pp. 1–9. [48] L. Jiang, G. Misherghi, Z. Su, S. Glondu, Deckard: scalable and accurate tree-based detection of code clones, in: Software Engineering, 2007. ICSE 2007. 29th International Conference on, 2007, pp. 96–105. [49] R. Koschke, R. Falke, P. Frenzel, Clone detection using abstract syntax suffix trees, in: Working Conference on Reverse Engineering, IEEE Computer Society Press, 2006, pp. 253–262. [50] R. Falke, P. Frenzel, R. Koschke, Empirical evaluation of clone detection using syntax suffix trees, Empirical Software Engineering 13 (6) (2008) 601–643. [51] N. Gode, R. Koschke, Incremental clone detection, in: Software Maintenance and Reengineering, 2009. CSMR ’09. 13th European Conference on, 2009, pp. 219–228.

[m5G;February 14, 2017;13:41] 13

[52] M.W. Godfrey, D.M. German, J. Davies, A. Hindle, Determining the provenance of software artifacts, in: Proceeding of the 5th ICSE International Workshop on Software Clones, IWSC 2011, Waikiki, Honolulu, HI, USA, May 23, 2011, 2011, pp. 65–66. [53] M. Kim, D. Notkin, Using a clone genealogy extractor for understanding and supporting evolution of code clones, ACM SIGSOFT Softw. Eng. Notes 30 (4) (2005) 1–5. [54] M. Kim, V. Sazawal, D. Notkin, G.C. Murphy, An empirical study of code clone genealogies, in: Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 20 05, 20 05, pp. 187–196. [55] I. Baxter, A. Yahin, L. Moura, M. Sant’Anna, L. Bier, Clone detection using abstract syntax trees, in: Software Maintenance, 1998. Proceedings., International Conference on, 1998, pp. 368–377. [56] Y. Higo, U. Yasushi, M. Nishino, S. Kusumoto, Incremental code clone detection: A pdg-based approach, in: Reverse Engineering (WCRE), 2011 18th Working Conference on, 2011, pp. 3–12. [57] J. Krinke, Identifying similar code with program dependence graphs, in: Reverse Engineering, 2001. Proceedings. Eighth Working Conference on, 2001, pp. 301–309. [58] H. Kim, Y. Jung, S. Kim, K. Yi, Mecc: memory comparison-based clone detector, in: Proceedings of the 33rd International Conference on Software Engineering, in: ICSE ’11, 2011, pp. 301–310. [59] C. McMillan, M. Grechanik, D. Poshyvanyk, Detecting similar software applications, in: Proceedings of the 2012 International Conference on Software Engineering, in: ICSE 2012, 2012, pp. 364–374. [60] L. Barbour, H. Yuan, Y. Zou, A technique for just-in-time clone detection in large scale systems., in: ICPC, 2010, pp. 76–79. [61] L. Barbour, F. Khomh, Y. Zou, Late propagation in software clones, in: Proceedings of the 2011 27th IEEE International Conference on Software Maintenance, in: ICSM ’11, IEEE Computer Society, Washington, DC, USA, 2011, pp. 273–282. [62] L. Jiang, Z. Su, E. Chiu, Context-based detection of clone-related bugs, in: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, in: ESEC-FSE ’07, ACM, New York, NY, USA, 2007, pp. 55–64. [63] J. Li, M.D. Ernst, CBCD: cloned buggy code detector, in: Proceedings of the 34th International Conference on Software Engineering, in: ICSE ’12, IEEE Press, 2012, pp. 310–320. [64] A. Lozano, M. Wermelinger, Assessing the effect of clones on changeability, in: Software Maintenance, 20 08. ICSM 20 08. IEEE International Conference on, IEEE, 2008, pp. 227–236. [65] M. Mondal, C.K. Roy, M.S. Rahman, R.K. Saha, J. Krinke, K.A. Schneider, Comparative stability of cloned and non-cloned code: An empirical study, in: Proceedings of the 27th Annual ACM Symposium on Applied Computing, ACM, 2012, pp. 1227–1234. [66] N. Göde, R. Koschke, Frequency and risks of changes to clones, in: Proceedings of the 33rd International Conference on Software Engineering, in: ICSE ’11, ACM, New York, NY, USA, 2011, pp. 311–320. [67] D. Steidl, N. Göde, Feature-based detection of bugs in clones, in: Proceedings of the 7th International Workshop on Software Clones, in: IWSC ’13, IEEE Press, 2013, pp. 76–82. [68] M. Mondal, C.K. Roy, K.A. Schneider, A comparative study on the bug-proneness of different types of code clones, in: 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015, Bremen, Germany, September 29 - October 1, 2015, 2015, pp. 91–100. [69] S. Thummalapenta, L. Cerulo, L. Aversano, M. Di Penta, An empirical study on the maintenance of source code clones, Empirical Softw. Eng. 15 (1) (2010) 1–34. [70] K. Hotta, Y. Sano, Y. Higo, S. Kusumoto, Is duplicate code more frequently modified than non-duplicate code in software evolution?: An empirical study on open source software, in: Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), in: IWPSE-EVOL ’10, ACM, New York, NY, USA, 2010, pp. 73–82. [71] C.J. Kapser, M.W. Godfrey, “Cloning considered harmful” considered harmful: patterns of cloning in software, Empirical Softw. Eng. 13 (6) (2008) 645–692. [72] J. Krinke, Is cloned code older than non-cloned code? in: Proceedings of the 5th International Workshop on Software Clones, in: IWSC ’11, ACM, New York, NY, USA, 2011, pp. 28–33. [73] N. Gode, J. Harder, Clone stability, in: Proceedings of the 2011 15th European Conference on Software Maintenance and Reengineering, in: CSMR ’11, IEEE Computer Society, Washington, DC, USA, 2011, pp. 65–74. [74] M. Mondal, C.K. Roy, K.A. Schneider, Automatic ranking of clones for refactoring through mining association rules, in: 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, CSMR-WCRE 2014, Antwerp, Belgium, February 3-6, 2014, 2014a, pp. 114–123. [75] M. Mondal, C.K. Roy, K.A. Schneider, Automatic identification of important clones for refactoring and tracking, in: 14th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2014, Victoria, BC, Canada, September 28-29, 2014, 2014b, pp. 11–20. [76] M. Balazinska, E. Merlo, M. Dagenais, B. Lagüe, K. Kontogiannis, Partial redesign of java software systems based on clone analysis, in: Proc. Working Conference on Reverse Engineering (WCRE), IEEE Computer Society Press, 1999, pp. 326–336.

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008

JID: INFSOF 14

ARTICLE IN PRESS

[m5G;February 14, 2017;13:41]

T. Lavoie et al. / Information and Software Technology 000 (2017) 1–14

[77] M.F. Zibran, C.K. Roy, Conflict-aware optimal scheduling of prioritised code clone refactoring, IET Softw. 7 (3) (2013) 167–186. [78] S. Bouktif, G. Antoniol, E. Merlo, M. Neteler, A novel approach to optimize clone refactoring activity, in: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, in: GECCO ’06, ACM, New York, NY, USA, 2006, pp. 1885–1892. [79] M. Mondal, C.K. Roy, K.A. Schneider, SPCP-Miner: a tool for mining code clones that are important for refactoring or tracking, in: 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015, 2015, pp. 484–488. [80] N. Volanschi, Safe clone-based refactoring through stereotype identification and iso-generation, in: Proceedings of the 6th International Workshop on Software Clones, in: IWSC ’12, IEEE Press, Piscataway, NJ, USA, 2012, pp. 50–56. [81] G.P. Krishnan, N. Tsantalis, Unification and refactoring of clones, in: Proceedings of the IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering, 2014, pp. 104–113. [82] P. Clements, L. Northrop, Software product lines: practices and patterns (2002). [83] Software engineering institute, software product lines, http://www.sei.cmu. edu/productlines. [84] Y. Dubinsky, J. Rubin, T. Berger, S. Duszynski, M. Becker, K. Czarnecki, An exploratory study of cloning in industrial software product lines, in: Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on, IEEE, 2013, pp. 25–34. [85] R. Kolb, D. Muthig, T. Patzke, K. Yamauchi, Refactoring a legacy component for reuse in a software product line: a case study, J. Softw. Maint. Evol. 18 (2) (2006) 109–132. [86] J. Rubin, K. Czarnecki, M. Chechik, Managing cloned variants: a framework and experience, in: Proceedings of the 17th International Software Product Line Conference, in: SPLC ’13, ACM, New York, NY, USA, 2013, pp. 101–110. [87] J. Rubin, K. Czarnecki, M. Chechik, Cloned product variants: from ad-hoc to managed software product lines, Int. J. Softw. Tools Technol. Transf. 17 (5) (2015) 627–646. [88] K. Pohl, G. Böckle, F.J. van Der Linden, Software product line engineering: foundations, principles and techniques, Springer Science & Business Media, 2005.

[89] I. Carmo Machado, J.D. McGregor, E. Santana de Almeida, Strategies for testing products in software product lines, ACM SIGSOFT Softw. Eng. Notes 37 (6) (2012) 1–8. [90] J. McGregor, Testing a Software Product Line, Technical Report, Software Engineering Institute, Carnegie Mellon University, 2001. CMU/SEI-2001-TR-022 [91] J.D. McGregor, Testing a software product line, in: Testing Techniques in Software Engineering, Springer, 2010, pp. 104–140. [92] E. Engström, P. Runeson, Software product line testing - a systematic mapping study, Inf. Softw. Technol. 53 (1) (2011) 2–13. [93] P.A.D.M.S. Neto, I. do Carmo Machado, J.D. McGregor, E.S. De Almeida, S.R. de Lemos Meira, A systematic mapping study of software product lines testing, Inf. Softw. Technol. 53 (5) (2011) 407–423. [94] M. Al-Hajjaji, T. Thüm, J. Meinicke, M. Lochau, G. Saake, Similarity-based prioritization in software product-line testing, in: Proceedings of the 18th International Software Product Line Conference - Volume 1, in: SPLC ’14, ACM, New York, NY, USA, 2014, pp. 197–206. [95] J. Lee, S. Kang, D. Lee, A survey on software product line testing, in: Proceedings of the 16th International Software Product Line Conference - Volume 1, in: SPLC ’12, ACM, New York, NY, USA, 2012, pp. 31–40. [96] I. Carmo Machado, J.D. Mcgregor, Y.a.C. Cavalcanti, E.S. De Almeida, On strategies for testing software product lines: a systematic literature review, Inf. Softw. Technol. 56 (10) (2014) 1183–1199. [97] S.P.R. Asaithambi, S. Jarzabek, Generic adaptable test cases for software product line testing, in: Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity, in: SPLASH ’12, ACM, New York, NY, USA, 2012, pp. 33–36. [98] L. Barbour, F. Khomh, Y. Zou, An empirical study of faults in late propagation clone genealogies, J. Softw. 25 (11) (2013) 1139–1165. [99] M. Balazinska, E. Merlo, M. Dagenais, B. Lagu, K. Kontogiannis, Measuring clone based reengineering opportunities, in: Proc. International Software Metrics Symposium, IEEE Computer Society Press, 1999a, pp. 292–303. [100] M. Balazinska, E. Merlo, M. Dagenais, B. Laguë, K. Kontogiannis, Partial redesign of java software systems based on clone analysis, in: WCRE, 1999b, pp. 326–336.

Please cite this article as: T. Lavoie et al., A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting, Information and Software Technology (2017), http://dx.doi.org/10.1016/j.infsof.2017.01.008