Computer Communications 53 (2014) 37–51
Contents lists available at ScienceDirect
Computer Communications journal homepage: www.elsevier.com/locate/comcom
Configuration analysis and recommendation: Case studies in IPv6 networks Fuliang Li a,b, Jiahai Yang a,b,⇑, Jianping Wu a,b, Zhiyan Zheng a,b, Huijing Zhang c, Xingwei Wang d a
Institute for Network Sciences and Cyberspace, Tsinghua University, 100084 Beijing, China Tsinghua National Laboratory for Information Science and Technology (TNList), Beijing, China c Beijing University of Posts and Telecommunications, Beijing, China d College of Information Science and Engineering, Northeastern University, Shenyang, China b
a r t i c l e
i n f o
Article history: Received 17 January 2014 Received in revised form 23 July 2014 Accepted 26 July 2014 Available online 4 August 2014 Keywords: Network management IPv6 Configuration analysis Configuration recommendation
a b s t r a c t Unraveling the characteristics of configurations can offer deep insights into networks. There are many analyses of IPv4 configurations, while few works are focusing on configurations of IPv6. In this paper, we conduct a first-ever study on IPv6 configurations based on the configuration snapshots of a pure IPv6 network – CERNET2 and a dual-stack network – Internet2. We find that configuration commands of IPv6 are a bit more complicated than IPv4 because of the complexity of IPv6 addresses. Configuration command lines of IPv6 are less abundant than IPv4, attributing to the smaller network scale of IPv6 compared with IPv4. Configurations of IPv6 are less complicated than IPv4 in views of referential dependence, but present a higher growth rate than IPv4, which is caused by fast development of IPv6. More importantly, we propose a framework for network configuration recommendation (FNCR) for the studied networks according to our analysis methods and results. Overall, although IPv6 is smaller in scale and less mature than IPv4 currently, it has been experiencing fast development as the next generation networks. Hence understanding configuration characteristics and enhancing configuration management are essential for IPv6 networks. Ó 2014 Elsevier B.V. All rights reserved.
1. Introduction With the exhaustion of IPv4 addresses, transition to IPv6 (Internet Protocol-IP version 6) [1–3] is imminent and challenging. To understand the deployment and usage of IPv6 networks, various kinds of data are collected and analyzed, including Regional Internet Register (RIR) logs, BGP dumps, DNS, WEB and FTP server logs, traceroute data, Netflow records, packet level datasets, web access data [4–13]. However, there are few data concerning on IPv6 configurations. Configuration files contain sets of configuration commands currently executing on the routers. Mining information from configuration files can offer abundant and beneficial insights into networks. Firstly, such a study can shed direct light on how network functions are composed [14,15]. Which functionalities are most widely configured? What change mostly with the network
⇑ Corresponding author at: Room 3-210, FIT Building, Tsinghua University, Beijing 10084, China. Tel.: +86 10 62603217; fax: +86 10 62785983. E-mail addresses: lfl
[email protected] (F. Li),
[email protected] (J. Yang),
[email protected] (J. Wu),
[email protected] (Z. Zheng),
[email protected] (H. Zhang),
[email protected] (X. Wang). http://dx.doi.org/10.1016/j.comcom.2014.07.011 0140-3664/Ó 2014 Elsevier B.V. All rights reserved.
evolution? Secondly, extracting policies from configuration files is helpful for configuration validation [16–18]. Configuration is an ad-hoc and error-prone job. Configuration mistakes are one of the main sources of network interruption and network anomalies [19–22]. Understanding the structure and design pattern of configuration files can show us how the policies execute and collaborate, as well as, where are the key points that are error-prone. Thirdly, deep analysis about configuration files can uncover the complexity of network management and the challenges of network-based services [23,24]. We will know which parts of configurations are time-consuming and how to decrease the complexity of configurations, either through optimizing configuration languages, or adjusting the structure of network. At last, such a study also makes automated configuration provisioning possible [25–29]. Based on the knowledge of network configuration in terms of functions, policies and languages, we can design a better management system to configure devices automatically. This can not only bring down burden on operators, but also reduce the probability of misconfigurations. In this paper, we present a first-ever analysis of IPv6 configurations. We retrieve configuration files of 25 production routers from a pure IPv6 network CERNET2 – the next generation China
38
F. Li et al. / Computer Communications 53 (2014) 37–51
Education and Research Network, as well as 9 production routers from a dual-stack network Internet2. We unveil the beneficial information hidden in the configuration files which, we believe, can supplement and fill the gaps of our understanding of IPv6. In addition, according to our analysis results and their potential value in configuration management, we put forward a framework which can achieve the goal of configuration recommendation. Topic One – Configuration commands: How many lines are there in the configuration files? What are the main differences of configuration commands between IPv6 and IPv4? Do the configurations of IPv6 need operators to spend much more efforts than IPv4? Topic Two – Configuration composition: Which function types account for most command lines? Does each function type of IPv4 contain more command lines than that of IPv6? Are there relationships existing among the function types? Topic three – Configuration complexity: Is IPv6 more complicated compared to IPv4? Which grows faster in configuration complexity, IPv6 or IPv4? What are the reasons accounting for the differences in configuration complexity between IPv6 and IPv4? The major challenging issues in this paper include two aspects. For one thing, our analysis of IPv6 configurations is based on Juniper routers. Configurations of Juniper routers are quite different from Cisco routers. According to our experience in daily operation and management of an IPv6 backbone network, we first describe the Juniper syntax through a configuration snippet in Section 3.1 and then show our analysis methodologies based on parsing Juniper syntax in Section 3.3. For another, we analyze the configuration complexity by extracting the configuration templates and dependence from the configuration files. However, both template construction and dependence creation need us to fully understand the structure of configurations and correctly identify the key words in the configuration files. Details of corresponding methods are presented in Section 6. Our main findings are described as follows. First of all, we find that configuration commands of IPv6 are a bit more complicated than IPv4. The complexity is mainly caused by IPv6 address. Although the growth rate of IPv6 configurations is larger than that of IPv4, the command lines of IPv6-related in a dual-stack network are much fewer than IPv4-related. In addition, Configurations of IPv6 are less complicated than IPv4 in views of referential dependence and configuration consistency. Configurations of IPv6related only contribute 2% of the changes taken place in the dual-stack network. All these can be attributed to that IPv6 is still far lagging behind IPv4, which is easy to explain – they have different histories, and are at different stages of their developments. More importantly, according to our analysis methods and results, we put forward a framework for network configuration recommendation (FNCR), which can enhance the ability of configuration management for the networks in case studies. The remainder of the paper is organized as follows. The related works are presented in Section 2. Section 3 describes the background and methodologies, including overview of a configuration file, datasets utilized in this study and methodologies adopted to analyze the characteristics of IPv6 configurations. Then differences between IPv4 and IPv6 in configuration commands are investigated in Section 4. Configuration composition of IPv6 and its comparison with the counterpart of IPv4 is presented in Section 5. Then quantitative analyses of configuration complexity of IPv6 are introduced in Section 6. Section 7 shows a framework of configuration recommendation. Section 8 presents a preliminary analysis of configuration evolution of Internet2. We describe implications of our analysis results, and discuss limitations of our study in Section 9. Finally, conclusions and future remarks are presented in Section 10.
2. Related work Related works are presented from three aspects. We begin with discussion of existing works on data analysis to understand the development and usage of IPv6 networks. Then, although there is no work on studying configurations of IPv6, many studies have been conducted to characterize and diagnose configurations of IPv4. At last, further goals of automated configuration provisioning have also been achieved through extension studies on configuration analysis. 2.1. Studies on IPv6 network Colitti et al. [4] measured IPv6 adoption from the perspective of a website operator. Malone [5] analyzed transition technologies based on the data collected from FTP and DNS servers. Berger et al. [36] characterized the inter-relation of IPv4 and IPv6 among Internet DNS resolvers by deploying both active and passive measurement techniques. Czyz et al. [37] collected and analyzed unclaimed traffic on the IPv6 Internet by announcing five large/ 12 covering prefixes. Karpilovsky et al. [6] found that although allocations of IPv6 addresses are growing, many prefixes are used either long time after allocation or not used at all. Hicks [7] found that the IPv6 AS-core map is comprised of 715 AS nodes in 2010 compared with 515 nodes in 2009. Investigations of netflow records and packet level datasets indicate that IPv6 development in China is progressing rapidly [8,9]. Analysis of IPv6 traffic was performed to reveal what happened after the World IPv6 Day [10,11]. Nikkah et al. [12] found that if AS-level paths are the same, IPv6 performance is similar to IPv4 performance. Dhamdhere et al. [13] discovered that IPv6 deployment is strong in core, lagging at edge. Luckie, Beverly and et al. [38,39] devoted to alias resolution technique for IPv6, which represents a step toward understanding the Internet’s IPv6 router-level topology. Lutu et al. [40] found that the fraction of limited visibility address space (i.e., not present in all the global routing tables of the operational networks.) is similar in the IPv4 and IPv6 Internet. Alzoubi et al. [41] found no evidence of performance penalty for unilateral enabling of IPv6, which provides basis for sites when they consider their IPv6 migration strategy. Many aspects of IPv6 networks have been investigated across wide range of datasets, but there are no data on configuration analysis of IPv6. In this paper, we perform a comprehensive analysis on configurations of two backbone networks (i.e., a pure IPv6 network and a dual stack network), which can make positive contribution to helping us to fully understand IPv6 networks. 2.2. Configuration analysis Many analyses of IPv4 configurations have been performed to address configuration properties and to diagnose misconfigurations. Kim et al. [14] analyzed five-year data of router, switch and firewall configurations from two large campus networks, which presented a long-term longitudinal study on the evolution of network configurations. Sung et al. [15] investigated dynamics of configurations of five enterprises’ VPN customers and presented the sets of portions changing most often. Feamster et al. [16] conducted a statistical analysis of router configurations to identify misconfigurations in BGP. Le et al. [17] adopted data mining technologies to extract policies of router configurations, which would be utilized as test cases to detect configuration errors. Based on a snapshot of the configurations, Xie et al. [18] made a statistical analysis of network reachability, which can be used to troubleshoot reachability problems and perform what-if analysis. Related analyses have proved that misconfiguration is the main reason for network outage and anomalies [19–22]. Czyz et al. [37] found that
F. Li et al. / Computer Communications 53 (2014) 37–51
misconfiguration is prevalent in IPv6 networks. Benson et al. proposed two models to quantify the complexity of configurations [23,24]. They not only captured the difficulty of configuring control and data plane behaviors on routers, but also unraveled the complexity of configuring network-based ISP services. Many studies have been performed on IPv4 configurations. However, there are few data analyzing IPv6 configurations. In this paper, we take a first step towards understanding configurations of IPv6 networks. 2.3. Automated configuration provisioning Extensive studies on network configurations have been performed to achieve the goal of automated configuration provisioning. Caldwell et al. [25] developed a tool to identify configuration policies and templates, which can drive the migration of the network toward automated provisioning. Gottlieb et al. [26] carried forward the template-driven approaches for ISPs to configure connections to new BGP-speaking customers automatically. Enck et al. [27] proposed a system called PRESTO, which can construct devicenative configurations based on the composition of function templates. They also presented the experience of configuring VPN and VoIP services. Chen et al. [28,29] put forward a mechanism for configuration management by taking advantage of database to abstract specific configurations and adopting declarative language to describe dependencies and restrictions among network components. Our analyses of configurations in IPv6 network can likewise help identify and construct configuration templates and configuration dependence, based on which, a framework (FNCR) for network configuration recommendation is put forward for studied networks. FNCR is a recommendation framework, which is different from automated configuration provisioning, but it also can lessen the workload on operators and reduce the risk of configuration errors. Although previous researches have performed many kinds of configuration analysis of IPv4 networks, as well as revealed the development and usage of IPv6 networks based on various datasets, our work is quite different and the main contributions are summarized as follows. First of all, our work is a first-ever and large-scale case study of IPv6 configurations, which enriches our understanding of IPv6 from the perspective of network configurations. Secondly, our datasets are gathered from the routers of a pure and production IPv6 backbone network CERNET2 and a large dual-stack backbone network Internet2. So our datasets are convincing to explore characteristics of configurations in IPv6 networks. Thirdly, we first conduct a comparison study on network configuration between IPv6 and IPv4 in terms of specific configuration commands, configuration composition and configuration complexity, which we believe is meaningful for further study of IPv6. At last, our analysis methods and results drive the migration of configuration management toward configuration recommendation for backbone networks. 3. Background and methodologies In this section, we first take a look at a configuration snippet. Then we describe the datasets and the studied networks. Finally, we present the analyzing methodologies of IPv6 configurations. 3.1. Overview of a configuration snippet Ahead of our analysis, we make a description of the configuration file of a backbone router and provide an overview of the device-level configuration commands. The configuration file of a Juniper router consists of several types of modules, while the
39
configuration file of a Cisco device consists of several types of stanzas. A stanza is defined as a set of command lines representing a specific functionality, while a module is defined as a type of function that is comprised of many stanzas with the same functionality. Configurations of Juniper devices are well organized obeying a strict hierarchical structure. The first level of the hierarchical structure is module. We separate stanzas from configuration files by identifying the special symbols (‘‘{’’ and ‘‘}’’). We extract a simple configuration snippet of a Juniper router of CERNET2. As depicted in Fig. 1, this snippet is made up of two types of function modules, i.e., interfaces module (lines 1–16) and firewall module (lines 17–36). The Interfaces module consists of all kinds of interfaces that will be enabled on a router. The Firewall module is comprised of filters that will be used to control the input and output of the interfaces. In this configuration snippet, the interface ge-1/1/0 applies two filters (lines 7–9). One is used to control the input (sava-ctc2bw) and the other is used to control the output (ipv6-sample). The filter of savactc2bw is defined in the firewall module (lines 19–34). We omit the description of ipv6-sample for space reasons. 3.2. Datasets We retrieve configuration files of 34 backbone routers. 25 of them are gathered from CERNET2 and the other 9 are gathered from Internet2 [32]. We gather the configuration snapshots on April 8th, 2013 for CERNET2 and from March 29, 2013 to April 29, 2013 for Internet2. We have also obtained the configuration snapshots of CERNET2 and Internet2 in 2010. Certain sections of configurations of Internet2 routers, such as the filters, SNMP, and user login portions, have been removed for security purposes. According to these data, we can conduct an analysis of configurations of IPv6 networks. In addition, benefiting from the dual-stack architecture of Internet2, we can also make a comparison observation on configurations between IPv4 and IPv6. Note that all analyses on configuration snapshots are based on Juniper syntax in this paper. We briefly introduce the topologies of the studied networks. CERNET2 [30] (The next generation China Education and Research Network) is a nation-wide pure IPv6 backbone network. It focuses on solving technical challenges during IPv6 deployment, overcoming difficulties in the operation and management of IPv6 networks, developing innovative IPv6 applications and building nationwide commercialized IPv6-enabled backbones. Fig. 2(a) shows the backbone of CERNET2. CERNET2 has 25 PoPs connected with each other via 2.5 Gbps or 10 Gbps links, providing IPv6 services for more than 200 campus networks at 1 Gbps, 2.5 Gbps or 10 Gbps. Internet2 [31] was introduced as an advanced not-for-profit United States networking consortium in 1996. Internet2 is a dual-stack network, supporting both IPv4 and IPv6 protocols. In 2009, Internet2 covered over 200 higher education institutions, over 40 members from industry, over 30 research and education networks and connector organizations and over 50 affiliate members. Fig. 2(b) shows the backbone of Internet2, which has 9 PoPs connected with each other via 10 Gbps or 100 Gbps links. 3.3. Methodologies Our methodologies for investigating network configurations of IPv6 contain three parts. First, we describe the configuration commands. Second, we analyze the configuration composition. Third, we unveil the complexity hidden in configuration files. Two key points should be clarified firstly. For one thing, our analysis is performed on Juniper devices, so analyzing methodologies do not apply to Cisco devices. However, the motivations of the paper and the results gained by the methodologies are of universal
40
F. Li et al. / Computer Communications 53 (2014) 37–51
Fig. 1. A configuration snippet of a Juniper device.
Fig. 2. Backbones of CERENT2 and Internet2.
guiding significance. For another, different human approaches to configuring devices may present different characteristics of configurations. And yet, we analyze the configurations of the studied networks one by one and do not make comparison between two different networks (e.g., CERNET2 and Internet2) managed by different operators. 3.3.1. Describing configuration commands We first describe two basic properties of configuration commands. On the one hand, we investigate the characteristics in specific configuration commands of IPv6. In order to make a comparison study on configurations between IPv6 and IPv4, we pick out all parts of the configurations of Internet2 related to layer 2 settings, which cannot be determined whether they undertake IPv6 services or IPv4 services. We divide the remainder part of the configurations into two groups: Internet2-IPv4 and Internet2IPv6. The division method is according to the differences between IPv4 commands and IPv6 commands (shown in Section 4.1). All the comparison analyses between IPv6 and IPv4 in this paper are based on the above partition. On the other hand, we analyze the configuration file size, which can reflect the size of the studied network to some extent. The configuration file size is defined as the number of command lines. 3.3.2. Abstracting high-level functionalities We abstract high-level functionalities from device-level configurations to reveal the composition of configuration snapshots of IPv6. Configurations of Juniper devices consist of many modules,
each of which is comprised of groups of command stanzas with similar functionalities. Different from what Kim et al. [14] have done, we consider each module in the configuration snapshot as a unique function type, which can represent a set of similar functionalities. Table 1 shows the function module map and the meaning of each module in our analysis. Note that the firewall module and the snmp module are removed for security reasons. The groups module defines many common interface media parameters that can be inherited by the interfaces module, which defines interfacespecific addressing information. The routing-options module defines protocol-independent routing properties. The protocols module defines routing protocols for route redistribution and exchange. The policy-options module defines many routing control policies that are referred by the protocols module. The class-of-service module allows operators to divide traffic into classes and offer various levels of throughput and packet loss when congestion occurs. The
Table 1 Function module map. Type
Meaning
Groups (gr) Interfaces (in) Routing-options (ro) Protocols (pr) Policy-options (po) Class-of-service (cs) Firewall (fi) Others
Common interface media parameters Interface-specific addressing information Protocol-independent routing properties Routing protocols Routing control policies Various levels of throughput and packet loss Packets filtering schemes SNMP, chassis, security, services, etc.
F. Li et al. / Computer Communications 53 (2014) 37–51
firewall module defines many packet filtering schemes that are referred by the interfaces module. 3.3.3. Unveiling configuration complexity Our third analysis task is unveiling the complexity arising from device-level configuration commands. We conduct this analysis based on two complexity models, which have been proved to be valid of unveiling configuration complexity. One is Referential Graph. The configuration files are constructed by a set of stanzas. When configuring a stanza, operators are often required to consider which stanzas may be related to it. In order to unravel these dependences, we create referential graphs for each router and among different routers. The nodes of the referential graph represent stanzas and the edges between nodes stand for the references between stanzas. The more referential links that should be created to refer to a stanza, the more difficult to create and maintain the stanza. The other model is Templates. Each template represents a specific functionality or an explicit configuration task, which is helpful to achieve the goal of automated configuration provisioning [25,26]. The templates are constructed by one or more stanzas, extracted from one or more than one routers. Thus, templates model can help to track the uniformity in configurations. The more templates that can be found, the more complicated that network management and configuration tasks will be. 4. Configuration commands Configuration files consist of many blocks of commands, which undertake high level configuration tasks. We make an investigation on configuration commands to answer the questions of Topic one. 4.1. Characteristics of commands Previous works of configuration analysis are mainly about IPv4 protocol with Cisco syntax [14,15]. Our work is a first-ever configuration analysis of IPv6 protocol with Juniper syntax. We now show the differences between IPv4 and IPv6 in specific configuration commands. In order to meet the requirements of IPv6 protocol, some traditional protocols are updated to support IPv6. For example, protocols of OSPFv2 and RIP serve for IPv4, while protocols of OSPFv3 and RIPng exclusively serve for IPv6. Other protocols also increase family inet6 to support IPv6, e.g., BGP has added inet6 (IPv6 NLRI parameters), inet6-mvpn (IPv6 MVPN NLRI parameters) and inet6-vpn (IPv6 Layer 3 VPN NLRI parameters) to advocate IPv6 protocol. In order to discriminate IPv6 configurations from IPv4, operators often name the variables with 6 or V6, e.g., group CONNECTOR6, policy-statement ISP-V6-IN, etc. Based on our observations, although configuration commands for IPv4 and IPv6 are a bit different, actually, they are protocol independent in most cases [3]. There are not too many differences in configurations between IPv4 and IPv6. The most obvious difference is the IP address. The length of an IPv6 address is 128 bits, compared to 32 bits of an IPv4 address. The longer address length can provide more available addresses, accompanying with reasonably allocating addresses, commendably enhancing routing convergence. However, it also brings in more complexity to operators when configuring IPv6 addresses. Through face-to-face interviewing with the operators of CERNET2, we may draw a conclusion that IPv6 addresses are troublesome to operators. For one thing, they are hard to remember, so operators often glance at the address spreadsheet repeatedly when configuring them. For another, although operators have carefully configured IPv6 addresses, they still make misconfigurations. Configuration commands of IPv6 do not need operators to spend much more time than IPv4, but need operators to pay more attention to them.
41
4.2. Configuration file size Configuration file size is not a good indication to reveal the complexity of network configuration, but it can reflect the development of the network to some extent. Fig. 3(a) and (b) shows the distribution of configuration file size of Internet2 and CERNET2. CERNET2 consists of relatively small files, with 80% of their files being under 600 lines in 2010 and 1000 lines in 2013. While Internet2 is composed of relatively large files, with 60% of their files being over 6000 lines in 2010 and 8000 lines in 2013. Configuration file sizes of both CERNET2 and Internet2 change obviously over time. As our statistics show, total command lines of CERNET2 are 15,811 in 2010, while in 2013, the number reaches up to 23,177, increasing by 47%. In CERNET2, command lines of the largest configuration file are 2245 in 2010, while the number is 3014 in 2013, presenting a 34.3% growth. This router directly connects with the main border router of CNGI-6IX (providing high-speed connection service to domestic CNGI backbones and international research networks), so it is easy to explain why it has the largest configuration file. For Internet2, total command lines increase over 18.93% from 2010 to 2013. The smallest file in Internet2 is filled with about 3667 command lines in 2010, while the smallest file is comprised of 5622 command lines in 2013, which shows a 53% growth. Further analysis of the increase especially the relationships between the increase and the function modules will be stated in Section 5. Based on the division of Internet2-IPv4 and Internet2-IPv6, we make a comparison study on configuration file size between IPv4 and IPv6. The results are depicted in Fig. 3(c) and (d). As a dualstack network of Internet2, the number of command lines is greater in IPv4 than IPv6 – about 3.15 times in 2010, while 2.8 times in 2013. Although the results imply a growth in command lines of IPv6 in Internet2, it is a fact that configuration file size of Internet2-IPv6 is smaller than that of Internet2-IPv4, which illustrates that operators concern more to provide IPv4 connections and services than IPv6 in Internet2. Comparing Fig. 3(c) with Fig. 3(d), we find that the growth rate of IPv6 (22.46%) is larger than that of IPv4 (9.1%). This is a positive indication for IPv6 development. Note that configuration file size of CERNET2 is smaller than that of Internet2-IPv6. As a dual-stack network, Internet2 contains many configurations shared by IPv6 and IPv4, which are not easily distinguished by us, so if we want to make a comparison study on IPv6 development between a dual-stack network and a pure IPv6 network, we had better perform passive measurements on traffic [8–11] or active measurements on performance [12,13]. 5. Configuration composition For Juniper devices, configuration files are composed of many function modules, each of which contains grouping commands bearing different properties and acting in different functionalities. We conduct a study on configuration composition to answer the questions of Topic two. 5.1. Configuration modules There are 24 function modules defined in the configurations of Juniper devices. But for specific router configuration, only parts of the modules are used. Based on the module map shown in Table 1, we classify the configuration files of CERNET2 and Internet2 into module-scale pieces. As shown in Fig. 4(a), Internet2 defines more command lines on policy-options module, making up 62.82% of total command lines in 2010 and 57.64% in 2013. The policy-options module also accounts for 33.7% of the growth of the command lines
42
F. Li et al. / Computer Communications 53 (2014) 37–51
Fig. 3. Cumulative distribution of configuration file size.
Fig. 4. Analysis of configuration composition.
during the past three years. As depicted in Fig. 4(b), in CERNET2, more command lines are configured on protocols module, which occupies 43.8% of total command lines in 2010 and 32.1% in 2013. For CERNET2, there is no configuration command related to class-of-service module in 2010, but appears in 2013, and contributes 34.4% of the growth of command lines. In addition, interfaces module (28.9%) and policy-options module (20%) also contribute greatly to the growth of command lines of CERNET2. We conduct a comparison analysis of configuration composition between IPv6 and IPv4. Fig. 4(c) and (d) shows the configuration composition of Internet-IPv4 and Internet2-IPv6 respectively. Internet2-IPv4 contains more command lines than IPv6 across most function modules. We can readily explain that although
IPv6 was fast-expanding in the past three years, the scale is still smaller than IPv4 in the dual-stack network of Internet2. The module of Internet2-IPv4 that obviously contains more command lines than Internet2-IPv6 is policy-options, which accounts for over 83% of the differences in the number of command lines. Policy-options module defines many policies to control route redistribution and exchange. So the results illustrate that configurations of routing control on IPv4 are more complicated than that of IPv6. This is caused by larger network scale of IPv4, which brings more requirements to routing control and management. The command lines of Internet2-IPv4 on interfaces module are also more than Internet2IPv6. This can be explained by the fact that more interfaces or subinterfaces are enabled with IPv4 addresses than IPv6 addresses.
F. Li et al. / Computer Communications 53 (2014) 37–51
Interfaces module enables many interfaces to create intra-domain or inter-domain connections. The more the interfaces are enabled, the larger the network scale may present. Once again, the results manifest that the size of IPv4 is larger than that of IPv6 in Internet2. In addition, as depicted in Fig. 4(c) and (d), average lines of each module are growing as time goes by. The growth rate of interfaces module and policy-options module for Internet2-IPv6 are 17.58% and 22.34% respectively, while for IPv4, they are 31.52% and 6.44%. Development of IPv4 has entered into the commercial phase, so more customers intend to connect to Internet2-IPv4 for its abundant services and applications, resulting in a greater growth rate than Internet2-IPv6. Once a backbone network develops in a mature phase, routing control policies of the network also maintain mature relationships, leading to a smaller increment across the past three years. IPv6 is in its fast development and deployment phase, so routing control policies need to be enriched and adjusted a little more frequently. 5.2. Relationships among modules Modules in a router and among routers are not isolated, but depend on each other when they are in actual operation. These relationships can be divided into two categories. One is the interior-module related. For example, stanzas in the interfaces module can inherit the stanzas defined in the groups module. This is similar to defining a parent class in advanced programming language, all the attributes of which can be inherited by all the subclasses. Through this way, configurations of devices are concise and convenient to extend. The other is the exterior-module related. For example, the protocols module not only interacts with modules (interfaces module, policy-options module, etc.) in the same configuration file, but also needs to cooperate with modules (interfaces module, protocols module, etc.) from other configuration files, such as creating BGP connections. 6. Configuration complexity According to the complexity models and metrics [23,24], we intend to quantify the complexity of IPv6 configurations to answer the questions of Topic three. Investigating the complexity of IPv6 configurations can abstract the templates from the configuration files, as well as reveal the dependence among the configuration commands.
43
6.1. Templates model We use templates model to quantify the consistency of configurations across the network. For this discussion, we focus on the representative policy-options module, which defines many routing control policies for route redistribution and exchange. The templates are extracted from policy-options module by the following four steps. Step-1: We accumulate templates in the policy-options module based on the name of each stanza. As depicted in Fig. 5, a stanza about a policy-statement called ebgp-v6-edge-im-policy (line 20 line 28) is regarded as a basic template and will be stored in the set of Tbasic. Each template in Tbasic represents a basic controlling behavior. As shown in Fig. 6, assume that router one has four basic templates, simply represented as R1 = {T1, T2, T3, T4}. Similarly, R2 = {T1, T3, T5, T6}, R3 = {T4, T7, T8}. Then, Tbasic = {T1, T1, T2, T3, T3, T4, T4, T5, T6, T7, T8}. Step-2: We merge the templates with the same name across the whole network, i.e., if there is more than one template with the same name, only one will be reserved in the set of Tmerge. As depicted in Fig. 6, Tmerge = {T1, T2, T3, T4, T5, T6, T7, T8}. Step-3: For each template in the set of Tmerge, find out which routers (to be precise, configurations of the routers) contain the template, e.g., if a template Ti includes in both router one and router two, templates like Ti containing in at least two routers will be put into the set of TRmulti in terms of Ti (R1, R2), where i represents the ith template in the set of TRmulti. In addition, the templates that only appear in one router will be put into the set of TRsingle. As depicted in Fig. 6, TRmulti = {T1 (R1, R2), T3 (R1, R2), T4 (R1, R3)}, TRsingle = {T5, T6, T7, T8}. Step-4: We merge the templates in the set of TRmulti with the same router set (two or more) into a bigger template. For example, if there are two templates, i.e., Ti (R1, R2) and Tj (R1, R2), they will be merged into one template named Tij (R1, R2) and stored in the set of TRmerge. Meanwhile, Ti (R1, R2) and Tj (R1, R2) are removed from the set of TRmulti, guaranteeing that templates in the set of TRmulti exclusively hold a router set. As depicted in Fig. 6, TRmerge = {T13 (R1, R2)}, TRmulti = {T4 (R1, R3)}, TRsingle = {T5, T6, T7, T8}. Through the above four steps, we get the sets of Tbasic, TRsingle, TRmulti and TRmegre. In addition, templates in TRsingle, TRmulti and TRmegre are all put into the set of TRtotal, each element of which represents either a basic control behavior, or a basic control behavior shared by at least two routers, or at least two basic control behaviors shared by at least two routers. In order to make the following
Fig. 5. Protocols module and policy-options module.
44
F. Li et al. / Computer Communications 53 (2014) 37–51
Fig. 6. Example of template construction.
analysis conveniently and clearly, we define the cardinality of each set and illustrate their physical interpretations. |Tbasic| represents the number of basic controlling behaviors of all the routers before the merging process. |TRsingle| represents the number of single control behavior only appearing in one router. |TRmulti| represents the number of single control behavior found in at least two routers. |TRmegre| represents the number of multiple control behaviors shared by at least two routers. Correspondingly, |TRtotal| = |TRsingle| + |TRmulti| + |TRmegre|. We adopt two complexity metrics arising from the templates model. One is the value of |TRtotal|, if the value of |TRtotal| equals to |TRsingle|, it means no templates can be shared across different configuration files. The more control behaviors are defined by operators, the more work they will have to do to guarantee that the behaviors are all defined and configured correctly and consistently [23]. The other complexity metric is the compression ratio, which denotes the value of |TRtotal|/|Tbasic|. The number of control behaviors may present natural growth with the increment of network scale, so it may be inaccurate to explain the complexity in maintaining the configuration consistency. However, the compression ratio can reveal the complexity in consistency accompanying with the increase of network scale. The smaller the ratio is the easier for operators to configure basic or merged control behaviors and maintain the consistency across all routers. As shown in Table 2, we extract 55 templates in total from CERNET2 in 2010, while a total of 75 templates are extracted in 2013, which shows a 36.36% increase. While for Internet2, we extract 495 templates in 2010 and 568 templates in 2013, which presents a 14.75% growth. We also find that Internet2-IPv4 has more templates than Internet2-IPv6 both in 2010 and in 2013, which manifests that configurations of Internet2-IPv4 are more complicated than Internet2-IPv6 as to the number of the templates. But the growth rate in the number of templates is 28.46% for Internet2IPv6, while 4.74% for internet2-IPv4. That is to say, IPv6 has entered into a rapid development and deployment phase, so routing control policies need to be enriched frequently, while IPv4 is in its mature phase, routing control policies are in a relatively steady level.
Maybe the number of templates is not convincing to explain the complexity in maintaining the configuration consistency because of the differences in network scale between different years and between different IP protocols. We now describe the configuration complexity based on the second complexity metric. The compression ratio of CERNET2 in 2013 is 26.04%, which is lower than the ratio of 27.64% in 2010. The results illustrate that configuration consistency of CERNET2 does not have any decrease but has a little ascension over time. The compression ratio of Internet2-IPv6 in 2013 is 21.51%, which is a little higher than the ratio of 21.03% in 2010. Configuration consistency of Internet2-IPv6 is decreasing with the number of templates increasing. The compression ratio of Internet2-IPv4 in 2010 is 24.96%, but in 2013, the ratio is 19.98%, decreasing by about five percent, which is a positive indication for configuration consistency of Internet2-IPv4. It is interesting that the compression ratio of Internet2-IPv6 is lower than that of Internet2-IPv4 in 2010, but it is the contrary in 2013. Results of 2013 may be caused by fast development and deployment of IPv6, but losing sight of configuration consistency, while for IPv4, paying close attention to maintaining the configuration consistency.
6.2. Referential graph model Modules in a router and among routers are not isolated, but dependent on each other when they are working in the operational network. We use referential graph model [23,24] to quantify the dependence within and among the configuration files. For this discussion, we focus on the interfaces module and protocols module. Firstly, interfaces module is closely related to policy control in data plane. Stanzas in the interfaces module mainly refer to the firewall module, which defines many filters. Each filter can enable a router to classify the incoming or outgoing packet stream based on the properties associated with individual packets or packets streams. An interface, defined in the interfaces module, can refer to a filter defined in the firewall module through the key words of input (line 8) or output (line 9) as depicted in Fig. 1. Secondly, protocols module is closely related to redistribution and routing policies operated in control plane. Similar to stanzas in interfaces module, stanzas in the protocols module primarily refer to the policy-options modules, which define many policy-statements. Each policy-statement is used to control the redistribution or exchange of routes, and finally determine the path that packets will take from their sources to their destinations. Key words of import and export are used to refer to the policy-statements. As shown in Fig. 5, line 4 refers to a policystatement defined from line 20 to line 28, and line 9 refers to a policy-statement defined from line 29 to line 36. The typical dependence extracted from interfaces module and protocols module is based on the following two steps. Step-1: We manually mark the key words for each typical reference. For example, dependence of data plane and control plane is explored within the configuration files based on the key words of input, output, import and export, etc. As shown in Fig. 7(a), assume
Table 2 The number of the templates extracted from policy-options module. Type
|Tbasic|
|TRsingle|
|TRmulti|
|TRmerge|
|TRtotal|
Ratio (%)
CERNET2 (2010) CERNET2 (2013) Internet2 (2010) Internet2-IPv6 (2010) Internet2-IPv4 (2010) Internet2 (2013) Internet2-IPv6 (2013) Internet2-IPv4 (2013)
199 288 2247 1170 1859 2924 1469 2433
49 68 394 179 306 442 221 364
4 4 37 25 37 52 48 52
2 3 64 42 59 74 47 70
55 75 495 246 402 568 316 486
27.64 26.04 22.03 21.03 24.96 19.43 21.51 19.98
45
F. Li et al. / Computer Communications 53 (2014) 37–51
Fig. 7. Example of dependence creation.
that there are three key words of input (I1, I2, I3) and three key words of output (O1, O2, O3) in the interfaces module. In addition, dependence among the configuration files can be found out according to the key words of neighbor (line 11 and line 15 shown in Fig. 5), peer, etc. As shown in Fig. 7(b), assume that R1 has two key words of neighbor, simply represented as N1 and N2. R2 and R3 also have a key word of neighbor (N1) respectively. Step-2: According to the key words extracted in Step-1, we identify and verify the referential links among the stanzas. The referential links within the same configuration files are stored in Din, while the referential links among the different configuration files are stored in Dout. As depicted in Fig. 7(a), 4 filters (F1, F2, F3, F4) are defined in the firewall module. These filters are referred by I1, I2, I3, O1, O2, and O3. Once we can verify the dependence between any pair of I–F or O–F, we will use a directed link to represent the referential relationship. For example, F1 is referred by I1, such relationship is recorded as Di1 = (I1 ? F1). Similarly, we can get Di2, Di3, Di4, Di5 and Di6. As depicted in Fig. 7(b), we can also get the referential links among configuration files. For example, the local-address (IP1) in R2 is referred by N2 in R1, such relationship is recorded as Do2 = {(N2, R1) ? (IP1, R2)}. Since we cannot gather configuration files in remote ASs, referential links among configuration files in different ASs are verified by consulting Emails and daily operation documents. The representation of such referential relationship is marked by AS number. For example, Do1 will be recorded as Do1 = {(N1, R1) ? (IP1, AS2)}. Configurations of firewall modules are removed for security reasons. Therefore, although we can extract the number of referential links through the key words of input and output, referential dependence between interfaces module and firewall module cannot be verified. We made an interview with the operators of CERNET2, and they confirmed the correctness of our analysis results of referential dependence between interfaces module and firewall module. In addition, we define the cardinality for Din and Dout to make the following analysis more conveniently and clearly. |Din| represents the number of the referential links identified and verified within configuration files, while |Dout| represents the number of the referential links identified and verified among configuration files. The complexity due to referential dependence on data plane is evaluated first. As shown in Table 3, CERNET2 has more referential links in 2013 than in 2010 on average, which shows an 85.52% increase. The high growth rate of referential dependence is attributed partly to the fast development of CERNET2, as well as partly to more security policies enforcement. The average referential links of Internet2-IPv4 in 2013 are more than that of Internet2-IPv6, which is similar to the situation in 2010. This manifests that complexity of IPv4 on data plane is higher than IPv6 in Internet2. However, referential links of Internet2-IPv6 grow about 92% from 2010 to 2013, while referential links of Internet2-IPv4 only present a 32%
increase. The results illustrate that complexity of IPv6 on data plane increases faster than that of IPv4. In addition, comparing the results of Internet2 in 2013 with that in 2010, we find that the number of referential links on data plane of Internet2 grows from 22.67 to 49.33 on average, which shows a 117.6% increase. Decomposing the increase of Internet2, most increase is caused by L2 (i.e., Layer 2, which carries the services like L2 VPN and L2 circuit) settings, while Internet2-IPv6 and Internet2-IPv4 only account for 18.75% and 17.08% of the growth respectively. The pre-existing large-scale of Internet2-IPv4 results in higher complexity in referential dependence than Internet2-IPv6, while faster development and deployment of Internet2-IPv6 lead to a higher growth rate of referential dependence than Internet2-IPv4. The complexity due to referential dependence on control plane is shown in Table 4. The average referential links of Internet2-IPv4 are much more than that of Internet2-IPv6 in 2010 and in 2013, which illustrates that IPv4 is also more complicated than IPv6 on control plane. Since IPv4 has already matured for commercial use, operations of IPv4 are more than that of IPv6, which results in more complexity generated by IPv4. The growth rates of referential dependence over time are listed as follows: CERNET2 (40%), Internet2 (20%), Internet2-IPv6 (14%) and Internet2-IPv4 (23%). The higher growth rate of CERNET2 exactly reflects the small base number of referential links on control plane. So once a routing control policy is added to the configurations, it will be easily observed. We also find that the growth rate of Internet2-IPv6 is lower than that of Internet2-IPv4, which indicates that, in order to control the routes redistribution and exchange, more referential links are created for IPv4 than for IPv6 over time. Comparing the complexity on data plane with control plane, we find that more complexity is introduced by control plane. This is confirmed by operators we interviewed. Routing policy and redistribution mechanisms on control plane are more complicated in their configuration processes. While packet filtering schemes on data plane depend on the number of valid interfaces, which are limited to the hardware. We also identify and verify more referential links for interfaces module and protocols module, not only including typical references within the same configuration file, but also involving typical references among different configuration files, such as creating BGP connections. The results are shown in Table 5, which refers back to the fact that IPv4 is more complicated than IPv6 and complexity of both IPv4 and IPv6 are growing over time, but the growth rate of IPv6 is higher than that of IPv4.
7. A framework for network configuration recommendation Misconfigurations are proved the main reason for network failures [19–22]. In order to lessen the workload on operators and reduce the risk of configuration errors, many automated configuration provisioning systems have been proposed [25–29]. However, in the actual network operation and management, automated configuration provisioning systems have two shortcomings. For one thing, automated configuration is unreliable. Most operators input
Table 3 Referential dependence on data plane. Type
Median
Maximum
Mean
CERNET2 (2010) CERNET2 (2013) Internet2 (2010) Internet2-IPv6 (2010) Internet2-IPv4 (2010) Internet2 (2013) Internet2-IPv6 (2013) Internet2-IPv4 (2013)
4 8 16 2 9 43 7 14
15 21 71 24 43 96 31 45
5.8 10.76 22.67 5.44 14.11 49.33 10.44 18.67
46
F. Li et al. / Computer Communications 53 (2014) 37–51
Table 4 Referential dependence on control plane.
Configuration Task-processing Layer
Type
Median
Maximum
Mean
CERNET2 (2010) CERNET2 (2013) Internet2 (2010) Internet2-IPv6 (2010) Internet2-IPv4 (2010) Internet2 (2013) Internet2-IPv6 (2013) Internet2-IPv4 (2013)
5 8 150 52 100 174 61 118
47 53 250 97 154 301 86 198
7.6 10.64 148.89 53.11 95.78 178.11 60.78 117.67
Configuration Command Operating
Configuration Task Parsing
Configuration-recommending Layer Configuration Template Recommendation
Configuration Dependence Recommendation
Table 5 Referential dependence in interfaces module and protocols module. Type
|Din|
|Dout|
CERNTE2 (2010) CERNET2 (2013) Internet2 (2010) Internet2-IPv6 (2010) Internet2-IPv4 (2010) Internet2 (2013) Internet2-IPv6 (2013) Internet2-IPv4 (2013)
632 757 2286 658 1221 3178 761 1492
64 92 1603 372 709 2102 435 867
the configuration commands through manual operation rather than automatic operation. They still believe that the manual operation is reliable and preferred especially when the network environment is complicated. For another, automated configuration is difficult to realize. Devices of the network may be produced by different manufacturers, which differ in configuration language. In addition, configuration tasks will change accompanying with the dynamics of the network. In summary, network configuration is diverse, so it is difficult to provide automated configuration for all kinds of devices and tasks. Due to the weakness of automated configuration provisioning systems in the actual network operation and management, we propose a framework for network configuration recommendation (FNCR) which contains two main functionalities. (1) Configuration template recommendation: FNCR retrieves the key words of the configuration task and then finds out the same configuration tasks that have been configured across the whole network. The results will be referred by operators to configure new tasks, which will enhance the configuration efficiency. (2) Configuration dependence recommendation: FNCR retrieves the dependent configurations for configuration commands before they are submitted. Operators will take note of these dependences, which will decrease the possibility of misconfigurations. FNCR is realized mainly based on the template model shown in Section 6.1 and the referential graph model depicted in Section 6.2. The template model can help us understand the complexity of network configuration by identifying group of stanzas with similar configurations across all the routers. This work has specifically instructed us how to construct templates. The referential graph model can reveal the configuration complexity through mining the dependence within and among the configuration files. This work can tell us how to create referential links between stanzas. The architecture of FNCR is depicted in Fig. 8. It is mainly comprised of three layers: configuration task-processing layer, configuration-recommending layer and configuration info-managing layer. We summarize the functionalities and methodologies of these layers as follows. 7.1. Configuration info-managing layer The configuration info-managing layer is responsible for configuration acquisition and storage. The function of configuration information acquiring has the ability of connecting to all the devices
Configuration Info-managing Layer Configuration Information Acquiring
Configuration Files Storing
Fig. 8. Architecture of FNCR.
of the network through scripts, which can download configuration files from the devices, as well as upload configuration files. On the one hand, the configuration files are used to support the layer of configuration recommending, and on the other, they are stored periodically by the function of configuration files storing. The function of configuration files storing can also backup the differences between two consecutive versions of configuration files. Once misconfiguration happens in a configuration file, the function of configuration files storing not only can recover the configuration file with the latest valid one, but also can help to identify the causes of misconfiguration. 7.2. Configuration-recommending layer The configuration-recommending layer consists of two functions: configuration template recommendation and configuration dependence recommendation. 7.2.1. Configuration template recommendation According to the template model stated in Section 6.1, FNCR can construct and recommend configuration templates to operators. This work is in the charge of the function of configuration template recommendation. This function mainly includes two parts: constructing template library and retrieving template library. The processes of constructing template library (i.e., TRsingle, TRmulti and TRmerge) have been described in Section 6.1. We now show the processes of retrieving template library in detail. Ahead of configuring a task, operators will provide key information (i.e., name of the task, the module it belongs to, etc.) about the task to FNCR. FNCR retrieves configuration templates from template library based on this key information. We assume that key information of the task has been imported by operators through the configuration taskprocessing layer. The retrieving processes will flow the definite sequences shown in Fig. 9(a) until we can get the satisfying template. Step-1: Retrieving TRmerge. TRmerge stores the templates that are composed of at least two functional stanzas and shared by at least
F. Li et al. / Computer Communications 53 (2014) 37–51
47
Fig. 9. Template and dependence retrieving sequences.
two routers. Templates in TRmerge have the global characteristic compared with other templates in the template library. Therefore, when configuring the task, retrieving TRmerge with the highest priority will enhance the retrieval speed. If FNCR finds out the template from TRmerge, it will be recommended to operators as a reference and the process terminates. Otherwise, continue with step-2. Step-2: Retrieving TRmulti. TRmulti stores the templates that contain only one functional stanza which is shared by at least two routers. Compared with the templates in TRsingle, each template of which includes only one functional stanza and emerges in only one router, templates in TRmulti are more commonly used. Therefore, FNCR retrieves TRmulti with the second highest priority. If FNCR finds out the template from TRmulti, it will be recommended to operators as a reference and the process terminates. Otherwise, continue with step-3. Step-3: Retrieving TRsingle. Because of the minimal universality of the templates in TRsingle, FNCR retrieves TRsingle with the lowest priority. If FNCR finds out the template from TRsingle, it will be recommended to operators as a reference and the process terminates. Otherwise, continue with step-4. Step-4: Fuzzy query. If FNCR cannot find out template from TRmerge, TRmulti and TRsingle, it will adopt fuzzy query, which is realized in two ways. One is fuzzy matching. Operators provide simplified or approximate key information to FNCR again through the configuration task-processing layer. FNCR retrieves TRmerge, TRmulti and TRsingle one by one. The retrieving results may be not unique. Operators choose one optimal template as a reference. Then the process terminates. If FNCR still cannot find out an appropriate template, it will employ closest matching, i.e., directly recommend a list of templates that are frequently configured recently. 7.2.2. Evaluation on template recommendation Function of retrieving template library is easy to realize for two reasons. For one thing, the retrieving processes only need to retrieve the template library according to the name of the template and the module the template belongs to. For another, we prioritize the processes of retrieving template library, through which, we can guarantee the query efficiency. However, function of constructing template library needs to decompose the structure of the configuration files and extract as many templates as possible. Overall, compared with the function of retrieving template library, constructing template library is more difficult to accomplish. Therefore, our previous analysis in Section 6.1 can partly evaluate the correctness of configuration template recommendation. We choose the policyoptions module as the evaluation object and the number of the
templates can be identified as the evaluation indicator. As depicted in Table 2, the values of |TRtotal| are the number of templates that can be identified by FNCR. The results not only reveal that FNCR can capture the changes of configurations over time, but also manifest that the complexity of the studied network increases. In addition, in the template library of the policy-options module, the templates in TRmerge and TRmulti are much fewer than the templates in TRsingle. However, the templates in TRmerge and TRmulti are commonly used across the whole network. Hence, as the methodology of retrieving template library shown in the above Step-1 to Step-4, if FNCR retrieves TRmerge and TRmulti with higher priorities, it can enhance the efficiency of the retrieving processes. 7.2.3. Configuration dependence recommendation According to the referential graph model stated in Section 6.2, FNCR can create and recommend configuration dependence to operators. This work is in the charge of the function of configuration dependence recommendation. This function mainly includes two parts: constructing dependence library and retrieving dependence library. The processes of constructing dependence library (i.e., Din and Dout) have been described in Section 6.2. We now show the processes of retrieving dependence library in detail. When configuring the task, FNCR can parse configuration commands and extract key words from the commands in real time. We assume that key word of the task has been identified through the configuration task-processing layer. Once a key word is identified, FNCR will capture the string following closely behind the key word. However, if the key word is the end of a command line, FNCR will capture the string in front of the key word. The string (name of a routing policy, name of a filtering policy, an address, etc.) is used to retrieve configuration dependence from dependence library. The retrieving processes will flow the definite sequences depicted in Fig. 9(b) until we can get the satisfying dependence. Step-1: Retrieving Din. Din stores the dependence within configuration files. According to our management experience in the real network, the number of interior dependence is greater than that of exterior dependence. So retrieving Din with the highest priority will enhance the retrieval speed. Evaluation results will prove the correctness of this design principle. If FNCR finds out one item or a list of dependence from Din, they will be recommended to operators as a notice and the process terminates. Otherwise, continue with step-2. Step-2: Retrieving Dout. Dout stores the dependence among configuration files. If FNCR finds out one item or a list of dependence from Dout, it will be recommended to operators as a notice and the process terminates. Otherwise, continue with step-3.
48
F. Li et al. / Computer Communications 53 (2014) 37–51
Step-3: Fuzzy recommendation. If FNCR cannot find out dependence from the existing dependence that has been created, it will directly recommend a sorted list of configuration stanzas that are configured recently. 7.2.4. Evaluation on dependence recommendation Function of retrieving dependence library is easy to realize for two reasons. For one thing, the retrieving processes only need to retrieve the dependence library according to the string closely after or in front of the key word. For another, we prioritize the processes of retrieving dependence library, through which, we can guarantee the query efficiency. However, function of creating dependence library needs to mark the key words for all the configuration files, identify and verify the referential dependence among the stanzas. Overall, compared with the function of retrieving dependence library, creating dependence library is more difficult to accomplish. Therefore, our previous analysis in Section 6.2 can partly evaluate the correctness of configuration dependence recommendation. We choose the interfaces module and protocols module as the evaluation objects and the number of the referential links can be identified and verified as the evaluation indicator. The evaluation on data plane (interfaces module<> firewall module) and control plane (protocols module<> policy-options module) are shown in Tables 3 and 4. The results illustrate that FNCR not only can capture the changes of configurations over time, but also can reveal the increase of the configuration complexity. Therefore, in order to reduce the possibility of misconfigurations, configuration dependence recommendation is necessary for operators when they are in the daily network management. In addition, we extract typical referential dependence (including both interior dependence and exterior dependence) in interfaces module and protocols module. As depicted in Table 5, it is obvious to see that the referential links within the configuration files are much more than the referential links among the configuration files, i.e., interior dependence is more dominant than exterior dependence. Therefore, if FNCR retrieves the interior dependence (i.e., Din) with the highest priority, it will enhance the query efficiency. The results validate the correctness of our design in the processes of retrieving dependence library. 7.3. Configuration task-processing layer The configuration task-processing layer is composed of two functions: configuration command operating and configuration task parsing. The function of configuration command operating provides a web interface, through which operators can perform configuration operations. The interface is bidirectional, i.e., we can also show and check configuration information through this interface. The function of configuration task parsing is closely related to the configuration-recommending layer. It has two primary functionalities. (1) Recording key information for configuration template recommendation. Ahead of configuring a task, operators will provide key information (i.e., name of the task, the configuration module it belongs to, etc.) about the task to FNCR. FNCR retrieves configuration templates from template library based on this key information. (2) Extracting key words for configuration dependence recommendation. When configuring the task, FNCR can parse configuration commands and extract key words from the commands in real time. Note that FNCR is a basic idea derived from our analysis methods and results in Sections 5 and 6. It not only applies to IPv6 networks, but also applies to IPv4 networks. We do not validate the correctness and efficiency of the whole FNCR in practice. FNCR can construct the templates library and dependence library according to the results shown in Tables 2–5. However, these just partly evaluate the correctness of FNCR. A comprehensive evaluation on
FNCR is needed after FNCR is deployed and works in a real network environment. 8. A preliminary analysis of configuration evolution Capturing the configuration evolution can shed direct light on which tasks are frequently performed on routers. So we pay attention to understanding how configuration evolves over time in this section. But limited by our dataset, which are gathered from March 29, 2013 to April 29, 2013, we can only conduct a preliminary analysis of configuration evolution of Internet2. We adopt the common methodologies to extract changes by comparing two consecutive revisions with the help of a software versioning and revision system called subversion, often abbreviated in SVN [33]. We capture changes from two levels. One is basic operations of operators. According to the logs of subversion, we can extract the changes of additions, modifications and deletions on network configurations. The other is high-level configuration tasks. The changes that can be tracked are attributed to specific function modules as shown in Table 1. 8.1. Basic operations of operators As depicted in Fig. 10(a) and (b), most changes across Internet2 are additions and deletions, accounting for 43.38% and 52.65% respectively, while most changes of Internet2-IPv6 are modifications (67.24%). As shown in Table 6, changes of Internet2-IPv6 only make up 2% of the changes taken place in Internet2, which illustrates that most of the work falling on operators is IPv6 irrelevant. This is true to reflect the situation of network management for operators. IPv4 network has entered into the commercial stage, so operators will be faced with many management tasks for IPv4, such as adding or deleting a customer, adding or removing a link and modifying the information about prefixes or interfaces. So changes of IPv4 are diverse. While IPv6 is still in her initial phase, and the primary task is operating the network with low packets loss rate, high transmission bandwidth and less network failures. So operators try their best to operate and manage the network steadily and easily. Deep analysis reveals that all the modifications of Internet2-IPv6 are related to IPv6 addresses. In order to manage the IPv6 addresses easily, operators have unified the IPv6 addresses format, i.e., making the capital letters in the IPv6 addresses into lowercase letters. 8.2. High-level configuration tasks We also correlate the changes to the main function modules. As shown in Table 7, most changes of Internet2 are related to the interfaces module (34.68%), protocols module (34.4%) and policyoptions module (23.65%). Other modules only account for 7.27% of total changes of Internet2. Most changes are IPv6 irrelevant, i.e., are related to IPv4 or related to layer 2 settings. The results illustrate that operators are confronted with little configuration tasks of IPv6 in daily work, which is caused by smaller scale of IPv6 and less experimental trails about IPv6. Fig. 10(c) shows the changes associated with specific operations on the modules of Internet2. Considering the whole network of Internet2, most changes on interfaces module, protocols module and policy-options module are additions and deletions. As a foremost and advanced networking consortium, Internet2 undertakes missions of the development, deployment and use of revolution Internet technologies. Operators may create many subinterfaces for experimental purpose or for new customers joining in, as well as disable the subinterfaces for experiments terminating or for existing customers backing out. The changes of configurations in protocols module
F. Li et al. / Computer Communications 53 (2014) 37–51
49
Fig. 10. A preliminary analysis of configuration evolution of Internet2.
Table 6 Changes distribution across basic operations. Type
Additions
Deletions
Modifications
Total
Internet2 Internet2-IPv6
6247 38
7582 57
573 195
14,402 290
Table 7 Changes distribution across function modules. Type
Interfaces
Protocols
Policy-options
Others
Internet2 Internet2-IPv6
4995 61
4955 29
3406 196
1046 4
are mainly related to L2. Operators validate the techniques related to L2 frequently, leading to obvious changes of addition and modification in protocols module. We do not concern about high-level configuration tasks related to L2 in this paper, so details of corresponding analysis are omitted. Policy-options module defines many control behaviors mainly referenced by protocols module. So changes in policy-options module are mainly caused by changes in protocols module. Fig. 10(d) shows the configuration changes of Internet2-IPv6 distributed across the primary modules. Changes of Internet2-IPv6 mainly distribute on the policy-options modules, which is different from the situation of the whole Internet2. Deep analysis reveals that most changes of Internet2-IPv6 are caused by modifications of IPv6 addresses, i.e., the capital letters in the IPv6 addresses are converted into lowercase letters for convenient address management.
that most changes of Internet2-IPv6 are related to IPv6 addresses. Configuration commands of IPv6 do not need operators to spend much more time than IPv4, but need operators to pay more attention to them while configuring IPv6 addresses, which introduce more complexity to network management than configuring IPv4 addresses. We should enhance IPv6 address management to reduce the possibility of misconfigurations. As shown in Table 2, there are more packets filtering schemes for IPv4 configurations than that for IPv6. Operators concern more about providing connection to access IPv6 network than security. In order to prevent IPv6 network from hostile attacks, operators should consider security policies on data plane when configuring routers, such as ingress filtering, egress filtering, rate-limiting, and black hole. We find that changes of IPv6 related only account for 2% of total changes across the month-scale measurement, which is not a positive indication of IPv6 development. We have summarized some points that may promote the development of IPv6 in our previous work [34]. Moreover, we capture the parts of configurations that frequently change in IPv6 network, such as modifying an IP address in interfaces module and adding a prefix to a prefix-list in policyoptions module. These changes are always accompanying with changes on other stanzas. So it is more complicated to deal with such kind of configurations than basic configurations, i.e., adding a new interface, adding a new link, etc. But our analyses have provided basis for constructing configuration templates and configuration dependence for changes that we have captured in the studied network. Once necessary, we can add a function with dynamical recommendation ability to FNCR. Then templates and dependence can be sent to operators to help perform configuration operations and care about the parts of configurations related to the changes.
9. Discussion We now discuss key implications of our analysis results of the studied networks. We then discuss some limitations of our work. 9.1. Configuration management As stated in Section 4.1, configuration commands of IPv6 are a bit more complicated than IPv4. The complexities are mainly caused by IPv6 addresses. In Section 8.1, our analysis also reveals
9.2. Limitations of our work In this paper, we put all energies into studying routers of two backbone networks, which is limited to an observation of edge routers of customer networks. We also note that configurations of dual-stack routers are more complex than that of single-stack routers, but further investigation on dual-stack routers, especially differences between IPv6 and IPv4 in routing policies, still needs to be conducted. In addition, a more comprehensive study on evolution
50
F. Li et al. / Computer Communications 53 (2014) 37–51
Table 8 Major differences between IPv6 configurations and IPv4 configurations. Type
Differences
Configuration commands
(1) (2) (3) (4) (5)
Some protocols (such as OSPFv3 and RIPng) exclusively serve for IPv6 Some protocols are increased family inet6 to support IPv6 Variables are named with 6, V6, etc. for IPv6 to distinguish from IPv4 The length of an IPv6 address is 128 bits, while an IPv4 address is 32 bits Configuration file sizes of IPv6 present a higher growth rate but are still smaller than that of IPv4
Configuration composition
(1) Configurations of IPv4 are more abundant than that of IPv6 across most function modules (2) The policy-options module accounts for over 83% of the differences in command lines (3) The growth rate of IPv6 on policy-options module is higher than that of IPv4
Configuration complexity
(1) Configurations of IPv4 are more complicated than IPv6 in view of the number of configuration templates (2) The compression ratio of configuration templates of IPv4 is around the same level with IPv6 (3) Configurations of IPv4 are more complicated than IPv6 in view of the amount of configuration dependence
of IPv6 configurations, however, requires gathering a longer period of configuration snapshots of the studied networks. Finally, enhancing configuration management for IPv6 network is essential and significant, so taking FNCR into actual effect is imperative. 10. Conclusions and future works In this paper, we firstly present the device-level configuration commands of IPv6 and show the differences between IPv6 and IPv4 in configuration composition. Configuration commands of IPv6 are a bit more complicated than that of IPv4. The complexity is mainly caused by IPv6 addresses, which are complex in nature. Secondly, we analyze the function modules that constitute IPv4 and IPv6 configurations. Command lines of IPv6 are less abundant than that of IPv4 across most function modules, attributing to the smaller scale of IPv6 compared with IPv4. Thirdly, we make a comparison investigation on the complexity of IPv6 and IPv4 configurations. IPv6 is less complicated than IPv4 in views of the number of referential links and the number of templates. Fourthly, according to the method of template construction and dependence creation adopted in investigation of configuration complexity, we propose a framework for network configuration recommendation. At last, we conduct a preliminary analysis of configuration evolution of Internet2. We present a first-ever study on IPv6 configurations and make a comparison between IPv6 and IPv4 in this paper. The major differences between IPv6 configurations and IPv4 configurations are summarized as shown in Table 8. We believe that unveiling characteristics of IPv6 configurations can help deepen our understanding of IPv6 world, as it is essential for enriching knowledge about IPv6 protocol, renewing consideration on network management, capturing the configuration complexity, tracking the frequent configuration tasks, providing basis for configuration templates abstraction and configuration dependence creation, ultimately, achieving the goal of configuration recommendation for IPv6 networks. We will conduct a more comprehensive analysis on network configuration of IPv6 combining with edge routers of customer networks. We will deploy the tools (such as RANCID [35]) of recording the changes of configuration commands in detail. We will also concretely develop and launch a configuration recommendation system based on the framework (FNCR) proposed in this paper. Acknowledgments The authors would like to thank Fanpeng Kong, Lujing Sun and Zejia Chen for their aid in gathering data and revising this paper. We also thank anonymous reviewers for their constructive comments. This work is supported by the National Basic Research Program of China under Grant No. 2012CB315806, the National
Natural Science Foundation of China under Grant Nos. 61170211, 61202356, and 61161140454, Specialized Research Fund for the Doctoral Program of Higher Education under Grant Nos. 20110002110056 and 20130002110058, Tsinghua University Initiative Scientific Research Program under Grant No. 2012Z02151, and Joint Research Fund of MOE-China Mobile under Grant No. MCM20123041, the National Science Foundation for Distinguished Young Scholars of China under Grant Nos. 61225012 and 71325002. References [1] Internet Society, Internet Society Statement on IPv4 Depletion
. [2] APNIC, Key Turning Point in Asia Pacific IPv4 Exhaustion . [3] S. Deering, R. Hinden, Internet Protocol, Version 6 (IPv6) Specification, RFC 2460, 1998. [4] L. Colitti, S.H. Gunderson, E. Kline, T. Refice, Evaluating IPv6 adoption in the Internet, in: Proceedings of the Passive and Active Measurement (PAM), 2010, pp. 141–150. [5] D. Malone, Observations of IPv6 addresses, in: Proceedings of the Passive and Active Network Measurement (PAM), 2008, pp. 21–30. [6] E. Karpilovsky, A. Gerber, D. Pei, J. Rexford, A. Shaikh, Quantifying the extent of IPv6 deployment, in: Proceedings of the Passive and Active Measurement (PAM), 2009, pp. 13–22. [7] P. Hicks, Fraction of IPv6 Traffic (In Packets and Bytes) For Monthly Passive Traces’’, May 2011 . [8] L. Gao, J. Yang, H. Zhang, D. Qin, B. Zhang, What’s Going On in Chinese IPv6 World, in: Proceedings of the Network Operations and Management Symposium (NOMS), 2012, pp. 534–537. [9] F. Li, C. An, J. Yang, P. Wu, Z. Chen. Unravel the characteristics and development of current IPv6 network, in: Proceedings of the Local Computer Networks (LCN), 2012, pp. 316–319. [10] Internet Society World IPv6 Day, 2011 . [11] N. Sarrar, G. Maier, B. Ager, R. Sommer, S. Uhlig, Investigating IPv6 Traffic – What Happened at the World IPv6 Day? in: Proceedings of the Passive and Active Measurement (PAM), 2012, pp. 11–20. [12] M. Nikkhah, R. Guérin, Y. Lee, R. Woundy, Assessing IPv6 through web access a measurement study and its findings, in: Proceedings of the Conference on Emerging Networking EXperiments and Technologies (CoNEXT), 2011, pp. 26:1–26:12. [13] A. Dhamdhere, M. Luckie, B. Huffaker, K. Claffy, A. Elmokashfi, E. Aben, Measuring the deployment of IPv6: topology, routing and performance, in: Proceedings of the Internet Measurement Conference (IMC), 2012, pp. 537– 550. [14] H. Kim, T. Benson, A. Akella, N. Feamster, The evolution of network configuration: a tale of two campuses, in: Proceedings of the Internet Measurement Conference (IMC), 2011, pp. 499–514. [15] Y. Sung, S. Rao, S. Sen, S. Leggett, Extracting network-wide correlated changes from longitudinal configuration data, in: Proceedings of the PAM, Seoul, South Korea, April 2009. [16] N. Feamster, H. Balakrishnan, Detecting BGP configuration faults with static analysis, in: Proceedings of the USENIX NSDI, Boston, MA, May 2005. [17] F. Le, S. Lee, T. Wong, H. Kim, D. Newcomb, Minerals: using data mining to detect router misconfigurations, in: Proceedings of the MineNets’06, Pisa, Italy, September 2006, pp. 293–298. [18] G. Xie, J. Zhan, D. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, J. Rexford, On static reachability analysis of IP networks, in: Proceedings of the IEEE INFOCOM, vol. 3, 2005, pp. 2170–2183.
F. Li et al. / Computer Communications 53 (2014) 37–51 [19] R. Mahajan, D. Wetherall, T. Anderson, Understanding BGP misconfiguration, in: Proceedings of ACM SIGCOMM, Pittsburgh, PA, August 2002, pp. 3–17. [20] Z. Kerravala, Configuration Management Delivers Business Resiliency, The Yankee Group, November 2002. [21] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, C. Diot, Characterization of failures in an IP backbone, in: Proceedings of the IEEE INFOCOM, Hong Kong, 2004, pp. 2307–2317. [22] D. Oppenheimer, A. Ganapathi, D. Patterson, Why do Internet services fail, and what can be done about it, in: Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS), Seattle, WA, USA, 2003, pp. 1–15. [23] T. Benson, A. Akella, D. Maltz, Unraveling complexity in network management, in: Proceedings of USENIX NSDI, Boston, MA, April 2009. [24] T. Benson, A. Akella, A. Shaikh, Demystifying configuration challenges and trade-offs in network-based ISP services, in: Proceedings of SIGCOMM, 2011, pp. 302–313. [25] D. Caldwell, A. Gilbert, J. Gottlieb, A. Greenberg, G. Hjalmtysson, J. Rexford, The cutting edge of IP router configuration, in: Hotnets-II, Cambridge, MA, November 2003. [26] J. Gottlieb, A. Greenberg, J. Rexford, J. Wang, Automated Provisioning of BGP Customers, IEEE Network, 2003. [27] W. Enck, T. Moyer, P. McDaniel, S. Sen, P. Sebos, S. Spoerel, A. Greenberg, Y. Sung, S. Rao, W. Aiello, Configuration management at massive scale: system design and experience, IEEE J. Select. Areas Commun. 27 (3) (2009) 323–335. [28] X. Chen, Y. Mao, Z.M. Mao, J. Van der Merwe, Declarative configuration management for complex and dynamic networks, in: Proceedings of the Conference on Emerging Networking Experiments and Technologies (CoNext), 2010. [29] X. Chen, Y. Mao, Z.M. Mao, J. Van der Merwe, DECOR: DEClarative network management and OpeRation, in: ACM SIGCOMM Computer Communication Review, vol. 40, 2010, pp. 61–66.
51
[30] J. Wu, J.H. Wang, J. Yang, CNGI-CERNET2: an IPv6 Deployment in China, ACM SIGCOMM Comp. Commun. Rev. 41 (2) (2011) 48–52. [31] Internet2 . [32] Internet2: Router configuration . [33] Subversion . [34] F. Li, C. An, J. Yang, J. Wu, H. Zhang, A study of traffic from the perspective of a large pure ISP, Comp. Commun. 37 (2014) 40–52. [35] Really Awesome New Cisco ConfIg Differ (RANCID) , 2004. [36] A. Berger, N. Weaver, R. Beverly, L. Campbell, Internet nameserver IPv4 and IPv6 address relationships, in: Proceedings of the Internet Measurement Conference (IMC), Barcelona, Spain, 2013, pp. 91–104. [37] J. Czyz, K. Lady, S.G. Miller, M. Bailey, M. Kallitsis, M. Karir, Understanding IPv6 internet background radiation, in: Proceedings of the Internet Measurement Conference (IMC), Barcelona, Spain, 2013, pp. 105–118. [38] M. Luckie, R. Beverly, W. Brinkmeyer, Speedtrap: internet-scale IPv6 alias resolution, in: Proceedings of the Internet Measurement Conference (IMC), Barcelona, Spain, 2013, pp. 119–126. [39] R. Beverly, W. Brinkmeyer, M. Luckie, J.P. Rohrer, IPv6 alias resolution via induced fragmentation, in: Proceedings of the Passive and Active Measurement (PAM), 2013, pp. 155–165. [40] A. Lutu, M. Bagnulo, P. Cristel, M. Olaf, Understanding the reachability of IPv6 limited visibility prefixes, in: Proceedings of the Passive and Active Measurement (PAM), 2014, pp. 163–172. [41] H.A. Alzoubi, M. Rabinovich, O. Spatscheck, Performance implications of unilateral enabling of IPv6, in: Proceedings of the Passive and Active Measurement (PAM), 2013, pp. 115–124.