Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 95 (2016) 345 – 352
Complex Adaptive Systems, Publication 6 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 2016 - Los Angeles, CA
Towards Assuring Correct Coordination of Multilayer Recovery Using Integrated Ontologies Joseph Kroculicka* a
Winifred Associates, Jim Thorpe PA 18229
Abstract Configuring automated recovery in a multilayer virtualized network presents significant architectural issues. The rules that define the interworking strategy between recovery mechanisms must be translated into technology-specific provisioning commands sent to network devices. This provisioning data is typically fragmented across many computing and networking resources. There is also a set of possible device actions that can be performed by a resource, such as forwarding traffic to a protection route, discovering a route to send traffic, adjusting security characteristics of a link, and matching sender and receiver characteristics. Consequently, multiple information models, each of which is specific to a device type and technology layer must be understood, aligned and shared. As a result, an approach is needed to verify and validate that the resulting recovery operations from the input provisioning data achieve end-to-end service requirements. This paper presents an ontological approach to assure the correct coordination of recovery actions using an integrated knowledge model of multilayer recovery. With an ontological approach, an assurance case that proves correctness of provisioning and resulting network operations can be automated. Such an ontological model identifies the key domain elements and their relationships. Associated statements can then be retrieved from a knowledgebase so that the constraints associated with the network representation and service requirements can be tested. © Published by by Elsevier B.V.B.V. This is an open access article under the CC BY-NC-ND license © 2016 2016The TheAuthors. Authors. Published Elsevier (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of scientific committee of Missouri University of Science and Technology. Peer-review under responsibility of scientific committee of Missouri University of Science and Technology Keywords: Multilayer Recovery; Provisioning
* Corresponding author. Tel.: +1-215-429-5635; fax: +0-000-000-0000 . E-mail address:
[email protected]
1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of scientific committee of Missouri University of Science and Technology doi:10.1016/j.procs.2016.09.344
346
Joseph Kroculick / Procedia Computer Science 95 (2016) 345 – 352
1. Introduction to Multilayer Network Provisioning As networks become virtualized and cloud-aware, it becomes increasingly important to determine whether the resilience requirements of users can be met. If the dimensions of the coordination of multilayer recovery can be aligned, it may be possible to select a particular solution approach that is optimized for a particular situation. An example of matching a solution technique to a resilience requirement is when to use restoration or protection. Restoration that efficiently discovers a new route is useful for unknown traffic demands, while protection is useful for predictable capacity demands since spare network resources are pre-allocated to working resources. For a resilience requirement of 50 ms or less, as an example, protection would be the logical choice [3, 4]. A network architecture for multilayer recovery should provide flexible technology choices to implement multilayer recovery scenarios (e.g., GMPLS RSVP-TE over DWDM with optical layer signaling or ATM recovery over SONET/SDH protection switching). The networking technology continues to evolve and similar functionality that leverages existing operations and maintenance techniques is provided by multiple layers. Provisioning multiple distributed components to meet end- to-end resilience requirements is a challenging problem. As protocol functionality increasingly runs on network device interfaces, a framework for modeling conflicts will become necessary in order to understand what can prevent a network infrastructure service from operating correctly at one or more technology-specific layers. To achieve high availability, multiple components need to be provisioned so as to manage local behaviors to achieve the desired global behavior. The underlying constraints and assumptions that define this global behavior must be specified to prove correctness of recovery. They may include providing connectivity or continuity of service and associated temporal deadlines. The contribution of this paper is to demonstrate how assurance-driven design can be applied to verify the correct coordination of multilayer recovery that results from device provisioning. A secondary objective is to show how this process can be automated. The paper then explores what types of information needs to be gathered, collected and transformed to manage available recovery mechanisms that are optimized for specific requirements and situations. An example situation is very fast recovery in the sub-50 ms range. A model of coordination is created from observing how the network infrastructure behaviors during a failure. This model is then related to provisioning commands which are transformed to changes on the model. This paper is organized into two parts: (1) creation of the integrated model that includes network infrastructure, service requirements and dependability requirements and (2) development of an accompanying assurance case that links service requirements to network infrastructure actions. 2. Integration Requirements for Coordination of Multilayer Recovery Coordination of multilayer recovery presents new challenges towards integrating information from different recovery perspectives. Among these are business processes, service delivery, security, management and control. The knowledge representation of specific device properties, relationships between system elements must be aligned so as to achieve end-to-end service requirements. Achieving alignment between such concepts is difficult because the interpretation of semantics may differ at different levels within and between components. Also, recovery of failed connections has been implemented in technology-dependent ways at different layers. There is no universally accepted representation to represent device semantics. Such a representation is a data model with concepts that are interpreted and related to other concepts. A key requirement for interoperable systems is openness between component interfaces so that the implementation can differ. 2.1. Information Sharing Requirements for Coordination of Multilayer Recovery Establishing interoperability between communication devices and components participating in automated recovery requires a shared understanding of the effect of global policy on infrastructure and protocols. The effect of policy on resources defines what happens when an event occurs. Essential information needed to manage the coordination of recovery can be exposed to a centralized controller and an inference engine. Effective reasoning over network management information depends on the availability of information that is exposed across system boundaries. As an example, in multilayer networks based on an overlay model of topologies, the recovery
Joseph Kroculick / Procedia Computer Science 95 (2016) 345 – 352
mechanism at the client layer is not aware of the topology of the server layer [1]. However, such information on network resources needs to be related and integrated in order to prove that the combined multilayer recovery process successfully restores traffic impacted by a failure. To achieve “assured reconfiguration” of a protected network service, it is necessary to identify the information sharing requirements between applications to be coordinated [2], as depicted in Figure 1. The availability of network
Figure 1. Control plane coordination integrates the recovery procedures in two or more layers.
resources for recovery needs to be known [3]. Signaling, which places resources into service, may communicate properties of a fault such as when and where it occurs [4]. The network control plane may need the user’s identity and other context data to maintain a certain quality of service [10, 9]. Once the information associated with resources in the network infrastructure domain is known, a particular recovery mechanism can then be matched to specific service requirements. 3. Integrated Knowledge Model Applied to the Coordination of Multilayer Recovery To verify the correctness of the coordination of multilayer recovery, the different perspectives from which it can be addressed should be represented in an integrated knowledge model. This type of approach integrates recovery mechanisms through control of the provisioning data that affect how they operate. This integrated knowledge model should align architectural concerns and abstract technology-specific implementation details [5]. Sollins [6] argues for such a model to be presented in a “knowledge plane”. Two key components of the knowledge plane include an “information plane” and a “reasoning framework”1†. To demonstrate that the coordination of multilayer recovery is
† 1 Sollins
[6] defines reasoning as “the capability of reasoning over the information to understand, hypothesize, infer, and act on knowledge rather than basic information”.
347
348
Joseph Kroculick / Procedia Computer Science 95 (2016) 345 – 352
correctly provisioned, the knowledge plane needs to access the information available to the devices that implement the functionality to be integrated. The information then needs to be organized into an information model. Configuration changes which would be realized from provisioning commands can be applied to this information model. Then the properties of a service can be related to the properties of available network functionality by updating the knowledge layer. An integrated knowledge-based architectural model can be used to demonstrate that key relationships between configured resources exist after recovery operations are performed. To connect the information to provide confidence that recovery operations on the model provide the desired service requirements, the model needs to represent the domain area (i.e., recovery theme) associated with the functionality under study, its configuration that results from user provisioning, and describe the results of recovery operations if they were performed during a failure event. An informative notification of the future state can be delivered to a network administrator. This information could be used to provide assurance that coordination of multilayer recovery is correct. 4. Verifying Correct Coordination Assurance cases are usually conceptual and organize argumentation evidence through text-based narratives to demonstrate a goal [7]. These narratives can be improved with automated tools. Several studies ( [8], [9]) provide examples of system configurations that are verified automatically using a formal representation of the argumentation an assurance case. A system configuration, since it encodes knowledge in a machine-processable format, can automatically be checked to ensure a system and its components align to a desired requirement. Hall and Rapanotti [9] in their paper “Assurance-driven design” apply “problem-oriented-engineering” to build a system solution in tandem with an assurance case that ensures it meets defined requirements. Problem-oriented engineering (POE) provides a set of transformations of a problem to achieve a solution. The justification at each step helps elicit system properties which are realized through component functionality. It is beneficial to assess at configuration time whether provisioned resources can be trusted to interact in a way that they collectively deliver a provisioned network service. The dependability of the control plane to operate correctly can be evaluated using the assurance case mechanism. Norros et al [10] in their paper "A Dependability Case Approach to the Assessment of IP Networks" suggest that an assurance-based design methodology using dependability cases can examine whether the network delivers a requested service. The "controllability of protocols” is identified in [10] as one of six factors that influence the dependability of a network system. How resources are configured affects the controllability of the network protocols that these resources participate in. Assuring the correctness reconfiguration depends on a formal structure to demonstrate the selected requirement. Processes that develop an assurance case are usually semi-formal and represented using a graphical organization of argument components such as claims (i.e., goals) and evidence [11, 12]]. Figure 2 presents an example of goal decomposition in an assurance case. 4.1. Representation of Assurance Case Elements Using an Integrated Ontology The structural relationships derived from the results of actions of recovery can be made formal using multiple techniques. A key objective of this research is the alignment between service-level requirements, infrastructure behavior and provisioning. Consequently, these elements were arranged into an assurance case. The OMG Structured Assurance Case Metamodel (SACM) was selected to document the assurance case and its elements. SACM [13] provides an XML-based metamodel that represents the argumentation-related elements of an assurance case. Facts about the network represent the network and were documented using OMG Software Assurance Evidence
Joseph Kroculick / Procedia Computer Science 95 (2016) 345 – 352
Figure 2. Automation of an assurance case that proves service continuity from device provisioning. Service requirements are aligned to network infrastructure properties in an integrated ontology. Verification chains statements to demonstrate the overarching goal of service continuity. The screenshot depicts the CertWare tool, which chains facts and verify that they are consistent. Facts are expressed using the SACM 1.0 metamodel, which is based on XML.
349
350
Joseph Kroculick / Procedia Computer Science 95 (2016) 345 – 352
Figure 3. Representation of SACM Metamodel Elements in an OWL Ontology. OWL provides a data model that supports inference of properties.
Model (SAEM) [14]. The XML tags for SACM 1.0 models include an argumentElement tag which provides an xsi:type attribute, which specifies a specific component of the argument model. Version 1.1 of the two specifications and below do not yet provide a mapping from XML to OWL. During this research, we transitioned an assurance case that proves service continuity from the D-Case tool [15] to CertWare [16] since CertWare provided an implementation of SACM and offered a validity tool that checks an argumentation claim from declared evidence [16]. An example verification of the service continuity goal in CertWare is depicted in Figure 2. To automate the argumentation process of an assurance case, the elements of the assurance case are mapped to ontological categories and their relationships, as depicted in Figure 3. The metamodel that describes the components of an assurance case can be implemented by an integrated domain and requirements ontology. In the ontological representation of an assurance case, the integrated ontology replaces the claim with a service goal that connected to a measurable dependability property such as strong connectivity. The dependability goal is then expressed in terms of the evidence, which asserts statements about the subject area that is modeled. The claim is connected to a selected quantitative dependability property, and the system resources and their relationships represent the facts or evidence. The argumentation is inferred over declared OWL properties to determine whether the assertion of a dependability property in the merged ontology is true. The evidence of an assurance case refers to documented facts or assumptions. These facts are typically categorized and linked. As example, the ontology development method of [17] creates linkages between a set of systems and system elements and a set of trustworthiness properties. An evidence category either represents a subclaim of the quantitative trustworthiness property or a property of some physical component that has to hold collectively along with other properties in other physical components [17]. The evidence results from the design decisions that are performed to provide service continuity.
Joseph Kroculick / Procedia Computer Science 95 (2016) 345 – 352
Once the elements of an assurance case are encoded in an OWL ontology [18, 19], we can then verify correctness of provisioning recovery using a SPARQL (Simple Protocol and RDF Query Language) [20] query of the inferred data model. An example query that checks mutual reachability of two client site access ports and the underlying data is depicted is depicted in Figure 4. Formal logic and reasoning provide a greater degree of trust then a simple graphic organization of concepts in an assurance case since a single interpretation of semantics is provided through the model axioms. A graphical organizer such as Goal Structuring Notation [21] is a useful aid in defining the categories. However, it does not have a rich variety of relationships. Ontologies constrain the ways terms such as goals, subgoals and resources in a fact model can be related to each other. They also provide a set of operators such as allowing inverse properties, cardinality restrictions and existential qualification of a concept.
Figure 4. SPARQL Query of a Connected relationship between two access-side Ports. The chaining of RDF triples allows overall end-to-end connectivity to be verified. An entailed triple a1
a2 is present in the inferred ontology generated by an OWL Reasoner.
5. Summary Coordination between recovery processes operating at the same time is needed to ensure correct and consistent interworking of the packet transport network with other existing/legacy packet networks [22] This process needs to be planned since interworking between recovery mechanisms may involve contention for the same resources, which results from the containment property of layering (e.g., “pipes carried inside of pipes”) [4]. Resources such as available links need to be available to carry data for a connection to be restored. This paper illustrated how the system assurance process can verify the correct coordination of recovery performed by multiple distributed agents in a network. The coordination of multilayer recovery is affected by resilience requirements, control plane recovery procedures and how network infrastructure resources are provisioned. Validation of recovery methods depends on understanding the information that is shared between layers. Verification of a survivability strategy can then be performed by demonstrating that these requirements are met by the specific recovery solution. An evaluation of the correctness of the recovery signaling can be performed from an inferred model derived from the asserted structural relationships of the network infrastructure ontology.
351
352
Joseph Kroculick / Procedia Computer Science 95 (2016) 345 – 352
With an ontology that represents the facts from selected recovery perspectives, this assurance process can also be automated. References
[1] R. G. Prinz, A. Autenrieth and D. Schupke, “Dual failure protection in multilayer networks based on overlay or augmented model,” in 5th International Workshop on Design of Reliable Communications Networks (DRCN 2005), 2005. [2] E. A. Strunk and J. C. Knight, “Dependability through Assured Reconfiguration in Embedded System Software,” IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 3, 2006. [3] P. Demeester, M. Gryseels, K. van Doorsalaere, A. Autenrieth, C. Brianza, G. Signorelli, R. Clemente, M. Ravara, A. Jajszcyk, A. Geyssens and Y. Harada, “Network Resilience Strategies in SDH/WDM Networks,” in 24th European Conference on Optical Communication, 1998. [4] P. Choáda and . A. Jajszczyk, “Recovery and its Quallity in Multilayer Networks,” Journal of Lightwave Technology, vol. 28, no. 4, February 2010. 2010. [5] P. Donzelli, D. Hirschbach and . V. Basili, “Using Visualization to Understand Dependability: A Tool Support for Requirements Analysis,” in 29th Annual IEEE/NASA Software Engineering Workshop, 2005. [6] K. R. Sollins, “An Architecture for Network Management,” in ReArch09, 2009. [7] N. Mansourov and D. Campara, System Assurance, Morgan Kaufmann, 2010. [8] E. Fong., M. Kass, T. Rhodes and F. Boland, “Structured Assurance Case Methodology for Assessing Software Trustworthiness,” in Fourth IEEE International Conference on Secure Software Intregration and Reliability Improvement Companion, 2010. [9] . J. G. Hall and . L. Rapanotti, “Assurance-driven design,” in The Third International Conference on Software Engineering Advances, 2008. [10] I. Norros, P. Kuusela and P. Savola, “A Dependability Case Approach to the Assessment of IP Networks,” in The Second International Conference on Emerging Security Information, Systems and Technologies, 2008. [11] J. Hall, . D. Mannering and L. Rapanotti, “Arguing safety with problem oriented software engineering,” in 10th IEEE High Assurance Systems Engineering Symposium, 2007. [12] C. B. Weinstock, J. B. Goodenough and J. J. Hudak, “Dependability Cases,” Software Engineering Institute, 2004. [13] Object Management Group, “Argumentation Metamodel,” 2010. [14] Object Management Group, “Software Assurance Evidence Metamodel,” Object Management Group, 2010. [15] V. Patu and . S. Yamamoto, “Identifying and Implementing Security Patterns for a Dependable Security Case – From Security Patterns to DCase,” in Computational Science and Engineering (CSE), 2013. [16] M. R. Barry, “CertWare: a workbench for safety case production and analysis,” in IEEE Aerospace Conference, 2011. [17] R. Paul, I.-L. Yen, F. Bastani, J. Dong, K. Tavi, K. Kavi, A. Ghafoor and J. Srivastava, “An Ontology-Based Integrated Assessment Method Framework for High-Assurance Systems,” in IEEE International Conference on Semantic Computing, Santa Clara, 2008. [18] D. McGuinness and F. van Harmelen, Eds.“OWL Web Ontology Language Overview Recommendation [W3C Std.],” W3C, February 2004. [19] “OWL 2 Web Ontology Language Primer”. [20] E. Prud’hommeaux and A. Seaborne, “SPARQL query language for RDF,” W3C Recommendation, 2008.. [21] E. A. Strunk and J. C. Knight, “The Essential Synthesis of Problem Frames and Assurance Cases,” Expert Systems, vol. 25, no. 1, 2006b. [22] M. Sprecher and . S. Ueno, “Requirements for an MPLS Transport Profile, IETF Std.,” 2009. [23] M. Pickavet, P. Demeester, D. Colle, D. Staessens, B. Puype, L. Depre and I. Lievens, “Recovery in Multilayer Optical Networks,” Journal of Lightwave Technology, vol. 24, no. 1, p. 122 – 134, January 2006. [24] M. E. Orwat, T. E. Levin and C. E. Irvine, “An Ontological Approach to Secure MANET Management,” in The Third International Conference on Availability, Reliability and Security, 2008.