4th IFAC Symposium on Telematics Applications 4th on Applications November 6-9, 2016. UFRGS, Porto Alegre, RS, Brazil 4th IFAC IFAC Symposium Symposium on Telematics Telematics Applications 4th IFAC Symposium on Telematics Applications November 6-9, Porto RS, Available online at www.sciencedirect.com November 6-9, 2016. 2016. UFRGS, UFRGS, Porto Alegre, Alegre, RS, Brazil Brazil November 6-9, 2016. UFRGS, Porto Alegre, RS, Brazil
ScienceDirect Influence of networkIFAC-PapersOnLine parameters49-30 on(2016) the278–283 recovery time of a ring topology Influence of network parameters on the recovery time of a ring topology Influence on the recovery Influence of of network network parameters parameters on the recovery time time of of aa ring ring topology topology PROFINET network PROFINET network PROFINET network PROFINET network
Fábio Alves Fernandes*, Guilherme Serpa Sestito*, André Luís Dias*, Dennis Brandão*, Paolo Ferrari** Fábio André Fábio Alves Alves Fernandes*, Fernandes*, Guilherme Guilherme Serpa Serpa Sestito*, Sestito*, André Luís Luís Dias*, Dias*, Dennis Dennis Brandão*, Brandão*, Paolo Paolo Ferrari** Ferrari** Fábio Alves Fernandes*, Guilherme Serpa Sestito*, André Luís Dias*, Dennis Brandão*, Paolo Ferrari** * Electrical Engineering Department, University of São Paulo ** Electrical Department, Electrical Engineering Engineering Department, University University of of São São Paulo Paulo São Carlos, Brazil (
[email protected] ,
[email protected], ,
[email protected] ) * Electrical Engineering Department, University
[email protected] São Paulo São Carlos, Carlos, Brazil Brazil (
[email protected] (
[email protected] ,,
[email protected],
[email protected],
[email protected] [email protected] )) São
[email protected] ,,
[email protected] São Carlos, Brazil (
[email protected] ,
[email protected],
[email protected] ,
[email protected] ) ** Department of Information Engineering, University of Brescia, ** Department of Information Engineering, University of Brescia, ** of Engineering, Brescia, Italy, (
[email protected] ) of ** Department Department of Information Information Engineering, University University of Brescia, Brescia, Brescia, Italy, (
[email protected] ) Brescia, Italy, (
[email protected] ) Brescia, Italy, (
[email protected] ) Abstract: This paper proposes a study on the use of ring topology to increase availability of PROFINET Abstract: This paper proposes proposes study on on the the use use of of ring ring topology topology to increase increase availability of PROFINET Abstract: paper aa study to PROFINET networks. This It discusses the influence Protocol (MRP),availability watchdog of time, and the Abstract: This paper proposes a studyof onMedia the useRedundancy of ring topology to increase availability of PROFINET networks. It discusses the influence of Media Redundancy Protocol (MRP), watchdog time, and the networks. It discusses of Media Redundancy Protocol (MRP), watchdog the computational power ofthetheinfluence PROFINET Controller on the communication recovery timetime, afterand a ring networks. It discusses thetheinfluence of Media Redundancy Protocol (MRP), watchdog time, and the computational power of PROFINET Controller on the communication recovery time after a ring computational power of the to PROFINET Controller on the communication a ring fracture. The paper proposes consider two main contributions to the overallrecovery recoverytime time:after the first is computational power of the to PROFINET Controller on the communication recovery time after a ring fracture. The paper proposes consider two main contributions contributions to the thewhile overall recovery time: the first first is fracture. The paper proposes to consider two main to overall recovery time: the is related to the low level (Ethernet layer 2) redundancy management, the second is related to the fracture. The paper proposes to consider tworedundancy main contributions to thewhile overall recovery time: the first is related to the low level (Ethernet layer 2) management, the second is related to the related the low levelof(Ethernet layer 2) redundancy management, while the second is related the softwareto the entire network. Several different configurations of networks haveto related toconfiguration the low levelof(Ethernet layer 2) redundancy management, while the second is related tobeen the software configuration the entire entire network. Several different configurations of The networks haveresults been software configuration of many the network. Several different configurations of networks have been created in laboratory and others have been analyzed in real industry plants. collected software configuration of many the entire network. Several different configurations of The networks haveresults been created in laboratory and others have been analyzed in real industry plants. collected created in laboratory and many othersrecovery have been real industry collected results show how watchdog could influence timeanalyzed since itsinviolation could plants. trigger The higher level routines created in laboratory and many othersrecovery have been analyzed inviolation real industry plants. collected results show how watchdog could influence time since its its could trigger The higher level routines routines show how watchdog could influence recovery time since violation could trigger higher level whose duration is basically set by PROFINET controller computational capability. On balance, relevant show how watchdog could influence recovery time since its violation could trigger On higher level routines whose duration is basically set by PROFINET controller computational capability. balance, relevant whose duration is basically set by PROFINET controller computational capability. On balance, relevant suggestions to increase the availability of PROFINET networks are pointed out. whose duration is basically set by PROFINET controller computational capability. On balance, relevant suggestions to increase increase the availability availability of PROFINET PROFINET networks are pointed pointed out. suggestions to the of networks are out. suggestions to increase the availability of PROFINET networks are pointed out. © 2016, IFAC (International Federation of Automatic Control) Hosting Elsevier Ltd. rights reserved. Keywords: PROFINET; Media Redundancy Protocol; Real time byEthernet; HighAllAvailability; Ring Keywords: Industrial PROFINET; Media Redundancy Redundancy Protocol; Protocol; Real Real time time Ethernet; Ethernet; High High Availability; Availability; Ring Ring Keywords: PROFINET; Media Topology; Automation. Keywords: PROFINET; Media Redundancy Protocol; Real time Ethernet; High Availability; Ring Topology; Industrial Automation. Topology; Industrial Automation. Topology; Industrial Automation. 1. INTRODUCTION PROFINET network is implemented using ring topology in 1. INTRODUCTION PROFINET network is implemented ring topology 1. INTRODUCTION PROFINET network is implemented using ring in laboratory, aiming networkusing recovery time in in 1. INTRODUCTION networkto is verify implemented using ring topology topology inaa An important requirement in the world of industrial PROFINET laboratory, aiming to verify network recovery time in laboratory, aiming to verify network recovery time in a failure. In parallel, some real networks have been analyzed in An important requirement in the the availability, world of of which industrial aiming some to verify networkhave recovery analyzed time in ina An important requirement in world industrial automation is the control system is laboratory, failure. In real failure. In parallel, parallel, some real networks networks have been been analyzed in An important requirement in the availability, world of which industrial real industry plants. In Section 4 the results and the analysis automation is the control system is In parallel, some real networks have been analyzed in automation is to thethecontrol system availability, which as is failure. directly related communication network reliability, industry In 4 and real industry plants. plants. In Section Section 4 the the results results and the the analysis analysis automation is to thethecontrol system availability, which as is real are presented: influence of parameters is analyzed such as directly related communication network reliability, real industry plants. In Section 4 the results and the analysis directly related to the communication network reliability, as reported standards IEC 61784 and IEC 62439. presented: influence of parameters is such as directly related to IEC the communication network reliability, as are are presented: influence ofswitches parameters is analyzed analyzed suchthat as watchdog time, number of composing the ring reported standards 61784 and IEC 62439. are presented: influence ofswitches parameters is analyzed suchthat as reported standards IEC 61784 and IEC 62439. watchdog time, number of composing the ring reported standards IEC 61784 and IEC 62439. watchdog time, number of switches composing the ring that with MRP, and PROFINET controller computational Currently, there are several communication network works watchdog time, number of switches composing the ring that works with MRP, and PROFINET controller computational Currently, there are several several communication network works MRP, and 5PROFINET controller computational Currently, there are communication network power. with Finally, Section brings relevant conclusions. technologies for industrial applications; Real Time Ethernet with MRP, and 5PROFINET controller computational Currently, there are several communication network works power. Finally, Section brings relevant conclusions. technologies for industrial applications; Real Time Ethernet power. Finally, Section 5 brings relevant conclusions. technologies for industrial applications; Real Time Ethernet Networks (RTE) are expanding and theirReal use Time is on Ethernet the rise power. Finally, Section 5 brings relevant conclusions. technologies for industrial applications; Networks (RTE) are expanding and their use is on the rise Networks (RTE) are expanding their is the Sauter et al. (2006). latest and technologies PROFINET AND MEDIA REDUNDANCY PROTOCOL Networks (RTE) areThese expanding and their use use offers is on on specific the rise rise 2. Sauter et These latest offers 2. PROFINET AND MEDIA REDUNDANCY PROTOCOL Sauter et al. al. (2006). (2006). These latest technologies technologies offers specific specific 2. PROFINET AND MEDIA REDUNDANCY PROTOCOL characteristics (such as communication in deterministic time, 2. PROFINET AND MEDIA REDUNDANCY PROTOCOL Sauter et al. (2006). These latest technologies offers specific The PROFINET protocol is a Real Time Ethernet characteristics (such as communication in deterministic time, characteristics (such as communication in deterministic time, The PROFINET protocol is a Real Time Ethernet network synchronization between field devices, and exchange of small network characteristics (such as communication in deterministic time, The PROFINET protocol is a Real Time Ethernet network (RTE) supported by Profibus designed for synchronization field and of synchronization between field devices, devices, and exchange exchange of small small PROFINET protocol is aInternational Real Time (PI), Ethernet network data efficiently between and frequently, Felser (2005)) that are The (RTE) supported by Profibus International (PI), designed for synchronization between field devices, and exchange of small (RTE) supported by Profibus International (PI), designed for use in industrial communication networks. Characterized by data efficiently and frequently, Felser (2005)) that are data efficiently and frequently, Felser (2005)) thatet are (RTE) supportedcommunication by Profibus International (PI), designedbyforaa extremely useful in industrial applications, Duerkop al. use in industrial networks. Characterized data efficiently and frequently, Felser (2005)) that are use in industrial communication networks. Characterized by central station that communicates with field devices spread extremely useful in industrial applications, Duerkop al. in industrial communication networks. Characterized by aa extremely useful et in al. industrial applications, Duerkop etLast, al. use (2012), Akerberg (2009) and Ferrari et al. (2010). et central station that communicates with field devices spread extremely useful in industrial applications, Duerkop et al. central station that communicates with field devices spread across the network as described in Profibus International (2012),RTE Akerberg et al. al. (2009) (2009) and to Ferrari et al. al. (2010). (2010). Last, Last, central the station that communicates with field devices spread (2012), Akerberg et and Ferrari et these networks provide the users for across network as described in International (2012),RTE Akerberg et al.also (2009) and to Ferrari et al. strategies (2010). Last, across the networkthree as different describedtypes in Profibus Profibus International (2012). It supports of devices: these networks also provide the users strategies for these RTE networks also provide to the users strategies for across the network as described in Profibus International redundancy that can thetheavailability of the these RTE networks alsoincrease provide to users strategies for (2012). (2012). It It supports supports three three different different types types of of devices: devices: redundancy that can increase the availability of the redundancy that can increase the availability of the It supports three different of devices: automation system. •(2012). IO-Controller is the centraltypes station of intelligence, redundancy that can increase the availability of the automation system. • IO-Controller is the central station of intelligence, automation system. •• responsible IO-Controller central station of for is thethe management and control throughout automation system. IO-Controller is the central station of intelligence, intelligence, In the 2015, with more that 10 Millions of installed nodes, responsible for the management and control throughout responsible for the management and control the data transfer process; In the 2015, 2015, is withone more that 10 Millions of installed installed nodes, responsible for the management and control throughout throughout In the with more that 10 Millions of nodes, PROFINET of the leading solutions among the the data transfer process; In the 2015, is withone more that 10 Millions of installed nodes, the data transfer process;field devices such as sensors, • IO-Device represents PROFINET of the leading solutions among the PROFINET is one of the leading solutions among the the data transfer process; different RTE technologies. The rapid spread of PROFINET • IO-Device represents field devices such as sensors, PROFINET istechnologies. one of theThe leading solutions among the • actuators, IO-Device IO represents fieldthatdevices such as sensors, modules, exchanges information different RTE rapid spread PROFINET different RTE technologies. The rapid of PROFINET IO-Device IO represents fieldthatdevices such as sensors, installation requires a carefully of theof modules, different RTE technologies. The analysis rapid spread spread of performance PROFINET • actuators, actuators, IO the modules, that exchanges exchanges information information cyclically with IO controller; installation requires a carefully analysis of the performance installation carefully analysis the actuators, with IO modules, that exchanges information that can be requires obtained,aa and the creation guidelines. IO installation requires carefully analysisofof ofdesign the performance performance cyclically with the the IO controller; controller; • cyclically IO-Supervisor represents the engineering station. It that be and the creation design that can can be isobtained, obtained, andthe theredundancy creation of offeatures design guidelines. guidelines. cyclically with the IO controller; This paper focuser on offered by IO-Supervisor representsandthe the engineering engineering station. It that can be isobtained, andthe theredundancy creation offeatures design guidelines. •• purposes IO-Supervisor represents station. It is to configure diagnostics across This paper focuser on offered by This paper isand focuser on the offered by IO-Supervisor representsandtheperform engineering station. It PROFINET its main aimredundancy is to give features some directions to • purposes is to configure perform diagnostics across This paper is focuser on the redundancy features offered by purposes is to configure and perform diagnostics across the network. PROFINET and main aim is some to PROFINET and its itswith mainhigh aimavailability is to to give give systems some directions directions to purposes is to configure and perform diagnostics across engineers dealing built over the network. PROFINET and its mainhigh aim is to give systems some directions to the engineers dealing the network. network. engineers dealing with with high availability availability systems built built over over PROFINET PROFINET. communication can be synchronized or engineers dealing with high availability systems built over PROFINET. PROFINET communication be synchronized or PROFINET. PROFINET communication canFerrari be et synchronized or unsynchronized, as discussed incan al. (2007) and PROFINET. communication canFerrari be et synchronized or This paper is organized as follows: Section 2 addresses issues PROFINET unsynchronized, as discussed in al. (2007) and unsynchronized, as discussed in Ferrari et al. (2007) and Fontanelli et al. (2014). This paper is organized as follows: Section 2 addresses issues This paper isPROFINET organized asprotocol follows: Section 2 addresses issues unsynchronized, as discussed in Ferrari et al. (2007) and related to and Media Redundancy Fontanelli et al. (2014). This paper isPROFINET organized asprotocol follows: Section 2 addresses issues Fontanelli et al. (2014). related to and Media related to(MRP). PROFINET and the Media Redundancy Protocol Sectionprotocol 3, exposes case Redundancy of study: a Fontanelli et al. (2014). related to PROFINET protocol and Media Redundancy Protocol Protocol (MRP). (MRP). Section Section 3, 3, exposes exposes the the case case of of study: study: aa Protocol (MRP). Section 3, exposes the case of study: a Copyright © 2016, 2016 IFAC 278Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © IFAC (International Federation of Automatic Control) Copyright 2016 IFAC 278 Copyright ©under 2016 responsibility IFAC 278Control. Peer review© of International Federation of Automatic Copyright © 2016 IFAC 278 10.1016/j.ifacol.2016.11.141
2016 IFAC TA November 6-9, 2016. Porto Alegre, Brazil Fábio Alves Fernandes et al. / IFAC-PapersOnLine 49-30 (2016) 278–283
Communication between IO-Controller and IO-Devices is done by the establishment of an Application Relationship (AR). Each AR means a logical connection required to provide the exchange of data between two devices. Data to be exchanged are defined in different Communication Relationship (CR). There are different types of CRs. Cyclic process data flows over the IO Data CR, the configuration data and other acyclic data flow over the Record Data CR and real time alarm data over Alarm CR. In PROFINET, the cyclical data exchange between the IO-Controller and an IODevice can start only after all the CRs between them have been configured and parameterized, as described in Profibus International (2012).
proposes four different protocols as a solution for industry grade redundancy protocols: • • • •
•
Media Redundancy Protocol (MRP) Parallel Redundancy Protocol (PRP) Cross-network Redundancy Protocol (CRP) Beacon Redundancy Protocol (BRP)
The PROFINET technology supports many network topologies, but for the redundancy architectures, it exploits the ring topology combined with the MRP. In this manner, automation systems with increased availability can easily be built, Profibus International (2015), and Felser (2008).
Two important time parameters in PROFINET technology is the cycle time and the watchdog time. Both of them are set in the design phase and their values are very important for network performance. In details: •
279
Fig. 1 shows a ring network with a single domain MRP. The PROFINET technology may support many domains, provided that they do not overlap. A network domain must contain a device that performs the role of Media Redundancy Manager (MRM) and one or more devices with the role of Media Redundancy Clients (MRCs). The MRM logically opens the ring, preventing multicast and broadcast packet to circulate forever in the loop, a condition that may saturate the network with traffic. Each device in the ring must provide two ports connected to the ring, which are called ring ports.
Cycle time refers to refresh rate at which the cyclical IO Data are sent by a device to other one (i.e. from IODevice to IO-Controller for input data and vice versa for the output data); it should be noted that PROFINET exploits full-duplex capability of Ethernet, hence the IO Data are sent independently in the two directions. The watchdog value is the time used to monitor the correct receipt of data. Hence, the watchdog supervises the ARs, meaning that if the AR consumer does not receive any IO Data during the predefined watchdog time interval, the AR is aborted. In other word, the watchdog time is the time the automation application can still be considered working properly even if it is not receiving feedback from the field
According to Profibus International (2012) the watchdog time of a device is related with the cycle type, and it can be calculated by Equation 1: Watchdog_Time = Watchdog_Factor × Cycle_Time
(1) Fig. 1. Ring topology (closed ring)
where the Watchdog_Factor is defined as the number of consecutive messages not received by a device.
There are some limitations in the use MRP, as follows: • •
Supports up to 50 devices; the ring must consist of devices that support the MRP protocol; • ring devices must be interconnected via their ring ports and must be members of the same redundancy domain. The ports of the MRM connected to the ring may have three different statuses, which are:
In conclusion, when a problem with the provider of the IO Data occurs, or the network does not guarantee the delivery of frames to the consumer of the IO Data (and this is exactly the problem this paper deals with) the watchdog time expires cancelling also the AR. In order to re-establish the AR, all the configuration and the parametrization of the involved device (or multiple devices) must be done again, leading to a downtime (of variable duration) of the automation system.
• •
2.1 Redundancy for PROFINET • There are several possibilities to increase the reliability and availability of a communication network for industrial automation. A simple approach is through the introduction of a network redundancy strategy. General protocols derived from IT world, like RSTP (Rapid Spanning Tree Protocol) may be used, but they lack of the timeliness required by industry applications. Therefore, IEC 62439 standard
Disable: port blocks all data traffic Blocked: port blocks data traffic, except for MRP control frames and other LLDP frames (Link Layer Protocol ); Forwarding: released port, allows data traffic of all types of frames.
Generally, when there are no fails in the ring (“Close Ring” conditions) the MRM has one of the two ring ports configured as Forwarding and the other configured as Blocking. Hence, the physical ring in the network treated as a logical segment that begins at the MRM and ends at the MRC connected to the Blocked port of the MRM. (MRC_3 in Fig. 1). In an event where the connection along the ring fails (no 279
2016 IFAC TA 280 Fábio Alves Fernandes et al. / IFAC-PapersOnLine 49-30 (2016) 278–283 November 6-9, 2016. Porto Alegre, Brazil
matter what is the cause), MRM is sole responsible for changing port status, dealing with the failure. The MRM port changes from Blocked to Forwarding, opening another path for data flow. After a failure appears in the ring, the network is called "Open Ring", (see Belie et al. (2013) ).
a delay that may be considered while the planning of the ring topology; it should be remembered that the physical ring topology is maintained by the MRM as a logical cascade of switches (see previous section). Following the PROFINET installation guidelines published by the Profibus International (2014), the limit of switches in a row depends on the update time of every device and the switch forwarding technique (“store and forward” or “cut through”). The limit is related to the forwarding time required by the switch, which is smaller for “cut-through” switched than for store-forward” switches. The suggested limits are reported in Table I.
In order to detect faults, the MRM constantly sends test frames every preset time called TST default at both Forwarding and Blocked ports, as showed in Fig. 1. Usually, the test frames travels along the ring in the two directions (the ring is full duplex!) and reach the opposite ports after passing through all the MRCs in the ring. The time within the MRM must get back the control frames is defined as TestTimer. If the test frames do not return after the expiring of TestTimer, the MRM increases an internal counter (Test Counter). If the counter reaches a certain predetermined value, defined as TSTNRmax (typically 3 or 5), the MRM declares a ring failure. In the case of frames return to MRM, the Test Counter is reset again.
Analyzing the numbers in the tables, it is clear that only a configuration with short cycle time and “store and forward” switched may lead to visible impact on the ring topology recovery time. On the other hand, the switches that are used both in the laboratory experiments and in real industrial applications are “cut-through”, because they generally offer better performance. Consequently, at the moment, it is very hard to experimentally evaluate the influence of the number of switches in network recovery time and it will be considered in future works.
According to IEC62439 , the time to ring failure detection (Tdetection) is given by equation 2. Tdetection = TSTdefault × TSTNRmax
(2)
After failure detection, the MRM must inform all the MRCs that the ring is open and they have to take the proper countermeasures. In details, the MRM sends Topology Change frames on both its ring ports (that now, as state before, are in the Forwarding state); the MRC, receiving the Topology Change frames, clear all the entries of their MAC Address table (i.e. the Ethernet Layer 2 address table used to forward frames) related to the two ring ports.
Table 1. Number of switches in a row depending on type Cycle time (ms) 1 2 4 8
When the failure is repaired, the MRM can receive again the test frames. After receiving just one test frame, the MRM reacts immediately, blocking one of its ring ports and Topology Change frames. Thus the detection of the “Close Ring” is faster than the detection of the “Open Ring”.
•
7 14 28 58
Number of “store and forward” switches 64 100 100 100
3. CASES OF STUDY In this paper, two classes of cases of study have been considered: laboratory based cases and real application cases.
The recovery time of a MRP ring topology could be defined as the sum of: • •
Number of “cutthrough” switches
A. Laboratory experimental setup The case of study proposes to identify the influence of the watchdog time on the network recovery time in a ring topology. For this reason, a PROFINET network had been implemented using the equipment shown in Table 2. Fig. 2 shows the connections between the equipment in order to obtain a ring topology. The MRM of the system is the switch connected to the IO-Controller, which is marked with a red point in the Fig. 2. It is important to highlight that the ring is only composed by managed switches in this work. The AR and CR are created between the IO-Controller and the IODevice for communication proposed.
the time needed by the MRM to detect the failure, the time that the MRM uses to send Topology Change frames the time all MRCs take to adapt and learn again the forwarding path.
According to IEC 62439, the time limit for a network that uses the MRP recover is 200 milliseconds. As result clear from the above, the MRP only take care of the Ethernet Layer 2 network, and it does not take into account the effect that ring reconfiguration has on the application protocols transported over the ring. In particular, MRP does not consider that a ring reconfiguration may cause the disruption of the data exchange between IO Controller and IO Device.
A measuring system for capturing the network traffic was used for network recovery time verification. This measurement system consists of an Industrial Ethernet TAP model EDS2100 made by Kunbus: such a device must be connected in between (in series) two Ethernet stations and it is able to copy all the Ethernet traffic passing through its monitored ports. All the traffic is sent to a monitor station using a third Ethernet port. The TAP is almost transparent to the network, inserting delays less than 1 nanosecond. By the way the TAP inserts a timestamp in network packets with a
2.2 .Switch cascading in PROFINET It may be interesting the evaluation of the influence on the in recovery time of the number of switches installed between the IO-Controller and the IO-Device, called “line depth”. This parameter is important because every switch introduces 280
2016 IFAC TA November 6-9, 2016. Porto Alegre, Brazil Fábio Alves Fernandes et al. / IFAC-PapersOnLine 49-30 (2016) 278–283
resolution of 10ns. In the considered experimental setup the collected packets are sent to a computer station with Wireshark software and PNT (PROFINET Network Analysis Tool) for offline analysis.
Controller, but several IO-Devices. For this reason, in these real cases, the TAP in inserted series with the link of the IOController, and the recovery time is the time interval that the IO Controller takes to return in exchange data with all the IO Devices in the system. A sample network topology for the case of measurements taken in real systems is shown in Fig. 3.
The analysis methodology consists in evaluation the behavior of the cyclic data exchange between the IO Controller and the IO Device in order to determine the effect of ring opening and closing. For this reason, the TAP is installed between the IO Device and a switch.
Table 3. Network parameters in real cases. Network L1 L2 L3 L4
Table 2. Equipment in the implemented network. Quantity 1 1 3 1 1
Description CPU S7 1200 ET200-S Scalance X208 TAP EDS 2100 PC + Wireshark + PNT
281
Function IO-Controller IO-Device Switch Meas. System Meas. System
IO-Devices 76 100 142 225
Switches in the ring 8 19 13 11
The fault simulation procedure for real networks is the following: •
•
•
initially all the ring traffic passes through the cable A, while the port of cable C is disabled and no traffic is flowing while the network is running properly, cable A is removed manually performing a break (failure) in the ring topology. From this moment the network status is called "Open Ring". after a few seconds, cable A is reconnected to its original position, and the network returns to its former status, called "Closed Ring".
A C
Fig. 2. Network topology for the study The fault simulation procedure is the following: •
while the network is running properly, cable B is removed manually performing a break (failure) in the ring topology. From this moment the network status is called "Open Ring".
•
after a few seconds, cable B is reconnected to its original position, and the network returns to its former status, called "Closed Ring".
The suggested procedure provides scenarios to verify the recovery time, i.e. measuring the time interval that IO Controller takes to return in data exchange with the IO Device through the test system. These scenarios are played for different values of watchdog parameter. B Real industry plant In order to evaluate more complex scenario and also to show how industry is dealing with the high availability of networks, real application cases as used in industry have been evaluated. Measurement campaigns have been carried out in 4 different configurations used in real plant during normal production. The Table 3 shows the parameters of the different real PROFINET networks. All the networks have a single IO-
Fig. 3 Network topology for the case of real networks. The measurement point is the link of the controller. 4. RESULTS The first experimental observation is that, during laboratory test, the communication between IO Controller and IO Device may be interrupted twice during the fault simulation 281
2016 IFAC TA 282 Fábio Alves Fernandes et al. / IFAC-PapersOnLine 49-30 (2016) 278–283 November 6-9, 2016. Porto Alegre, Brazil
procedure: when performing the break, removing cable B, from "Closed Ring" to "Open Ring" status, and again when reconnecting cable B, from "Open Ring" to "Closed Ring" status. Fig. 4 - Graph 1 shows the interruption of the traffic toward the IO-Device (as seen by the TAP installed according Fig. 2) when the ring is opened and when ring is closed again.
experiments were done using different values for watchdog in IO Device (ET200S): 60, 40 and 6 milliseconds. Each experiment is repeated seven times. The results for the average RT and RRT are shown in Table 4. The value of TSTdefault and TSTNRmax are respectively 20ms and 3. From the results of experiments A it is clear that the detection of the “Open Ring” is slower than the detection of “Closed Ring”, since in the last case just one test frame is needed. By the way, the condition of “Closed Ring” is very dangerous if the MRM does not quickly block one of its ports.
This behavior is easily explained since it depends on the MRM behavior. In details: • • •
At the beginning of the experiments the MRM has the link A in forwarding and the link C blocked. Manually disconnecting link B forces MRM to put also C in forwarding. Reconnecting link B forces MRM to block again link C. MRM is the sole responsible for loop control and this is the unique option that it has to logically open the ring.
Table 4. Average values for RT and RRT in experiments
In order to confirm this behavior the TAP has been temporary moved on the link C: the Fig. 4 - Graph 2 shows that the data exchange between IO Controller and IO Device in the cable C is only present when the link B is disconnected.
Experiment
Watchdog [ms]
A
60
B
40
C
6
Value Minimum Average Maximum Minimum Average Maximum Minimum Average Maximum
RT [ms] 48 55 60 3506 3520 3534 3460 3465 3470
RRT [ms] 18 23 28 18 21 28 3466 3472 3480
Again from experiment A, it can be noted that the watchdog value is greater than both RT and RRT. Frames are lost and do not arrive to their destinations (either IO-Controller or IODevice), but the watchdog do not expires and the AR between IO Controller and IO Device, is not aborted. From the application point of view, the redundancy switchover controlled by MRP at Ethernet Layer 2 is seamless (i.e. transparent) for the application. In the experiment B, the watchdog set to 40ms. In this case, the watchdog expires before the MRP process of ring reconfiguration is completed, and the AR between IO Controller and IO Device is cancelled. As a consequence, the IO-Controller takes RT=3.52 s to re-establish the data exchange because a new configuration procedure must take place. Again, in experiment B, the MRP ring reconfiguration after reconnection is faster that the watchdog and in this case the RRT remains the same as in experiment A. Fig. 4.
Data traffic behavior during the experiment.
If (in experiment C) the watchdog time is decreased to 6 ms (a value lower than any MRP reconfiguration times), the AR is cancelled both when disconnecting and when reconnecting the cable B. RT and RTT are almost identical with a value greater than 3.4 s.
The PROFINET cycle time is 2 ms. The number of packets per period is shown, with the period = 100 ms. Thus, it is worth to define two kind of time interval before carrying out the experiments: •
•
From these experiments, it is clear that the watchdog is the real threshold between a seamless transition at application level or a loss of communication. More in detail, if the MRP reconfiguration time is lower than the watchdog, the application does not halt while, if it is higher, the application stops.
the Recovery Time (RT): the time interval needed to reestablish the communication between the IO Controller and the IO Device when a failure happens in the ring topology. the Reconnection Recovery Time (RRT): the time interval needed to reestablish the communication between the IO Controller and the IO Device when the ring is closed again.
4.2 Recovery time in real systems In order to avoid application from stopping when the ring is opened or closed, the first approach is to increase the watchdog time at a value grater that the MRP reconfiguration time. However increasing the watchdog time results in
4.1 Watchdog influence in the laboratory tests For watchdog influence analysis on the RT and RRT, three 282
2016 IFAC TA November 6-9, 2016. Porto Alegre, Brazil Fábio Alves Fernandes et al. / IFAC-PapersOnLine 49-30 (2016) 278–283
increasing the time the application is assumed to work properly without feedback. In real application, this time cannot be increased without limits, because safety issues came into play (e.g. motors running without feedback, chemical reactions without control, etc.).
increase of traffic; perform benchmark on the selected IOController to assess the time needed to configure and parametrize the whole network. ACKNOWLEDGMENT The authors acknowledge the academic support and research structure from the Laboratory of Industrial Automation of the Engineering School of Sao Carlos, at the University of São Paulo. REFERENCES
As shown in Ferrari et al. (2011), real PROFINET networks have tens of devices, many switches in the ring, and are managed by IO-Controller with finite computational capability. All these constrains contribute to increase the RT perceived by the application, but only when the RT is greater than the watchdog. The Table 5 shows the recovery times of different real PROFINET networks using ring topology.
Akerberg, J., Bjorkman, M. (2009) Introducing Security Modules in PROFINET IO. In: Proc. of IEEE Emerging Technologies and Factory Automation 2009, Mallorca. IEEE, p. 1-8. Belie, F.; Martinovic, G. (2013). Model of Influence of MRP on Network Performance. In: Proc. of IEEE Symposium on Computers and Communications, Split, Croatia, July 07-10. Duerkop, L. et. al. (2012). Towards Auto configuration of Industrial Automation System: A Case Study Using PROFINET IO. In: Proc. of IEEE Emerging Technologies and Factory Automation, 2012. New York: IEEE, 2012. Felser, M. (2005). Real-time Ethernet - Industry Prospective. Proceedings of IEEE, v. 93, n.6, p. 1118-1129, New York Felser, M. (2008). Media Redundancy for PROFINET IO. In: Proc. of IEEE International Workshop on Factory Communication Systems 2008, Dresden, May 21-23 Ferrari, P., Flammini, A., Marioli, D., Taroni, A., Venturini F. (2007). Evaluation of timing characteristics of a prototype system based on PROFINET IO RT Class 3. In: Proc. of IEEE Emerging Technologies and Factory Automation 2007. p.1254-1261. Ferrari, P., Flammini, A., Rinaldi, S., Sisinni, E., (2010). On the Seamless Interconnection of IEEE 1588 – Based Devices Using a PROFINET IO Infrastructure. IEEE Transactions on Industrial Informatics, Vol. 6, No 3, 2010, pp. 381-392 Ferrari, P., Flammini, A., Venturini, F., Augelli A. (2011). Large PROFINET IO RT networks for factory automation: a case study. In Proc. of Emerging Technologies and Factory Automation 2011, pp. 1-4 Fontanelli, D., Macii, D., Rinaldi S., Ferrari, P., Flammini, A. (2014). A servo-clock model for chains of transparent clocks affected by synchronization period jitter. IEEE Transactions on Instrumentation and Measurement. Vol. 63, No. 5, pp. 1085-1095 Profibus International. (2012). Application Layer protocol for decentralized periphery and distributed automation, Technical Specification for PROFINET IO, v. 2.3. Available at http://www.profibus.com Profibus International. (2014). PROFINET, Design Guideline. v. 1.14, December 2014. Available at http://www.profibus.com Profibus International. (2015). PROFINET IO System Redundancy. v. 1.10. Available online at http://www.profibus.com Sauter, T. , Soucek, S. , Kastner, W. , Dietrich, D. (2011). The evolution of factory and building automation. IEEE Industrial Electronics Magazine. V. 5, Issue 3, pp. 35-48.
The analysis of the experimental data, obtained in network with the same IO-Controller model, shows that the RT is comparable with the RT measured in laboratory in the networks L1, L2 and L3, despite the higher number of IODevices in these real cases. On the contrary in network L4, when the number of devices that abort the AR with the IOcontroller during the redundancy switchover increases too much, the RT jumps to very high value. The explanation is that the IO-Controller cannot configure all these devices at the same time, but it needs several runs to put smaller groups of devices back in data exchange. It results a longer RT when the computational capability of the IO-Controller is exceeded. Table 5. Recovery time (RT) in real cases. Network
IODevices
Switches in the ring
L1 L2 L3 L4
76 100 142 225
8 19 13 11
Aborted ARs during switchover 7 15 23 43
283
RT (s) 3,67 3,71 4,11 12,71
5. CONCLUSIONS Undoubtedly, the MRP protocol is an important technique to increase the availability of an industrial automation network, as well as being a simple strategy and easy to apply. This work developed a simple methodology to measure with high resolution the recovery time in a PROFINET network in ring topology using MRP protocol. The experiments show that if MRP reconfiguration time is lower than the watchdog time, the automation application does not halt, while if it is higher the application stops. In addition, when the application stops, the recovery time increase a lot due to the new configuration and parametrization of all the ARs between IO-Devices and the IO-Controller. This last contribution is depending on the number of device in the network, on their complexity (more parameters), and on the computational capability of the IOController As a result, it is suggested to: configure the watchdog time as high as possible for a safe operation of the automation application. (taking into account all the application dynamics.); minimize the MRP reconfiguration time shortening the TSTdefault compatibly with the resulting 283