G Model
ARTICLE IN PRESS
SUSCOM-324; No. of Pages 12
Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
Contents lists available at ScienceDirect
Sustainable Computing: Informatics and Systems journal homepage: www.elsevier.com/locate/suscom
Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus Weiwen Chen a , Keni Qiu a,∗ , Shaonan Zhang b , Jiqin Zhou a , Weigong Zhang a a b
Capital Normal University, Beijing, China Cambricon Technologies, China
a r t i c l e
i n f o
Article history: Received 27 November 2017 Received in revised form 30 December 2018 Accepted 25 February 2019 Available online xxx
a b s t r a c t UM-BUS, a reconfigurable high-speed device interconnected bus, has been proposed to enable lightweight sensor system deployment in IoTs. Performance prediction is a key step to build an idea of the worst or best cases before real world deployment of UM-BUS-based systems. This paper proposes a queuing theory guided analytic model which allows us to obtain an approximation for the average packet delay as well as upper and lower bounds. Furthermore, a Revive Interval (RI)-based technique is proposed to establish a dynamic logical cycle frame to involve only active nodes so as to improve the bus efficiency. A set of experiments based on MATLAB simulation are conducted to evaluate the bus performance. The proposed dynamic cycle frame scheme is validated to be effective for a high performance and low power design. © 2019 Elsevier Inc. All rights reserved.
1. Introduction Distributed embedded systems generally comprise multiple spatially distributed nodes which support gathering and processing information from different kinds of sensors or actuators. These distributed nodes typically exchange messages amongst themselves to fulfill their tasks via an appropriate network. In this paper we focus on the scenario where these nodes communicate over a novel high speed serial bus named UM-BUS [1]. UM-BUS exploits multiple lanes to transmit information concurrently to gain a high bandwidth of up to 6.4Gbps and meanwhile these lanes are also redundant backup of each other to tolerate the lane faults. Fig. 1 presents a design of automotive collision avoidance system built on UM-BUS which can offer a transmission speed of up to 3.2Gbps with 16 lanes. Most of the distributed nodes which are referred as slave nodes have no processors or ECUs (Electronic Control Units) [2], correspondingly they are all controlled by the Processor Module (the light blue block) which is referred as master node. In comparison to existing protocols popularly used in distributed embedded systems such as CAN [3,4], PCIe [5,6], 1553B [7], RapidIO [8], FlexRay [9,10], UM-BUS not only has the advantage of higher bandwidth and reliability, but also can simplify the process-
∗ Corresponding author. E-mail address:
[email protected] (K. Qiu).
ing model of the embedded system by minimizing the needed ECUs or distributed processors in the system [11,12]. UM-BUS can be configured to transmit various signals such as text, audio and video in different scenarios of Internet of Things (IoTs) [13,14]. For this reason, the transmission features of UMBUS link are different under diverse application scenarios. Basically, there are two kinds of methods to obtain the transmission features by testing: field measuring and model analysis. Since field measuring needs a real system deployment, it is not appropriate to conduct a fast and economic evaluation in the early stage of deployment. This paper focuses on the latter method and presents performance evaluation guided by an analytic model [15,16]. The experimental results hint the design factors and offer a good guidance for real world deployment based on UM-BUS. In summary, this paper makes the following contributions.
• First, the proposed analytic model for UM-BUS performance evaluation is proved to follow M/M/1 queuing theory. Directed by the model, the lower and upper bounds for the average packet delay, as well as an approximation for the average packet delay are all fully discussed. • Second, a Revive Interval (RI)-based technique is proposed to improve UM-BUS efficiency by dynamically maintaining only active nodes in the logical cycle frame. In this way, a high performance and energy-efficient bus system can be obtained by
https://doi.org/10.1016/j.suscom.2019.02.004 2210-5379/© 2019 Elsevier Inc. All rights reserved.
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model SUSCOM-324; No. of Pages 12
ARTICLE IN PRESS W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
2
Fig. 1. A design of automotive collision avoidance system built on UM-BUS.
reducing both the response time of active nodes and the energy during bus transmission in a cycle frame. • Third, a set of experiments are conducted to evaluate the performance bounds. In particular, Energy-Performance Product (EPP) is proposed to evaluate the efficiency of the RI-based dynamic cycle frame scheme. The results validate that the dynamic scheme outperforms the fixed scheme in terms of energy efficiency. The organization of this paper is as follows. First we introduce the structure and protocols of UM-BUS and the queuing theory in Section 2. Then we derive the analytic model based on queuing theory and the dynamic cycle frame scheme in Section 3 and Section 4 respectively. The evaluation methodology and experimental results are presented in Section 5. Related work is described in Section 6. Finally, conclusions are given in Section 7. 2. Background information In this section, we first introduce the structure and protocols of UM-BUS. Then the preliminaries of queuing theory is described. 2.1. The structure and protocols model of UM-BUS UM-BUS has three layers, namely, transaction layer, data link layer and physical layer, as shown in Fig. 2. Transaction layer is the main controller of UM-BUS. It is responsible for the management of the entire bus communication system and provides feedback to external devices or upper layer. In transaction layer, we have designed and implemented the address mapping and management protocol which can support the host to access the internal memory of the slave node directly. Data link layer consists of transport sub-layer and MAC sublayer. Transport sub-layer is responsible for data grouping and transporting. With the protocol of transport sub-layer, communication data are allocated to all the normal lanes according to the fault table of bus lanes. Hence, the faulted lanes can be masked automatically, which means the fault in the bus can be tolerated
dynamically. MAC sub-layer is responsible for the management of lanes status including lane detection and fault table maintenance. Physical layer consists of logical sub-module and electrical submodule. Among those, the logical sub-module is responsible for data coding (8b/10b), error checking/correcting and clock synchronization. The electrical sub-module stipulates the physical connection characteristics of UM-BUS. The UM-BUS adopts bus topology where the distributed nodes of embedded system can be interconnected directly. All the nodes of UM-BUS are divided into two sets including one master node and multiple slave nodes, among which only the master can sponsor a communication process. Accordingly, the slave nodes can only response to the command of the master and complete the corresponding information acquisition and actuation under the control of it. The UM-BUS is based on MLVDS and uses multiple lanes to transmit information concurrently and meanwhile these lanes are also redundant backup of each other [17]. Normally, communication data is allocated to all lanes by the bus controller. However, if one or a few lanes fail, the bus controller can detect the fault timely and allocate the data to the rest of healthy lanes. Thus, multiple failures in lanes or circuits of nodes can be dynamically tolerated [1]. All bus accesses are transformed into protocol packets at the transaction layer. We can also use protocol packet format directly to complete the common communication without the address transforming and managing protocol of transaction layer. The UM-BUS uses master − slave Command − Response communication protocol to transmit data. First, the master sends a command packet to the slave. Then the slave receives the packet and understands the command. Finally, the slave sends a response packet including communication status and data to the master. Both the command packet and the response packet at the transaction layer have the same format as shown in Table 1. In the process of bus communication, all bus accesses are in the form of exchanging protocol packets between the master and the slave [18]. The protocol packets of UM-BUS can be divided into two classes including short packets and long packets, as shown in Table 1. The Command Header (referred as Short Packet) is used to
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model SUSCOM-324; No. of Pages 12
ARTICLE IN PRESS W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
3
Fig. 2. The protocol model of UM-BUS.
Table 1 The protocol of data packet of UM-BUS. Command header (16 bytes)
Data (1025 bytes)
2.2. Preliminaries of queuing theory Destination node
1B
Source node Command frame Address offset Short-packet data Acknowledge command Acknowledge status Command CRC Long-packet data Data CRC
1B 1B 6B 4B 1B 1B 1B 1024B 1B
transmit control or status information of 16 bytes in Table 1. The Header+Data Packet (referred as Long Packet) is used to transmit a large chunk of information of 1041 bytes. The command frame specifies the packet type. There are eight kinds of packets in UM-BUS protocol currently, including I/O space read command, I/O space write command, memory space read command, memory space write command, configuration space read command, configuration space write command, response packet with data and response packet without data. To sum up, the UM-BUS protocol exhibits the following three features: (1) UM-BUS is organized by general bus-topology, so all the nodes can be interconnected directly. (2) UM-BUS has two kinds of nodes including one master and multiple slaves, among which only the master can sponsor the communication process. (3) The dynamical lane grouping mode of UM-BUS makes it possible to gain high transmission rate by configuring multiple lanes.
Queuing theory is the mathematical study of waiting queues. In queuing theory a model is constructed so that queue lengths and waiting times can be predicted [19,20]. Queuing theory is generally considered a branch of operations research because the results are often used when making business decisions about the resources needed to provide a service. An M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process and job service times have an exponential distribution. The model is the most elementary of queueing models and an attractive object of study as closed-form expressions can be obtained for many metrics of interest in this model. An M/M/1 queue is a stochastic process whose state space is the set 0,1,2,3, . . . where the value corresponds to the number of customers in the system, including any currently in service. • Arrivals occur at rate according to a Poisson process and move the process. • Service times have an exponential distribution with rate parameter, where 1/ is the mean service time. • A single server serves customers one at a time from the front of the queue, according to a first-come, first-served discipline. When the service is complete the customer leaves the queue and the number of customers in the system reduces by one. • The buffer is of infinite size, so there is no limit on the number of customers it can contain. In M/M/1 queue, the mean waiting time Tw and the mean queuing time Tq can be obtained as below. 2
Tw =
Ts 2(1 − )
(1)
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model
ARTICLE IN PRESS
SUSCOM-324; No. of Pages 12
W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
4
Fig. 3. Cycle frame.
Fig. 4. Queuing model of UM-BUS. 2
Tq = Tw + Ts =
Ts Ts (2 − ) + Ts = 2(1 − ) 2(1 − )
(2)
In queueing theory, a discipline within the mathematical theory of probability, Little’s Law, is quite remarkable. It can be expressed algebraically as below: L = W
(3)
In Eq. (3), L, and W denote the long-term average number of customers in a stable system, the long-term average effective arrival rate, and the average time a customer spends in the system, respectively. It indicates that the relationship is not influenced by the arrival process distribution, the service distribution, the service order, or practically anything else. This paper exploits the queuing theory to build the performance model and evaluate the performance bounds before a real world deployment of UM-BUS. 3. Queuing theory guided evaluation model In this section, the performance model guided by queuing theory is first built. And then the lower and upper bounds for average packet delay are derived accordingly. 3.1. Performance model As UM-BUS adopts a command-response mechanism to schedule the packet transmission [1], we propose a concept of cycle frame for further explanation as depicted in Fig. 3. A cycle frame covers a complete responding process from each slave node upon a command from the master. Since the data volume transmitted in different cycles is varied, the lengths of cycle frames vary as well. Following this concept, these assertions can be obtained as below. First, data packet transmission on UM-BUS follows a Poisson process. The underlying reason is that we cannot tell how many long or short data packets will be sent during a cycle frame. The length of a cycle frame cannot be predicted. And data packet arrivals at UM-BUS is a Poisson process. Second, the transmission time of a data packet follows a negative exponential distribution. The queuing model of UM-BUS can be described in Fig. 4. The above
properties indicate that the M/M/1 queuing model can well depict data packet transmission behavior of UM-BUS. Four possible cases are considered in this paper as shown in Fig. 5. (1) Case 1: All the nodes only transmit short data packets on UMBUS link. (2) Case 2: Only one node transmits long data packets on UM-BUS link. (3) Case 3: All the nodes only transmit long data packets on UMBUS link. (4) Case 4: Some nodes transmit long packets while others transmit short ones on UM-BUS link. 3.2. Lower and upper bounds for average packet delay Directing by the M/M/1 queuing analytic model, the performance of UM-BUS can be evaluated before deployment. Latency bound study is of great value to guide real-world UM-BUS applications. This paper discusses the latency in the following four cases. The relevant parameters are described in Table 2. (1) Case 1: Lower bound 1 for the average packet delay: The lower bound 1 Delaylow1 for the packet delay occurs when all the nodes transmit only short data packets (Command Header). In this case, the average transmission speed t is equal to the length of a short packet ms divided by the bandwidth B of UM-BUS link that can be expressed as below. t¯ =
ms B
(4)
The average service time of transmitting a short packet Txlow1 is fixed by expressed as below. Tslow1 = t¯ =
ms B
(5)
Adopting the M/M/1 queuing model, the lower bound for the mean queuing time Tq can be expressed as below. Delay
low1
= Tqlow1 =
Tslow1 (2 − ) 2(1 − )
(6)
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model
ARTICLE IN PRESS
SUSCOM-324; No. of Pages 12
W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
5
Fig. 5. Four cases of data packet transmission on UM-BUS link.
Table 2 The parameters of UM-BUS’ queuing model.
Adopting the M/M/1 queuing model, the lower bound for the mean queuing time Tq can be expressed as below.
Parameter Description B P t ml ms n Tq
The bus utilization Bandwidth of UM-BUS link The number of nodes The average busy time of UM-BUS link The length of data part of a long packet (1025Bytes) The length of a short packet (16Bytes) The number of short packets The queuing time in UM-BUS link including waiting time and data transmission time
(2) Case 2: Lower bound 2 for the average packet delay: The lower bound 2 Delaylow2 for the packet delay occurs when only one node transmits long packets (Command Header+Data) while others transmit only short data packets (Command Header). In this case, the average transmission speeds of the data part of a long packet and a short packet can be expressed as below. For data part of a long packet, the average transmission speed t¯1 is equal to the length of data part of a long packet ml divided by the bandwidth B of UM-BUS link: t¯1 =
ml B
t¯ =
(8)
Supposing the number of slave nodes is P, the average service time Tslow2 can be expressed as below. t¯1 + P ∗ t¯2 = P
ml +ms B
+ P
(P−1)ms B
=
1 (m + Pms ) PB l
(9)
= Tqlow2 =
Tslow2 (2 − ) 2(1 − )
(10)
ml + ms B
(11) upp
The average service time of transmitting a long packet Ts fixed by expressed as below. upp
Ts
= t¯ =
ml + ms B
is
(12)
Adopting the M/M/1 queuing model, the upper bound for the mean queuing time Tq can be expressed as below. upp
(7)
ms B
Tslow2 =
low2
(3) Case 3: Upper bound for the average packet delay: The upper bound for the average packet delay occurs when all the nodes only transmit long packets (Command Header+Data) on UM-BUS link. In this case, the average transmission speed t is equal to the length of a short packet ms plus the length of data part of a long packet ml divided by the bandwidth of UM-BUS link that can be expressed as below.
Delay
For a short packet, the average transmission speed t¯2 is equal to the length of a short packet ms divided by the bandwidth B of UM-BUS link: t¯2 =
Delay
upp
= Tq
upp
=
Ts (2 − ) 2(1 − )
(13)
(4) Case 4: The general average packet delay: The general average packet delay occurs when some nodes transmit long packets (Command Header+Data) while others transmit short data packets (Command Header) on UM-BUS link. In this case, the average transmission speeds of the data part of a long packet and a short packet can be expressed as below. For data part of a long packet, the average transmission speed t¯1 is equal to the length of data part of a long packet ml divided by the bandwidth B of UM-BUS link: t¯1 =
ml B
(14)
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model SUSCOM-324; No. of Pages 12
ARTICLE IN PRESS W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
6 Table 3 Active/inactive record.
4.1. Tag
Slave node
Current status (1-bit)
Past record (7-bit)
1 2 ... n
Active/inactive (1/0) Active/inactive (1/0) Active/inactive (1/0) Active/inactive (1/0)
continuous Inactive times out of 128 continuous Inactive times out of 128 continuous Inactive times out of 128 continuous Inactive times out of 128
For a short packet, the average transmission speed t¯2 is equal to the length of a short packet ms divided by the bandwidth B of UM-BUS link: t¯2 =
ms B
First, a table to record each slave node’s current active/inactive status and past active/inactive time is built at the master node side. As Table 3 shows, the 1-bit current status indicates the corresponding node is active or inactive and the 7-bit past record continuous indicates the inactive times of the past 128 Command-Response cycles. For each slave node, we use its Most Significant Bit (MSB) of the Acknowledge Status (1B) to indicate the current transmission is useful (referred as active) or dummy (referred as inactive).
(15)
Supposing P and n denote the number of slave nodes, the number of short packets, respectively. The average service time can be expressed as below.
4.2. Delete
When the past record of a slave node is larger than a Threshold (TH), it is regarded as inactive node. In other words, if a slave node (P−n)(ml +ms ) fails TH times to transmit useful information, it will be excluded s + nm m¯ + ms 1 B B Tsave = l = = ((P − n)(ml + ms ) + nms )(16) from the active list from every other cycle frame. The value of B P PB Threshold can be determined in the range of 1128, depending on Adopting the M/M/1 queuing model, the average delay for deployment scenarios. If the length of the logical cycle is relatively the mean queuing time Tq can be expressed as below. long, TH is set to a small value; otherwise, it is often set to a big value. As indicated in Fig. 6, when the past record of Node 2 reaches ave T (2 − ) ave the threshold of 64, it will not be included in the active list in the Delay = Tqave = s (17) 2(1 − ) next cycle frame. 4. Optimization with dynamic logical cycle frame Since UM-BUS adopts the Command-Response protocol, only the master node can initiate communication. During a default complete communication cycle, the master node needs to query all the slave nodes, in spite whether the slave nodes have useful information or just feed back a dummy response. Actually, this is not an efficient way because the inactive nodes will waste the bus slots. Addressing this problem, this paper proposes a Revive Interval (RI)based method to dynamically add active nodes into the bus link and delete inactive nodes out of the bus link. In other words, slave nodes can be dynamically added and deleted from the cycle frames. In this way, the bus can be more effective to transmit useful information only for active nodes. The detailed solution is depicted in the following operations.
4.3. Add At the end of each cycle frame, there is an interval, called Revive Interval (RI) in our optimization scheme. RI can be exploited by the inactive slave nodes which are out of the logical cycle frames and want to be inserted in it. In each cycle frame, the interval can be only used for the master node to query one inactive node. For example, in Fig. 7, Node 2 can use the RI to be included in the active list of the logical cycle frame when its Past Record reaches 64. Note that if one inactive node wants to be inserted, it must wait the master node to ask it during the RI instead of requesting due to the CommandResponse protocol. Therefore, the longest time for an inactive node to wait for its revival is n − 1 cycle frames.
Fig. 6. The Delete Operation.
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model SUSCOM-324; No. of Pages 12
ARTICLE IN PRESS W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
7
Fig. 7. The add operation.
4.4. Benefits With the concept of dynamic logical cycle frame, only the active nodes are involved in the bus service, which can lead to a high performance and energy-efficient bus system. First, the response time of the active nodes can be reduced because the length of the logical cycle frame is actually shortened. Second, the energy of the bus system can be reduced since the dummy response from inactive nodes is greatly eliminated. In all, a high energy efficiency can be achieved with the combined effects. 5. Experimental evaluation By deriving the lower, upper and average bounds of transmission speed, we evaluate how the bus performance (Tq ) is impacted from bus utilization, bus transmission speed, the number of nodes issuing long or short packets and mixed factors of them. Furthermore, we compare the RI-based dynamic cycle frame scheme to the fixed cycle frame scheme and validate the efficiency of the proposed optimization technique. MATLAB is employed to simulate the queuing model and obtain the simulation results. 5.1. Impact study from utilization () Fig. 8 presents how the utilization p impacts on average queuing time Tq . The transmission speed of UM-BUS link is set as 100Mbps, 200Mbps, 300Mbps and 400Mbps respectively. In case 4, n is set as 1/2P. It can be seen that data packet delay grows as the utilization increases. In summary, the following observations can be obtained by analyzing the relationship of Tq and . In Case 1 where all the slave nodes transmit short packets, packet delay dramatically increases when the bus utilization is larger than 0.97. As such, the bus performance is rapidly degraded. Similar trend can be observed for other cases. For example, given the transmission speed of 400Mbps, packet delay remarkably increases from the points of = 0.95, 0.85 and 0.92 for Case 2, 3 and 4 respectively. This can be explained that the higher utilization of bus link implies larger throughput of transmission data and larger queuing time.
Furthermore, it can be seen that the gradients of Tq are different turns as the bandwidth of UM-BUS varies by comparing these four subfigures. This is because lower bandwidth indicates less bus service time. 5.2. Impact study from transmission speed (B) Fig. 9 shows the relationship between average queuing time Tq and transmission speed (B) with a fixed slave node number of 20. The utilization () of UM-BUS link is set as 0.5, 0.6, 0.8 and 0.9 respectively. The best performance can be obtained when only short packets are transmitted through UM-BUS. Analogously, it degrades as long packets increases. The results show that the average packet delays are 0.01 ms, 0.02 ms and 0.15 ms when transferring all short packets, only one long packet and all long packets respectively, given that the bus utilization () is 0.9 and the transmission speed is 400Mbps. It is interesting that the bus performance maintains steady when the transmission speed is larger than 400Mbps regardless of different cases. This indicates that 400Mbps is a sufficiently high bandwidth to provide fast data transmission through UM-BUS link. Furthermore, it is observed that Tq is larger as the bus utilization () gets larger. This can be explained that heavy bus traffic implies large queuing time. 5.3. Impact study from the number of nodes which issue long packets Figure 10 presents the impact of node number (P) on average queuing time Tq . The parameter n denotes the number of nodes transmitting short packets and the utilization () is fixed as 0.9. The following three characteristics can be observed. (1) As mentioned before, only one slave node transmits long packets in Case 2. When the total number of nodes (P) is changing from 0 to 10, data packet delay rapidly goes down. As such, the bus performance is rapidly improved. When the node number is larger than 10, data packet delay decreases slowly.
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model SUSCOM-324; No. of Pages 12
ARTICLE IN PRESS W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
8
Fig. 8. Tq vs. bus utilization ().
(2) It is known that some slave nodes transmit long packets while others transmit only short packets in Case 4. When the total number of nodes (P) changes from 0 to 40, data packet delay increases. (3) Tq decreases as the number of short packets grows. 5.4. Impact study with mixed factors Figure 11(a) shows how the queuing delay changes with bus bandwidth under different node number in Case 2 where long packages are transmitted by only one node. As validated in previous experiments, Tq decreases as the bandwidth increases for all the node numbers, while this trend is not noticeable when the transmission speed is larger than 400Mbps. Furthermore, it is observed that Tq decreases as the node number increases. Fig. 11(b) shows how the queuing delay changes with bus bandwidth under different node number in Case 4. The number of nodes transmitting short packages (n) are also indicated following the total node number (P). The ratio of n to P is mostly around 0.7–0.8. Also, we can see that Tq decreases as the bandwidth increases. It is interesting that Tq changes very little as bus bandwidth increases when the transmission speed is larger than 200Mbps. 5.5. Efficiency study on the dynamic logical cycle frame It is supposed UM-BUS is applied in a scenario where there are totally ten slave nodes and on average five of them are active under
the Delete and Add mechanism. Two of the five nodes send long packets while the other three send short ones. The delete TH is set as 20. Fig. 12 shows the comparison between the schemes of fixed cycle frame and dynamic cycle frame in terms of Tq vs. bus utilization () and Tq vs. bus bandwidth (B). The results validate that the proposed dynamic scheme can achieve much better performance than the fixed scheme. This is because the dynamic scheme only maintain the active nodes and there is much less waste slots for the inactive nodes. Furthermore, we evaluate the average energy efficiency of the bus system of each cycle frame. Here we use energy-performance product (EPP) as the evaluation metric. It is calculated in Eq. (13). EPP = Enecyc × Transcyc = Datacyc × E0 × Ttranscyc
(18)
The parameters Enecyc , Datacyc and Ttranscyc denote the energy consumption, the volume of data transmission and the time length of a cycle frame. E0 denotes the energy consumed to transmit onebyte data. The EPP metric can well depict the energy efficiency because only small data volume and small cycle frame length can lead to a good EPP. Fig. 13 presents the normalized EPP comparison between the fixed cycle frame scheme and dynamic cycle frame scheme, given that the total slave node number is ten and only five nodes are in the active list. It can be seen that the dynamic scheme outperforms the fixed scheme by 12.5% on average. This is because the response time of a node and energy of a cycle frame are both reduced due to the elimination of inactive nodes from the logical cycle.
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model SUSCOM-324; No. of Pages 12
ARTICLE IN PRESS W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
9
Fig. 9. Tq vs. bus transmission speed (B).
5.6. Design hints The following design hints can be concluded for future realword implementation of UM-BUS-based systems. (1) The bus performance decreases as the bus utilization increases. That is, the bus delay would get longer as grows. (2) The bus performance gets improved as the transmission speed increases. But when the bandwidth reaches a certain value, the bus performance changes slowly as the bandwidth increases. Packet delay is dramatically reduced as the transmission speed changes from 50Mbps to 200Mbps. If the speed is larger than 400Mbps, the change is little. (3) The performance can be improved as the ratio of nodes which transmit long packets decreases. The transmission delay becomes large when node number is large given that the long packet number is fixed. (4) The performance is much more impacted by the nodes transmitting long packets than those transmitting short packets. (5) The RI-based dynamic logical cycle frame scheme is more energy efficient than the fixed scheme by maintaining only active nodes in the cycle frames. 6. Related work The work [16] mainly focuses on the requirements of a multimedia workstation and propose a scheduling and management policy
Fig. 10. Tq vs. node number.
that supports a mix of periodic and random jobs and to show how it can be used for a multimedia bus. The bus used here is a packetswitched I/0 bus. It is assumed that a multimedia workstation is a collection of specialized processing units that cooperate and are linked together by a packet bus. The proposed scheme is efficient in
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model SUSCOM-324; No. of Pages 12 10
ARTICLE IN PRESS W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
Fig. 11. Tq vs. bus transmission speed (B) under different node number.
Fig. 12. Performance comparison between fixed cycle frame scheme and dynamic cycle scheme.
Fig. 13. Normalized EPP comparison between fixed cycle frame scheme and dynamic cycle scheme.
the sense that it allows the bus to support both continuous media transfers and regular random transactions. In the way, continuous streams can meet them real-time constraints independently of random traffic, and random traffic is not delayed significantly by continuous traffic except when traffic load is very high. Trevor [21] presents a tool suite for building, simulating, and analyzing the results of hierarchical descriptions of the scheduling policy for modules sharing a bus in real-time applications. These schedules can be based on a variety of factors including characteristics of messages and time slicing and are represented in a hierarchical tree-like structure that specifies multiple levels of
arbitration. What is more, it is easy to extend to other real-time scheduling problems for the proposed approach. The work [18] presents six of the messaging patterns that it can be found when carrying out architectural evaluations in Finnish machine industry. The six design patterns are related to messaging in work machine control system architectures. In addition, the work shows our pattern language graph to demonstrate how these patterns are related to the rest of the pattern language. The work [22] investigates the scheduling problem to reduce the deviation to the expected completion time of messages by taking advantage of the characteristics of UM-BUS. By configuring different lanes to change the bus utilization, the effectiveness of the proposed algorithm is evaluated. PCI Express (Peripheral Component Interconnect Express) is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X, and AGP bus standards. PCIe has numerous improvements over the older standards, including higher maximum system bus throughput, lower I/O pin count and smaller physical footprint, better performance scaling for bus devices. Nowadays, the use of accelerators such as GPUs is increasing, but efficient use of GPUs requires making good design choices. Many applications are being accelerated on GPUs. Using GPUs for general purpose scientific computing has allowed researchers to solve larger and finer-grained datasets [5]. However, many researchers have concluded that it is difficult to get performance improvement. Such design choices include type of memory allocation and overlapping concurrency of data transfer with parallel computation. Recently, the work [6] proposes novel design guide-
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model
ARTICLE IN PRESS
SUSCOM-324; No. of Pages 12
W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
11
Table 4 The comparison between UM-BUS and the other buses.
Max. transmission rate Communication direction Communication mode
UM-BUS
PCIe [26]
CAN [25]
1553B [7]
RapidIO [8]
6.4Gbps Half duplex Command/response Parallel/serial
160Gbps Full duplex
1Mbps Half duplex
1Mbps Half duplex
12.5Gbps Full duplex
Parallel
Serial
Serial
Parallel/serial
lines for GPU accelerated cluster systems, to optimize the data transfer from host (CPU) to device (GPU) using the PCIe bus. It observes that a speedup of 2.6× can be obtained just by making good design choices. A Controller Area Network (CAN bus) is a robust vehicle bus standard designed to allow microcontrollers and devices to communicate with each other in applications without a host computer. It is a message-based protocol, designed originally for multiplex electrical wiring within automobiles to save on copper, but is also used in many other contexts. Recently, the work [23] analyses an example of a joint design, that is the combination of integrity with encryption considering the constraints of a typical CAN network and real-time traffic. The analysis is presented considering different attacker models, packet fragmentation issues and the residual probability of error of the combined scheme. The work [24] presents a secure protocol for authenticating ECUs in the CAN bus and establishing session keys between them. These keys are used to introduce message source authentication for the CAN network. This method is based on the elliptic curve cryptography (ECC) which is more suitable for embedded systems where computational power and memory are very reduced. The performance evaluation of the proposed authentication method shows better results in comparison to the existing methods in term of bus load. Based on physical waveforms, the work [25] presents a CAN bus measuring system with electric property. The physical layer waveforms which are transmitting on bus network can be collected directly and stored in real time. The 1553 bus defines the mechanical, electrical, and functional characteristics of a serial data bus. The 1553B bus is a kind of digital time division command/response multiplex data bus. Due to its high reliability and real-time performance, it is widely used in aerospace, industry and other fields. The primary goal of the 1553B was to provide flexibility without creating new designs for each new user. This was accomplished by specifying the electrical interfaces explicitly so that electrical compatibility between designs by different manufacturers could be assured. Recently, the work [7] describes the design of a kind of 1553B bus test unit based on multicore DSP and FPGA. It is used to realize the 1553B bus protocol test and electrical performance test. It mainly describes the design of the 1553B bus protocol analysis module for protocol test, the test unit to realize 1553B bus test is convenient, highly reliable and of good scalability. The RapidIO architecture is a high-performance packetswitched, interconnect technology, using as a chip-to-chip, board-to-board, and chassis-to-chassis interconnect. It supports messaging, read/write and cache coherency semantics, guaranteeing in-order packet delivery and enabling power- and area- efficient protocol implementation in hardware. RapidIO has its roots in energy-efficient, high-performance computing and later extends its application in many other areas such as communications infrastructure, industrial automation and aerospace that are constrained by at least one of size, weight, and power [8]. Over the years, high-speed serial buses have been widely adopted in embedded systems because they have many advantages such as high transmission rate, small pin count, low cost and high reliability. The comparison of UM-BUS and several widely used buses is presented in Table 4.
7. Conclusions UM-BUS is a novel serial bus architecture designed for embedded system with multi-lanes concurrent transmission to enhance the bandwidth. In order to obtain a knowledge of bus performance before real world deployment on UM-BUS, this paper proposes a queuing theory-guided evaluation methodology to derive the lower, upper and average bounds of packet delay on UM-BUS link. The impact from bus utilization, bus transmission speed and the number of nodes issuing long packets on bus performance is discussed. Then, a RI-based dynamic cycle frame scheme is proposed to improve the bus efficiency. Finally, design hints are also presented based on the simulation results. The analytical model along with the simulation results is expected to offer valuable references for the real world deployment.
Acknowledgment This work is supported by the grants from Beijing Innovation Center for Future Chip, Beijing Advanced Innovation Center for Imaging Technology, National Natural Science Foundation of China [Project No. 61502321, 61872251] and the Project of Beijing Municipal Education Commission [Project No. KM201710028016].
References [1] Jiqin Zhou, Weigong Zhang, Keni Qiu, Xiaoyan Zhu, Um-bus: an online fault-tolerant bus for embedded systems., 2016 17th International Symposium on Quality Electronic Design (ISQED) (2016) 198–204. [2] P. imonk, T. Mrovc, J. Tak, Principles and techniques for analysis of automotive communication lines and buses, 2014 International Conference on ELEKTRO (2014) 500–503. [3] Shi Shengbing, Song Chunyan, Wu Yanlin, Can bus performance test technology, 2015 12th IEEE International Conference on Electronic Measurement Instruments (ICEMI) (2015) 1395–1399. [4] A. Anta, P. Tabuada, On the benefits of relaxing the periodicity assumption for networked control systems over can., 2009 30th IEEE Real-Time Systems Symposium (RTSS) (2009) 3–12. [5] N.K. Govindaraju, S. Larsen, J. Gray, D. Manocha, A memory model for scientific algorithms on graphics processors, SC 2006 Conference, Proceedings of the ACM/IEEE (2006) 6. [6] J. Bhimani, M. Leeser, N. Mi, Design space exploration of GPU accelerated cluster systems for optimal data transfer using PCIE bus, 2016 IEEE High Performance Extreme Computing Conference (HPEC) (2016) 1–7. [7] Q. Jia, Z. Liu, The design of a kind of 1553 bus test unit based on multi-core DSP and FPGA, 2017 2nd IEEE International Conference on Integrated Circuits and Microsystems (ICICM) (2017) 205–208. [8] RapidIO Trade Association, RapidIO Interconnect Specification Part 1: Input/Output Logical Specification. Rev. 2.1. [9] K. Schmidt, E.G. Schmidt, Optimal message scheduling for the static segment of flexray., 2010 IEEE 72nd Vehicular Technology Conference (VTC) (2010) 1–5. [10] Martin Lukasiewycz, Michael Glass, Jurgen Teich, Paul Milbredt, Flexray schedule optimization of the static segment, Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) (2009) 363–372. [11] R. Schneider, U. Bordoloi, D. Goswami, S. Chakraborty, Optimized schedule synthesis under real-time constraints for the dynamic segment of flexray, 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC) (2010) 31–38. [12] D. Goswami, R. Schneider, A. Masrur, M. Lukasiewycz, S. Chakraborty, H. Voit, A. Annaswamy, Challenges in automotive cyber-physical systems design., 2012 International Conference on Embedded Computer Systems (SAMOS) (2012) 346–354.
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004
G Model SUSCOM-324; No. of Pages 12 12
ARTICLE IN PRESS W. Chen et al. / Sustainable Computing: Informatics and Systems xxx (2019) xxx–xxx
[13] Y. Cui, R.M. Voyles, R.A. Nawrocki, G. Jiang, Morphing bus: A new paradigm in peripheral interconnect bus, IEEE Trans. Components Packag. Manuf. Technol. 4 (2) (2014) 341–351. [14] Zhu Xiaoyan, W. Zhang, J. Wang, Q. Duan, S. Liu, The design of high reliable serial system bus, 2010 International Conference On Computer Design and Applications (ICCDA), vol. 4 (2010). [15] Daniel A. Menasce, Leonardo Lellis, P. Leite, Performance evaluation of isolated and interconnected token bus local area networks, Proceedings of the 1984 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) (1984) 167–175. [16] Saied Hosseini Khayat, Andreas D. Bovopoulos, A simple and efficient bus management scheme that supports continuous streams, ACM Trans. Comput. Syst. 13 (2) (1995) 122–140. [17] K. Song, C. Li, L. Ye, B. Chen, A. Hereid, Signal integrity optimization of MLVDS based multi-master instrument bus, 2014 IEEE International Symposium on Electromagnetic Compatibility (EMC) (2014) 433–437. [18] Veli Pekka Eloranta, Johannes Koskinen, Messaging patterns for distributed machine control systems, Proceedings of the 16th European Conference on Pattern Languages of Programs (EuroPLoP) (2012), 12:1–12:18. [19] T. Ichrak, E. Nourredine, Basic control theory applied to the active queue management model., 2012 International Conference on Multimedia Computing and Systems (ICMCS) (2012) 635–640.
[20] M. Guizani, A. Rayes, B. Khan, A. Al-Fuqaha, Queuing Theory (2010) 197–233. [21] T. Meyerowitz, C. Pinello, A. Sangiovanni-Vincentelli, A tool for describing and evaluating hierarchical real-time bus scheduling policies, Proceedings of the 40th Annual Design Automation Conference (DAC) (2003) 312–317. [22] J. Zhou, W. Zhang, K. Qiu, R. Bai, Y. Wang, X. Zhu, Expected completion time aware message scheduling for um-bus interconnected system, 2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC) (2017) 60–66. [23] L. Dariz, M. Selvatici, M. Ruggeri, G. Costantino, F. Martinelli, Trade-off analysis of safety and security in can bus communication, 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS) (2017) 226–231. [24] S. Fassak, Y. El Hajjaji El Idrissi, N. Zahid, M. Jedra, A secure protocol for session keys establishment between ECUS in the can bus, 2017 International Conference on Wireless Networks and Mobile Communications (WINCOM) (2017) 1–6. [25] H. Wu, C. Chang, L. Li, Research on can bus measuring technique based on collecting physical waveforms, 2017 IEEE 3rd Information Technology and Mechatronics Engineering Conference (ITOEC) (2017) 887–890. [26] PCI Express 3.0 Base Specification Revision 3.0.
Please cite this article in press as: W. Chen, et al., Queuing theory guided performance evaluation and energy optimization for a reconfigurable high speed device interconnected bus, Sustain. Comput.: Inform. Syst. (2019), https://doi.org/10.1016/j.suscom.2019.02.004