Non-minimal, turn-model based NoC routing

Non-minimal, turn-model based NoC routing

Microprocessors and Microsystems 37 (2013) 899–914 Contents lists available at SciVerse ScienceDirect Microprocessors and Microsystems journal homep...

7MB Sizes 0 Downloads 83 Views

Microprocessors and Microsystems 37 (2013) 899–914

Contents lists available at SciVerse ScienceDirect

Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro

Non-minimal, turn-model based NoC routing Wen-Chung Tsai a,⇑, Kuo-Chih Chu b, Yu-Hen Hu c, Sao-Jie Chen d a

Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu 310, Taiwan, ROC Department of Electronic Engineering, Lunghwa University of Science and Technology, Taoyuan 333, Taiwan, ROC c Department of Electrical and Computer Engineering, University of Wisconsin–Madison, Madison, WI 53706-1691, USA d Department of Electrical Engineering, Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 106, Taiwan, ROC b

a r t i c l e

i n f o

Article history: Available online 23 August 2012 Keywords: Network-on-chip Turn-model Non-minimal Routing algorithm Fault tolerance

a b s t r a c t In this study, it is shown that any deadlock-free, turn-model based minimal routing algorithm can be extended to a non-minimal routing algorithm. Specifically, three novel non-minimal NoC routing algorithms are proposed based on the Odd–Even, West-First, and Negative-First turn models, respectively. These algorithms are not only deadlock free and livelock free, but can also leverage non-minimal routing paths to avoid traffic congestion and improve fault tolerance. Moreover, these algorithms are backward compatible with existing minimal routing schemes. As a result, they represent an ideal routing solution to NoC-based interconnections designed for both existing and emerging embedded multicore systems. Ó 2012 Elsevier B.V. All rights reserved.

1. Introduction The Network-on-Chip (NoC) paradigm has emerged as a longterm on-chip communication solution for Multi-Processor System on Chip (MPSoC) [1] and Chip Multi-Processor (CMP) [2] microarchitectures. The most common NoC architecture is a mesh network consisting of a two-dimensional array of nodes (or tiles) located on mesh-connected grid points. Processors, memory devices and other modules are placed throughout the network and interface with local routers via a Network Interface (NI) unit. Each local router is connected to four neighboring routers in the north, east, south, and west directions, respectively. Moreover, within each router, buffers are provided for each input channel, and a cross-bar switch is used to route the incoming packets to an appropriate output port. In NoC-based communication systems, the data packets are generally broken into a contiguous sequence of flow units known as flits. Transmitting a packet from a source to its destination requires a consecutive transmission of multiple flits over the same path. The path is chosen distributedly by applying the same routing algorithm at each router encountered by the packets en route to their destinations. An efficient routing mechanism is essential in optimizing the performance of NoC-based communication systems [3,4]. Generally speaking, the routing algorithms used in NoC applications can be categorized as either minimal routing algorithms or ⇑ Corresponding author. Tel.: +886 3 5915536; fax: +886 3 5829733. E-mail addresses: [email protected] (W.-C. Tsai), [email protected] (K.-C. Chu), [email protected] (Y.-H. Hu), [email protected] (S.-J. Chen). 0141-9331/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.micpro.2012.08.002

non-minimal routing algorithms. In the former case, the algorithm selects a minimal path between the source and the destination along the way (i.e., detours are not permitted). On the other hand, in the latter case, the algorithm may consider alternative non-minimal paths. In NoC-based systems implemented using a packet switching approach, the routing resources (i.e., the channels and buffers) are reserved for a particular data packet until all of the flits belonging to that packet have been transferred. If the input port buffer of a neighboring router en route is full, the packet must wait at the present router until that input buffer is vacated. In other words, the channel occupied by the current packet is dependent on the input channel of the neighboring router. If the channel dependence of a set of paths leads to a circular dependence relation, none of the packets are able to proceed, and the so-called deadlock condition occurs. A key strategy when developing deadlock-free routing algorithms is to ensure that cyclic channel dependence cannot occur. Toward this goal, several turn-model based routing algorithms have been proposed [5,6]. In general, such models prohibit the packets from making specific types of turns so as to prevent the formation of cycles among them. As such, Glass and Ni [5] claimed that ‘‘routing algorithms that employ the remaining turns are deadlock free, livelock free, minimal or non-minimal, and highly adaptive for the network’’. However, despite the claim of [5] that turn models are applicable to both minimal and non-minimal routing paths, existing turn-model based algorithms, e.g., those based on the West-First, North-Last, and Negative-First turn models [5] or the Odd–Even turn model [6], are all constrained to the use of minimal routing

900

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

paths. Moreover, Chiu [6] commented that ‘‘non-minimal paths are possible with the Odd–Even turn model. However, non-minimal paths are not guaranteed to exist’’. Such observations reflect the realization that, while not impossible, developing turn-model based non-minimal routing algorithms is a non-trivial endeavor. However, as pointed out by Glass and Ni [5], ‘‘Non-minimal routing, however, is more adaptive and fault tolerant’’. Inherently, nonminimal routing paths provide greater flexibility in avoiding local routing hot spots or searching for detours when encountering a defective channel. Hence, it is beneficial to develop non-minimal routing algorithms under the turn-model constraints. In this paper, three novel non-minimal routing algorithms based on the Odd–Even [6], West-First, and Negative-First [5] turn models are proposed. In particular, new heuristic routing rules are developed to permit the exploration of non-minimal routing paths without violating the turn-prohibition rules of the corresponding turn model. It is shown that the proposed non-minimal routing algorithms are deadlock free, livelock free, and backward compatible with existing minimal routing schemes. It is shown experimentally that the proposed non-minimal routing algorithms promise significant performance improvement compared to existing minimal routing schemes for certain on-chip data traffic patterns. The remainder of this paper is organized as follows. Section 2 presents the background on the NoC routing problem and describes the challenges involved in developing non-minimal routing algorithms under turn-model constraints. Section 3 introduces the proposed turn-model based non-minimal routing algorithms. Section 4 presents the performance evaluation results. Finally, Section 5 provides some brief concluding remarks. 2. Background In this section, fundamental characteristics of NoC routing schemes including existing turn-model based non-minimal routing algorithms will be reviewed, and the advantages and disadvantages of non-minimal routing approaches will be discussed. 2.1. Characterization of NoC routing NoC routing algorithms utilize various strategies in selecting suitable paths. For example, they may always choose a pre-determined path between each particular pair of source and destination nodes (deterministic), or may dynamically choose different paths depending on the traffic conditions (adaptive), or may choose among a set of alternative paths which obey certain routing rules (partially adaptive). Furthermore, NoC routing algorithms may be constrained to the use of only shortest paths (minimal), or may be permitted to select detour routes (non-minimal) for congestion avoidance or fault tolerance purposes. In minimal path routing, the path length is equal to the twodimensional city block (mesh) distance between the source and the destination. The constraint of using only minimal paths has the advantages in guaranteeing livelock free and minimal hops for a packet’s traversal. It simplifies the design of a deadlock-free routing algorithm, but may lead to performance degradation and function loss. For example, as shown in Fig. 1a, the minimal routing path may carry a heavy traffic load, causing an excessive delay (latency) for any flit which must traverse those links. By contrast, non-minimal routing paths offer alternative light-traffic routes which provide additional link bandwidth and therefore reduce the overall latency by avoiding contention (see Fig. 1b). Furthermore, as shown in Fig. 1c and d, non-minimal routing provides a significantly improved fault-tolerance performance.

2.2. Deadlock and livelock Deadlock is an anomalous network state in which a circular hold-and-wait dependency relation is formed among the network resources; causing the routing of the packets to be indefinitely postponed (see Fig. 2). With livelock, a packet travels continuously around the network without ever reaching its destination, since the requested channels are constantly occupied by other packets. Livelock occurs only in adaptive and non-minimal routing. In contrast to deadlock, livelock is relatively easy to resolve by imposing certain rules on the non-minimal path selection process [7]. 2.3. Deadlock-free routing policies Existing NoC routing algorithms utilize one of the following two strategies to handle the deadlock condition, namely deadlock avoidance or deadlock recovery (see Fig. 3). Deadlock avoidance schemes impose additional constraints on the routing algorithm to ensure that deadlock can never occur. Deadlock recovery approaches, on the other hand, detect and resolve the deadlock situation as it occurs and then continue with normal routing operations. Deadlock avoidance is realized using some form of turn-restriction based schemes (e.g., Up/Down [8] and Segment-based Routing [9]) or turn-model (e.g., XY and Odd–Even [6]) based routing algorithm. Turn-restriction based schemes place turn restrictions depending on the network topologies. In contrast, turn-model based schemes rely on the regularity of a network topology (e.g., a mesh) to prohibit the use of certain turns at fixed positions. Therefore, turn-restriction based schemes can be topology agnostic and fault tolerant, compared to turn-model based ones. However, turn-restriction based schemes [8,9] are deterministic routing methodologies and cannot provide locally and timely adaptive routing decisions for congestion control. In a mesh network, the adaptive routing capability of turn models [5,6] can significantly benefit the routing performance as demonstrated in [10]. Table 1 summarizes the major turn-model based minimal length routing algorithms currently available for deadlock and livelock avoidance. As shown, in each algorithm, the packets are routed to their destinations without using certain turns. Among the various turn models proposed, the Odd–Even turn model [6] is one of the most elaborate, and its associated minimal routing algorithm, ROUTE, has been extensively applied in NoCs (e.g., NoP [10], DyAD [11], BiNoC [12], Schafer et al. [13], Lin et al. [14], and Wu [15]). In general, these turn-model based routing algorithms have a lower implementation complexity and a more flexible routing performance than the other deadlock-free approaches such as Virtual Channel (VC) methods [16,17] and deflection routing algorithms [18,19], or deadlock recovery approaches [20,21]. 2.4. Odd–Even turn model and associated routing algorithm (ROUTE) NoC routing algorithms utilize the rules specified within the adopted turn model to route the packets toward their destination in such a way that prohibited turns are avoided and the packets do not become stalled. This section reviews the turn rules in the Odd–Even turn model [6] and describes the routing criteria applied in ROUTE, the corresponding minimal routing algorithm. The Odd– Even turn model is governed by the following turn rules: Turn Rule 1: No packet is allowed to make an EN turn at any router located in an even column, or an NW turn at any router located in an odd column. Turn Rule 2: No packet is allowed to make an ES turn at any router located in an even column, or an SW turn at any router located in an odd column.

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

901

Fig. 1. Local congestion scenarios in (a) minimal routing and (b) non-minimal routing; and faulty link cases in (c) minimal routing and (d) non-minimal routing.

Fig. 2. (a) Simple circular hold-and-wait dependency in deadlock conditions and (b) eight possible turn types in a two-dimensional mesh network.

Fig. 3. Taxonomy of deadlock-free routing schemes.

specified by the Odd–Even turn model. For example, the Minimal Routing (MinR) Criteria in ROUTE [6] are as follows:

Table 1 Routing algorithms, turn models, and prohibited turns: Routing algorithm XY West-First North-Last Negative-First ROUTE a b

Turn model name a

NA West-First North-Last Negative-First Odd–Even

Prohibited turn b

NW, SE, NE, SW NW, SW NW, NE NW, ES NW, SW in odd column EN, ES in even column

No particular turn model name. N, E, S, W represent north, east, south, west, respectively.

Turn Rule 3: (Derived from Theorem 1 of [6]). No packet is allowed to make a 180° turn at any router. In designing any Odd–Even turn-model based routing algorithm, the routing criteria must be consistent with the rules

MinR Criterion 1: No packet may move in a direction away from its destination (i.e., path selection is constrained to minimal routes). MinR Criterion 2: If the destination of a packet is to the west of its source, the packet may not move north or south at any intermediate routers residing in an odd column unless the destination is located in the same column (see Fig. 4a). MinR Criterion 3: If the destination of a packet is to the east of its source and is located in an even column, the packet must finish routing in the north or south direction before it reaches the column in which the destination is located (see Fig. 4b). The Odd–Even turn model is regarded as the current state-ofthe-art turn model, since it does not prohibit certain turns at all positions, and therefore has a higher degree of routing adaptivity than other turn models [6]. As a result, the ROUTE algorithm (see

902

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

Fig. 2 of [6]) is more elaborate than other routing algorithms such as West-First, North-Last, and Negative-First [5]. 2.5. Odd–Even based routing algorithms Numerous NoC routing algorithms (e.g., NoP [10], DyAD [11], and BiNoC [12]) utilize the Odd–Even turn model to prevent deadlock. However, these algorithms are all limited to the use of minimal routing paths. The non-minimal routing potential of the Odd– Even turn model has been received only partially discussed in the literature. In the Odd–Even based routing algorithms proposed in [13–15], deadlock free is guaranteed by using a set of pre-defined non-minimal routing paths to route around faulty nodes or rectangular blocks in the network. However, the use of these non-minimal routing paths is still subject to a minimal total length constraint. That is, for each source–destination pair, the path selection is still limited to the path (or paths) with the shortest length among all possible minimal and non-minimal routing paths. Moreover, the traffic loads in a congested minimal path cannot be redistributed to alternative non-minimal paths to enhance the network performance. Thus far, no routing algorithm has exploited the full potential of the Odd–Even turn model. 2.6. Existing non-minimal routing schemes Minimal routing path schemes guarantee the shortest path from a source to its destination. However, it is imprudent to ignore the potential performance benefits afforded by the use of non-minimal paths. For example, if the output channels in all the minimal routing paths are full or faulty; mis-routing the packets along a non-minimal path may be a viable alternative [22,23]. Thus, as described in the following, various non-turn-model based, non-minimal routing schemes have been proposed. The Truly Fully Adaptive Routing (TFAR) algorithms remove any restriction on the routing path selection, and therefore provide a better routing performance and fault tolerance than minimal routing schemes. However TFAR algorithms are prone to suffer from deadlock, and thus deadlock recovery techniques are required [20,21]. However, due to the inherently complicated nature of such deadlock detection and recovery mechanisms [24], and the unpredictable period from a deadlock being detected to the deadlock being released [25], deadlock-recovery techniques are relatively uncommon in NoCs [26]. Deflection routing [18,19] is another form of non-minimal routing algorithm. In deflection routing, no buffer is required and deadlock is avoided by transferring the packets continuously between the routers. Under light-traffic loads, deflection routing achieves a good performance [19]. However, in congested networks, the

packets are likely to be mis-routed away from their destinations and the throughput saturation point is reached at a lower packet injection rate than in networks using buffered routing algorithms [19]. Moreover, bufferless routing schemes yield a poor flow control implementation [27]. 2.7. Turn-model based non-minimal routing schemes Compared with other deadlock-free schemes, turn-model based routing algorithms are extensively used in NoCs due to their relatively low cost and high performance. However, most existing turn-model based schemes take only the minimal routing paths into consideration. That is, the inherent non-minimal properties of existing turn models are not fully exploited despite potential performance benefits. In this paper, three novel non-minimal routing algorithms based on the Odd–Even [6], West-First, and Negative-First [5] turn models, respectively are proposed. 3. Design methodologies In this section, details of the proposed design methodologies are discussed. These methodologies include the definition of a direction-aware path selection mechanism; a description of the proposed turn-model based non-minimal routing algorithms; a formal proof of the deadlock free, livelock free, and minimal routing compatibility of the three schemes; and finally, a comparison of the routing adaptivity of the three algorithms with that of existing schemes. 3.1. Prioritization of available direction sets At any intermediate router, a turn-model based adaptive routing algorithm must determine the set of directions toward which the packet will be forwarded in the next hop. In this study, this set of directions is referred to as the Available Direction Set (ADS). A priority class is assigned to the ADS at any intermediate router as follows: 1. ADS-Priority_0 (ADS-P0), if ADS contains the same set of routing directions as those of the minimal routing path. 2. ADS-Priority_1 (ADS-P1), if ADS contains the set of routing directions that are 90° off those of the minimal routing path. 3. ADS-Priority_2 (ADS-P2), if ADS contains the set of routing directions that are 180° off those of the minimal routing path. As shown in Fig. 5a, when a routing direction is selected from a ADS-P0, the packets are forwarded along a minimal path. On the

Fig. 4. (a) Minimal Routing (MinR) Criterion 2 and (b) Minimal Routing (MinR) Criterion 3.

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

other hand, when a routing direction is selected from a ADS-P1 or a ADS-P2, the packets are forwarded along a non-minimal path. 3.2. Non-minimal routing algorithm based on Odd–Even turn model The non-minimal routing algorithm based on the Odd–Even turn model (designated as NMinR-OE), as proposed in this work, is based on ROUTE with an important improvement: The minimal routing limitation (MinR Criterion 1 in Section 2.4)) in the Odd– Even turn-model based (designated henceforth as ROUTE-OE) is replaced by a new set of Non-Minimal Routing (NMinR) Criteria developed for both the minimal and non-minimal routing paths. In particular, all the minimal routing paths of ROUTE-OE are included as a subset of those in NMinR-OE. NMinR Criterion 1: If the destination of a packet is on the west side of its source, the packet is prohibited from moving east. When applying the Odd–Even turn model, as described by Chiu [6], ‘‘the key to deadlock freedom is that the rightmost column segment of a circular waiting path, which is essential for a deadlock state, can never be formed’’. That is, in the absence of the rightmost (or east-most) column segment of a circular path, once a packet has been routed to the east, it is impossible for the packet to be subsequently routed toward the west. Thus, in NMinR-OE, packets are permitted to move in the eastward direction only when the destination is located on the east side of the current router, as shown in Fig. 5b. In addition, in accordance with the MinR Criterion 2 of ROUTE-OE, packets are prohibited from moving north or south at any intermediate routers residing in an odd column. NMinR Criterion 2-1: If the destination of a packet is to the north of its source, the packet is prohibited from moving east. This criterion is required, since if a packet takes a mis-route to the east, then, according to NMinR Criterion 1, it cannot reach a destination currently located to its west (see Fig. 6a). NMinR Criterion 2-2: If the destination of a packet is to the north of its source, the packet is prohibited from moving south at any intermediate router residing in an odd column or in the westmost even column (i.e., col. 0). If a packet which destination lies to the north is mis-routed to the south, then, according to Turn Rule 2 of the Odd–Even turn model, it cannot subsequently turn west. Moreover, in accordance

903

with NMinR Criterion 1, it cannot turn east. Finally, according to Turn Rule 3 of the Odd–Even turn model, it cannot return in the northward direction. In other words, the packet can only move south and will therefore ultimately stall at the southernmost boundary of the network. However, as shown in Fig. 6b, NMinR Criterion 2-2 prevents this anomalous state from occurring, since the packet is prohibited from moving south in advance. NMinR Criterion 3-1: If the destination of a packet is to the south of its source, the packet is prohibited from moving east. The motivation for this criterion is similar to that for NMinR Criterion 2-1 (see Fig. 6c). NMinR Criterion 3-2: If the destination of a packet is to the south of its source, the packet is prohibited from moving north at any intermediate router residing in an odd column or in the westmost even column (i.e., col. 0). The motivation for this criterion is similar to that for NMinR Criterion 2-2 (see Fig. 6d). NMinR Criterion 4-1: If the destination of a packet is to the east of its source and is located in the next even column, the packet is prohibited from moving north or south at any router residing in an odd column (see Fig. 7a).

Since in the Odd–Even turn model, NW, SW, and 180° turns are not allowed by any routers in the odd columns, if the packet is to be routed in the north or south direction next, the packet can only take either NE or SE turn. Furthermore, since an ES or EN turn is not permitted in the next even column, the packet can only move continuously in the east direction. That is, for the reason described above for NMinR Criterion 1, the packet cannot reverse its direction of travel and move in the westward direction. NMinR Criterion 4-2: If the destination of a packet is to the northeast of its source and is located in the next even column, the packet is prohibited from moving east or south at any router residing in an odd column (see Fig. 7b).

Note that a move in the east direction is prohibited in accordance with MinR Criterion 3, while a move in the south direction is prohibited in accordance with NMinR Criterion 4-1.

Fig. 5. (a) Priorities of Available Direction Sets (ADSs) and (b) Non-Minimal Routing (NMinR) Criterion 1 for source nodes in odd or even columns.

904

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

Fig. 6. Non-Minimal Routing (NMinR) criteria (a) Criterion 2-1, (b) Criterion 2-2, (c) Criterion 3-1, and (d) Criterion 3-2.

Fig. 7. Non-Minimal Routing (NMinR) criteria (a) Criterion 4-1, (b) Criterion 4-2, and (c) Criterion 4-3.

NMinR Criterion 4-3: If the destination of a packet is to the southeast of its source and is located in the next even column, the packet is prohibited from moving east or north at any router residing in an odd column (see Fig. 7c). The justification for this criterion is similar to that for NMinR Criterion 4-2. The listing of the NMinR-OE algorithm can be found in the Appendix A of this paper. 3.3. Non-minimal routing algorithms based on West-First and Negative-First turn models As with the Odd–Even turn model, the West-First and NegativeFirst turn models proposed in [5] also possess an inherent nonminimal routing capability. In this section, two novel non-minimal routing algorithms based on the West-First and Negative-First turn models, respectively designated as NMinR-WF and NMinR-NF, will be presented. First, in the West-First turn model, all turns to the west (NW, SW) are prohibited. That is, for a packet traveling west bound, it must start from the west direction. Therefore, the proposed NMinR-WF routing algorithm has a non-adaptive routing property when the destination of a packet is west of the source. However, if the destination is located on the east side of the source, full routing adaptivity is retained. Second, the Negative-First turn model prohibits the use of the northeast turns (NW, ES) of possible clockwise or counter-clockwise circular paths to prevent deadlock. In other words, a packet can go through the other turns on a mis-routing path to the

southwest of its destination, as described in the NMinR-NF routing algorithm proposed in this study. Full listings of the NMinR-WF and NMinR-NF routing algorithms can be found in the Appendix A at the end of this paper. 3.4. Proofs of deadlock free, livelock free and minimal routing compatibility properties In this section, it is shown that any deadlock-free, turn-model based minimal routing algorithm can be extended to a non-minimal routing algorithm which guarantees both deadlock free and livelock free conditions in a closed mesh network. (Note that the non-minimal routing algorithm retains all the minimal routing paths of the original algorithm.) Utilizing the same format as that used in the Odd–Even turn model in [6], the proofs are given as follows: Definition 1. A non-minimal routing algorithm is a routing algorithm in which the path selections are not limited to the minimal routes between each source-destination pair.

Definition 2. A closed network is a network in which the topology, size, and turn prohibition positions are static during normal operations. Theorem 1. A non-minimal routing algorithm which follows all the rules of the adopted turn model is deadlock free in a closed mesh network provided that 180° turns are prohibited.

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

Proof. The theorem is proved by contradiction. Assume that in mis-routing packets, there exists a set of deadlocked packets pa, pb, . . . , pl (with destinations in many positions). Assume also that these packets form a circular waiting path. Since 180° turns are prohibited, the circular path must include a minimum of four different clockwise 90° turns, four counter-clockwise 90° turns, or three clockwise plus three counter-clockwise 90° turns (a circle with a shape like the Fig. 8). However, according to the adopted turn model, which results in the lack of the essential turn required to form a circular path. That is, one of the clockwise 90° turns and one of the counter-clockwise 90° turns are prohibited in the adopted turn model. Thus, a contradiction arises. Hence, the theorem is proved. h Theorem 2. A non-minimal routing algorithm which follows all the rules of the adopted turn model is livelock free in a closed mesh network. Proof. The theorem is again proved by contradiction. Assume that there exists a packet pl which is livelocked. That is, pl travels continuously around the network without ever reaching its destination. Consequently, in a closed network (e.g., an on-chip network), pl must visit a router at least twice as part of the same path. However, in accordance with Theorem 1, a circular path cannot exist within the network. In other words, a contradiction arises, and thus the theorem is proved. h

3.5. Routing adaptivity comparison Routing adaptivity is one of the most important metrics of network performance [28] and fault tolerance [7]. In [5,6], the adaptivity of the minimal routing algorithm was quantified in terms of the number of minimal paths, Palgorithm, allowed by the algorithm between source (xs, ys) and destination (xd, yd). For example, define Dx = xd  xs, Dy = yd  ys, dx = |Dx|, and dy = |Dy|, the number of minimal paths in a mesh network for a Fully Adaptive minimal Routing (FAR) algorithm can be expressed as

PFAR ¼

ðdx þ dy Þ! dx !dy !

Proof. The theorem is once again proved by contradiction. Assume that, following the turn rules of the adopted turn model, a minimal routing path (a, b, c) from router A to router C through router B is invalid in the non-minimal routing algorithm, but is valid in the corresponding minimal routing algorithm. That is, in the non-minimal routing algorithm, the path (a, b, c) can form a circular routing path (. . ., a, b, c, . . . , a, b, c, . . ..) with other valid paths with legal (un-prohibited) turns provided by the non-minimal routing algorithm. In accordance with Theorem 1, the circular path includes an illegal (prohibited) turn. Accordingly, there is an illegal (prohibited) turn within the path (a, b, c). However, the path (a, b, c) is valid in the minimal routing algorithm. This means that the possible turn in router B is legal (un-prohibited) or not existent. Thus, a contradiction arises. Hence, the theorem is proved. h

ð1Þ

Clearly, similar equations do not calculate the number of the possible non-minimal routing paths. For non-minimal routing algorithms, the routing adaptivity can be defined in terms of the sum of the available directions (i.e., the degree of output port selections) for each destination of all the source nodes in an m  n mesh network. The number of Available Directions (ADir) for a Truly Fully Adaptive Routing (TFAR) algorithm is

ADirTFAR ¼

m1 n1 XX

Av ailable Directions of Nodeði; jÞTFAR

ð2Þ

i¼0 j¼0

Thus, the Adaptivity Degree (ADeg) of a given routing algorithm can be defined as

ADeg algorithm ¼ Theorem 3. A turn-model based non-minimal routing algorithm includes all the minimal routing paths generated by the corresponding minimal routing algorithm provided that the same turn model is used in both cases.

905

ADir algorithm ADir TFAR

ð3Þ

Fig. 8 shows the sum of the available directions for the TFAR, XY, ROUTE-OE, and NMinR-OE routing algorithms. Meanwhile, Table 2 displays the adaptivity degree of each routing algorithm normalized to that of TFAR. The results show that XY has the lowest adaptivity degree (0.29), while NMinR-OE has the highest (0.71). From inspection, each non-minimal routing algorithm achieves a significant improvement compared with the corresponding minimal routing algorithm. For example, the adaptivity degree of NMinR-OE is 65.12% (from 0.43 to 0.71) higher than that of ROUTE-OE, and this routing adaptivity enhancement would greatly benefit both network performance and fault tolerance. 3.6. Fault tolerance improvement Non-minimal routing algorithms provide multiple routing paths between a source–destination pair. In the presence of faulty links, these paths offer additional flexibility to transfer packets via alternative routes. This kind of fault-tolerance [7] against link failure is a desirable feature that is not shared with the more rigid minimal routing algorithms. Such examples are depicted in

Fig. 8. Sum of available directions in 8  8 mesh network for TFAR, XY, ROUTE-OE, and NMinR-OE routing algorithms.

906

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

Table 2 Comparisons of Available Directions (ADir) and Adaptivity Degree (ADeg) among different routing algorithms in 8  8 mesh network. Alg.

TFAR

XY

MinR-WF

NMinR-WF

MinR-NF

NMinR-NF

ROUTE-OE

NMinR-OE

ADir ADeg

14112 1.00

4032 0.29

5600 0.40

8904 0.63

5852 0.41

8946 0.63

6104 0.43

10038 0.71

Fig. 9. Traditional turn-model based adaptive routing algorithms support path selections among available minimal routing paths, thus provide certain level of fault-tolerance capability. For example, in the right panel of Fig. 9a, due to the faulty link between node #4 and node #7, nodes #0, #2, #3, and #5 fail to reach the destination node #7 using XY routing. However, these four nodes are valid with ROUTE-OE as shown in the right panel of Fig. 9b. Yet nonminimal routing algorithms such as the NMinR-OE provide even more alternative routing paths for packets to route around the faulty region as shown in Fig. 9c. For example, in the right panels of Fig. 9a and b, node #1 and node #4 fail with both XY and ROUTE-OE. But in Fig. 9d and e, non-minimal routing paths provided by NMinR-OE can take over the function of the faulty minimal path by mis-routing packets to the destination node #7. 4. Experimental results In this section, the performance and implementation cost of the proposed Non-Minimal Routing (NMinR) algorithms are compared against those of the Minimal Routing (MinR) schemes. 4.1. Path selection and switch arbitration Assume that a direction-aware path selection mechanism classifies the available output directions at each router in the priority

Fig. 10. NoC router architecture.

sequence ADS-P0, ADS-P1, and ADS-P2 (see Section 3.1). When a header flit arrives at the router, the router retrieves the destination

Fig. 9. Routing examples of no faulty link and a faulty link exists between node #4 and node #7 under (a) XY, (b) ROUTE-OE, and (c) NMinR-OE routing algorithms. NMinR-OE provides non-minimal routing paths in (d) between node #4 and node #7 and in (e) between node #1 and node #7. Therefore, for the destination node #7, there is no failure caused by the faulty link as shown in the right panel of (c).

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

907

Fig. 11. Performance of various routing algorithms in terms of (top) average latency, (middle) maximal throughput, and (bottom) mis-routing rate under (a) uniform and (b) hotspot traffic conditions.

address, reads the routing table, and then checks the buffer status in all the neighboring routers. In making a routing decision, the router selects a direction within the highest priority ADS (ADSP0) unless the buffer of the router in the preferential direction is full, in which case a direction within the ADS with the second highest priority (ADS-P1) is selected. If the buffer full conditions are found for all three ADSs, the router waits until any neighboring router becomes available. In addition, if a header flit can be routed in two directions with an equal priority, the router transfers the header along the y dimension first (i.e., as in ROUTE [6]). Finally,

a simple round-robin policy is used in the router cross-bar arbitrations to prevent packet starvation. 4.2. Evaluation platform and simulation setting 4.2.1. NoC platform architecture The proposed routing algorithms, NMinR-OE, NMinR-WF, and NMinR-NF are implemented on an 8  8 NoC router which block diagram is shown in Fig. 10. This NoC router consists of a routing table, which can be configured by the local processing element

908

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

Fig. 12. Performance of various routing algorithms in terms of (top) average latency and (bottom) maximal throughput under (a) regional-burst traffic and (b) uniform traffic with faulty links.

attached to the network interface or by a configuration packet from another router. It also contains five router ports, the first four ports are respectively located in the E, W, N, S directions and the fifth one is interfaced to the local processing element. Each router port consists of a pair of receiving (RX) and transmission (TX) modules and an input buffer. In all simulation runs, the buffer size was set to 1024 bits. This buffer size is the same as that used in the Intel (80-core) TeraFLOPS Processor [29]. To investigate the impact of different buffer sizes [30], some simulations were repeated with half of the buffer size (512 bit). The link bandwidth was assumed to be one flit (32-bit) per cycle. Accordingly, four cycles were required to switch the header flits to an output port in a pipelining fashion. To reduce the overall packet latency, the wormhole switching technique [31] was applied such that the flits of a packet were forwarded to the next router as soon as any flit unit buffer space was available. Each processing element (node) generated packets to be transmitted to other processing elements (nodes) at a packet injection rate (packets/cycle/node) specified by the simulation software. Usually, it begins with an extremely low rate and gradually increases until traffic is completely jamed. The packet size is assumed to be a random number evenly distributed between 4 and 32 flits. 4.2.2. Traffic patterns Four types of synthesized traffic patterns were generated: uniform, hotspot, regional-burst, and transpose.

With the uniform traffic pattern, each NoC node transmits packets according to a specified packet injection rate to any other node in the network with equal probability. To evaluate the fault-tolerant capability of the proposed NMinR algorithm, the uniform traffic pattern is also applied in a simulation where a couple of links of node (2, 2) are assumed to be broken. With the hotspot traffic pattern, all traffics are still generated using the uniform traffic pattern, except that the destinations of 20% of the packets generated will be targeted on four selected nodes [(7, 2), (7, 3), (7, 4), (7, 5)] with equal probability. With the regional-burst traffic pattern, five 2  2 sub-meshes [(1, 1), (1, 2), (2, 1), (2, 2)], [(1, 4), (1, 5), (2, 4), (2, 5)], [(3, 3), (3, 4), (4, 3), (4, 4)], [(5, 1), (5, 2), (6, 1), (6, 2)], and [(5, 4), (5, 5), (6, 4), (6, 5)] are programmed to operate in a burst mode. That is, nodes within the same sub-meshes will transmit to each other at a rate higher than the specified packet injection rate at nodes outside these sub-meshes. To ensure the burst traffics are confined within each sub-mesh, a control bit designated as Mis-routing Disable (MD) is set in the header flits of packets generated by nodes within each sub-mesh to inform the routers to route the corresponding packets only through minimal routing paths. With the transpose (i.e., Matrix-Transpose) traffic pattern [5,6], all packets generated by node (i, j) in the NoC mesh will have a destination of node (j, i). As such, large amounts of global traffic will be generated. Besides the four synthesized NoC traffic patterns, a real world telecom traffic pattern drawn from the E3S benchmark of the

909

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

Embedded Microprocessor Benchmark Consortium (EEMBC) [32] is also applied in the simulation. This telecom traffic pattern include 30 tasks (nodes) and is the largest benchmark in E3S. Thus, the result obtained from this benchmark can be considered representative. With this smaller size of traffic pattern, a 6  6 mesh NoC architecture was used for simulation. Other parameters were set identical to those used for the synthesized traffic patterns. 4.2.3. Performance metrics The performance of each algorithm was evaluated using Cadence’s NC-Verilog. For synthesized traffic patterns, performance metrics were averaged over 60,000 packets after an initial warmup session of 30,000 arrived packets. Three performance metrics were used: latency, throughput, and mis-routing rate. Latency is defined as the averaged number of cycles to transmit a packet from its source to its destination. Throughput is the averaged number of packets arrived at their destinations per cycle per node. Finally, mis-routing rate is the fraction (percentage) of packets that are routed via non-minimal routing paths. These metrics are correlated. In general, higher mis-routing rate implies higher latency, and higher latency implies lower throughput.

Fig. 14. Performance enhancement provided by 1024-bit buffers over 512-bit buffers.

Table 3 Routing table example. Address/output-port 0 1 2 ...

4.3. Simulation results 4.3.1. Uniform and hotspot traffic patterns The results were summarized in Fig. 11a and b respectively. As expected, the hotspot traffic pattern is more difficult to handle and

a b

4tha

3rda b

West Southb Westb ...

2nda b

South Southb Westb ...

b

East Westb Westb ...

1sta Northb Northb Northb ...

Four priorities of available output port selections. Output port (direction) encoding; North: 00, East: 01, South: 10, West: 11.

Fig. 13. Throughput performance of various routing algorithms under (a) transpose traffic and (b) telecom traffic with (top) 1024-bit and (bottom) 512-bit buffers.

910

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

Table 4 Implementation overhead analyses. Items/algorithms a

Routing size ratio Routing path timing Routing power calculation

MinR

NMinR

NMinR–MinR

2.57% 0.74 ns 91.46 mW

4.72% 0.77 ns 95.15 mW

2.15%b 0.03 nsc 3.69 mW

a Routing size ratio = routing design size divided by the size of the designed router. b The 2.15% overhead is the size ratio of the additional design to the whole router. c The 0.03 ns extra delay is not located in the critical path of the designed router.

normal operation breaks down at a lower packet injection rate (0.008) than that of the uniform traffic pattern (0.01). Specifically, for the uniform traffic pattern, the NMinR-OE routing method sustained normal operation till packet injection rate reaches 0.012 packets/cycle/node which is the highest among all six routing methods. However, at this break point, the behavior of NMinR-OE experienced a sudden change that the mis-routing rate jumped from 6% to 22%. This, in turn, caused the latency to jump 7-fold and throughput rate to drop more than 30%. In practice, however, the NoC will not operate in the break-down region and hence these performance degradation figures will have little impacts. From Fig. 11a, for the uniform traffic pattern, the maximal throughput of NMinR-OE was approximately 6.78% higher than that of ROUTE-OE, while the average latency was around 11.38% lower before the break-down point. For the hotspot traffic pattern, NMinR-OE and NMinR-WF achieved similar performance; both sustained a packet inject rate of 0.009 packets/cycle/node before the normal operation breaks down. In these simulations, the hotspot nodes were placed near the east side boundary of the NoC mesh. As a result, a heavy traffic jam was formed in the eastern region of the network. Since NMinRWF was programmed to detour packets to west-bound when detecting congestion, it eases the easter-border traffic jam and therefore improves the global network performance. By inspection, NMinR-WF outperformed MinR-WF by about 10.10% in terms of the maximal throughput and 24.28% in terms of the average latency at traffic loads equal and lower than that associated with the maximal throughput. It is seen in Fig. 11b that NMinR-OE also performed well in distributing the traffic load under hotspot traffic conditions. For example, the maximal throughput of NMinR-OE was approximately 6.65% higher than that of ROUTE-OE, while the average latency was around 22.98% lower at traffic loads equal and lower than that associated with the maximal throughput.

4.3.2. Regional-burst and uniform-faulty traffic patterns In both the regional-burst traffic pattern and the uniform traffic pattern with faulty link scenarios, packets enroute through those congested area or faulty region will need to be detoured to other paths. Hence the flexibility of non-minimal routing should be beneficial in these cases. As shown in Fig. 12a, NMinR-OE yielded the lowest latency and highest throughput among all six routing methods. Specifically, compared to the next best algorithm, ROUTE-OE, NMinR-OE achieved 5.32% lower latency and 10.02% higher throughput rate. To investigate the fault tolerance [7] characteristics of nonminimal routing method, only NMinR-OE and ROUTE-OE were compared under three faulty scenarios: (a) 0-fault, (b) 1-fault (the north input link of node (2, 2) is broken), and (c) 2-faults (both the north and south input links of node (2, 2) are broken). The results were summarized in Fig. 12b. The proposed NMinR-OE algorithm resulted in the maximal throughput performance degra-

dations of just 4.66% and 9.89% in the ‘‘1 fault’’ and ‘‘2 faults’’ scenarios, respectively, compared to that with no faulty link (i.e., ‘‘0 fault’’). 4.3.3. Transpose traffic pattern The transpose traffic pattern implies global packet movement from the SW corner to the NE corner of the meshed NoC topology. The traffic volume will be most heavy along the anti-diagonal direction. Within this heavy-traffic band, there are no light-traffic paths which can be exploited by the non-minimal routing algorithms. As a result, mis-routing packets is not only infeasible, but may in advertently cause congestion to the network. This phenomenon is also reported earlier for the non-minimal routing scheme in DISHA [20]. Nevertheless, since the non-minimal routing algorithms proposed in this study are fully backward compatible with the routing operations of their primitive counterparts, the NoCbased communication system can accordingly be aware and simply be triggered to disable the use of non-minimal routing paths for such specific applications (e.g., transpose traffic) and to use the minimal routing paths identified by the original algorithm in their place. Besides, comparing the two panels in Fig. 13a, it is seen that the throughput performances of the minimal and non-minimal routing algorithms were insensitive to the buffer size due to the extremely regular data flows of the transpose traffic. 4.3.4. Telecom traffic pattern As shown in the upper panel of Fig. 13b, there was no significant difference between the throughput of the non-minimal routing schemes and the minimal routing schemes. We conjecture that this is due to the large 1024-bit input buffers. In the lower panel of Fig. 13b, where the results using a smaller buffer size (i.e., 512 bits) were shown, it is noted that the throughput of ROUTE-OE was impacted more severely (11% lower throughput) by the smaller buffer size than that of NMinR-OE. 4.3.5. Buffer size effect To evaluate the effect of increasing buffer size on the performance of the proposed non-minimal routing algorithms, a performance gain is defined as:

Gain ¼

Max: throughput in 1024 bit buffers 1 Max: throughput in 512 bit buffers

ð4Þ

As shown in Fig. 14, the average performance gain due to the larger buffer size for all the minimal and non-minimal routing algorithms was about 17%. In addition, it is shown that larger buffers resulted in a greater performance gain for random traffic flows (e.g., 28.44% in uniform traffic) than for regular traffic flows (e.g., 4.87% in transpose traffic). 4.4. Implementation overheads NoC routing algorithms can be implemented using either an algorithmic approach or a routing table. The choice of an implementation method depends largely on the manufacturing technologies. In this study, the routing algorithms were implemented using a main routing table together with some additional combinational logics to recognize the available output ports and prohibited turns in different turn models. A potential benefit of using a tablebased implementation is its better flexibility to change the adopted routing algorithms in accordance with the applied traffic type or to improve the fault tolerance capability. All of the minimal and non-minimal routing algorithms were implemented using the routing table as shown in Table 3. Note that the four entries corresponding to table Address 0 respectively list

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

the four possible output directions for a packet en route to the destination node (0, 0) in a priority order, while those corresponding to table Address 1 list the prioritized output directions for the destination node (0, 1), and so on. Each entry uses 2 bits to identify the corresponding direction of the four possible output ports (i.e., directions of north, east, south, and west). As a result, 8 bits were required for each destination node and each router within the 8  8 mesh network required a 64-byte routing table. (Note that for the minimal routing algorithms, no more than two output

911

directions can be selected when forwarding a packet; and thus a 32-byte routing table is sufficient.) In our study, the routing design was synthesized using Synopsys Design Compiler and power analyses were performed using Synopsys Power Compiler in UMC 90 nm technology under typical operating conditions. The results presented in Table 4 show that compared to the resources (size) required by the Minimal Routing (MinR) algorithms, the Non-Minimal Routing (NMinR) algorithms incurred an overhead of routing size ratio, routing path timing,

Fig. A1. Non-minimal routing algorithm based on Odd–Even turn model (NMinR-OE).

912

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

and routing power consumption by 2.15%, 0.03 ns, and 3.69 mW, respectively. The implementation overhead is due to the following two major reasons: First, the additional chip area (including the routing table and logic) is relatively small in contrast to the large-size buffers. For the buffer size issue, a discussion is given in next section. Second, the time complexity of the proposed routing algorithms (as listed in the Appendix A) for an n  n mesh network is h (n2). All available output ports for a packet toward its destination were calculated off-line and pre-recorded in the routing table (see Table 3). In other words, the chip implementation overheads are just from accessing a larger routing table (from 32 bytes to 64 bytes) in an identical time complexity of h (1). As a result, it is cost-effective to add a non-minimal routing capability to the turn-model based, minimal routing schemes. 5. Discussion and conclusion In this study, the performance and cost of the proposed nonminimal routing algorithms have been analyzed with a buffer size configuration referring to the TeraFLOPS Processor (Intel 80-core) [29]. Although the area of a router design is dominated by the buffer space (as in the implementation of [33]), the router buffers occupy only a small fraction of the system-wide SRAM memory in most architectures (e.g., CMPs with multi-megabyte caches) [27]. Therefore, future design efforts should concentrate primarily on the NoC function and performance improvements, allowing the cost issue to be solved via chip manufacturing advances in accordance with Moore’s Law.

Of the three non-minimal routing algorithms proposed in this study, the one based on the Odd–Even turn model (NMinR-OE) yielded the best network performance (i.e., the greatest maximal throughput and the lowest average latency) in most experimental results. In contrast to the conventional turn-model based minimal routing algorithms, the operational flexibility provided by NMinROE resulted in not only a better routing performance, but also an improved fault tolerance. Furthermore, the non-minimal routing schemes proposed in this study are backward compatible with their primitive minimal routing counterparts, and the mis-routing path selection is optional for each packet in each router. Since the experimental results have shown that not all mis-routing packets are beneficial to the network performance. Thus, a future study will investigate the feasibility of improving the performance of the proposed algorithms by applying an intelligent strategy to the mis-routing decision.

Acknowledgement This work was partially supported by the National Science Council, under Grants 99-2220-E-002-041 and 100-2220-E-002012.

Appendix A See Figs. A1–A3

Fig. A2. Non-minimal routing algorithm based on West-First turn model (NMinR-WF).

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

913

Fig. A3. Non-minimal routing algorithm based on Negative-First turn model (NMinR-NF).

References [1] L. Benini, G. De Micheli, Networks on chips: a New SoC paradigm, IEEE Computer 35 (1) (2002) 70–78. [2] J.D. Owens, W.J. Dally, R. Ho, D.N. Jayasimha, S.W. Keckler, L.S. Peh, Research challenges for on chip interconnection networks, IEEE Micro 27 (2007) 96–108.

[3] P.P. Pande, C. Grecu, M. Jones, A. Ivanov, R. Saleh, Performance evaluation and design trade-offs for network-on-chip interconnect architectures, IEEE Transactions on Computers 54 (2005) 1025–1040. [4] T. Bjerregaard, S. Mahadevan, A survey of research and practices of networkon-chip, ACM Computing Surveys 38 (2006) 1–51. [5] C.J. Glass, L.M. Ni, The turn model for adaptive routing, Journal of the ACM 41 (1994) 874–902.

914

W.-C. Tsai et al. / Microprocessors and Microsystems 37 (2013) 899–914

[6] G.M. Chiu, The odd–even turn model for adaptive routing, IEEE Transactions on Parallel and Distributed Systems 11 (2000) 729–738. [7] J. Duato, S. Yalamanchili, L. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann Publishers, 2002. [8] M.D. Schroeder, A.D. Birrell, M. Burrows, H. Murray, R.M. Needham, T.L. Rodeheffer, E.H. Satterthwaite, C.P. Thacker, Autonet: a high-speed, selfconfiguring local area network using point-to-point links, IEEE Journal on Select Areas of Communication 9 (1991) 1318–1335. [9] A. Mejia, J. Flich, J. Duato, S.A. Reinemo, T. Skeie, Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori, in: Proceedings of the 40th International Parallel and Distributed Processing Symposium, 2006. [10] G. Ascia, V. Catania, M. Palesi, D. Patti, Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip, IEEE Transactions on Computers 57 (2008) 809–820. [11] J. Hu, R. Marculescu, DyAD-smart routing for networks-on-chip, in: Proceedings of the 41st Design Automation Conference, 2004, pp. 260– 263. [12] Y.C. Lan, H.A. Lin, S.H. Lo, Y.H. Hu, S.J. Chen, A bi-directional NoC (BiNoC) architecture with dynamic self-reconfigurable channel, IEEE Transactions on Computer-Aided Design of Integrated Circuits and System 30 (2011) 427– 440. [13] M.K.F. Schafer, T. Hollstein, H. Zimmer, M. Glesner, Deadlock-free routing and component placement for irregular mesh-based networks-on-chip, in: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2005, pp. 238–245. [14] S.Y. Lin, C.H. Huang, C.H. Chao, K.H. Huang, A.Y. Wu, Traffic-balanced routing algorithm for irregular mesh-based on-chip networks, IEEE Transactions on Computers 57 (2008) 1156–1168. [15] J. Wu, A fault-tolerant and deadlock-free routing protocol in 2D meshes based on odd–even turn model, IEEE Transactions on Computers 52 (2003) 1154– 1169. [16] C.J. Glass, L.M. Ni, Maximally fully adaptive routing in 2D meshes, in: Proceedings of the 21st International Conference on Parallel Processing, 1992, pp. 101–104. [17] L. Schwiebert, D.N. Jayasimha, Optimal Fully adaptive wormhole routing for meshes, in: Proceedings of the 7th ACM/IEEE Conference on Supercomputing, 1993, pp. 782–791. [18] P. Baran, On distributed communications networks, IEEE Transactions on Communications Systems 12 (1964) 1–9. [19] T. Moscibroda, O. Mutlu, A case for bufferless routing in on-chip networks, in: Proceedings of the 36th Annual International Symposium on Computer Architecture, 2009, pp. 196–207. [20] K.V. Anjan, T.M. Pinkston, An efficient, fully adaptive deadlock recovery scheme: DISHA, in: Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995, pp. 201–210. [21] Y.H. Song, T.M. Pinkston, Distributed resolution of network congestion and potential deadlock using reservation-based scheduling, IEEE Transactions on Parallel and Distributed Systems 16 (2005) 686–701. [22] E. Nilsson, M. Millberg, J. Oberg, A. Jantsch, Load distribution with the proximity congestion awareness in a network on chip, in: Proceedings of the 6th Design, Automation and Test in Europe Conference and Exhibition, 2003, pp. 1126–1127. [23] T.T. Ye, L. Benini, G. De Micheli, Packetization and routing analysis of on-chip multiprocessor networks, Journal of Systems Architecture 50 (2004) 81–104. [24] D. Park, C. Nicopoulos, L. Kin, N. Vijaykrishnan, C.R. Das, A distributed multipoint network interface for low-latency, deadlock-free on-chip interconnects, in: Proceedings of the 1st International Conference on Nano-Networks, 2006, pp. 1–6. [25] C. Neeb, N. Wehn, Designing efficient irregular networks for heterogeneous systems-on-chip, Journal of Systems Architecture 54 (2008) 384–396. [26] A. Hansson, K. Goossens, A. Radulescu, A unified approach to mapping and routing on a network-on-chip for both best-effort and guaranteed service traffic, VLSI Design (2007) 1–17. [27] G. Michelogiannakis, D. Sanchez, W.J. Dally, C. Kozyrakis, Evaluating bufferless flow control for on-chip networks, in: Proceedings of the 4th ACM/IEEE International Symposium on Networks-on-Chip, 2010, pp. 9–16. [28] E. Baydal, P. Lopez, J. Duato, Increasing the adaptivity of routing algorithms for k-ary n-cubes, in: Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, 2002, pp. 455–462. [29] S.R. Vangal et al., An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS, IEEE Transactions on Solid-State Circuits 43 (2008) 29–41. [30] J. Hu, U.Y. Ogras, R. Marculescu, System-level buffer allocation for application-specific networks-on-chip router design, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25 (2006) 2919–2933. [31] W.J. Dally, C.L. Seitz, The torus routing chip, Distributed Computing 1 (1986) 187–196. [32] R. Dick, Embedded System Synthesis Benchmark Suites (E3S). . [33] W.J. Dally, B. Towles, Route packets, not wires: on-chip interconnection networks, in: Proceedings of the 38th Design Automation Conference, 2001, pp. 684–689.

Wen-Chung Tsai received the B.S. degree in Computer Science and Information Engineering from Tamkang University in 1996. He received the M.S. degree in Electrical Engineering from National Cheng Kung University in 1998. He received the Ph.D. degree in Electronics Engineering from National Taiwan University in 2011. He is currently a researcher of Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan. His research interests include system-on-chip, network-onchip, computer network, and mobile telecom.

Kuo-Chih Chu received the B.S. degree in Information Engineering and Computer Science from Feng-Chia University, Taichung, Taiwan, R.O.C., in 1996 and received M.S. and Ph.D. degrees in Electrical Engineering from National Cheng Kung University, Tainan, Taiwan, R.O.C., in 1998 and 2005, respectively. He is currently an Associate Professor with the Department of Electronic Engineering, Lunghwa University of Science and Technology, Taoyuan, Taiwan, R.O.C.

Yu-Hen Hu received the B.S.E.E. degree from the National Taiwan University, Taipei, Taiwan, R.O.C., in 1976, and the M.S.E.E. and Ph.D. degrees from the University of Southern California, Los Angeles, in 1980, and 1982, respectively. From 1983 to 1987, he was an Assistant Professor with the Electrical Engineering Department, Southern Methodist University, Dallas, TX. Since 1987, he has been in the Department of Electrical and Computer Engineering, University of WisconsinMadison, Madison, where he is currently a Professor. He has served as an Associate Editor for the European Journal of Applied Signal Processing and Journal of VLSI Signal Processing. He has broad research interests ranging from design and implementation of signal processing algorithms, computer-aided design and physical design of VLSI, pattern classification and machine learning algorithms, and image and signal processing in general. He has published more than 200 technical papers and edited several books in these areas. Dr. Hu has served as an Associate Editor for the IEEE Transactions of Acoustic, Speech, and Signal Processing, IEEE Signal Processing Letters, and IEEE Multimedia Magazine. He has served as the Secretary and an executive committee member of the IEEE Signal Processing Society, a board of governors of the IEEE Neural Network Council representing the Signal Processing Society, the Chair of the Signal Processing Society Neural Network for Signal Processing Technical Committee, and the Chair of the IEEE Signal Processing Society Multimedia Signal Processing Technical Committee (2004–2005). He is also a steering committee member of the International Conference of Multimedia and Expo, IEEE Transactions on Multimedia on behalf of the IEEE Signal Processing Society. Professor Hu is a fellow of IEEE. Sao-Jie Chen received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, ROC, in 1977 and 1982 respectively, and the Ph.D. degree in electrical engineering from Southern Methodist University, Dallas, USA, in 1988. Since 1982, he has been a member of the faculty in the Department of Electrical Engineering, National Taiwan University, where he is currently a full professor. During the fall of 1999, he was a visiting professor in the Department of Computer Science and Engineering, University of California, San Diego, USA. During the fall of 2003, he held an academic visitor position in the Department of System Level Design, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA. He obtained the ‘‘Outstanding Electrical Engineering Processor Award’’ by the Chinese Institute of Electrical Engineering in December 2003 to recognize his excellent contributions to EE education. During the falls of 2004–2009, he was a visiting professor in the Department of Electrical and Computer Engineering, University of Wisconsin, Madison, USA. His current research interests include: VLSI physical design, SOC hardware/software codesign, and Wireless LAN and Bluetooth IC design. Dr. Chen is a member of the Chinese Institute of Engineers, the Chinese Institute of Electrical Engineering, the Institute of Taiwanese IC Design, the Association for Computing Machinery, and a senior member of the IEEE Circuits and Systems and IEEE Computer Societies.