Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 34 (2014) 538 – 543
2014 International Workshop on the Design and Performance of Networks on Chip (DPNoC 2014)
Power-Aware Mapping for 3D-NoC Designs using Genetic Algorithms Haytham Elmiligia,∗, Fayez Gebalib , M. Watheq El-Kharashic a Computing
Science Department, Thompson Rivers University, Kamloops, BC, Canada of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada c Department of Computer and Systems Engineering, Ain Shams University, Cairo 11517, Egypt
b Department
Abstract Scalable 3D Networks-on-Chip (NoC) designs are needed to match the ever-increasing communication and low-power demands of large-scale multi-core applications. However, chip designers do not have the necessary tools to implement their applications efficiently at different layers of the design hierarchy. A design methodology for low power 3D-NoCs applications is needed to achieve the best performance. To address this problem, we use Genetic Algorithms to find the best 3D-NoC mesh network mapping that achieves minimum power consumption for a given application. As a proof of concept, a case study of a multicore application that has 32 symmetric microprocessors is presented. We used Genetic Algorithms to calculate the fitness function and solve the optimization problem in less than four minutes, whereas it took over three days using exhaustive search and yet to find the minimum power consumption. c 2014 Elsevier © B.V.Published This is anby open access article under the CC BY-NC-ND license The Authors. Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under responsibility of Elhadi M. Shakshuki. Selection and peer-review under responsibility of Conference Program Chairs Keywords: 3D-NoCs, genetic algorithms, IP mapping, power modeling, power optimization.
1. Introduction Current research focuses on designing efficient Networks-on-Chip (NoC) interconnect without considering the target application. This results in a mismatch between the application requirements and efficient system resource utilization. This mismatch leads to poor system performance and/or excessive use of silicon area. Poor performance arises when the data traffic for a certain application exceeds design processing capabilities, whereas an excessive use of the silicon area takes place if the design resources are over estimated. An alternative is to design an NoC that takes into consideration the intended target application, which is referred to as Application Specific NoC (ASNoC). Before we get into further details, we need to distinguish between 3D-ICs and 3D-NoCs. A 3D-IC chip has more than one physical plane and all Intellectual Property (IP) cores in the chip could be integrated either on a single physical plane or on several physical planes. If an NoC is utilized in a 3D-IC, it could be implemented as a 2D∗
Corresponding author. Tel.: +0-250-828-5230 ; fax: +0-250-828-5086. E-mail address:
[email protected]
1877-0509 © 2014 Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under responsibility of Conference Program Chairs doi:10.1016/j.procs.2014.07.065
Haytham Elmiligi et al. / Procedia Computer Science 34 (2014) 538 – 543
539
NoC (i.e., routers are implemented on a single physical layer) or as a 3D-NoC (i.e., routers are implemented in multiphysical plans). In 3D-NoC, a router in one plane could be connected to routers located on the same or adjacent physical planes. In other words, having a 3D-NoC used to connect IP cores inside a 3D-IC does not mean that all 3D-ICs utilize 3D-NoCs. This paper is organized as follow. Section 2 presents a summary of related works. Section 3 introduces a new power model for 3D-NoCs. Section 4 formulates an optimization problem to minimize the power consumption of 3D-NoC-based applications. Section 5 shows a case study that represents the average traffic between 32 symmetric microprocessors as a proof of concept. Finally, we draw conclusions and suggest ideas for future work in Section 6. 2. Background Power optimization in 3D-NoC-based designs has become more critical with the use of high speed and complex ICs in mobile and portable applications 1 . Managing power in such cases does not only target power reduction, but also ensures that all IPs receive proper and efficient amount of power to keep the application stable and reliable. Power constraints are among the major bottlenecks that limit functionality and performance of complex NoC-based designs 2 . One of the most effective approaches to address the high power dissipation problem is to design a network architecture that achieves the lowest possible power consumption 3,4,5 . Although this might not resolve the problem completely, it has a significant impact on the chip power consumption. At early design phases, the exact physical layout structure is not yet defined and designers do not have enough layout information to choose the most power-efficient network topology for their target applications. Researchers tried to solve this limitation by developing analytical models that could be used to evaluate the power consumption at early design phases 6,7 . However, the proposed models cannot be used to identify the power consumption of a single physical link or a single path. Moreover, they do not take into consideration the traffic distribution of the target application and its impact on the power calculations. The work presented in this paper focuses on 3D-NoCs as extension in the third dimension to the works presented in 8,9,10 , which addressed only 2D interconnection network. The work in 8 introduced a new methodology to reduce the total power consumption of the global router-to-router links by selecting the optimal 2D network topology. An extended version with more analysis and case studies is presented in 9 and a joint consideration of power, delay, and reliability is presented in 10 for 2D-NoCs. In this paper, we extend the graph-theoretic concepts introduced in 8,9,10 for 2D-NoCs to 3D-NoCs, introduce a new model for the power consumption in 3D-mesh networks, and use GA to minimize the power consumption of 3D-NoC based applications. 3. Power Modeling for 3D-Mesh NoCs 3.1. Graph-Theoretic Concepts Mesh-based topologies are becoming more popular for NoC implementations because they have many advantageous over other topologies in terms of fabrication and performance properties. Mesh networks have fixed-size rectangular tiles, which make them easier to implement and reuse in different applications. They also have a high reliability against edge failures. Unlike tree or ring based networks, there is always an alternative path if one edge fails. Therefore, we focus on the mesh topology structure in this paper 11 . The topological structure of any mesh network can be represented by a graph G(V,E,W,Λ), where each node vk ∈ V represents an IP1 . E is a set of edges that represent the logical communication channels (i.e., paths) between IPs. wi j ∈ W represents the weight of the corresponding physical link ei j and λi j ∈ Λ is the traffic, in number of packets, from vi to v j . The number of router nodes (n) in any 3D-mesh network equals n1 × n2 × n3 , where n1 , n2 , n3 are the network dimensions. On the same context, an adjacency matrix (A) could be used to represent the physical network architecture 12 . The adjacency matrix represents the direct (one-hop) paths between all pairs of network nodes. Optimizing the power consumption of NoC applications can be achieved based on graph-theoretic analysis. To connect a large number of IPs through a 3D-mesh NoC, the allocation of the IPs over the mesh nodes must be 1
In standard mesh networks, each single IP is connected to a single router.
540
Haytham Elmiligi et al. / Procedia Computer Science 34 (2014) 538 – 543
optimally chosen. To achieve this goal, we use Dijkstra’s algorithm2 to find the least cost path in a graph that can be used to exchange data between all IPs in the network 14,15 . The cost function here is defined as the power consumption. 3.2. Modeling the Cost Function The cost function is designed to represent the power consumed by links and routers in 3D-mesh networks. The power model for the links (Plink ) is developed based on the link power presented in 16,17 as follows 18 . Plink = P switching + P short + P static
(1)
The router’s power (Prouter ) is analyzed using a set of VHDL synthesized routers. We carried out 48 experiments to calculate the power consumptioned using Synopsys Design Compiler, VHDLSIM, and Power Compiler tools based on switching activity 18 . Based on these experiments, the total power consumption to transmit a single packet from node vi to node v j can be calculated as follows Pi j = Plinki j + Prouteri j
(2)
To calculate the network total power consumption, we create a new matrix (W) to represent the power consumption of all network logical paths. To construct the weight matrix (W), we have to complete the following steps: 1. Power Matrix: A non-symmetric power matrix P = [Pi j ] is created, where Pi j is calculated from (2). The power matrix represents the total power consumption to transmit a single packet from node vi to node v j . This means, each entry in the adjacency matrix (A) will have a corresponding entry in the power matrix (P) to represent the power consumption of all physical paths in the network. 2. Using Dijkstra’s Algorithm: Dijkstra’s algorithm is used to generate a new n × n square matrix (W) from the power matrix (P). W = [wi j ] is a weight matrix, where each entry wi j represents the minimum power needed to transmit a single packet from node vi to node v j , assuming shortest path routing 19 . The wights in (W) are calculated assuming shortest path routing.
Fig. 1. (a) An example of a 3 × 2 × 2 mesh network. (b) An example of a GA Chromosome that represents the IP mapping in (a), where A is the adjacency matrix of the network topology in (a).
The main difference between the power matrix (P) and the weight matrix (W) is that P represents only the power consumption to transmit a packet over a physical link (i.e., over a real physical link that connects two routers), whereas W is generated by Dijkstra’s algorithm and represents the minimum power needed to transmit a packet over each logical path. The significance of W is that it could be used to find the shortest path routing protocol and, at the same time, it could be used to derive an accurate power model for 3D-NoCs. 2 Dijkstra’s algorithm is an efficient algorithm that finds a path with lowest cost. Readers can refer to 13 for more information about Dijkstra’s algorithm.
541
Haytham Elmiligi et al. / Procedia Computer Science 34 (2014) 538 – 543
For any application that is represented by a traffic distribution matrix (Λ), the total power consumption can be evaluated using 18 : Ptotal =
n n i=1
wi, j · λi, j
(3)
j=1
This new model for the total power consumption for 3D-mesh NoC takes into consideration the differences between vertical and horizontal links as well as the impact of the routing algorithm. The proposed model allows designers to specify the power consumed when a packet is transmitted over each single physical link in a network, which is not supported by other models in the literature. Moreover, the power model in (3) utilizes the target application traffic matrix. This feature allows designers to evaluate the power consumption at different modes of operation for the same application at early design phases. 4. Problem Formulation Figure 1(a) shows an example of a 3 × 2 × 2 mesh network and Figure 1(b) shows an example of a GA Chromosome that represents the IP mapping in (a). The chromosome consists of two parts. The first part is the network architecture and the second part is the IP mapping. The network architecture is represented by the adjacency matrix A and the network dimensions, whereas the IP mapping is represented by a vector of size n. The IP mapping vector is used to arrive at the best allocation of IPs to the nodes for any NoC system. The optimization problem could be formulated as follows. For a given application represented by a Traffic Distribution Graph (TDG) and a traffic distribution matrix (Λ) and a network architecture whose logical-path-power is presented by a weight matrix (W), it is required to find the optimum 3D network topology/mapping pair by minimizing the following objective function 18 . Minimize Ptotal =
n n i=1
wi, j · λi, j
(4)
j=1
subject to: ai j = a ji n
= n1 × n2 × n3 (5)
where A = [ai j ] represents the adjacency matrix of the target topology and n is the number of IPs in the network. 5. Experimental Work In this section, we explain how a framework is developed to find the best 3D-mesh architecture for an ASNoC design. As an example, we study the 3D-NoC implementation of a 32-core symmetric multi-processor system. A symmetric multi-processor system consists of homogeneous microprocessors running independently, i.e., each microprocessor executes an independent function or procedure. Microprocessors use different data but have the capability of sharing common internal system resources, such as sharing access to a common memory unit or I/O interfaces. We setup an experiment to simulate a scenario in which we 32 microprocessors exchange data through a common memory unit. This experiment is prepared to represent large number of applications that use symmetric microprocessors, such as video processing and data accusation systems. We present this experiment to prove that the same methodology that we use in this case study can be used for symmetric multi-processor applications. For a practical implementation, we consider the following constraints/assumptions for the target 3D-NoC structure: 1) The target NoC is implemented in a 3D-IC structure, 2) The locations of TSVs between layers are pre-defined due to fabrication requirements/specifications, 3) Routers are laid out on a rectilinear grid in each layer and the dimensions of the grids are identical in each layer, 4) The maximum number of external links for each router does not exceed four and the
542
Haytham Elmiligi et al. / Procedia Computer Science 34 (2014) 538 – 543
Fig. 2. Results for optimizing the power consumption of the symmetric multicore application using GA.
Fig. 3. (a) The weight matrix (W) for the 3D-mesh NoC. (b) Mapping 32 multiprocessors to 3D-mesh NoC.
minimum number is one, 5) The number of grid points in each layer is defined based on estimated layout information at early design phases, 6) The number of routers equals the number of IPs and each router is associated with one IP. We start off by generating a traffic distribution matrix to represent the average traffic between 32 symmetric microprocessors. The traffic matrix is generated in Matlab to represent a uniform traffic distribution between the microprocessors through a shared memory. Then, we use Matlab GA toolbox to calculate the fitness function in (4) and find the best 3D-NoC mesh network that achieves a minimum power consumption. Figure 2 shows the GA results for finding the best mapping that achieves the lowest power consumption for a symmetric multicore application. As shown in the figure, our method could be used to find the global minimum point
Haytham Elmiligi et al. / Procedia Computer Science 34 (2014) 538 – 543
543
that achieves the lowest power consumption. The curves show a smooth convergence towards the minimum point using GA. The weight matrix was generated as shown in Figure 3(a) and the final mapping of the 32 multiprocessors is generated, as shown in Figure 3(b). The lowest power calculated based on this mapping is 2.2409e+006 mw and Matlab took 339 seconds to generate the GA results. The GA solves the optimization problem in less than four minutes whereas it took over three days using exhaustive search and yet to find the minimum power consumption. 6. Conclusion and future work In this paper, we explained how to calculate the power consumption of any 3D-NoC-based design through a graphtheoretic approach. Graph-theoretic approach proved that it is extremely powerful in modeling NoC performance metrics. We developed a new model that can be used at early design phases to calculate the power consumption based on the application traffic pattern. Our model utilizes the application’s TDG, which improves the accuracy of the power calculation, compared to other generic models. We also considered the routing protocol in our power calculations for 3D-mesh NoC-based application. Dijkstra’s algorithm was used to find the shortest path routing. The proposed model was then used to find the optimum mapping for any given application. We used GA to find the network mapping that achieves minimum power consumption. We are planning to extend this work in two directions. The first direction is to use the same methodology to find the optimum network topology and mapping when we have different design constraints; such as fixed IP mapping or limited number of TSVs. The second direction is to consider other network topologies such as ring and tree networks. References 1. B. H. Meyer, J. J. Pieper, J. M. Paul, J. E. Nelson, S. M. Pieper, A. G. Rowe, Power-performance simulation and design strategies for single-chip heterogeneous multiprocessors, IEEE Transactions on Computers 54 (6) (2005) 684–697. 2. N. Banerjee, P. Vellanki, K. S. Chatha, A power and performance model for network-on-chip architectures, in: Proceedings of the Design, Automation and Test in Europe (DATE’04), Paris, France, 2004, pp. 21250–21256. 3. V. Pavlidis, E. Friedman, 3-D topologies for networks-on-chip, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 15 (10) (2007) 1081–1090. doi:10.1109/TVLSI.2007.893649. 4. B. Feero, P. Pande, Networks-on-chip in a three-dimensional environment: A performance evaluation, IEEE Transactions on Computers 58 (1) (2009) 32–45. doi:10.1109/TC.2008.142. 5. A. Morgan, H. Elmiligi, M. El-Kharashi, F. Gebali, Unified multi-objective mapping and architecture customisation of networks-on-chip, IET Computers Digital Techniques 7 (6) (2013) 282–293. doi:10.1049/iet-cdt.2013.0017. 6. A. Sheibanyrad, F. Ptrot, A. Jantsch, 3D Integration for NoC-based SoC Architectures, Springer-Verlag, 2011. 7. G. Reehal, M. Ismail, A systematic design methodology for low-power nocs (2014). doi:10.1109/TVLSI.2013.2296742. 8. H. Elmiligi, A. A. Morgan, M. W. El-Kharashi, F. Gebali, Power-aware topology optimization for networks-on-chips, in: Proceedings of the 2008 International Symposium on Circuits and Systems (ISCAS 2008), Seattle, WA, 2008, pp. 360–363. 9. H. Elmiligi, A. A. Morgan, M. W. El-Kharashi, F. Gebali, Power optimization for application-specific networks-on-chips: A topology-based approach, Microprocessors and Microsystems 33 (5) (2009) 343–355. 10. H. Elmiligi, A. A. Morgan, M. W. El-Kharashi, F. Gebali, Networks-on-chip topology optimization subject to power, delay, and reliability constraints, in: Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), 2010, pp. 2354–2357. doi:10.1109/ISCAS.2010.5537194. 11. R. Holsmark, S. Kumar, Design issues and performance evaluation of mesh NoC with regions, in: 23rd NORCHIP Conference, 2005, pp. 40 – 43. doi:10.1109/NORCHP.2005.1596984. 12. J. L. Gross, J. Yellen, Graph theory and its applications, 2nd Edition, CRC Press, Boca Raton, FL, USA, 2005. 13. M. T. Goodrich, R. Tamassia, Algorithm Design: Foundations, Analysis, and Internet Examples, John Wiley and Sons, 2002. 14. M. Barbehenn, A note on the complexity of Dijkstra’s algorithm for graphs with weighted vertices, IEEE Transactions on Computers 47 (2) (1998) 263. doi:10.1109/12.663776. 15. P. Zhou, P.-H. Yuh, S. S. Sapatnekar, Application-specific 3d network-on-chip design using simulated allocation, in: Proceedings of the 2010 Asia and South Pacific Design Automation Conference, ASPDAC ’10, IEEE Press, Piscataway, NJ, USA, 2010, pp. 517–522. URL http://dl.acm.org/citation.cfm?id=1899721.1899844 16. S. Bhat, Energy models for networks-on-chip components, Master’s thesis, Technische University Eindhoven, Eindhoven, Netherlands (Dec. 2005). 17. C. Jueping, J. Peng, Y. Lei, H. Yue, L. Zan, Through-silicon via (TSV) capacitance modeling for 3D NoC energy consumption estimation, in: 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), 2010, pp. 815–817. doi:10.1109/ICSICT.2010.5667434. 18. H. Elmiligi, M. W. El-Kharashi, F. Gebali, Power consumption of 3D networks-on-chips: Modeling and optimization, Microprocessors and Microsystems 37 (67) (2013) 530–543. 19. D. Lee, Proof of a modified Dijkstra’s algorithm for computing shortest bundle delay in networks with deterministically time-varying links, IEEE Communications Letters 10 (10) (2006) 734–736. doi:10.1109/LCOMM.2006.051982.