With Exascale systems on the horizon, we have ushered in an era with power and energy consumption as the primary concerns for scalable computing. It is pivotal to consider revolutionary approaches for hardware and software co-design, such that extreme scale applications can extract peak performance from the systems under severe constraints of power, which also imply decreased reliability of individual components. In this special issue, two papers seek to address the important energy efficiency aspects in the HPC community that have not been previously addressed by aspects covered in the data center or cloud computing communities. Emphasis is given to the applications view related to significant energy efficiency improvements and to the required hardware/software stack that must include necessary power and performance measurement and analysis harnesses. In the paper “TracSim: Simulating and scheduling trapped power capacity to maximize machine room throughput”, Zhang et al. present that the power supplied to machine rooms tends to be over-provisioned because it is specified in practice not by workload demands but rather by high energy LINPACK runs or nameplate power estimates. TracSim enables users to specify the system topology, hardware configuration, power cap, and task workload and to develop resource configuration and task scheduling policies aimed at maximizing machine-room throughput while keeping power consumption under a power cap by exploiting CPU throttling techniques. The authors use real measurements from the LANL cluster to set TracSim’s configuration parameters. They leverage TracSim to implement and evaluate four resource scheduling policies. Simulation results indicate the performance of those policies and quantify the amount of trapped capacity that can effectively be reclaimed. In the paper “Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy”, Oden, Klenk and Froning present that Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high computational power and high performance per Watt. However, one of the main bottlenecks of GPUaccelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate the computation, while the CPU is responsible for the communication. This approach always requires a dedicated CPU thread, which consumes additional CPU cycles and therefore increases the power consumption of the complete application. In recent work they have shown that the GPU is able to control the communication independently of the CPU. However, there are several problems with GPU-controlled communication. The main problem is intra-GPU synchronization, since GPU blocks are non-preemptive. Therefore, the use of communication requests within a GPU can easily result in a deadlock. In this work they show how dynamic parallelism solves this problem. GPU-controlled communication in combination with dynamic parallelism allows keeping the control flow of multi-GPU applications on the GPU and bypassing the CPU completely. Their proposed approaches and implementations show that performance per Watt increases by up to 10% while still using commodity hardware.
Vishnu Abhinav Andres Marquez Dimitris Nikolopoulos
http://dx.doi.org/10.1016/j.parco.2016.08.002 0167-8191/Published by Elsevier B.V.