Editorial: Special Issue on Extreme Scale Parallel Architectures and Systems

Editorial: Special Issue on Extreme Scale Parallel Architectures and Systems

Future Generation Computer Systems 30 (2014) 44–45 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: ww...

149KB Sizes 0 Downloads 111 Views

Future Generation Computer Systems 30 (2014) 44–45

Contents lists available at ScienceDirect

Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs

Editorial: Special Issue on Extreme Scale Parallel Architectures and Systems

High-performance computing (HPC) has embarked into the Petascale era and efforts are already on-going towards further increasing the peak performance of supercomputing systems. The advancement of computing to the Exascale era is of paramount importance to computational science – including its domainspecific angles – for responding promptly to a variety of medical, environmental, societal, engineering and global sustainability open problems. In parallel, various lines of business across sectors are becoming increasingly dependent on gaining valuable insights out of processing large volumes of data. Current trends unveil an exponential rate of increase in the volume of available and meaningful to process data, e.g. as a result of the penetration of smart mobile devices into various facets of life and the installation of sensors/actuators and connectivity modules to a disparity of electronic end-points that generate continuous streams of data. And while supercomputing has traditionally focused on serving computational science problems, it is now clear that the technical requirements posed by large-scale data-science applications should also be embraced as a target throughout the design of Exascale computers. As it is probably inherent in every advancement of state of the art by multiple orders of magnitude, embarking to the Exascale era is confronted with various technical challenges. Aggravating things more, many of the roadblocks to overcome along the roadmap towards Exascale are unique to this advancement – e.g. they are not shared with the challenges faced during advancement to Petascale – and in some cases they cannot be addressed by solely evolving (e.g. scaling) state of the art due to hitting upon fundamental limitations of existing approaches. Not exhaustively listing, parts of these challenges are: exposing extreme scale parallelism, power efficiency, providing for sufficient resilience and efficient data-movement at extreme-scale. In addition, strong focus needs also to be put on evolving or taking clean slate approaches to developing the necessary research methods and tools for evaluating approaches in all aspects of Exascale research and development. In alignment with the frontier problems in Supercomputing, the papers included in the present Future Generation Computer Systems Special Issue on Extreme Scale Parallel Architectures and Systems present latest approaches addressing the top priority list items in the Exascale agenda, specifically: resilience, data movement, efficient extreme scale system simulation and system codesign. Original results are presented, together with tools and guidelines on how to have this new knowledge and toolset directly impact the design and implementation of a future Exascale machine. The first paper in this Special Issue, authored by Richard Barrett et al. and titled ‘‘Exascale Design Space Exploration and Codesign’’, focuses on the use of ‘‘mini-applications’’ as a valuable 0167-739X/$ – see front matter © 2013 Published by Elsevier B.V. http://dx.doi.org/10.1016/j.future.2013.10.010

means of system co-design. A mini-application is essentially a condensed implementation of a real parallel application focusing on evaluating one or more key performance aspects of the application on a target architecture. A mini-application is carefully architected to be quickly comprehensible by non-domain experts (e.g. system designers) and also amenable to refactoring by applicationdomain experts and/or developers; and yet, not extremely abstracted to a level that it stops being representative enough of the scientific problem solved by the parent application. In this context, the concept of a mini-application at the right level of abstraction is very promising in becoming the common language between application-domain experts/developers/algorithm designers and system software/hardware researchers/architects /developers. Beyond a conceptual presentation, the paper goes deeper into showing results obtained via execution of mini-applications and their respective real application code on both microarchitectures and on full systems. These results confirm the high fidelity level that mini-applications achieve in capturing key performance metrics of real applications. Another valuable contribution of the paper is in showing how mini-applications can be used as input for exploratory simulation of hypothetical extreme-scale architectures, thereby elevating the fidelity of simulation resutls, relaxing the computational requirements of system simulation and shortening the time required for the simulation expert to develop applicationdomain expertise. Christian Engelmann presents in his paper titled ‘‘Scaling To A Million Cores And Beyond: Using Light-Weight Simulation to Understand The Challenges Ahead On The Road To Exascale’’ a comprehensive report on xSim, an extreme-scale system simulation tool that has been under development at the Oak Ridge National Laboratory. Among the important features claimed by the architecture of xSim and its custom built PDES (Parallel Discrete-Event Simulation) engine is its ability to simulate a humungous number of target system cores with a relatively low amount of computational resources. Notably, the author reports the ability to grow to 227 simulated cores by using 960 processing cores. Additionally, xSim is carefully architected to support real application input, specifically instrumented application traces stemming from real application executions. Trace instrumentation is here key to achieve extreme simulation scalability, while controlling computing resource requirements. In this context, the connection to the research pursued in the first paper of this issue is obvious and thus the support of mini-application within xSim is an interesting topic for future exploration. Two interesting usecases and evaluation results thereof are showcased in the paper using xSim simulation: the evaluation of collective communication algorithms on various exploratory network architectures, as well as the evaluation of the performance of a Monte-Carlo application across varying architectural parameters.

Editorial / Future Generation Computer Systems 30 (2014) 44–45

The third paper titled ‘‘Accelerating Incremental Checkpointing for Extreme-Scale Computing’’ by Kurt Ferreira et al. addresses the important issue of application fault tolerance at extreme scale and more precisely the scalability of a common implementing technique thereof at extreme scale, checkpoint/restart. Developing on the observation that operating system page-level incremental checkpointing fails to significantly reduce the disk size required for checkpointing at extreme scale, they present a new incremental checkpointing approach (libhashckpt) that works at finer-granularity, using hash calculations and MPI hooks to determine the locations within a dirty page that have been changed. The authors back their postulations with checkpointing evaluation results of four parallel applications, showing that the hash-based approach does in many cases reduce the size of data involved in checkpointing (and therefore moved and stored). In addition to efficacy, the overhead induced by hashing is evaluated across multiple hashing algorithm alternatives and showing that the utilization of GPU acceleration can significantly reduce overhead. The last paper by Lugones et al. titled ‘‘A Reconfigurable, Regular-topology Cluster/Datacenter Network using commodity Optical Switches’’ presents a novel interconnect design built out of a combination of electronic and optical switches, capable of

45

scaling to multiple 10ths of thousands of nodes. The interconnect employs the shuffle exchange topology to avoid costly multiple stages and is backed with efficient decomposition and routing algorithms. The interconnect is evaluated via simulation for a disparity of input traffic patterns and system sizes, showing that on average it achieves competitive throughput when compared to much costlier (full-bisection bandwidth) solutions. Aligned with the power and interconnect throughput concerns for Exascale, the presented solution is very promising and novel due to the use of optical switching.

Georgios Theodoropoulos Institute of Advanced Research Computing, Durham University, UK Kostas Katrinis Rolf Riesen Shoukat Ali IBM Research - Ireland, IBM Dublin Technology Campus, Mulhuddart, Dublin 15, Ireland