Data through the Computational Lens

Data through the Computational Lens

Journal of Computational Science 20 (2017) 81–84 Contents lists available at ScienceDirect Journal of Computational Science journal homepage: www.el...

1MB Sizes 3 Downloads 75 Views

Journal of Computational Science 20 (2017) 81–84

Contents lists available at ScienceDirect

Journal of Computational Science journal homepage: www.elsevier.com/locate/jocs

Data through the Computational Lens a b s t r a c t Today, many advanced studies in computational science are enforced by data collected and processed in distributed systems, obtained and assimilated in real (or near-real) time, used to identify and build models, or even used for advanced visualization to have insights through visual images. Taking such an important role of data into account, 16th International Conference on Computational Science (ICCS 2016), an event that promotes leading-edge research in the area, proposed the special theme “Data through the Computational Lens” to bring diverse ideas and recent developments together from computational science society aimed towards this vector. This special issue contains extended papers selected from the conference proceedings that promotes leading edge research. © 2017 Published by Elsevier B.V.

1. Introduction Investigation of natural phenomena was always related to data collection by measurements within experiments and observations. Developing the early scientific method Sir Francis Bacon considered the empirical studies and further inductive inference as a core approach for knowledge discovery. In those days, an opposite approach, the rationalism, proposed and developed by “continental” philosophers (Descartes, Spinoza, Leibniz) promoted deductive inference directed from “pure” knowledge to empirical experience. Being carried out for years and multitude of brilliant scientific thoughts, those two paradigms are still important. Considering the modern area of computational sciences [1], we can talk about data-oriented technologies, approaches, and concepts, which can be treated as inductive methods to knowledge discovery. On the other hand, the available data is mostly imperfect and we need to apply different techniques to produce more abstract and “pure” knowledge. Within this scope, modeling and simulation techniques, designed and developed as a reflection of more abstract knowledge, serve the same goal: knowledge acquisition. Today, we can consider inductive and deductive ways separately, but to achieve better results the combination of those techniques should form a framework for good researches. The contemporary level of technology, provides us with an ability to collect and process a huge amount of data. The concept of BigData appeared and gained popularity several years ago which gives us the ability to talk about a new paradigm in science being built around data-intensive scientific discovery [2]. Nowadays, large amount of data can be obtained to describe natural, social, informational and other systems. This leads to growing role of data science methods forming a toolbox for data-intensive scientific discovery. Still, the new concept reveals a lot of important issues on technology and methodology of large data processing: imperfection, quality and trustworthy of data, objectiveness of http://dx.doi.org/10.1016/j.jocs.2017.05.003 1877-7503/© 2017 Published by Elsevier B.V.

data inference, ethical and cultural issues (see, e.g. [3]). Managing these issues requires the involvement of higher-level knowledge from various domains. To make correct inference, one needs not only data-driven “inductive” inference but also knowledge-driven “deductive” reasoning to discover the true nature of investigated phenomenon and come to a proper conclusion [4]. Computational science plays a significant role in such two-way inference as it encompasses approaches, methods and techniques: a) to design and develop models using domain-specific knowledge; b) to perform analysis using simulation with available data being involved for model identification, verification, validation; c) to apply multitude data-driven technologies for data assimilation, surrogate modeling etc. Within this scope, computational science provides a technology to glue directions of inference together, where models could be considered as a central concept which feed both inductive and deductive reasoning with domain knowledge and available data. This idea leads to several important consequences and appearance of important issues. First, now a days complex systems are often considered as objects of scientific discovery. Advanced techniques for modeling and simulation are required to constrain sources of uncertainty, limitation of models, imperfection of used data and knowledge, etc. Second, many of contemporary models require a significant amount of computations. Therefore, a complex model often requires complex computational infrastructure to be managed to gain higher performance, reactivity, throughput, etc. Third, considering the significant role of data within a scientific discovery process, we need to consider modeling and simulation as an important source to enrich data corpora available for researchers. Uncontrollable data production may lead to data deluge, in the worst-case scenario. Fourth, knowledge is the main goal and the main force of the scientific discovery process. Within a complex multi-disciplinary investigation, multiple knowledge sources should be brought together from various domains (problem domain, IT, modeling and simulation) within

82

Data through the Computational Lens / Journal of Computational Science 20 (2017) 81–84

Fig. 1. Topics of ICCS a) high-level topics groups; b) highly dynamic topics.

knowledge-based technologies for building problem-solving environments (PSE). Finally, today the growing role of collaboration within a global scientific society needs a new paradigm of Science 2.0 [5]. This leads to the appearance of multitude tools of collaboration within research projects, sharing of scientific results, publication of early results etc.

2. History and perspectives of ICCS Covering the mentioned issues, computational science presents a toolbox for data- and computational-intensive scientific discovery, a way to look at and analyze natural phenomena via modeling and simulation methods, and an abundant source of knowledge for various scientific areas. Understanding the importance of this area, we are glad to present this special issue of Journal of Computational Science with selected best papers from the last International Conference on Computational Science (ICCS 2016)1 which was held in San Diego, California, U.S.A. from 6 to 8 June 2016 [6]. Since its inception in 2001, ICCS continues attracting experts in computational science from all over the world. ICCS covers a wide range of topics: scientific computing, problem-solving environments, advanced numerical algorithms, modeling and simulation of complex systems, and many others. During its history, ICCS topics have been changing to reflect the most urgent problems in the computational science area. Today the corpus of ICCS papers contains about 5695 papers with average length of 10 pages published in Lecture Notes Computer Science (2001–2009) and Procedia Computer Science (2010–2016). More interesting results can be obtained with detailed analysis of the ICCS corpus using the topic modeling approach. Fig. 1a represents the identified high-level topics of ICCS with quantification of their proportions showing the leading role of Modeling and HPC areas. However, these topics have had periods of popularity and unpopularity during the history of ICCS for various reasons such as topicality of research problems, methods and invention and/or advancement of technologies (see Fig. 1b). For instance, HPC-GPU were not popular at ICCS until 2006. However, since 2006 (and especially, since 2009) HPC-GPU have been booming because one of the first common scientific programs to run faster on GPUs than CPUs was reported in 2005, as well as the appearance of GPGPU and hybrid computing libraries and standards (FireStream (2006), CUDA (2007), OpenCL (2009), and others).

1

http://www.iccs-meeting.org/iccs2016/.

More insightful results can be also obtained with analysis of static and dynamic networks of ICCS topics (see Fig. 2). The centrality analysis of the network reveals the crucial role of Simulation and HPC topics in linking the multiple clusters together and keeping the consistency of the conference. In fact, disappearing of this area would make the conference presumably “fall apart” into separate conferences. Further details on the topic modeling of the ICCS [7] will be presented at ICCS 2017 in Zürich, Switzerland on 12–14 June, 2017. 3. Special issue content This special issue presents a series of papers aimed toward the representation of the key works of ICCS [8,9]. It contains 13 papers collected both from the Main Track of the conference and its workshops after a competitive selection. They cover various areas and topics relevant to the ICCS society. Multiple papers are devoted to parallel and distributed computing considering different architectures of computational systems. Abdelfattah et al. [10] consider GPU-accelerated Cholesky factorization for batch and native models of operation showing significant speedup in comparison to multi-core CPU solution. Wang et al. [11] investigate the performance of energy lookup algorithm implemented on CPU and many-core architectures and introduce vectorized versions of it. Area of distributed system are presented by Owsiak et al. [12]. They focused on multiple workflows running simultaneously within Kepler framework executed using Docker-based mechanism obtaining high scalability and support of multi-user work. Research on agent-based modeling which depict collective behavior, and can be used to investigate urban and social systems are reflected in the following works. Korczynski et al. [13] discuss memetic search in classic and agent-based evolutionary algorithms, propose several original methods for algorithms enhancement. Bi et al. [14] investigate infrastructure charging station for electric vehicles in the urban environment of Singapore by modeling vehicles’ drivers behavior through agent-based traffic simulation. Research of Visheratin et al. [15] is focused on agent-based modeling of information spreading in urgent scenarios in an urban environment. An important area of machine learning and its applications are presented as well. Fisher et al. [16] developed a novel data-driven workflow to apply machine learning techniques to detect anomalies in earth dam and levee using sensors data. Artetxe et al. [17] investigate heterogeneous ensemble classifier (Anticipative Hybrid Extreme Rotation Forest, AHERF) and its application to emergency service readmission risk prediction.

Data through the Computational Lens / Journal of Computational Science 20 (2017) 81–84

83

Fig. 2. Network of topics.

Numerical methods are investigated by Melis and Samaey [18] who introduce variance reduction technique based on control variables for stochastic slow-fast systems containing a deterministic slow equation and a stochastic fast equation. Yeung et al. [19] present an original preconditioner algorithm based on spectral projection for solving ill-conditioned linear systems of equations. Various applications of computational science are presented in the special issue with papers in geosciences, material science, and finance. Fu et al. [20] present computational analysis of the multi-grain solidification behavior of a crystal-melt nickel system at a moderate undercooling degree via both molecular dynamics and a phase field model. Balajewicza and Toivanen [21] consider reduced order models for pricing European and American options under jump-diffusion and stochastic volatility models. Finally, important aspects of collaboration and education in computational science are covered by Purawat et al. [22] with presentation of a community-oriented e-learning environment in the biomedical area, which enable collaboration and knowledge sharing using software training toolboxes available through virtualization technologies. We thank all the authors of the selected papers for their valuable contribution. We believe that current special issue presents a comprehensive and significant collection of works in computational science, reflects state-of-the-art in that area, and can have an impact on the scientific community. Our hope is to encourage further development of approaches, methods and technologies for data- and computational-intensive scientific discovery as a form of new tools for contemporary science.

Acknowledgments We thank all committee members from the main track and the workshops for their contribution to ensure the high standard of accepted papers of ICCS. As always, we thank Elsevier, the conference is organized with their financial and administrative support. The conference was organized by staff from the San Diego Supercomputer Center, the University of Amsterdam, NTU Singapore and UC San Diego. Finally, we would like to gratitude the reviewers of this special issue for their impact and valuable comments which enable further improvement of the selected papers. The research on topic modeling was performed within a project financially supported by The Russian Scientific Foundation, Agreement #14-21-00137. References [1] P. Sloot, P. Coveney, J. Dongarra, Preface, J. Comput. Sci. 1 (2010) 3–4, http:// dx.doi.org/10.1016/j.jocs.2010.04.003. [2] T. Hey, S. Tansley, K. Tolle (Eds.), The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, 2009, ISBN (978-0-9825442-0-4) https://www.microsoft.com/en-us/research/publication/fourth-paradigmdata-intensive-scientific-discovery/#. [3] D. Boyd, K. Crawford, Critical questions for big data, information, Commun. Soc. 15 (2012) 662–679, http://dx.doi.org/10.1080/1369118X.2012.678878. [4] P.M.A. Sloot, More likely we would be ritually slaughtered, in: 43 Visions for Complexity, 2017, pp. 65–66, http://dx.doi.org/10.1142/9789813206854 0033. [5] B. Shneiderman, Science 2.0, Science 319 (2008) 1349–1350, http://dx.doi. org/10.1126/science.1153539. [6] I. Altintas, M. Norman, M. Lees, V.V. Krzhizhanovskaya, J. Dongarra, P.M.A. Sloot, Data through the computational lens, preface for ICCS 2016, Procedia Comput. Sci. 80 (2016) 1–7, http://dx.doi.org/10.1016/j.procs.2016.05.426. [7] T.M. Abuhay, S.V. Kovalchuk, K.O. Bochenina, G. Kampis, V.V. Krzhizhanovskaya, M.H. Lees, Analysis of computational science papers from ICCS 2001–2016 using topic modeling and graph theory, Procedia Comput. Sci. (2017) (preprint at arXiv:1705.02203) (arXiv and Procedia Computer science in press).

84

Data through the Computational Lens / Journal of Computational Science 20 (2017) 81–84

[8] S. Koziel, L. Leifsson, Special issue computational science at the gates of nature, J. Comput. Sci. 9 (2015) 1–162 http://www.sciencedirect.com/science/ journal/18777503/9. [9] D. Abramson, V.V. Krzhizhanovskaya, M. Lees, Perspectives of the international conference of computational science 2014, J. Comput. Sci. 10 (2015) 247–248, http://dx.doi.org/10.1016/j.jocs.2015.08.007. [10] A. Abdelfattah, A. Haidar, S. Tomov, J. Dongarra, Fast cholesky factorization on GPUs for batch and native modes in MAGMA, J. Comput. Sci. 20 (2017) 85–93, http://dx.doi.org/10.1016/j.jocs.2016.12.009. [11] Y. Wang, E. Brun, F. Malvagi, C. Calvin, Competing energy lookup algorithms in Monte Carlo neutron transport calculations and their optimization on CPU and intel MIC architectures, J. Comput. Sci. 20 (2017) 94–102, http://dx.doi. org/10.1016/j.jocs.2017.01.006. [12] M. Owsiak, M. Plociennik, B. Palak, T. Zok, C. Reux, L. Di Gallo, D. Kalupin, T. Johnson, M. Schneider, Running simultaneous Kepler sessions for the parallelization of parametric scans and optimization studies applied to complex workflows, J. Comput. Sci. 20 (2017) 103–111, http://dx.doi.org/10. 1016/j.jocs.2016.12.005. [13] W. Korczynski, A. Byrski, M. Kisiel-Dorohinicki, Buffered local search for efficient memetic agent-based continuous optimization, J. Comput. Sci. 20 (2017) 112–117, http://dx.doi.org/10.1016/j.jocs.2017.02.001. [14] R. Bi, J. Xiao, V. Viswanathan, A. Knoll, Influence of charging behaviour given charging infrastructure specification: a case study of Singapore, J. Comput. Sci. 20 (2017) 118–128, http://dx.doi.org/10.1016/j.jocs.2017.03.013. [15] A.A. Visheratin, T.B. Trofimenko, K.D. Mukhina, D. Nasonov, A.V. Boukhanovsky, A multi-layer model for diffusion of urgent information in mobile networks, J. Comput. Sci. 20 (2017) 129–142, http://dx.doi.org/10. 1016/j.jocs.2017.03.013. [16] W.D. Fisher, T.K. Camp, V.V. Krzhizhanovskaya, Anomaly detection in earth dam and levee passive seismic data using support vector machines and automatic feature selection, J. Comput. Sci. 20 (2017) 143–153, http://dx.doi. org/10.1016/j.jocs.2016.11.016. ˜ S. Rios, Using anticipative hybrid extreme [17] A. Artetxe, B. Ayerdi, M. Grana, rotation forest to predict emergency service readmission risk, J. Comput. Sci. 20 (2017) 154–161, http://dx.doi.org/10.1016/j.jocs.2016.12.008. [18] W. Melis, G. Samaey, Variance-reduced multiscale simulation of slow-fast stochastic differential equations, J. Computat. Sci. 20 (2017) 162–176, http:// dx.doi.org/10.1016/j.jocs.2016.12.008. [19] M.-C. Yeung, C.C. Douglas, L. Lee, A spectral projection preconditioner for solving ill conditioned linear systems, J. Comput. Sci. 20 (2017) 177–186, http://dx.doi.org/10.1016/j.jocs.2017.01.005.

[20] Y. Fu, J.G. Michopoulos, J.-H. Song, Bridging the multi-phase field model with the molecular dynamics for the solidification of nano-crystals, J. Comput. Sci. 20 (2017) 187–197, http://dx.doi.org/10.1016/j.jocs.2016.10.014. [21] M. Balajewicz, J. Toivanen, Reduced order models for pricing European and American options under stochastic volatility and jump-diffusion models, J. Comput. Sci. 20 (2017) 198–204, http://dx.doi.org/10.1016/j.jocs.2017.01.004. [22] S. Purawat, C. Cowart, R.E. Amaro, I. Altintas, Biomedical Big Data Training Collaborative (BBDTC): an effort to bridge the talent gap in biomedical science and research, J. Computat. Sci. 20 (2017) 205–214, http://dx.doi.org/10.1016/j. jocs.2017.03.010.

Sergey V. Kovalchuk Tesfamariam M. Abuhay ITMO University, Russia Ilkay Altintas Michael L. Norman University of California, San Diego, USA Michael H. Lees University of Amsterdam, The Netherlands Valeria V. Krzhizhanovskaya a,b a ITMO University, Russia b University of Amsterdam, The Netherlands Jack Dongarra University of Tennessee, USA Peter M.A. Sloot a,b,c ITMO University, Russia b University of Amsterdam, The Netherlands c Nanyang Technological University, Singapore a