Scope Program Committee Registration Travel & Locations

Agenda


Tuesday, 10 October 2017


9:00 - 9:15 Introduction
Michael Resch, HLRS, University of Stuttgart
9:15 - 9:45 Two-Year Experiences with Vector Supercomputer SX-ACE and Design Space Exploration of the Next Generation Vector System
Hiroaki Kobayashi, Cyberscience Center, Tohoku University
Abstract In this talk, I will be presenting two-year experiences with our brand-new vector-parallel supercomputer SX-ACE. In particular, we will show you operation statistics, applications developed on SX-ACE, and some case study of program tuning to exploit its potential. In addition, I will describe the future plan for supercomputing resource installation and deployment at Tohoku University and make some discussion on the design space exploration of the future vector system.
9:45 - 10:15 SiVeGCS - The Future of German Supercomputing
Michael Resch, HLRS
Abstract Germany has launched a new round of funding for its national centers. This talk will present the project SiVeGCS through which national centers HLRS, JSC, and LRZ will receive funding to keep a leading role in European HPC and worldwide.
10:15 - 10:45 Break
10:45 - 11:15 JAMSTEC Next Scalar Supercomputer System
Ken'ichi Itakura, JAMSTEC
Abstract We need a successor of the present super computer (SC) system and plan for an upgrade and making the next term general-purpose high-performance computer system. The functions required for the new system are common platform with numerical models developed in all of the world and co-operating with the "Earth Simulator". And this new system will support analysis of oceanic global environment with big data analysis, machine learning, artificial intelligence. The new system will be used from February, 2018. In my talk, I will show the details of this new system.
11:15 - 11:45 OCTOPUS: a new supercomputing service of Osaka University
Susumu Date, Cybermedia Center, Osaka University
Abstract Cybermedia Center will introduce a new supercomputer system named OCTOPUS. The speaker explains the background and administrators' desire behind the system's procurement as well as the brief introduction of the system. Also, the future plan is mentioned.
11:45 - 12:15 The Brand-new Vector Supercomputer, Aurora
Shintaro Momose, NEC
Abstract NEC is developing the next-generation vector supercomputer Aurora, which is a successor system to the latest model SX-ACE. This brand-new system is slated for its product release in 2018. In order to make available both high sustained performance and productivity, NEC develops an innovative vector processor offering the world’s highest memory bandwidth per processor. It is built into a standard PCIe card as Vector Engine, which allows the user to capitalize on a high sustained performance for a spectrum of applications in the de facto standard x86/Linux environment. There is no need to struggle for special programming techniques in harnessing its processing capabilities. This card device enables easy programming in standard Fortran/C/C++ languages, for which the NEC compiler will automatically carry out vectorization and parallelization. The user can benefit from a higher sustained performance with minimum effort by utilizing the Vector Engine hardware.
12:15 - 13:15 Lunch
13:15 - 13:45 A Multiple-layer Bypass Mechanism for Energy Efficient Computing
Ryusuke Egawa, Masayuki Sato, Ryoma Saito, Hiroaki Kobayashi, Cyberscience Center, Tohoku University
Abstract The cache hierarchy consists of several cache layers to hide the memory access latency in modern microprocessors, and the capacity and energy consumption of the cache hierarchy increase significantly as the number of layers and their sizes increase. However, since one cache configuration cannot fit all applications, the cache hierarchy sometimes degrades the energy efficiency of the computing system. In this talk, we introduce a cache control mechanism which changes the structure of the cache hierarchy according to the memory access behavior of applications by bypassing multiple cache layers.
13:45 - 14:15 Coupling Strategies for Multiphysics Simulations on Hierarchical Cartesian Meshes
Matthias Meinke, Michael Schlottke, Ansgar Niemöller, Institute of Aerodynamics, RWTH Aachen University
Abstract This paper will present a fully coupled hybrid CFD/CAA method for the prediction of aeroacoustic noise on high-performance computing systems. The method combines a finite-volume for the solution of the Navier-Stokes equations and a discontinuous Galerkin metod for the solution of the acoustic perturbation equations both formulated for hierarchical Cartesian meshes. Both algorithms share a common base level of the Cartesian mesh, which is also used for the domain decompositioning. This allows to distribute the computational load of the two solvers such that the transfer of acoustic source terms from the CFD to the CAA solution are performed by transfer operations in the local memory. The coupled solver thus avoids the I/O of large data volumes otherwise needed by running the solver in the non-coupled consecutive way. The paper will show scaling tests for the individual solvers, the I/O and the fully coupled solver. The performance of the coupled solver will be demonstrated by a large scale application for jet noise.
14:15 - 14:45 Locally Linearized Euler Equations in Discontinuous Galerkin with Legendre Polynomials
H. Klimach, M. Gaida, S. Roller, Simulationstechnik & Wissenschaftliches Rechnen, Universität Siegen
Abstract We present the implementation of locally linearized Euler equations in our Discontinuous Galerkin solver Ateles that operates on the basis of Legendre polynomials. The linearization is tied to the discretization and exploits the fact that the first mode of the Legendre polynomials represents the mean in the elements. This mean is used as the stationary mean flow for the local linearization in each element. With linearized equations, it gets possible to evaluate all terms efficiently in modal space. Between elements the full nonlinear fluxes are computed, which also opens the possibility to combine such linearized elements with others where the nonlinear equations are to be solved. We compare this method with the computation of globally linearized Euler equations and nonlinear Euler equations in Ateles. It can be observed how this linearization helps to reduce the computational effort, especially for high-order discretizations. An adaptive algorithm may easily switch to linearized equations in elements wherever possible as the interface fluxes remain unchanged in this method. Therefore, the nonlinear fluxes build a kind of generic interface between elements that allows for the independent choice of equations to solve from element to element. The use of high-order discretizations delivers low numerical errors for smooth solutions, which we would expect for regions, where linearization is possible. At the same time, it reduces the amount of required memory to store the solution. The described method, therefore, nicely suits a high-order discretization and helps to reduce computational costs drastically.
14:45 - 15:15 Unveiling Insight on Fluid Systems in a Diverse Environment using CFD
Manuel Hasert, Festo AG & Co. KG
Abstract The automation company Festo offers a vast variety in fluid power devices such as valves, drives and sensors. Simulation methods are increasingly employed during the design process. Computational Fluid Dynamics (CFD) aids in identifying optimization potential for power fluid systems and giving insight into device performance before prototypes are created. Certain standard tasks recur in a regular manner and can be planned and processed directly. The large variety in devices and variants leads to strongly different analysis objectives and requirements. Typical objectives include the computation of flow rates and spool forces, cavitation analysis, parameter studies and optimization, the coupling of multiple physics domains or aiding in identifying reasons of failures. The time range in which the computational analysis has to be performed is however often times very restricted. This talk addresses current activities and requirements in CFD analysis in such a diverse environment.
15:15 - 15:45 break
15:45 - 16:15 High-fidelity Simulation of Helicopter Phenomena HPC aspects in advanced engineering applications
Manuel Keßler, Institute for Aero and Gasdynamics University of Stuttgart
Abstract Rotorcraft aeromechanic simulations are amongst the most demanding engineering applications due to the complex aerodynamics, flight mechanics and structure dynamics and their respective interactions. The unique framework developed at IAG over the last two decades - together with research institutes and industry - combines an advanced CFD solver, CSD components and comprehensive analysis tools and has reached a level of maturity to actually deliver reliable results matching flight data. Efficient utilisation of HPC resources is of paramount importance, as a single simulation run can burn several million core hours. Continuous optimisations at the node level as well as on parallel scaling are essential to achieve justifiable results.
16:15 - 16:45 Highly portable CFD solutions for heterogeneous computing on unstructured meshes
A.V. Gorobets, S.A.Soukov, Keldysh Institute of Applied Mathematics of RAS, Moscow, Russia
P.B. Bogdanov, Scientific Research Institute of System Development of RAS, Moscow, Russia
X. Alvarez, F.X.Trias, Heat and Mass Transfer Technological Center of UPC, Barcelona, Spain
Abstract The worldwide exascale race imposes many challenging problems related with the variety and complexity of hybrid massively-parallel computing architectures and the required extreme levels of parallelism. The present work is focused on development of highly scalable and portable parallel CFD algorithms and software implementations for time-accurate simulations of compressible and incompressible turbulent flows using unstructured hybrid meshes. A multilevel MPI+OpenMP+OpenCL parallelization for heterogeneous computing on whatever hybrid architectures is proposed. It includes multilevel decomposition for distribution and balancing of workload between computing devices of hybrid nodes and hiding of communication overhead by overlapping data transfer and computations. A finite-volume cell-centered parallel algorithm and its portable stream processing-based implementation are described. Performance study is reported for a wide range of computing devices including various multi-core CPUs, NVIDIA and AMD GPUs, Intel Xeon Phi accelerators. Scalability is tested on different hybrid systems including a fat-node with 8 GPUs and a supercomputer (with up to several hundreds of GPUs engaged). Performance in a heterogeneous execution on CPUs and GPUs is demonstrated.
Similarly, a multilevel-parallel algorithm and its algebraic-based portable implementation are presented for the case of incompressible flows. The time integration core is composed of only 3 basic linear algebra operations: SpMV, axpy and dot product. Its performance is studied on different architectures including GPU-based hybrid clusters and the Mont-Blanc prototype made of low power consumption ARM-based systems on a chip.
This work has been financially supported by the Russian Science Foundation, project 15-11-30039 (heterogeneous implementations) and the Russian Foundation for Basic Research, grant 15-07-04213-a (parallel mesh processing). Supercomputers Lomonosov of MSU, MVS-10P of JSCC, K100 of KIAM, HPC4 of Kurchatov institute have been used for our calculations and performance tests. The authors thankfully acknowledge these institutions.
16:45 - 17:15 Numerical modelling of phase change processes in clouds - challenges and approaches
Martin Reitzle, Bernhard Weigand, Institute of Aerospace Thermodynamics, University of Stuttgart
Abstract Numerical simulations of phase change processes in clouds (e.g. solidification, evaporation or sublimation) are very demanding from both, a physical and numerical point of view: The complex physics need to be translated into mathematical models before they can be discretised and solved numerically. However, there is a strong coupling of the equations which needs to be mimicked in the numerical schemes. Furthermore, high spatial and temporal resolutions are necessary in order to capture all physical aspects. The challenges and possible solution strategies to these kind of problems will be presented in the framework of the in-house finite-volume code FS3D.
17:15 - 17:45 A dynamic load-balancing strategy for large scale CFD-applications
Philipp Offenhäuser, HLRS
Abstract Current supercomputers generate their computing power by facilitating hundreds of thousands of CPU-cores. To make use of this computing power of current and future supercomputers, the computational e_ort must be distributed evenly across all cores. Furthermore, the communication e_ort of massively parallel applications has to be small compared to the computation e_ort to use the parallel hardware e_ciently. Over the last decades, many parallel numerical algorithms, methods for distributing the computational e_ort and e_cient communication patterns have been developed and implemented. Methods for Computational uid dynamics (CFD) pro_t strongly from this progress and became an indispensable tool in science and research. Due to the available computing power, CFD-applications have been extended to much more complex problems like transient multi-scale and multi-phase-ow problems. To cover all physical phenomena the numeric has enhanced with the disadvantage of local and time-dependent additional numerical costs. The additional numeric costs are hard to predict concerning the precise location within the simulated domain and concerning the exact time. This lead to a mismatch of computational e_ort between MPI-Processes and the whole performance of the application su_ers. A dynamic load-balancing methodology to overcome the problem of over- and under-loaded MPI-processes will be presented and resulting further challenges of dynamic load-balancing methodologies will be discussed.
19:00 - 21:00 Dinner in Goldener Adler

Wednesday, 11 October 2017


9:00 - 9:30 API Extension and Resource Manager Integration for Malleable MPI Applications
Isaias Alberto Compres Urena, Institute of Informatics, Technical University of Munich
Abstract An extension to MPI that enables malleability has been developed at the Transregional Collaborative Research Center Invasive Computing. The MPICH library and the SLURM workload manager have been extended to support the extension. An overview of the current prototype and the improvements it can bring to system-wide efficiency metrics in HPC systems will be presented.
9:30 - 10:00 Performance and Quality Analysis of Interpolation Methods for Coupling
N. Ebrahimi-Pour, S. Roller, Simulationstechnik & Wissenschaftliches Rechnen, Universität Siegen
Abstract Simulations of interactions between structural mechanics, fluid flows and the resulting acoustics is a challenge for modern computing systems. Solving the whole problem in a single domain is unfeasible due to the different scales that need to be covered accordingly. A strategy that can be used to overcome this problem is the subdivision of the overall domain into smaller parts that can be discretized independently and coupled at the surfaces of the individual domains. Such a separation allows solving different equations as required by the physics in each of the domains with their discretization resolving the respective scales of relevant physical phenomena. For the communication and the data exchange between the domains, we use two coupling approaches: APESmate and preCICE. The first approach APESmate is an integrated approach in our APES framework that has knowledge about the numerical scheme within the domains. Thus it evaluates the high order polynomials of the underlying Discontinuous Galerkin scheme for the data exchange, which results in accurate simulation results. The second approach preCICE is a black-box tool, where just the point values at the coupling surface of each domain are known. Hence for the data exchange between the domains preCICE has to interpolate the values from one domain to the other. In this work, we compare the various options for interpolation and coupling in terms of quality and performance. For the interpolation in preCICE we use a method based on radial-basis functions and one that uses a projection to a surface mesh. In APESmate we evaluate all terms of the polynomials representing the solution. Lastly, we compare the computational cost for each method, to find a well-suited method for coupled simulations.
10:00 - 10:30 Towards Realizing a Dynamic and MPI Application-aware Interconnect with SDN
Keichi Takahashi, Cybermedia Center, Osaka University
Abstract Current interconnects of HPC clusters are usually statically configured and over-provisioned. Therefore, interconnects tend to be underutilized when there is a mismatch between the inter-process communication pattern of an application and the performance characteristics of the interconnect. We have been developing SDN-enhanced MPI, a framework that dynamically reconfigures the interconnect using Software Defined Networking (SDN) to improve the utilization of the interconnect and accelerate inter-process communication. This talk introduces our recent achievements on SDN-enhanced MPI, including a toolset to facilitate the analysis of the performance characteristics of dynamic interconnects.
10:30 - 11:00 Break
11:00 - 11:30 FEniCS HPC: An automated predictive high-performance framework for multiphysics simulations
Niclas Jansson, Department of High Performance Computing and Visualization, School of Computer Science and Communication, KTH Royal Institute of Technology
Abstract We present a framework for coupled multiphysics in computational fluid dynamics, targeting massively parallel systems. Our strategy is based on general problem formulations in the form of partial differential equations and the finite element method, which open for automation, and optimization of a set of fundamental algorithms. We describe these algorithms, including finite element matrix assembly, adaptive mesh refinement and mesh smoothing; and multiphysics coupling methodologies such as unified continuum fluid-structure interaction, and aeroacoustics by coupled acoustic analogies. The framework is implemented as FEniCS open source software components, optimized for massively parallel computing. Examples of applications are presented, including simulation of aeroacoustic noise generated by an airplane landing gear, simulation of the blood flow in the human heart, and simulation of the human voice organ.
11:30 - 12:00 Automated derivation and parallel execution of finite difference models on CPUs, GPUs and Intel Xeon Phi processors using code generation techniques
Christian T. Jacobs, Satya P. Jammy, David J. Lusher, Neil D. Sandham, Engineering and the Environment at the University of Southampton
Abstract We present a new Python-based numerical modelling framework named OpenSBLI [1]. This framework comprises code generation techniques that allow users to write the equations they wish to solve as high-level expressions, and the model code that performs the finite difference approximations and solution process is generated automatically. The generated code can then be targetted using the OPS library [2] towards different backends including MPI, OpenMP, CUDA, OpenCL and OpenACC, which readily enables the execution of the same model on a variety of hardware architectures. This helps to future-proof models as new exascale-capable architectures become available. It also introduces a separation of concerns between domain specialists, numerical modellers, and HPC experts, allowing better maintainability and extendibility of the codebase.
[1] C. T. Jacobs, S. P. Jammy, N. D. Sandham (2017). OpenSBLI: A framework for the automated derivation and parallel execution of finite difference solvers on a range of computer architectures. Journal of Computational Science, 18:12-23. doi:10.1016/j.jocs.2016.11.001
[2] I. Z. Reguly, G. R. Mudalige, M. B. Giles, D. Curran, S. McIntosh-Smith (2014). The OPS Domain Specific Abstraction for Multi-Block Structured Grid Computations. In Proceedings of the 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, IEEE Computer Society, pp. 58-67. doi:10.1109/WOLFHPC.2014.7
12:00 - 12:30 Performance tuning of Ateles using Xevolver
Kazuhiko Komatsu, Cyberscience Center, Tohoku University
Abstract In this presentation, we examine some experiences of optimization of the Ateles code on the NEC SX-ACE system. In order to achieve both performance and maintainability of the code, we employ the Xevolver code transformation framework to transform the original code to the optimised version for SX-ACE.
12:30 - 13:30 Lunch
13:30 - 14:00 vTorque - Introducing virtualization capabilities to Torque
Nico Struckmann, HLRS
Abstract The flexibility and portability commonly known from Clouds provides many benefits for users, administrators and software developers. With emerging technologies addressing today's major bottleneck of virtualization technologies, the virtual I/O, the incentive for adoption of virtualization in HPC infrastructure arises. The advantages are many-fold. Users can be served with customized environments. Software developers can package applications with all dependencies. Adminstrators can upgrade or change their HPC infrastructure, i.e. the operating system, without impact on the applications served. vTorque is an non-intrusive approach to introduce virtualization capabilities to the batch-system resource manager Torque. It enables for traditional HPC infrastructures cloud-like features, e.g. flexibility and portability, while maintaining the ability to run jobs on bare metal.
14:00 -14:30 Optimised scheduling mechanisms for Virtual Machine deployment in Cloud infrastructures
Michael Gienger, HLRS
Abstract Modern cloud infrastructures rely on mechanisms to share hardware resources. Although this approach increases usability, flexibility and efficiency of resources, the performance of each virtual machine may be affected by the others. Consequently, in order to maximise the performance of all instances, virtual machine profiles are required that reflect their individual behaviour, especially with respect to I/O and network saturation. This presentation takes up this particular problem and provides a concept for defining and assessing the behaviour of virtual machines. In addition, a method is presented which makes use of this information in order to optimise the overall virtual machine deployment.
14:30 - 15:00 To be defined
Christopher L. Barrett, Biocomplexity Institute, Virginia Tech
Abstract To be defined
15:00 - 15:30 Break
15:30 - 16:00 Software for agent based social simulation in the distributed HPC environments
Sergiy Gogolenko, HLRS
Abstract Agent-based modelling and simulation (ABMS) is an essential tool which allows to explore the role of social phenomena via computer simulation. Despite the high importance of ABM in the computational social sciences, only few ABMS frameworks are designed for large scale simulations on distributed HPC platforms. The most remarkable examples are RepastHPC, FLAME, EcoLab, and Pandora. In this talk, we overview state-of-the-art ABMS frameworks for HPC -- giving special emphasis on RepastHPC and Pandora, -- highlight technical ideas behind them, and identify their weaknesses with respect to scalability, software design, and user-friendliness. We show that the common bottleneck of existing frameworks is a naive approach to distribution of spatial environment which results in a poor load balancing. This discovery stimulated us to propose an alternative graph-based approach to the workload distribution and to initiate development of a new ABMS framework. We present the design of this framework and discuss the aspects of its implementation. Furthermore, we illustrate that our approach can be easily and efficiently implemented with well-established highly optimized distributed sparse matrix (e.g., CombBLAS, GraphPad, Trilinos/Xpetra, PEGASUS) and graph-parallel analytics (e.g., GraphLab/PowerGraph, Kineograph, Pregel) software for a broad class of ABM problems.
16:00 - 16:30 A parallel solver for a linear system with a symmetric sparse matrix by one-dissection ordering
Mitsuo Yokokawa, Tomoki Nakano, Kobe University
Takeshi Fukaya, Hokkaido University
Yusaku Yamamoto, The University of Electro-Communications
Abstract A direct method for solving a linear system of equations is difficult to parallelize due to recurrence of computational sequences of the solver. In my talk, a parallel computation is applied to the linear system with symmetric sparse matrix which is ordered by a one-dissection method. A performance of thread-based parallelization will be presented.
16:30 - 17:00 Vistle, a scalable visualization system for immersive virtual environments
Martin Aumüller, Uwe Wössner, HLRS
Abstract At HLRS, we develop the data-parallel visualization system Vistle as a successor to COVISE. Vistle is a scalable distributed implementation of the visualization pipeline. Modules are realized as MPI processes on a cluster. Within a node, different modules communicate via shared memory. TCP is used for communication between clusters. Vistle targets especially interactive visualization in immersive virtual environments. For low latency, a combination of parallel remote and local rendering is possible.
17:00 - 17:30 Farewell