+++ Teraflop Workbench +++

Scope

Program

Committee

Registration

Travel & Locations

Agenda

Monday, 5 December 2016
9:00 - 9:15	Introduction Michael Resch, HLRS, University of Stuttgart
9:15 - 9:45	NEC SX-ACE's Operations and Applications Development for the Future Hiroaki Kobayashi, Cyber Science Center, Tohoku University
9:45 - 10:15	Accelerators - Future Trends Michael Resch, HLRS, University of Stuttgart
10:15 - 10:45	break
10:45 - 11:15	Data compression strategies for exascale CFD simulations Patrick Vogler, Ulrich Rist, Institute of Aerodynamics and Gasdynamics, University Stuttgart Abstract The steady increase of available computer resources has enabled engineers and scientists to use progressively more complex models to simulate a myriad of fluid flow problems. Yet, whereas modern high performance computers (HPC) have seen a steady growth in computing power, the same trend has not been mirrored by a significant gain in data transfer rates. Current systems are capable of producing and processing high amounts of data quickly, while the overall performance is oftentimes hampered by how fast a system can transfer and store the computed data. Considering that cfd researchers invariably seek to study simulations with increasingly higher temporal resolution on fine grained computational grids, the imminent move to exascale performance will only exacerbate this problem. One of the major pitfalls of storing 'raw' simulation results lies in the implicit and redundant manner in which it represents the flow physics. Thus transforming the large 'raw' into compact feature- or structure-based data could help overcome the I/O bottleneck. This presentation seeks to give an overview over the compression strategies developed by the ExaFlow Project, that could help to significantly boost the performance and efficiency of HPC systems today and in the near future.
11:15 - 11:45	Vectorization of Cellular Automaton-based labeling of connected components in 3D binary lattices Peter Zinterhof, CHPC / Dept. of Computer Sciences University Salzburg Abstract Labeling connected components in binary lattices is a basic function in image processing with applications in a range of fields, such as robotic vision, machine learning, and even CFD (percolation theory). While classical algorithms often employ recursive designs that seem ill-suited for parallel execution as well as being prone to stack-overflows, the described new algorithm is based on a cellular automaton (CA). Being an inherently parallel system in itself, the CA promises speedup and scalability on vector supercomputers as well as on current accelerators, such as GPGPU and Xeon PHI. In this talk we want to describe the very first parallel implementation of the iterative CA-based algorithm for 3D binary lattices. It delivers the result set of connected components in the form of an emergent property of the CA. By substituting branching operations with GPU- and vector-friendly bitwise operations we aim for additional speedup - with somewhat counter-intuitive effects both on GPU- and SX-ACE platforms. Speedup can be obtained, just not the way we intended to. Reference: Stamatovic, Biljana and Trobec, Roman, Cellular automata labeling of connected components in n-dimensional binary lattices, Journal of Supercomputing. 2016, 1-12. DOI:10.1007/s11227-016-1761-4
11:45 - 12:15	Development of a massive parallel and optimized phase-field model for the sinter process Johannes Hötzer, M. Kellner, W. Rheinheimer, M. Seiz, H. Hierl, F.Hafner, L. Promberger, B. Nestler/KIT, Karlsruhe Institute of Technology Abstract A tailored microstructure is crucial for high performance materials with defined properties. In the sintering process, a loose powder of particles densifies due to heat treatment. This process of microstructure formation is driven by surface minimization. Depending on the dominant diffusion mechanism, additionally grain growth can occur, resulting in an inhomogeneous microstructure with varying properties. The phase-field method allows to systematically study the microstructure formation depending on different diffusion mechanisms, particle size distributions, particle shapes and initial powder densities. A phase-field model based on the Grand potential approach, implemented in the parallel framework PACE3D, is used to investigate the sintering process. The solver is optimized on various levels starting from the model and parameters down to the hardware. To calculate multiple thousand particles, each described by an order-parameter field is of high computational effort. Therefore, a mechanism to locally reduce the number of order-parameters is applied. With this, the required memory and calculation time becomes independent from the number of considered particles. To parameterize the interactions between the different particles huge dense parameter matrices are necessary. Exploiting that most interactions are the same, a concept based on different material classes is used to reduce the size of the matrices and hence the required memory. To conduct large-scale simulations, required to study different particle size distributions, the solver is parallelized based on MPI using domain decomposition. To exploit the hardware efficiently the kernels are explicitly vectorized using compiler intrinsics. The solver scales on up to 96 100 processes on the three supercomputers Hazel Hen, ForHLR I and ForHLR II and reaches a single node performance of up to 25 %.
12:15 -13:15	lunch
13:15 - 13:45	Coupled Multi-Physics Simulation Framework on Octree Data Structures Kannan Masilamani, Verena Krupp, Sabine Roller, Simulationstechnik & Wissenschaftliches Rechnen Universität Siegen Abstract Many real world engineering applications involve a heterogeneous setup with multiple physical phenomena. Often different equations need to be solved for those, resulting in the need for according numerical approximations when simulating the overall system. Interactions between the individual phenomena are either happening across the complete volume or at some surfaces, that separate distinct regions from each other. In this contribution we present a coupling mechanism for our simulation framework APES. The framework utilizes a distributed octree data structure to represent meshes in the implemented solvers. This infrastructure is used in the presented coupling tool APESmate to allow for a scalable coupling of solvers. The coupling itself is relatively generic and it is possible to translate variables into different quantities, as required for example, but the neighbor identification relies on the available discretization to avoid any bottlenecks in the parallel execution. Surface couplings are realized via boundary conditions, while volume information is provided by some additional terms in the equation system, which need to be properly described in the solver. Both cases are represented in a generic space-time function, that enables the evaluation at requested points in space and time. APESmate employs a single executable that binds together all required solvers and the coupling infrastructure to ease the deployment. The global MPI environment can then be used for the exchange between the coupled domains, while individual communicators are created for each domain to allow a nearly unmodified execution of those solvers. The algorithm and scalability of APESmate on massively parallel computers will be presented.
13:45 - 14:15	High-Order Geometry Representation for Discontinuous Galerkin Methods Nikhil Anand, Harald Klimach and Sabine Roller, Simulationstechnik & Wissenschaftliches Rechnen Universität Siegen Abstract High-order Discontinuous Galerkin (DG) methods enable accurate simulations with few degrees of freedom. DG methods can be used for the discretisation of a wide range of partial differential equations and have become increasingly popular over the last decade. Because of their high data locality, these methods are also attractive for parallel computations. Within our APES framework we develop a DG solver that relies on a dim-by-dim approach and cubical elements. To this end, a distributed octree mesh is employed, which can voxelize arbitrary geometries. One obstacle in the deployment of high-order methods is the accurate representation of boundaries with the same order. Generating meshes with high-order boundary representations by using elements with non-linear surfaces is a complex task and often not as robust as desired. An alternative approach is commonly used in spectral methods by applying additional penalization terms in the equations to enforce boundary conditions inside the computational domain. For high-order DG methods this approach offers an option to boundary treatment that avoids the need for the generation of deformed elements. This approach also allows us to maintain the numerical properties with dim-by-dim computations and simple scalings to and from reference elements. In this talk, we are concerned with a Brinkmann penalization for compressible flows that allows for the modelling of isothermal wall boundary conditions. The main idea is to model solid obstacle as a porous medium with porosity and permeability approaching zero. We discuss the implementation of this boundary condition in our DG solver, show some validation for the approach in this kind of discretization and have a look into the usability of this method for complex geometries. We believe, high-order methods can be an important strategy in the future of high-performance simulations, and the discussed boundary treatment offers a path to a wider applicability of DG methods, which have already shown excellent scaling properties in various applications and implementations.
14:15 - 14:45	break
14:45 - 15:15	Aurora Disclosed Information in SC16, and System Updates Shintaro Momose, NEC Tokyo
15:15 - 15:45	NEC Scheduler and MPI on the Aurora System Kenji Kanemura, NEC Tokyo
15:45 - 16:15	break
16:15 - 16:45	Reallabor Stuttgart Myriam Guedey, HLRS
16:45 - 17:15	The slow NVMe revolution Erich Focht, NEC/Stuttgart Abstract The presentation discusses the current state of the NVMe development in the context of applying the technology to next generation HPC post-processing problems. Current usage models and APIs are evaluated for their fitness and applicability to several possible system designs of the post-processing cluster.
17:15 - 17:45	Simulation of Turbulent Particulate Flow on HPC Systems. Konstantin Fröhlich, Institute für Aerodynamics RWTH Aachen
19:00 - 21:00	Dinner at Goldener Adler Böheimstraße 38, 70178 Stuttgart (coordinates: 48.762250, 9.164196)
Tuesday, 6 December 2016
9:00 - 9:30	Convection permitting seasonal latitude-belt simulation using the Weather Research and Forecasting (WRF) model Thomas Schwitalla, Institute of Physics and Meteorology University of Hohenheim
9:30 - 10:00	Supercomputer Benchmarks - A comparison of HPL, HPCG, and HPGMG and their Utility for the TOP500 Erich Strohmaier, LBNL Berkeley Abstract The popular biannual supercomputer ranking TOP500 is based on the HPL (High Performance Linpack) benchmark. HPL is often criticized as being outdated and not representative for today's complex architectures. In recent years two new benchmarks have been developed with the goal to provide additional data to the TOP500 and the supercomputing community in general, the HPCG ( High Performance Conjugate Gradients) benchmark and the HPGMG (High-performance Geometric Multigrid) benchmark. In this presentation we will briefly introduce these 3 benchmarks and compare their most important features. We will also look at their utility to augment the TOP500 listing.
10:00 - 10:30	Performance and Power Analysis of SX-ACE using Common Benchmark Programs Ryusuke Egawa, Yoko Isobe, Cyber Science Center, Tohoku University; NEC Tokyo Abstract This talk presents the performance and power analysis SX-ACE using common benchmarks such as HPL, HPCG, and HPGMG.
10:30 - 11:00	break
11:00 - 11:30	MRI-based computational hemodynamics in patients Andreas Ruopp, HLRS Abstract Based on an incompressible Navier-Stokes solver, pressure gradients and flow characteristics are gained in patients with a coarctation of the aorta (CoA). In detail, a promising method is established and tested using a complex standard geometry with various aortic furcations combined with a fully automated calibration scheme in a first step. Based on this calibration, outlet boundary conditions and volumetric flux can be simulated in respect to occurrent fluxes as wells as the different energy levels. In a second step four-dimensional (4D) flow-sensitive phase-contrast MRI data is used to impose correct inlet condition after the aortic valve in combination with the calibration method. The comparison of MRI with CFD data indicates a reasonable agreement. Therefore, the calibration method can be considered as an adequate method for future studies with a view to assisting diagnosis and therapy planning.
11:30 - 12:00	Spectral decomposition of nonlinear Trajectories Uwe Küster, HLRS Abstract Structural analysis of instationary nonlinear phenomena is difficult. Even if spectral analysis seems to be restricted to linear operators, it turns out that it is also applicable to nonlinear operators by turning them linear by embedding these in a much larger space. Spectral analysis will be possible. We show a numerical approach, which allows to separate different parts of instationary data in a time vanishing part and a remaining part.
12:00 - 13:00	lunch
13:00 - 13:30	A next-generation high performance healthcare infrastructure for a deep learning-based orthodontic diagnostic system in dentistry Chonho Lee, Cybermedia Center, Osaka University Abstract The talk presents an ongoing project to develop a high performance healthcare infrastructure that operates deep learning (DL) based big data analytics for orthodontic treatment in dentistry, which automate medical tasks such as diagnostic imaging, morphological feature extraction, and treatment recommendation. Due to a large amount of heterogeneous dataset including images (facial/oral photo, X-rays) and texts (doctor's note), doctors struggle against temporal and accuracy limitations when processing and analyzing those data using conventional machines and approaches. DL techniques supported by high performance computing infrastructure remove those limitations and help find unseen healthcare insights. We evaluate the practical use of DL models in medical front and show its effectiveness. The developed system dramatically reduces doctor's workload and is smoothly expanded to other departments such as otolaryngology and ophthalmology.
13:30 - 14:00	A Directive Generation Using A Code Translation Framework Kazuhiko Komatsu, Cyber Science Center, Tohoku University Abstract Even in a directive based programming, an application code becomes complex to achieve high performance on various HPC systems because different directive sets are required for each HPC system. This talk proposes a directive generation approach that generates various kinds of directive sets using a code translation framework Xevolver. Instead of several kinds of directive sets, a user writes a special placeholder that is utilized to specify a unique code pattern. Then, the special placeholder triggers generation of appropriate directives for each system using a user-defined rule with the code translation framework. Because only special placeholders are inserted in a code, the proposed approach can keep the code maintainability and readability.
14:00 - 14:30	break
14:30 - 15:00	Experiences on K computer from a topic focused on the large-scale eigenvalue solver project Toshiyuki Imamura, Tetsuya Sakura, Yasunori Futamura, Advanced Institute for Computational Science, RIKEN, Japan; University of Tsukuba, Japan Abstract Eigenvalue computation (or diagonalization) of a matrix is thought as an essential part of many scientific simulation codes. Since 2012, we have developed eigenvalue solvers on supercomputer systems to do "CODESIGN" among computer science, applied math. and simulation sciencee. In the project, EigenExa [1] for dense matrices and z-Pares [2] for sparse matrices have been developed, and the part of this project is organized as a joint work with the ESSEX project under the joint initiative of DFG with JST and ANR. In the EigenExa and z-Pares project, we mainly utilized the K-computer. For maximizing the computational power of the K-computer, we tried to break three technical walls, i) memory bandwidth, ii) network latency, iii) parallelism. Several recipes to remove the walls are using Level3 BLAS, communication avoidance, and hiding techniques, and utilizing the combination of multiple-layered programming language and framework such as MPI, OpenMP, SIMD parallelism. The EigenExa and z-Pares benchmark performs successfully on the full system of the K computer (82,944 nodes=663,552cores). It is a kind of feasibility demonstrations of capability computing on an exascale computing era. We will extend the current numerical algorithms and implementation to future HPCI systems including the post-K system. In the talk, we will address the overview and experiences of the EigenExa and z-Pares libraries. Also, we will demonstrate some results of joint works with the simulation codes such as RSDFT, NICAM-LETKF, NTChem, and other simulation codes as ELSES, Platypus QM/MM, and PHASE. [2]. EigenExa homepage, http://www.aics.riken.jp/labs/EigenExa_e.html [3]. Z-Pare homepage,http://zpares.cs.tsukuba.ac.jp/
15:00 - 15:30	Autotuning meets code transformations Hiroyuki Takizawa, Graduate School of Information Sciences, Tohoku University Abstract Practical HPC application development is often a teamwork of different kinds of programmers such as application developers and performance engineers. This talk shows a case study of using a code transformation framework, named Xevolver, which allows performance engineers to easily define their own code transformation rules for transforming a legacy code to its auto-tunable version. As a result, application developers can usually maintain the original version, and the code is transformed just before auto-tuning. If code transformation rules are properly defined by performance engineers, the application developers can benefit from auto-tuning technologies without considering the complex auto-tunable code generated by the transformation.
15:30 - 16:00	break
16:00 - 16:30	A Proposal of On-demand Staging leveraging Job Management System and Software Defined Networking Susumo Date, Cybermedia Center, Osaka University Abstract Through the daily administration of our large-scale computer systems in Osaka University, we often encounter the unhappy situation where scientists and researchers cannot utilize our computational resources for the analysis of their own privacy-rich and confidential data although they really are eager to use them. In this talk the speaker present an idea of on-demand staging leveraging SDN to minimize the exposure of such data to others.
16:30 - 17:00	Environmental (friendly) Extreme Scaling - From Exploding Volcanos to Personalized Medicine Dieter Kranzlmüller, LMU Abstract Science and research in many field depend on supercomputers, who deliver highest performance and offer large memory capacities. However, using these machines is not only challenging due to their scale without the science, it is also costly and affects the environment. As such, scaling approaches are needed, which are effective and efficient in using these machines in an environmental friendly method. We demonstrate two different examples of extreme scaling, one where the seismic activity in a volcano is simulated, and one where docking of medicine is performed. Both examples show that the best approach is a combination of hard-, soft- and brainware, where experts from computer science support the utilization of these machines by computational scientists in various application domains.
17:00 - 17:30	JAMSTEC Cyber System Current Status Ken'ichi Itakura, JAMSTEC
17:30 - 17:45	Farewell Michael Resch, HLRS

Agenda

Monday, 5 December 2016

Tuesday, 6 December 2016