+++ Teraflop Workbench +++

Scope

Program

Committee

Registration

Travel & Locations

Agenda

Thursday, 17 December 2015
9:30 - 10:00	Registration
10:00 - 10:15	Introduction Michael Resch, HLRS, University of Stuttgart
10:15 - 10:45	HLRS and Moore's law - HPC beyond 2015 Michael Resch, HLRS, University of Stuttgart
10:45 - 11:15	One-year experience with SX-ACE Hiroaki Kobayashi, Cyber Science Center, Tohoku University Abstract In my talk, I would like to present our experience with SX-ACE in terms of its operations, program developments and performance evaluation. In addition, I will try to draw the vision for the future vector computing through the experience.
11:15 - 11:45	Coffee Break
11:45 - 12:15	High resolution climate projections using the WRF model Viktoria Mohr, Thomas Schwitalla and Kirsten Warrach-Sagi, Institute for Physics and Meteorology, University of Hohenheim Abstract Considering the projections of different climate scenarios, global mean surface temperature is expected to rise over the 21st century accompanied by an increase of other weather extremes due to the past anthropogenic emissions of greenhouse gases. As the warming of many land areas is higher than on the global average, the impact of future climate conditions needs to be estimated rather on a regional scale. Thus, climate projections of spatially high resolution simulations are required in combination with their uncertainties and robustness. For a selected area, these simulations are being performed within the framework of EURO-CORDEX. ReKliEs-De (Regional Climate Ensembles Germany) is a Project which complements these simulations by providing projections of model ensembles about the development of future climate and climate extremes for Germany. With the Weather Research and Forecast (WRF) model and its land surface model NOAH we are performing simulations from 1950 to 2100 with 0.44° (~50 km) and 0.11° (~12 km) resolution, using input data of five global circulations models (GCMs) with spatial resolutions of ~1° to 2° (~100-200 km). The results of our simulations will be provided to end users such as hydrologists, for political consulting and researchers of climate consequences. Besides, our institute performed a latitude belt simulation with a resolution of 0.025° (3 km) for a 2 months period. The domain covers the area between 20°N and 65°N. The results indicate an improved precipitation forecast when going to higher resolutions as e.g. applied in EURO-CORDEX (~12km). Hence, to obtain more accurate climate projections in future, simulations require to be performed on such a spatial resolution. However, these simulations are still too expensive today regarding computational time. Another important factor is also the data amount. It can easily reach up to 100 TB or more so that this can be a limiting factor in future long-term high-resolution simulations.
12:15 - 13:15	Lunch
13:15 - 13:45	Accelerating a Risk Simulation of Heatstroke on SX-ACE Ryusuke Egawa, Daisuke Sasaki, Takeshi Yamashita, Ayumu Nishio, Akimasa Hirata, Hiroaki Kobayashi, CyberScience Center, Tohoku University Abstract In this talk, I would like to introduce our research activities regarding performance optimization of a heatstroke simulation code. Using this code,the potential of SX-ACE on executing real application will be discussed.
13:45 - 14:15	Subdomain Local FE Solver Design for Domain Decomposition Method: A Kind of On-cache Iterative Solver Hiroshi Kawai, Masao Ogino and Ryuji Shioya, Tokyo University of Science-Suwa Abstract We have been developing an open-source CAE system called ADVENTURE since 1997. This finite element code is parallelized using Domain Decomposition Method (DDM). The on-going project, HDDMPPS, funded by JST CREST program in Japan, is aimed at re-structuring this CAE package using a newly developed DDM-based parallel linear solver framework, LexADV. Here in this work, an on-cache iterative solver based on the DDM framework is developed. The subdomain local FE solver in DDM code employs pre-conditioned CG solvers, such as SSOR and ICT. By adjusting the subdomain size so that the footprint fits within the last-level cache of processor, this DDM code can be considered as a kind of an on-cache iterative solver. Some performance benchmark results are shown on various kinds of HPC platform, such as Haswell, Knights Corner and Fujitsu PRIMEHPC FX100. The implementation will be introduced to the future version of ADVENTURE.
14:15 - 14:45	PHET Adaptivity Harald Klimach, University of Siegen Abstract Adaptive strategies can offer large reductions in computational effort in many simulation settings. Well known and established are h/p-adaptive solvers in the field of finite element methods. In h/p-adaptive simulations the mesh size and scheme order are locally adapted to the solution, which allows a concentration of computational efforts to those parts of the domain, where it is actually needed. In transient simulations the time can be considered as an additional dimension in which adaptivity may be applied by locally varying the time step width. In some simulations it might even be possible to locally adapt the equation system to solve and reduce the required numerical complexity in parts of the domain. We talk about plans of combining these adaptations and leveraging them on distributed parallel systems for high order discontinuous Galerkin methods. While many of the concepts are well established and in use for smaller problems, their combination and efficient implementation on distributed parallel systems is still a major challenge, which we believe is worth to pursue.
14:45 - 15:15	Coffee Break
15:15 - 15:45	Massively Parallel Multigrid Solvers for Partial Differential Equations Andreas Vogel, Goethe-Center for Scientific Computing Abstract The mathematical modeling of scientific and industrial questions is achieved via a formulation in terms of partial differential equations for many practically important applications. In order to address the solution numerically, finite element and finite volume discretizations are employed while the geometry of the considered physical domain is resolved by unstructured grids. Such grid-based discretization lead to large sparse matrix equations that must be solved efficiently on massively parallel systems. Multigrid methods are well known for their optimal complexity for such problem classes, i.e., the computation effort only increases linearly with the problem size. This makes them a promising algorithm when focusing on the weak scaling properties for such matrix systems. However, the parallelization and implementation of these algorithms are challenging: While the strength of the multigrid algorithm is the problem size reduction on coarser grid levels, this gives rise to a potential performance bottleneck on parallel architectures, since on coarser grid levels the inner to boundary ratio of the grid parts assigned to a process becomes unpleasant. Thus, a parallel smoother on those coarse levels will suffer from the fact that mostly communication at the boundary takes place and only little computation on the inner part is performed. We present an approach that can successfully resolve this issue by gathering coarser levels to fewer processors leaving the remaining processors idle and results in a nicely weak-scaling implementation. We comment on the MPI-based programming infrastructure of our simulation code UG4 and in particular show how vertical communication structures allow this gathering process and adapt the transfer operators of the multigrid algorithm. On the coarsest level a serial base solver, e.g., LU factorization, is used given that the grid can be reduced to single process. We present scaling studies up to hundred thousands of processes for this multigrid approach that show close to optimal scalability for different kind of physical problems: simple scalar diffusion equations, drug diffusion through human skin, and PDE systems for the simulation of density driven groundwater flow.
15:45 - 16:15	C++ for CFD/CAA simulations on HPC systems Michael Schlottke-Lakemper, Matthias Meinke, Institute of Aerodynamics, RWTH Aachen University, Germany Abstract
16:15 - 16:45	Coffee Break
16:45 - 17:15	Basic Building Blocks for Sparse iterative Solvers: Trends to Exascale Gerhard Wellein, Institute of Informatics Abstract Sparse iterative solvers are knwon for their low computational intensities and their potentially highly irregular data access patters. The efficient implementation of such solvers faces challenges on all hardware levels of modern supercomputer, ranging from SIMD/SIMT vectorization at the core level towards large scale load balancing. Within the ESSEX project (funded by DFG SPPEXA) a basic building block library (GHOST) has been implemented which is expected to provide efficient heterogeneous parallel building blocks for a selected number of sparse eigensolvers. This talk reports about basic concepts implemented in GHOST, which we expect to be relevant for the generations of supercomputer architectures to come. Moreover, we will briefly discuss the potential of selected iterative eigenvalue solvers which build on GHOST and have been shown to deliver high performance on large scale (heterogeneous) supercomputers such as the CRAY XC30 system at CSCS Lugano. The ESSEX project: http://blogs.fau.de/essex/ GHOST library: https://bitbucket.org/essex/ghost
17:15 - 17:45	Xevtgen: automatic generation of code transformation rules based on before-and-after codes Hiroyuki Takizawa, Shoichi Hirasawa, and Reiji Suda, Graduate School of Information Sciences, Tohoku Abstract
19:00 - 21:00	Dinner at Restaurant "Vinum im Literaturhaus" Breitscheidstr. 4, 70174 Stuttgart (coordinates: 48.779735, 9.168220)
Friday, 18 December 2015
9:00 - 9:30	New 3D histomorphometric features of cancellous bone extracted by direct numeric simulation Ralf Schneider, HLRS, University of Stuttgart Abstract To develop the continuum mechanical simulation of bone-implant-compounds like intramedullary nailing systems towards a real prediction tool, which is beneficial in everyday clinical work, it is absolutely necessary not only to simulate the load situation right after implantation, but to predict the outcome of the surgery, from the planning phase onwards, until the stagnation of bone remodeling within the fracture zones is reached. Since an essential factor for the bone remodeling is the activation of bone tissue by mechanical stimuli and bone remodeling models on the continuum mechanical scale estimate the development of the bone tissue based on local mechanical stimuli, that is to say the local strain field within the bone, it is necessary to use advanced material modelling to derive the local strain field as precise as possible.As the only data available from a patient undergoing orthopedic surgery are clinical imaging data mainly resulting from clinical computer tomography (clinical-CT), all material parameters have to be derived from these data. The latest developments in material modelling of cancellous bone based on clinical-CT data use database approaches [1]. These methods are based on the actually measured density distribution of a bone and its comparison to a database where density fields of many bones are connected to advanced information about histomorphometric parameters and the corresponding material properties derived from microfocus computer tomography (micro-CT). The method to derive continuum mechanical material properties of micro structured materials via the direct mechanics approach was first presented by Hill in 1963 [2] and shown to by applicable to cancellous bone by Rietbergen in 1995 [3]. Even though these methods are not new and the mentioned database methods show promising results, the evaluation of the micromechanical parameters and the corresponding material properties is, to our best knowledge, limited to only a few selected locations within the analyzed bones, in all published studies dealing with this method. The question which remains unanswered is, whether it is possible to derive microstructural parameters not only at a few selected locations but for complete bone structures and whether the evaluation of the resulting high dimensional parameter fields would deliver new information which can be used to further enhance the precision of continuum mechanical, anisotropic material modelling of cancellous bone. The work to be presented addresses this question by the detailed micro mechanical analysis of cancellous bone not only at special landmarks but continuously over complete bone structures. By analyzing different resolutions of the continuum, mechanical scale new three dimensional histomorphometric features of cancellous bone are derived and their connections to the bone’s elastic properties and density distribution are evaluated. As an introduction a short review about the direct mechanics approach towards the determination of continuum mechanical material parameters of micro structured materials is given. Along with this introduction, it is derived how this method can be utilized to extract histomorphometrical features of cancellous bone from its results. After that, the implementation of the process chain which enables the calculation of three dimensional continuum mechanical, anisotropic material fields of arbitrary resolution on the basis of micro-CT scans for complete bone structures is explained. Since this process chain strongly depends on the utilization of high performance computing resources, the implications of its execution on these resources are shortly highlighted. Finally, the results of a human femoral head analyzed by the presented method on three different continuum mechanical resolutions are discussed. Especially the detailed analysis of the correlations between the different components of the effective material tensor and how these correlations are influenced by the chosen resolution of the anisotropic material field will be addressed. Further on it will be demonstrated that a parameter reduction of the 21 dimensional field of anisotropic material constants can be achieved by a combination of hierarchical agglomerative cluster analysis, discriminant analysis and principal component decomposition. Based on this result it will be discussed to what extend the variance of the anisotropic material parameters within the analyzed femoral head can be covered by the reduced parameterization. References 1. Hazrati Marangalou J., et al., A novel approach to estimate trabecular bone anisotropy using a database approach. Journal of Biomechanics. 2013 Sep;46(14):2356-62. DOI:10.1016/j.jbiomech.2013.07.042. 2. Hill R., Elastic Properties of Reinforced Solids: Some Theoretical Principles. Journal of the Mechanics and Physics of Solids. 1963;11:357-372. 3. van Rietbergen B., et al., A new Method to determine trabecular Bone Elastic Properties and loading using Micromechanical Finite-Element Models. Journal of Biomech
9:30 - 10:00	Simulation of flows: present status and future applications @ AIA Matthias Meinke, Lennart Schneiders, Stephan Schlimpert, Vladimir Statnikov, Institute of Aerodynamics, RWTH Aachen University, Germany Abstract
10:00 - 10:30	Coffee Break
10:30 - 11:20	NEC SX-ACE Vector Supercomputer and Its Successor Product Plan Shintaro Momose, Toshikazu Aoyama, NEC, Tokyo Abstract NEC gives an overview of the latest model vector supercomputer SX-ACE with several performance evaluations. NEC also provides the plan for a successor product of SX-ACE. NEC will talk about the vision and concept of its future vector architectural product, which is aimed at not only conventional high-performance computing but also emerging big data analytics applications.
11:20 - 11:50	Fast Transformations for 3D DG Polynomials Verena Krupp, University of Siegen Abstract High order discontinuous Galerkin schemes often make use of a polynomial basis to represent the solution in elements. Special choices of polynomial representations allow the efficient computation of special operations. For an optimal computation it therefore is necessary to change the representation in the course of the simulation. Fast mechanisms to convert Polynomials exist. However they often require a higher one-dimensional degree to achieve the fast computing time. We investigate different strategies and their suitability for polynomials in 3D DG simulations and the parallel implementation with OpenMP.
11:50 - 12:50	Lunch
12:50 - 13:20	Parallel Adaptive Multigrid for Processes from Science and Engineering Gabriel Wittum, Goethe-Center for Scientific Computing Abstract Numerical simulation has become one of the major topics in Computational Science. To promote modelling and simulation of complex problems new strategies are needed allowing for the solution of large, complex model systems. Crucial issues for such strategies are reliability, efficiency, robustness, usability, and versatility. After discussing the needs of large-scale simulation we point out basic simulation strategies such as adaptivity, parallelism and multigrid solvers. To allow adaptive, parallel computations the load balancing problem for dynamically changing grids has to be solved efficiently by fast heuristics. These strategies are combined in the simulation system UG4 ("Unstructured Grids") being presented in the following. In the second part of the seminar we show the performance and efficiency of this strategy in various applications. In particular, large scale parallel computations of density-driven groundwater flow and contaminant transport coupled with radioactive decay are discussed in more detail with an emphasis on validation vs. specially designed experiments. Load balancing and efficiency of parallel adaptive computations is discussed and the benefit of combining parallelism and adaptivity is shown. We will further discuss different parallel architectures used in high performance scientific computing and their suitability for different problems, in particular we discuss the use of grid computing for computational science and show results of applications.
13:20 - 13:50	Numerical Investigation of Film Cooling in Laminar and Turbulent Supersonic Boundary-Layers Michael Keller, Institute for Aero- and Gasdynamics, University of Stuttgart Abstract The hot combustion gases of next-generation rocket engines generate heat loads that exceed the temperature limits of today's available materials. A promising approach to protect the thermally stressed nozzle liner is the blowing of a coolant into the hot main flow. In order to improve the understanding of the highly complex mixing phenomena, cooling-gas blowing into generic supersonic flat-plate boundary-layer air-main flows with zero pressure gradient is investigated by means of direct numerical simulations. Argon, air, and helium are employed as cooling gases and are injected in the wall-normal direction through a single infinite spanwise slit into the laminar or turbulent boundarylayer flow. The blowing is realized either by prescribing a fixed distribution of the cooling-gas mass flux, mass fraction, and temperature at the orifice location (modeled blowing), or by including the blowing channel with a constant plenum pressure, mass fraction, and temperature at its lower end (simulated blowing). The latter allows for an interaction of the main and cooling-gas flow, but it is generally more costly due to domain coupling and higher grid-resolution requirements. An evaluation of the blowing modeling is important for the development and use of standard design tools and conventional CFD methods. In the talk, we (i) elaborate the influence of the boundary layer state, the blowing modeling, and the cooling-gas type on the film-cooling effectiveness, and (ii) address computational aspects with respect to the latest vector supercomputer at HLRS, NEC-SX-ACE.
13:50 - 14:20	Coffee Break
14:20 - 14:50	Migration of a Large-scale Code to an OpenACC Platform Using a Code Transformation Framework Kazuhiko Komatsu, Cyber Science Center, Tohoku University Abstract As the diversity of HPC systems increases, even legacy HPC applications often need to use accelerators for higher performance. To migrate large-scale legacy HPC applications to modern HPC systems including accelerators, OpenACC is a promising approach because its directive-based approach can prevent drastic code modifications. This work shows a case study of the migration of a large-scale simulation code to an OpenACC platform by keeping the maintainability of the original code. Although OpenACC enables an application to use accelerators by adding a small number of directives, it requires modifying the original code to achieve a high performance in most cases, and tends to degrade the maintainability. This work adopts a code transformation framework, Xevolver, to avoid such code modifications. Instead of directly modifying the code, custom code transformation rules and custom directives are defined using the Xevolver framework.
14:50 - 15:20	HPC Benchmarking Vladimir Marjanovic, HLRS, University of Stuttgart Abstract The work compares three different benchmarks HPL(Linpack), HPCG(Conjugate Gradient) and HPGMG(Geometric Multigrid) which aim to become a reference benchmark for the TOP500 list. Together HPCG and HPL can be interpreted as bookends on the range of performance a collection of applications can experience. HPL is heavily compute bound kernel where a Byte/FLOP ratio is close to 0, while HPCG is heavily memory bound kernel where the Byte/FLOP ratio is larger than 4. On the hand HPGMG has Byte/FLOP ratio larger than 1 thus represents real applications. The work overviews performance results of three different benchmarks in terms of Byte/FLOP metric and communication, and also compares NEC-ACE and Intel-Haswell nodes by using HPGMG. The NEC-ACE node reaches 24% of the peak performance which is the best result among all tested platforms for the HPGMG reference code without any hand-tuned optimization.
15:20 - 15:50	Coffee Break
15:50 - 16:20	Large scale phase-field simulation for ternary eutectic directional solidification Johannes Hötzer, KIT Abstract High performance materials with defined properties are crucial for the development and optimization of existing applications. During the directional solidification, ternary eutectic alloys form a wide range of patterns which are related to their mechanical properties. To study the underlying physical processes of microstructure evolution, large scale phase-field simulation, based on a grand-potential approach, allow to gain new insights in the underling pattern formation processes. Building on the waLBerla framework, a highly optimized solver for finite differences on a regular grid was developed. The solver was optimized at various levels, starting from the model and parameters, down to the hardware. On the model side, we exploit the physical properties by employing a moving window technique and classification techniques to calculate only the relevant terms. In addition, buffering techniques are employed to reuse multiple required values. The 3D domain is parallelized using the message passing interface on a block structured grid. The kernels are explicitly vectorized using compiler intrinsic. The solver performs with excellent scaling behaviour on all three German Tier1 supercomputer, SuperMUC, JUQUEEN, and Hazel Hen and reaches 25% of the peak FLOP rate at a SuperMUC node.
16:20 - 16:35	Farewell Michael Resch, HLRS, University of Stuttgart

Agenda

Thursday, 17 December 2015

Friday, 18 December 2015