


The Energyoriented Centre of Excellence (EoCoEII)
This is the first newsletter of project EoCoEII, started in January 2019. The newsletter presents the challenges tackled in the project as well as a general description of the topics addressed in its Workpackages. The present project EoCoEII will draw on a successful proofofprinciple phase of EoCoEI, where a large set of diverse computer applications in the energy domain achieved significant efficiency gains thanks to a multidisciplinary approach with experts in applied mathematics and supercomputing. EoCoEII will channel its efforts into five scientific Exascale challenges in Energy Meteorology, Materials, Water, Wind and Fusion. This multidisciplinary effort will harness innovations in computer science and mathematical algorithms within a tightly integrated codesign approach.








Exascale Science Challenges in Energy Research
A new approach to simulating morphologies of organic semiconductors
A new approach to modeling molecular arrangements in films has been developed by Alison Walker's group at the University of Bath [1]. These arrangements determine charge and energy transport in solar cells made from organic semiconductors and in the charge transport layers in perovskite solar cells. These class of cells have a wide variety of production processes, a large range of candidate organic compounds, and a multitude of possible structural configurations. Having a predictive computer modeling of morphology and chargetransport characteristics provides an inexpensive platform for the design and testing of new devices. The team at Bath, which is part of the Materials for Energy challenge, has developed a method intended for the rapid generation of structurally independent atomistic configurations of molecular and polymeric systems.

Events
ISOPHOS 2019 summer school
The leader of the Bath team, Alison Walker is a coorganiser of the summer school ISOPHOS 2019 aimed postgraduate students and postdocs [2]. The summer school will be held from the 2nd till 6th of September 2019 in the wonderful atmosphere of Castiglione della Pescaia (Tuscany  Italy), an ancient seaside town grew around a medieval fortress. The school focuses on recent advances in science and technology of organic and hybrid photovoltaic devices, including polymers, perovskites, dye solar cells and the use of graphene and other 2D materials for energy applications. Both experimental and theory/simulation descriptions of the organic and hybrid PV will be presented and there are handson experimental and modeling sessions.

Symposium “Towards Exascale Supercomputers for Nanotechnology” at NanoInnovation 2019
The team at ENEA, which is part of the Materials for Energy challenge, organizes a symposium on the topic “Towards Exascale Supercomputers for Nanotechnology”. The symposium will be held in Rome on the 12th of June within the NanoInnovation 2019 conference [3], and will offer an exciting opportunity to bring together researchers in materials science and computer science to discuss new approaches and explore new collaborations in the theoretical discovery of materials. A new generation of supercomputers is on the horizon which will be capable of achieve computational capabilities in the range of ExaFLOPs. Such technological advance will have a tremendous impact on materials science paving the way to a new kind of interaction between theory and experiment, with the potential to accelerate materials discovery and meet the increased demand for taskspecific materials.







Programming Models
Work Package 2 (WP2) – Programming models focus on performance evaluation, code refactoring and optimization. Toward the goal to make EoCoE applications running on pre Exascale machines, WP2 will ensure that the required level of performance is reached to simulate large scientific simulation cases on forthcoming computing platforms. WP2 has HighPerformance Computing (HPC) experts that will help the scientific teams to achieve their optimization target. The Erlangen Regional Computing Center is one of our key partners here. Performance evaluation workshops and hackathons will be organized all along the project life with them and in collaboration with the Performance Optimisation and Productivity Center of Excellence (PoP). All the codes involved in the project are not equally represented in the WP2.

Alya is the flagship code of the Wind Scientific Challenge (SC) developed at Barcelone superComputing Center (BSC). The goal is to perform Large Eddy Simulation (LES) formulation for wind farm planning including an accurate rotor modeling with a rotating mesh and complex terrain. This level of complexity requires a high degree of computing performance. Optimization work includes improvement of the CPU (Central Processing Unit) implementation at node level, extension of the GPU (Graphic Processing Unit) version and heterogeneous coexecution, better load balancing strategy and asynchronous behavior when overlaping the computation and communication tasks. The simulations will be compared with the WaLBerla actuator line code developed at IFPEN.



Image courtesy of A. Gargallo et al.  BSC, Spain


EURADIM is one of the flagship code of the Meteorology Scientific Challenge. This code is developed at Forschungszentrum Jülich GmbH and is dedicated to the modeling of chemical components and aerosol transport in the atmosphere. The code will be refactored for better performance at node level, better vectorization and memory management. A hybrid CPU/GPU implementation using MPI + OpenMP or OpenACC will be explored to run on forthcoming GPU clusters.



Image courtesy of H. Elbern et al.  FZJ, Germany


PVnegf is one of the flagship code of the Material Scientific Challenge. It provides photocarrier dynamics (generation, transport and recombination) of nanostructured regions and at complex interfaces in advanced highefficiency solar cell devices. The code PVnegf requires a complete rewrite in order to keep the algorithmic implementation separated from multiple backends and enable different parallel executions on distinct computing architectures. The successful 1D prototype will be extended and optimized to work both on CPUbased and on GPUbased machines.



Image courtesy of M. Salanne  MdlS, France


ParFlow is one of the flagship code of the Hydrology Scientific Challenge. ParFlow simulates surface and 3D subsurface flow using porous modeling. It will be used for hydropower simulations over the European continent. Adaptive Mesh Refinement implementation started in the last project EoCoE and will be improved and extended for production runs. The code will be optimized and a GPU version of the solvers will be added.



Image courtesy B. Naz and S. Kollet et al.  FZJ, Germany


Gysela is the flagship code of the Fusion Scientific Challenge developed at CEA, IRFM. It solves Vlasov equations using 5D fullf and fluxdriven gyrokinetic models and Poisson equations to simulate plasma turbulence and transport in Tokamak devices. Gysela will be rewritten to be modernized in a new version called GyselaX. The modernization will include a cleaning of the current implemented modules, performance optimization and a new more realistic discretization mesh.

Workpackage 2 will work jointly with Workpackage 4 on Input/Output and Workpackage 5 on Ensemble Runs concerning the Parallel Data Interface (PDI) integration.



Image courtesy of G. DifPradalier  CEAIRFM, France



Scalable Solvers
Solving large and sparse linear systems is a core task in four out of five EoCoE II Scientific Challenges (SCs). The notion of "large" is qualitative and there is a clear need to increase it. Current EoCoE II SCs solve systems with millions or even billions of degree of freedoms (DOFs) and thus the availability of exascaleenabled solvers is fundamental in preparing the SC applications for the new exascale ecosystem.

The method of choice to efficiently solve the above systems on modern highperformance computers are the iterative Krylov methods, whose convergence and scalability properties are strictly related to the choice of a suitable preconditioning technique. Multigrid methods are among the most eﬃcient numerical preconditioners for solving large systems of equations. In particular, they are optimal, in the sense that their computational cost grows linearly with the number of unknowns, when the linear systems come from the discretization of some standard elliptic partial diﬀerential equations. We focus on the Algebraic MultiGrid (AMG) approach, which, unlike the geometric one, does not make explicit use of information about the problem which the linear system comes from, but exploits only the system matrix, with the goal of achieving methods that can be applied to wide classes of problems. The linear complexity of AMG preconditioners generally translates in algorithmic scalability, i.e., the number of iterations of AMGpreconditioned Krylov solvers does not depend on the size of the problem. This allows efficient parallel implementations, for example through a data partitioning approach, in which rows of the matrix are assigned to diﬀerent computing nodes. Because of their linear complexity and algorithmic scalability, AMG preconditioners are expected to be methods of choice in the emerging exascale scenario.

The diﬀusion of General Purpose Graphics Processing Units (GPGPUs), currently found in many of the fastest supercomputers in the Top 500 list requires the exposition of a high degree of parallelism, because GPUs are highthroughput manycore processors. Since AMG methods are obtained by combining diﬀerent components (smoother, coarsening algorithm, coarsestlevel solver, restriction and prolongation operators), a full exploitation of GPU capabilities requires each component to be optimized for this type of architecture. We focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of AMG smoothers and coarsestlevel solvers capable of harnessing the computational power oﬀered by a cluster of GPUs. In our work, the smoothers and local solvers are used within the AMG framework oﬀered by the MLD2P4 package [1] of preconditioners and exploiting sparse matrix data structures and Krylov solvers from the PSBLAS library [23]. We conducted weak scalability tests, keeping approximately 16 million DOFs per GPU, on the Piz Daint supercomputer, operated by the Swiss National Supercomputing Centre and available for our CoE through the PRACE Research Infrastructure. We obtain good weak scalability (see Figs.) on up to 512 GPUs and more than 8x109 DOFs for linear systems arising from a groundwater modelling application developed at the Jülich Supercomputing Centre (JSC) [4]; an almost constant time per iteration ranging from 0.1 to 0.13 sec. shows a very good implementation scalability of the preconditioned Krylov solver.




[1] P. D’Ambra, D. di Seraﬁno, and S. Filippone. MLD2P4:a package of parallel algebraic multilevel domain decomposition preconditioners in Fortran 95. ACM Trans. Math. Softw., 37(3):7–23, 2010.

[2] S. Filippone and A. Buttari. Objectoriented techniques for sparse matrix computations in Fortran 2003. ACM Trans. Math. Softw., 38(4):23:1–23:20, 2012.

[3] S. Filippone, V. Cardellini, D. Barbieri and A. Fanfarillo: Sparse matrixvector multiplication on GPGPUs, ACM Trans. Math. Softw., 43(4): 30:1–30:49, 2017.

[4] J.E. Aarnes, T. Gimse, and K.A. Lie. An introduction to the numerics of ﬂow in porous media using Matlab. In G. Hasle, K.A. Lie, and E. Quak, editors, Geometric Modelling, Numerical Simulation, and Optimization, pages 265–306. Springer, 2007.



I/O and Data Flow
I/O data handling and the general data flow within an HPC application can become a significant bottleneck while improving the overall scalability. Leveraging new types of storage devices and optimized usage of I/O libraries can help to keep the I/O overhead on a minimum.

As part of the EoCoE II project, work package 4 tries to target the main I/O and data oriented bottlenecks, which prevent further scaling, within the various energy oriented application codes of the scientific challenges:

 Improvement of I/O accessibility: Different I/O libraries support a variety of different configuration options. Depending on the situation these options must be continually updated, or a complete new library must be adapted. We want to introduce a generic interface with the help of the Portable Data Interface (PDI), which decouples the I/O API and the application to allow easier switching between different I/O subsystems. These interface should either serve standard I/O operation but should also be useful in context of ensemble or insitu visualisation data movement.
 I/O performance: The data writing and reading time can consume a significant part of the overall application runtime and should be minimized. For this we want to leverage the optimization options of different I/O libraries in use as well by adapting intermediate storage elements such as flash storage devices.
 Resiliency: Running an application on a large scale increases the chance of hard or software problems if more and more computing elements are involved in the calculation. Additional I/O techniques can be used to reduce the effort needed to restart a broken run or even avoid an overall crash, by storing intermediate snapshots to the storage elements. In particular we want to focus on resiliency for ensemble calculations.
 Data size reduction: Running an application on a larger scale often implies an increasing data size, which can become unmanageable and consume too much resources. Within this task we want to reduce the overall data size without losing necessary information via insitu and intransit processing, moving postprocessing elements directly into the frame of the running application.









