50 likes | 189 Views
Leibniz Supercomputing Centre Garching/Munich. Matthias Brehm HPC Group. Leibniz Supercomputing Centre (LRZ) Bavarian Academy of Sciences and Humanities. Computing Centre (~175 employees) for Munich Universities All kinds of IT services and support Capacity computing, (Virtual) servers
E N D
Leibniz Supercomputing CentreGarching/Munich Matthias BrehmHPC Group
Leibniz Supercomputing Centre (LRZ)Bavarian Academy of Sciences and Humanities • Computing Centre (~175 employees) for Munich Universities • All kinds of IT services and support • Capacity computing, (Virtual) servers • Regional Computing Centre for all Bavarian Universities • Capacity computing • Backup and Archiving Centre (more than 7 petabytes, 5.5 billion files) • Competence Centre (Networks, IT Management) • National Supercomputing Centre • Integrated into Gauss Centre for Supercomputing = (JSC, HLRS + LRZ) • Legal entity for acting in Europe • High End System (62 TF, 9726 cores) • Linux Cluster (45 TF, 5000 cores) • Grid Computing • Active in DEISA and PRACE (1IP) • WP8 (WP9) Leadership: Future Technologies • Current Procurement: Multi-PetaFlop System: End of 2011 • Contract in 2010 • General Purpose System (Intel or AMD based) of Thin and Fat Shared Memory Nodes • Doubling of Computer Cube, Cave & Visualization, new office space
HPC research activities • IT Management (Methods, Architectures, Tools) • Service Management: Impact Analysis, Customer Service Mgmt, SLA Mgmt, Monitoring, Process Refinement • Virtualization • Operational Strategies for Petaflop Systems • Grids • Middleware (IGE, Initiative for Globus in Europe, Project Leader): services, coordination, provisioning • Grid Monitoring (D-MON, Resources of gLite, Globus, Unicore) • Security and Intrusion Detection, Meta-Scheduling, SLA • Computational Science • Munich Computational Sciences Centre (MCSC) & Munich Centre of Advanced Computing (MAC): TU Munich, Univ. Munich, Max-Planck-Society Garching • New Programming Paradigms for Petaflop Systems • Energy efficiency • (Hot water) Cooling & Reuse (heating of buildings) • Scheduling, sleep mode of idle procs etc. • Automatic performance analysis and system-wide performance monitoring • Network Technologies & Network Monitoring • Long-Term Archiving • Talks/Activities with Russia • LSU Moscow: Coop. Competence Network of HPC & Bavarian Graduate School of Comp. Engin.: joint courses, applications in physics, climatology, quantum chemistry, drug design • Steklov Inst. / State Univ. St. Petersburg: Joint Advanced Student School (JASS): Modelling and Simulation • T-Platforms: Cooling technology, energy-efficiency
Specific research ideas for collaboration • Programming modelsandruntimesupport • PGAS (partitioned global address space) – Coarray Fortran CAF (or UPC) • Re-implement an essential infrastructurelibrary – e.g. ARPACK in CAF • sparsemightbe a goodcandidateforloadbalancing • Implement a microbenchmarkset • measureQoImpl. vs. OpenMP / MPI • measureQoImpl. formessageoptimization (messageaggregation etc.) • Investigate potential of interoperabilitybetween CAF and UPC, CAF and OpenMP, CAF and MPI • whatisfeasible? whatisn‘t? • standardsdon‘tmention this anywhere (yet) • Develop Fortran classlibrariesfor parallel patterns • presentlytheonly „OO“ andsimultaneously parallel language • User Training • ScalableVisualisation Infrastructure • Highly scalable visualisation service for HPC • Remote visualization, virtualization • location-independent, instant, and cost-effective framework for the analysis of HPC simulation results • resource allocation, account management, data transfer and data compression, advance reservation and quality of service
Specific research ideas for collaboration • Energy Efficiency • Scheduling • Dynamic clock adjustment of CPU (and Memory) • Monitoring and Tuning of energy fluxes • Cooling technologies, energy reuse • Performance analysis tools for HPC • Automatic performance monitoring and analysis • System-wide background monitoring • Hardware performance counters, communication behaviour, I/O • Automatic bottleneck detection • (System) Monitoring • By Using Map-Reduce-Techniques • Optimisation, scalability and porting of codes • Scalable and dynamical mesh generation & load balancing • More than parMetis • Application areas: geophysics, cosmology, CFD, multi-physics