1 / 14

High Performance Distributed Computing

High Performance Distributed Computing. Henri Bal Vrije Universiteit Amsterdam. Outline. 1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle 4. Conclusions. Developments. Multiple types of data explosions :

tausiq
Download Presentation

High Performance Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Distributed Computing Henri Bal VrijeUniversiteit Amsterdam

  2. Outline 1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle 4. Conclusions

  3. Developments • Multiple types of data explosions: • Big data: huge processing/transportation demands • Complex heterogeneous data LOFAR: ~15 PB/year SKA: >300 PB/year, exascale processing Complex data

  4. Developments • Infrastructure explosion • High complexity: heterogeneous systems with diversity of processors, systems, networks

  5. VU HPDC GROUP • Bridge the gap between demanding applications and complex infrastructure • Distributed programming systems for • Clusters, grids, clouds • Accelerators (GPUs) • Heterogeneous systems (``Jungles”) • Clouds & mobile devices • Applications: multimedia, semantic web, model checking, games, astronomy, astrophysics, climate modeling ….

  6. Highlights VU-HPDC group Solved Awari 2002 DACH 2008 - BS DACH 2008 - FT AAAI-VC 2007 3rd Prize: ISWC 2008 1st Prize: SCALE 2008 1st Prize: SCALE 2010 EYR 2011Sustainability award

  7. Links to data science cycle Reasoning Knowledge representation Multimedia Retrieval Modelingandsimulation Machine Learning Information Retrieval DecisionTheory Perception Cognition Visual Analytics Distributed Processing Large Scale Databases Software Eng. System / Network Eng. Distributed reasoning Jungle computing MapReduce

  8. Reasoning – Semantic Web • Make the Web smarter by injecting meaning so that machines can “understand” it. • initial idea by Tim Berners-Lee in 2001 • Now attracted the interest of big IT companies

  9. Google Example

  10. Google Example

  11. Distributed Reasoning • WebPIE: web-scale distributed reasoner doing full materialization • QueryPIE: distributed reasoning with backward-chaining + pre-materialization of schema-triples • DynamiTE: maintains materialization after updates (additions & removals) • Challenge: real-time incremental reasoning on web scale, combining new (streaming) data & existing historic data With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen Commit/

  12. Glasswing: MapReduceon Accelerators • Use accelerators as a mainstream feature • Massive out-of-core data sets • Scale vertically & horizontally • Code portability using OpenCL • Maintain MapReduce abstraction With: Ismail El Helw, RutgerHofman

  13. Glasswing Pipeline • Overlaps computation, communication & disk access • Supports multiple buffering levels

  14. Evaluation of Glasswing • Glasswing uses CPU, memory & disk resources more efficiently than Hadoop • Compute-bound applications benefit dramatically from GPUs • Better scalability than Hadoop • Runs on a variety of accelerators • E.g. k-means clustering: • 8.5× (1 node) vs.15.5 × (64 nodes) vs. 107 × (GPU node)

More Related