The Distributed ASCI Supercomputer

The Distributed ASCI Supercomputer The third generation Dick Epema (TUD) (with many slides from Henri Bal) Parallel and Distributed Group

Distributed ASCI Supercomputer • Joint infrastructure of the ASCI research school • Clusters integrated in a single distributed testbed • Long history and continuity DAS-1 (1997) DAS-2 (2002) DAS-3 (2006)

DAS is a Computer Science grid • Motivation: CS needs its own infrastructure for • Systems research and experimentation • Application experiments • DAS is simpler and more homogeneous than most production grids • Single operating system • “A simple grid that works’’

DAS-3: overall structure • 272 AMD Opteron nodes • 792 cores, 1TB memory • more heterogeneous: • 2.2-2.6 GHz single/dual core nodes • Myrinet-10G (not in Delft) • Gigabit Ethernet UvA/MultimediaN (46) VU (85 nodes) UvA/VL-e (40) SURFnet6 Operational: oct. 2006 10 Gb/s lambdas TU Delft (68) Leiden (32)

Cluster configuration

Projects using DAS-3 • Virtual Lab for e-Science • Grid computing, scheduling, workflow, PSE, visualization • MultimediaN • Searching, classifying multimedia data • NWO projects, e.g., StarPlane and GUARD-G • NCF projects (off-peak hours) • And many more (P2P, …)

CPU’s R CPU’s R CPU’s R NOC CPU’s R CPU’s R Projects using DAS: StarPlane • Key idea: • Applications can dynamically allocate light paths • Applications can change the topology of the wide-area network, possibly even atthe sub-second timescale • VU (Bal, Bos, Maassen) • UvA (de Laat, Grosso, Xu, Velders)

Projects using DAS: GUARD-G • How to turn grids into a predictable utility for computing (much like the telephone system) • Problems: • Predictability of workloads • Predictability of system availability (grids are faulty!) • Allocation of light paths very useful here • TU Delft (Epema) + Leiden (Wolters)

Projects using DAS: KOALA, a co-allocating grid scheduler • Main goals: • processor co-allocation: (non-)fixed/flexible jobs • data co-allocation: move large input files to the locations where the job components will run prior to execution • load sharing: in the absence of co-allocation • run alongside local schedulers • KOALA • is written in Java • uses Globus components (e.g., GRAM, RSL and GridFTP) • has been deployed on the DAS2 in september 2005

Status DAS3 clusters • Delft cluster: accepted, up and running • VU, UvA-MM: acceptance this week • UvA, Leiden: acceptance this year

The Distributed ASCI Supercomputer

The Distributed ASCI Supercomputer

Presentation Transcript

ASCI Red Math Libraries

SUPERCOMPUTER TO THE RESCUE

BlueGene/L Supercomputer

The First 16 Years of the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam

Distributed I/O with ParaMEDIC : Experiences with a Worldwide Supercomputer

TITAN SUPERCOMPUTER

The IBM Roadrunner Supercomputer

The Distributed ASCI Supercomputer (DAS) project

ASCI-00-003.1

Supercomputer Performance Characterization

The Distributed ASCI Supercomputer (DAS) project

Opening Workshop DAS-2 ( Distributed ASCI Supercomputer 2) Project

Stanford Streaming Supercomputer

FreeSurfing on the Supercomputer

Hitachi SR8000 Supercomputer

The BlueGene/L Supercomputer

Supercomputer performance

DOE ASCI TeraFLOPS

Griding the Nordic Supercomputer Infrastructure

The Distributed ASCI Supercomputer (DAS) project

Operational Machines: ASCI White

DAS 1-4: 14 years of experience with the Distributed ASCI Supercomputer