Compiler (and Runtime) Support for CyberInfrastructure

Compiler (and Runtime) Support for CyberInfrastructure Gagan Agrawal (joint work with Wei Du, Xiaogang Li, Ruoming Jin, Li Weng)

What is CyberInfrastructure ? • How computing is done is changing with advances in internet and emergence of web • Access web-pages, data, web-services from the internet • What does it mean in terms of large scale computing • Supercomputers are no longer stand-alone resources • Large data repositories are common

What is CyberInfrastructure ? • Infrastructures we are familiar with • Transportation infrastructure • Telecommunication infrastructure • Power supply/distribution infrastructure • CyberInfrastructure means large scale computing infrastructure on the internet • Enable sharing of resources • Enable large-scale web-services • Access and process a 1 tera-byte file as a web-service • Run a job on a large supercomputer using your web-browser !

CyberInfrastructure • CyberInfrastructure is also a new division with CISE directorate of National Science Foundation • Shows the importance • Needs new research at all levels • Networking / parallel computing hardware • System software • Applications

Why is Compiler Support Needed for Cynerinfrastructure ? • Compilers have often simplified application development • Application development for Cyberinfrastructure is a hard problem !! • We need transparence to different resources • We need transparence to different dataset sources and formats • We need applications to adapt to resource availability • ….

Outline • Compiler supported Coarse-grained pipelined parallelism • Why ? • How ? • XML Based front-ends to scientific datasets • Compiler support for application self-adaptation • A SQL front-end to a grid data management system

General Motivation • Language and Compiler Support for Parallelism of many forms has been explored • Shared memory parallelism • Instruction-level parallelism • Distributed memory parallelism • Multithreaded execution • Application and technology trends are making another form of parallelism desirable and feasible • Coarse-Grained Pipelined Parallelism

Range_query Find the K-nearest neighbors Coarse-Grained Pipelined Parallelism(CGPP) • Definition • Computations associated with an application are carried out in several stages, which are executed on a pipeline of computing units • Example — K-nearest Neighbor Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and a point  = (a, b, c). We want to find the nearest K neighbors of  within R.

Coarse-Grained Pipelined Parallelism is Desirable & Feasible • Application scenarios data data data data Internet data data data

Coarse-Grained Pipelined Parallelism is Desirable & Feasible • A new class of data-intensive applications • Scientific data analysis • data mining • data visualization • image analysis • Two direct ways to implement such applications • Downloading all the data to user’s machine – often not feasible • Computing at the data repository - usually too slow

Coarse-Grained Pipelined Parallelism is Desirable & Feasible • Our belief • A coarse-grained pipelined execution model is a good match data Internet data

Coarse-Grained Pipelined Parallelism needs Compiler Support • Computation needs to be decomposed into stages • Decomposition decisions are dependent on execution environment • How many computing sites available • How many available computing cycles on each site • What are the available communication links • What’s the bandwidth of each link • Code for each stage follows the same processing pattern, so it can be generated by compiler • Shared or distributed memory parallelism needs to be exploited • High-level language and compiler support are necessary

Decomposition Code Generation DataCutter Runtime System An Entire Picture Java Dialect Compiler Support

Language Dialect • Goal • to give compiler information about independent collections of objects, parallel loops and reduction operations, pipelined parallelism • Extensions of Java • Pipelined_loop • Domain & Rectdomain • Foreach loop • reduction variables

RectDomain<1> PacketRange = [1:4]; Pipelined_loop (b in PacketRange) { 0. foreach ( …) { … } 1. foreach ( …) { … } … … n-1. S; } Merge ISO-Surface Extraction Example Code public class isosurface { public static void main(String arg[]) { float iso_value; RectDomain<1> CubeRange = [min:max]; CUBE[1d] InputData = new CUBE[CubeRange]; Point<1> p, b; RectDomain<1> PacketRange = [1:runtime_def_num_packets]; RectDomain<1> EachRange = [1:(max-min)/runtime_define_num_packets]; Pipelined_loop (b in PacketRange) { Foreach (p in EachRange) { InputData[p].ISO_SurfaceTriangles(iso_value,…); } … … }} For (int i=min; i++; i<max-1) { // operate on InputData[i] }

Experimental Results • Versions • Default version • Site hosting the data only reads and transmits data, no processing at all • User’s desktop only views the results, no processing at all • All the work are done by the compute nodes • Compiler-generated version • Intelligent decomposition is done by the compiler • More computations are performed on the end nodes to reduce the communication volume • Manual version • Hand-written DataCutter filters with similar decomposition as the compiler-generated version Computing nodes workload heavy Communication volume high workload balanced between each node Communication volume reduced

Experimental Results: ISO-Surface Rendering (Z-Buffer Based) Small dataset 150M Large dataset 600M 20% improvement over default version Width of pipeline Width of pipeline Speedup 1.92 3.34 Speedup 1.99 3.82

Motivation • The need • Analysis of datasets is becoming crucial for scientific advances • Emergence of X-Informatics • Complex data formats complicate processing • Need for applications that are easily portable – compatibility with web/grid services • The opportunity • The emergence of XML and related technologies developed by W3C • XML is already extensively used as part of Grid/Distributed Computing • Can XML help in scientific data processing?

XQuery ??? XML The Big Picture HDF5 NetCDF TEXT RMDB …

Programming/Query Language • High-level declarative languages ease application development • Popularity of Matlab for scientific computations • New challenges in compiling them for efficient execution • XQuery is a high-level language for processing XML datasets • Derived from database, declarative, and functional languages ! • XPath (a subset of XQuery) embedded in an imperative language is another option

Approach / Contributions • Use of XML Schemas to provide high-level abstractions on complex datasets • Using XQuery with these Schemas to specify processing • Issues in Translation • High-level to low-level code • Data-centric transformations for locality in low-level codes • Issues specific to XQuery • Recognizing recursive reductions • Type inferencing and translation

System Architecture External Schema XML Mapping Service logical XML schema physical XML schema Compiler XQuery Sources C++/C

Satellite Data Processing Time[t] ··· • Data collected by satellites is a collection of chunks, each of which captures an irregular section of earth captured at time t • The entire dataset comprises multiples pixels for each point in earth at different times, but not for all times • Typical processing is a reduction along the time dimension - hard to write on the raw data format

Using a High-level Schema • High-level view of the dataset – a simple collection of pixels • Latitude, longitude, and time explicitly stored with each pixel • Easy to specify processing • Don’t care about locality / unnecessary scans • At least one order of magnitude overhead in storage • Suitable as a logical format only

XQuery -A language for querying and processing XML document - Functional language - Single Assignment - Strongly typed XQuery Expression - for let where return(FLWR) - unordered - path expression Unordered( For $d in document(“depts.xml”)//deptno let $e:=document(“emps.xml”)//emp [Deptno= $d] where count($e)>=10 return <big-dept> {$d, <count> {count($e) }</count> <avg> {avg($e/salary)}<avg> } </big-dept> ) XQuery Overview

Unordered ( for $i in ( $minx to $maxx) for $j in ($miny to $maxy) let p:=document(“sate.xml”) /data/pixel where lat = i and long = j return <pixel> <latitude> {$i} </latitude> <longitude> {$j} <longitude> <sum>{accumulate($p)}</sum> </pixel> ) Define function accumulate ($p) as double { let $inp := item-at($p,1) let $NVDI := (( $inp/band1 -$inp/band0)div($inp/band1+$inp/band0)+1)*512 return if (empty( $p) ) then 0 else { max($NVDI, accumulate(subsequence ($p, 2 ))) } Satellite- XQuery Code

Challenges • Need to translate to low-level schema • Focus on correctness and avoiding unnecessary reads • Enhancing locality • Data-centric execution on XQuery constructs • Use information on low-level data layout • Issues specific to XQuery • Reductions expressed as recursive functions • Generating code in an imperative language • For either direct compilation or use a part of a runtime system • Requires type conversion

A number of getData functions to access elements(s) of required types getData functions written in XQuery allow analysis and transformations Want to insert getData functions automatically preserve correctness and avoid unnecessary scans getData(lat x, long y) getData(lat x) getData(long y) getData(lat x, long y, time t) …. Mapping to Low-level Schema

Summary – XML Based Front-ends • A case for the use of XML technologies in scientific data analysis • XQuery – a data parallel language ? • Identified and addressed compilation challenges • A compilation system has been built • Very large performance gains from data-centric transformations • Preliminary evidence that high-level abstractions and query language do not degrade performance substantially

Applications in a Grid Environment • characteristics summarized • long-running applications • adaptation to changing environments is desirable • constraints-based • response time • output can be varied in a given range • resolution • accuracy • precision How to achieve adaptation?

Proposed Language Extensions public interface Adapt_Spec { string constraints; // “RESP_TIME <= 50ms” List<string> opti_vars; // “m”, “clipwin.x” List<string> thresholds; // “m>=N”, “sampling_factor>=1” List<int> opti_dir; }

Implementation Issues & Strategies • Language Aspect • Compiler Implementation • Performance Modeling & Resource Monitoring • Experimental Design

Overview of the Project • Cyber-infrastructure/grid environment comprises distributed data sources • Users will like seem-less access to the data • SQL is popular for accessing data from a single database • SQL for grid-based accesses • Data is distributed • Data is not managed by the relational database system • Need to export data layout information to the query planner

Overview (Contd.) • Use Grid-db-lite as the backend • A grid data management middleware • Define and use a data description language • Parse SQL queries and the data description language and generate a Grid-db-lite application

Design • Dataset description file • Data set schema • Dataset list file • Cluster configure • Dataset storage location • Meta-data • Logical data space ( number of dimension ) • Attributes for index declaration • Partition • Physical data storage annotation

Description file { Group “ROOT” { DATASET “bh” { DATATYPE { IPARS } DATASPACE {RANK 3 } DATAINDEX { RID, TIME } PARTS { 9503, 9503, 9537, 9554, 9503, 9707, 9520, 9520 } DATA { DATASET SPACIAL, DATASET POIL, DATASET PWAT, …… } } Group “SUBGROUP” { DATASET “SPACIAL” { DATATYPE { } DATASPACE { SKIP 4 LINES LOOP PARTS { X SPACE Y SPACE Z SKIP 1 LINE } } DATA {PART in (0,1,2,3,4,5,6,7) .0.PART.5.init } } DATASET “POIL” { DATATYPE { } DATASPACE { LOOP TIME { SKIP 1 double LOOP PARTS { POIL } } } DATA { PART in (0,1,2,3,4,5,6,7) .0.PART.5.0 } …… } [IPARS] RID = INT2 TIME = INT4 X = FLOAT Y = FLOAT Z = FLOAT POIL = FLOAT PWAT = FLOAT …… Meta-data Data list file [bh] DatasetDescription = IPARS io = file Dim = 17x65x65 Npart = 8 … Osumed1 = osumed01.epn.osc.edu, osumed02.epn.osc.edu, … 0 = bh-10-1 osumed1 /scratch1/bh-10-1 1 = bh-10-2 osumed1 /scratch1/bh-10-2 ……

{ Group “ROOT” { DATASET “TitanData” { DATATYPE { TITAN } DATASPACE {RANK 3 } DATAINDEX { FID, OFFSET, BSIZE } DATA { DATASET TITAN, INDEXSET TITANINDEX} } Group “SUBGROUP” { DATASET “TITAN” { DATATYPE { struct TITAN_Record_t {unsigned int x, y, z; unsigned int s1,s2,s3,s4,s5; }; } DATASPACE { LOOP {struct TITAN_Record_t} } DATA { 0 } } INDEXSET “TITANINDEX” { DATATYPE { HOST hostid; struct Block3D { MBR rect; JMP jmp; FID fid; OFFSET offset; BSIZE bsize; }; } DATASPACE { LOOP { HOST SPACE struct Block3D } } DATA { IndexFile } } } } Description file [TITAN] X = INT4 Y = INT4 Z = INT4 S1 = INT4 S2 = INT4 S3 = INT4 S4 = INT4 S5 = INT4 Meta-data Data list file [TitanData] DatasetDescription = TITAN io = file Dim = NULL Npart = 1 Osumed1 = osumed01.epn.osc.edu 0 = NULL osumed1 /scratch1/weng/Titan/

Compilation Issues • Interface between Index() and Extractor() • Range query • A chunk can be totally in the query range, partially in the query range, or totally outside of the query range • How to choose a suitable size for indexed chunks • Interface between Extractor() and GridDB-lite • Explore alternative methods to get tuples/records • Smarter extractor can signal GridDB for some filtering operations • Query transform • Optimization? • Hosts allocation for stages ( DP, DM, Client) • Some other potential issues? • The granularity of “tuple” • Data partitioning methods • …

Other Research Areas • Runtime support systems • Ease parallelization of data mining algorithms in a cluster environment (FREERIDE) • Grid-based processing of distributed data streams • Algorithms for Data Mining / OLAP • Parallel and scalable algorithms • Algorithms for processing distributed data streams

Group Members • Seven Ph.D students • Liang Chen • Wei Du • Anjan Goswami • Ruoming Jin • Xiaogang Li • Li Weng • Xuan Zhang • Two Masters students • Leo Glimcher • Swarup Sahoo • Part-time student • Kolagatla Reddy

Getting Involved • Talk to me • Sign in for my 888

Compiler (and Runtime) Support for CyberInfrastructure

Compiler (and Runtime) Support for CyberInfrastructure

Presentation Transcript

Compiler Construction

Just-In-Time Compiler

CSC 8505 Compiler Construction

CSE P501 – Compiler Construction

CSE P501 – Compiler Construction

AMD-SPL Runtime Programming Guide

Compiler Optimizations in the Berkeley UPC Translator

COMPILER CONSTRUCTION

CS 153: Concepts of Compiler Design October 8 Class Meeting

COMPILER CONSTRUCTION

CPSC 388 – Compiler Design and Construction

Runtime CPU Spike Detection using Manual and Compiler-Automated Instrumentation

Compiler Principle and Technology

Winter 2006-2007 Compiler Construction T9 – IR part 2 + Runtime organization

Subject Name Compiler Design Subject Code: 10CS63 Prepared By: Deepa, Dharmalingham, Besty

Languages and Compiler Design II Runtime System

COMPILER CONSTRUCTION

COMPILER CONSTRUCTION

Compiler Construction

Winter 2007-2008 Compiler Construction T9 – IR part 2 + Runtime organization