540 likes | 728 Views
EECE 571R: Data-intensive computing systems. Matei Ripeanu matei at ece.ubc.ca. Contact Info. Email : matei @ ece.ubc.ca Office : KAIS 4033 Office hours : by appointment (email me) Course page : http://www.ece.ubc.ca/~matei/EECE571/. EECE 571R: Course Goals. Primary
E N D
EECE 571R:Data-intensive computing systems Matei Ripeanu matei at ece.ubc.ca
Contact Info Email: matei @ ece.ubc.ca Office: KAIS 4033 Office hours: by appointment (email me) Course page: http://www.ece.ubc.ca/~matei/EECE571/ Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
EECE 571R: Course Goals • Primary • Gain deep understanding of fundamental issues that affect design of: • Data-intensive systems • (more generally) Large-scale distributed systems • Survey main current research themes • Gain experience with distributed systems research • Research on: federated system, networks • Secondary • By studying a set of outstanding papers, build knowledge of how to do & present research • Learn how to read papers & evaluate ideas Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
What I’ll Assume You Know • Basic Internet architecture • IP, TCP, DNS, HTTP • Basic principles of distributed computing • Asynchrony (cannot distinguish between communication failures and latency) • Incomplete & inconsistent global state knowledge (cannot know everything correctly) • Failures happen (In large systems, even rare failures of individual components, aggregate to high failure rates) • If there are things that don’t make sense, ask! Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Outline • Case study (and project ideas): • Volunteer computing: SETI@home /BOINC • Virtual Data System • Batch Aware Distributed File System • Administrative Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
How does it work? SETI@home Characteristics: • Fixed-rate data processing task • Low bandwidth/computation ratio • Independent parallelism • Error tolerance Master-worker architecture Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
tape archive, delete tape backup user DB science DB master DB redundancy checking DLT tapes CGI program acct. queue result queue RFI elimination garbage collector web page generator splitters repeat detection screensavers WU storage web site data server SETI@home Operations data recorder Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Total Last 24 Hours (as of Wed Feb 23 07:04:51) Users 5,361,313 4,391 Results received 1,779 millions 5 million Total CPU time 2.2 million years 3610.717 years Average CPU time/work unit 10 hr 58 min 14.0 sec 6 hr 19 min 30.1 sec History and Statistics • Conceived 1995, launched April 1999 • Millions of users, hosts… • No ET signals yet, but other results Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Millions of individual contributors!(Problems) • Server scalability • Dealing with excess CPU time • Untrusted environment: Bad user behavior • Cheating • Team recruitment by spam • Sale of accounts on eBay • Malfunctions of individual components Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
SETI@home: Summary • The characteristics of the problem … • Massive (“embarrassing”) parallelism • Low bandwidth/computation ratio • Fixed-rate data processing task • … make possible a solution that operates in an unfriendly environment • Wide area distribution; huge scale • High failure rates • Untrusted/malicious components • Solution: Master-worker design • Master=central point of control • Single point of failure • Performance bottleneck Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Outline • Case study (and project ideas): • Volunteer computing: SETI@home /BOINC • Virtual Data System • Batch Aware Distributed File System • Administrative Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Virtual Data System • Context: ’big science’ • Motivation/goals: support science process, • i.e., track all aspects of data capture, production, transformation, and analysis • Requirements: ability to define complex workflows, and to reliably & efficiently execute workflows in heterogeneous, multi-domain environments. • Derived benefits: helps to audit, validate, reproduce, and/or rerun with corrections various data transformations. Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
BIG Science! The European Organisation for Nuclear Research CERN builds particle accelerators for particle physics research Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
CERN Data Handling and Computation for Physics Analysis reconstruction event filter (selection & reconstruction) detector analysis processed data event summary data raw data batch physics analysis event reprocessing simulation analysis objects (extracted by physics topic) event simulation interactive physics analysis les.robertson@cern.ch EECE571R Data-intensive computing (Spring’07)
Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS CMS Grid Hierarchy 2500 Physists, 40 countries 10s of Petabytes/Yr by 2008 Experiment Online System 100MB~1.5GB/sec Bunch crossing per 25 ns100 triggers per second~1 MByte per event CERN Computer Center > 20 TIPS Tier 0 HPSS 10 ~ 40 Gbits/sec Tier 1 France Center Italy Center UK Center USA Center 2.5-10 Gbits/sec Tier 2 0.6-2.5 Gbits/sec Institute Tier 3 Institute Institute Institute Institute Institute Institute Institute Institute Physics data cache 0.1-1 Gbits/sec Tier 4 Workstations,other portals Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.” “I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.” “I want to apply an astronomical analysis program to millions of objects. If the results already exist, I’ll save weeks of computation.” Motivations (1) “I’ve detected a calibration error in an instrument and want to know which derived data to recompute.” Data consumed-by/ generated-by Product-of Transformation Derivation execution-of Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Motivations (2) • Data track-ability and result audit-ability • Repair and correction of data • Rebuild data products—c.f., “make” • Workflow management • A new, structured paradigm for organizing, locating, specifying, and requesting data products • Performance optimizations • Ability to re-create data rather than move it Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Requirements • Express complex multi-step “workflows” • Perhaps 100,000s of individual tasks • Operate on heterogeneous distributed data • Different formats & access protocols • Harness many computing resources • Parallel computers &/or distributed Grids • Execute workflows reliably • Despite diverse failure conditions • Enable reuse of data & workflows • Discovery & composition • Support many users, workflows, resources • Policy specification & enforcement Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Workflow spec VDL Program Virtual Data catalog Virtual Data Workflow Generator Abstract workflow Virtual Data System Create Execution Plan Grid Workflow Execution Statically Partitioned DAG DAGman DAG DAGman & Condor-G Dynamically Planned DAG Job Planner Job Cleanup Local planner Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
VDS Software Stack • Express complex multi-step “workflows” • Perhaps 100,000s of individual tasks • Operate on heterogeneous distributed data • Different formats & access protocols • Harness many computing resources • Parallel computers &/or distributed res. • Execute workflows reliably & efficiently • Despite diverse failure conditions • Enable reuse of data & workflows • Discovery & composition • Support many users, workflows, resources • Policy specification & enforcement VDL, XDTM Pegasus,DAGman, Globus VDC TBD Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Outline • Case study (and project ideas): • Volunteer computing: SETI@home /BOINC • Virtual Data System • Batch Aware Distributed File System • Administrative Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Motivating question: Are existing distributed file systems adequate for batch computing workloads? • NO. Internal decisions inappropriate • Caching, consistency, replication • A solution: Combine scheduling knowledge with external storage control • Detail information about workload is known • Storage layer allows external control • External scheduler makes informed storage decisions • Combining information and control results in • Improved performance • More robust failure handling • Simplified implementation Explicit Control in a Batch-Aware Distributed File System, John Bent, Douglas Thain, Andrea C.Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Miron Livny, (NSDI '04) Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Outline • Batch computing • Systems • Workloads • Environment • Why not DFS? • Solution: BAD-FS • Design • Experimental evaluation Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Batch computing Internet Home storage Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Batch computing • Not interactive • Compute Loop • Users submit jobs • Job description languages • System itself executes • Results are copied back to user system • Many exiting batch systems • Condor, LSF, PBS, Sun Grid Engine Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Compute node Compute node Compute node Compute node CPU Manager CPU Manager CPU Manager CPU Manager Jobqueue 1 2 3 4 Batch computing Internet Home storage Scheduler 1 2 3 4 Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Batch workloads • General properties • Large number of processes • Process and data dependencies • I/O intensive • Different types of I/O • Endpoint • Batch • Pipeline • Usage: mainly scientific workloads, but also video production, data mining, electronic design, financial services, graphic rendering Pipeline and Batch Sharing in Grid Workloads, Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. HPDC 12, 2003. Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Endpoint Endpoint Batch dataset Endpoint Pipeline Pipeline Batch dataset Batch workloads Endpoint Endpoint Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Endpoint Endpoint Endpoint Endpoint Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Cluster-to-cluster (c2c) • Not quite p2p • More organized • Less hostile • More homogeneity • Each cluster is autonomous • Run and managed by different entities • An obvious bottleneck is wide-area network Internet Home store Q: How to manage flow of data into, within and out of these clusters? Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Why not a traditional Distributed File System ? Internet • Distributed file system (DFS) would be ideal • Easy to use • Uniform name space • But . . . • Designed for wide-area networks • Not practical • Embedded decisions are wrong Home store Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Distributed file systems make ‘bad’ decisions • Caching • Must guess what and how to cache • Consistency • Output: Must guess when to commit • Input: Needs mechanism to invalidate cache • Replication • Must guess what to replicate Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
BAD-FS makes ‘good’ (i.e. informed) decisions • Removes the guesswork • Scheduler has detailed workload knowledge • Storage layer designed to allow external control • Scheduler makes informed storage decisions • Manages data as well as computations • Retains simplicity of distributed file systems • Practical and deployable Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Outline • Introduction • Batch computing • Systems • Workloads • Environment • Why not DFS? • One solution: BAD-FS • Design • Experimental evaluation Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Solution BAD-FS: Practical and deployable • User-level; requires no privilege • Packaged as a modified batch system • A new batch system which includes BAD-FS • General: will work on all batch systems SGE SGE SGE SGE BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS SGE SGE SGE SGE Internet Home store Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Storage Manager Storage Manager Storage Manager Storage Manager Jobqueue 1 2 3 4 Solution BAD-FS: Components Compute node Compute node Compute node Compute node CPU Manager CPU Manager CPU Manager CPU Manager BAD-FS BAD-FS BAD-FS 1) Storage managers 2) Batch-Aware Distributed File System Job queue 3) Expanded job description language 4) BAD-FS scheduler Home storage BAD-FS Scheduler Scheduler Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Information used • Remote cluster knowledge • Storage availability • Failure rates • Workload knowledge • Data type (batch, pipeline, or endpoint) • Data quantity • Job dependencies Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Control through volumes • Guaranteed storage allocations • Containers for job I/O • Scheduler • Creates volumes to cache input data • Subsequent jobs can reuse this data • Creates volumes to buffer output data • Destroys pipeline, copies endpoint • Configures workload to access containers Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Knowledge plus control • Enhanced performance • I/O scoping • Capacity-aware scheduling • Improved failure handling • Cost-benefit replication • Simplified implementation • No cache consistency protocol Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Setup 16 jobs 16 compute nodes Emulated wide-area Configuration Remote I/O AFS-like with /tmp BAD-FS Result is order of magnitude improvement Real workload experience Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
BAD-FS Lessons • Generic solutions may be inefficient • Often designed with specific tradeoffs in mind (e.g., most common workloads) • Fix: • Redesign for new workload • Use explicit information available at runtime to optimize the execution of lower layers Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Course Organization/Syllabus/etc. Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Administravia: Course structure • Lectures • About 1/3 of all classes • Student projects • Aim high! Have fun! It’s a class project, not your PhD! • Teams of up to 3 students • Project presentations at the end of the term • Paper discussion • The other classes Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Administravia: Weekly schedule (tentative) • Introduction. Overview of current research problems, technologies, and applications. • File system semantics, data durability and availability, replication and consistency, fault-tolerance. • Data storage technologies. Storage hierarchies. Capacity management. • Scientific applications: data access patterns, workload characterization. • Integration with compute systems. Grids and Virtual Data • Performance focus: caching, parallel access, striping. • Structured overlays. Distributed hash tables. Data systems harnessing structured overlays. • Security. • Applications I: Experience with deployed systems. (NFS, AFS, Google File System) • Applications II: Data archival. Cooperative internet proxy caches. Content distribution networks. • Applications III: Peer-to-peer file-sharing (BitTorrent, FreeLoader) • Project presentations Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Administravia: Grading • Paper reviewing:35% • Discussion leading: 15% • Project: 50% Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Administravia:Paper Reviewing (1) • Goals: • Think of what you read • Expand your knowledge beyond the papers that are assigned • Get used to writing paper reviews • Reviews due by midnight the day before the class • Be professional in your writing • Have an eye on the writing style: • Clarity • Beware of traps: learn to use them in writing and detect them in reading • Detect (and stay away from) trivial claims. E.g., 1st sentence in the Introduction: “The tremendous/unprecedented/phenomenal growth/scale/ubiquity of the Internet…” Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Administravia:Paper Reviewing (2) Follow the form provided when relevant. • State the main contribution of the paper • Critique the main contribution: • Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). • Explain your rating in a sentence or two. Rate how convincing the methodology is. • Do the claims and conclusions follow from the experiments? • Are the assumptions realistic? • Are the experiments well designed? • Are there different experiments that would be more convincing? • Are there other alternatives the authors should have considered? • (And, of course, is the paper free of methodological errors?) Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Administravia:Paper Reviewing (3) • What is the most important limitation of the approach? • What are the three strongest and/or most interesting ideas in the paper? • What are the three most striking weaknesses in the paper? • Name three questions that you would like to ask the authors. • Detail an interesting extension to the work not mentioned in the future work section. • Optional comments on the paper that you’d like to see discussed in class. Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)
Administravia:Discussion leading • Come prepared! • Prepare discussion outline • Prepare questions: • “What if”s • Unclear aspects of the solution proposed • … • Similar ideas in different contexts • Initiate short brainstorming sessions • Leaders do NOT need to submit paper reviews • Main goals: • Keep discussion flowing • Keep discussion relevant • Engage everybody (I’ll have an eye on this, too) Matei Ripeanu, UBC EECE571R Data-intensive computing (Spring’07)