CS 239 – Big Data Systems Fall 2019

CS 239 – Big Data SystemsFall 2019 Harry Xu UCLA

My Research Background • Compilers and systems • Static and dynamic program analysis • Compiler • Runtime/operating systems • Big Data Analytics • Dataflow systems • Graph systems • Machine learning systems • Some industrial experience • Microsoft – created and developed an optimizing compiler for Cosmos/Scope that improved the overall performance of production jobs by up to 3X • IBM – created and developed a series of profiling tools for large-scale systems Big Data system support for scalable program analysis system support for scalable analytics

BigDatalog Application Circle Infrastructure Circle

This Course: Big Data Systems • What it is about • Low-level infrastructures • Programming models • Runtimes • Scalability and efficiency • What it is NOT about • High-level applications • Workloads • Data collection and usage • An example • We are going to discuss some papers on machine learning systems • We are NOT going to discuss learning models and algorithms

Industrial Relevance • Many papers came directly from industry • GFS, MapReduce, Bigtable, Spanner, TensorFlow (Google) • HDFS (Yahoo) • Azure, Trill, Dryad, Naiad (Microsoft) • Spark, Tachyon (Databricks) • Applications v.s. systems • Many people can develop applications • Few people can develop systems • Applications are specific to domains while skills required to build infrastructures are generic

Goals to Achieve • Understand what systems are available for data analytics • Understand fundamental challenges in system design • Understand how to design a customized system for a certain workload • Gain experience with system development by proposing and implementing a new idea

What This Course is Related To • Distributed systems • Database systems • Computer Architecture • Networking • Storage (memory, disk, file system,etc.) • Graph algorithms • Statistics • Machine learning

Aspects of Big Data Processing • Where to put data? • How to process data at scale? • How to process different types of data? • Structured data • Unstructured data • Streaming data • Graph data • Data for model training • How to take advantage of technological advances • How to make processing efficient?

Topics Covered (I) • Distributed storage systems • HDFS, GFS, Bigtable, Spanner, and Azure storage • Dataflow engines • MapReduce, Dryad, AsterixDB, Spark • Batch processing • Hive, Spark SQL, and SCOPE • Resource Management • Mesos, YARN, LATE, Borg, Sparrow

Topics Covered (II) • Stream processing • Storm, Flink, Kafka, Naiad, Trill, SVE, Drizzle • Graph processing • Pregel, Ligra, GraphChi, Xstream, GridGraph • Machine learning • TensorFlow, Parameter Servers, Project Adam

Why Do We Need Those Systems • Enablers • Better performance • Scalability • Efficiency • Energy • Easy/flexible programmability

Course Structure • Paper critiques • Due before each presentation day • Presentation • 20-25 mins • Participation in active discussion • Project • 2-3 students form a group, working on an innovative idea in system development

Things about Presentations/Critiques • Reuse slides as much as possible • A good rule of thumb is to follow this order • What problems does the paper solve? • Why are they (serious) problems? • Why aren’t they already solved? • What are the main challenges? • How did the authors overcome them? • What evidence did the authors show that the problems is solved? • Questions, concerns, opportunities for improvement

CS 239 – Big Data Systems Fall 2019

CS 239 – Big Data Systems Fall 2019

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7