230 likes | 375 Views
Problem-solving on large-scale clusters: theory and applications. Lecture 1: Introduction and Theoretical Background. Today’s Outline. Introductions Quiz Course Objective & Administrative Info fold and map : Theory. Introductions. Name + trivia. Quiz Time!.
E N D
Problem-solving on large-scale clusters: theory and applications Lecture 1: Introduction and Theoretical Background
Today’s Outline • Introductions • Quiz • Course Objective & Administrative Info • fold and map: Theory
Introductions • Name + trivia
Quiz Time! • Not graded; helps us calibrate how difficult to make this seminar • Okay (and encouraged!) to leave questions blank
Course Outline • Introduction to parallel programming and distributed system design • successfully decompose problems into map and reduce stages • decide whether a problem can be solved with a parallel algorithm, and evaluate its strengths and weaknesses • understand the basic tradeoffs and major issues in distributed system design • know the common pitfalls of distributed system design • This seminar is light on “facts” and “recipes”, heavy on “tradeoffs”
Course Information (1 of 2) • Lecturers: • Albert J. Wong • Hannah Tang • Lab consultant: • Alden King • Liasons: • John Zahorjan • Christophe Bisciglia
Course Information (2 of 2) • Textbook • None; see online course readings • Webpage: http://www.cs.washington.edu/cse490h • Mailing lists: • Course discussion: cse490h@...
Warning: Theory Ahead! • Before we can talk about MapReduce, we need to talk about the concepts on which it is founded: • Programming languages: fold and map • Distributed systems: data dependancies
Digression: Function Objects (1 of 3) • A function object is a function that can be manipulated as an object • Sometimes referred to as a “functor” • In Java, this is usually implemented with a class that has an execute() (or similarly named) method
The underlying idea is to pass the “greater than” operation to sort() Digression: Function Objects (2 of 3) • Example: Inheriting from the Comparable interface to use Collections.sort() class ReverseAlphaOrder implements Comparable { public int Compare(Object o1, Object o2) { if(o1 instanceof String && o2 instanceof String) { return String(o1) >= String(o2); } } String[] myStrings; ReverseAlphaOrder rao; Collections.sort(myStrings, rao);
Digression: Function Objects (3 of 3) • In Java, methods that take function objects are “higher-order functions” • Collections.sort() is a higher-order function • Mathematically, a “higher order function” is a function which does at least one of the following: • Take one or more functions as input • Output a function • Examples: • The derivative (from calculus)d/dx (x3 + 2x) = 3x2 + 2
fold - Introduction • fold is a family of higher-order functions that process a data structure and return a single value • Commonly, fold takes a function f and a list l, and recursively applies f to “combine” the elements of l • The return value may be “complex”, e.g. a list • Example: • fold (+) [1,2,4,8] -> ??? • fold (/) [64,8,4,2] -> ???
÷ ÷ 2 ÷ 4 64 8 fold - Directionality • Remember how we said fold was “a family of functions”? • foldr (/) [64,8,4,2] -> 64 / (8 / (4/2)) -> 16 • foldl (/) [64,8,4,2] -> ((64/8) / 4) / 2 -> 1 • “fold right” • recursively applies f over the right side of the list • “fold left” • recursively applies f over the left side of the list Right fold Left fold ÷ ÷ 64 ÷ 8 4 2
fold - Questions • Discussion questions: • What should the base case return? • foldr (+) [] -> ??? • foldr (/) [] -> ??? • Can a right fold be implemented as a loop (using tail recursion)? What about left fold? • Enrichment questions: • What happens to a right fold when given an infinite list? What about left fold?
fold - Formal Definition • fold takes a function and a list as its inputs – but it can also take more values. • In particular, fold maintains context / state across each invocation of f -- If the list is empty, return the initial value ‘z’foldr f z [] = z -- If the list is not empty, calculate the result of folding the -- rest, and apply f to the first element and to that result. -- The context from previous invocations of f is implicitly -- passed to the current invocation of via foldr foldr f z (x:xs) = f x (foldr f z xs) What is the formal definition of foldl?
fold – An Intuition • fold “iterates” over a data structure, and maintains one unit of state • At each iteration, f is invoked with the current element and the current state • fold’s return value is the result of f’s final invocation
map - Introduction • map is a higher-order function that “transforms” each element in a sequence of elements • Commonly, map takes a function f and a sequence s, and applies f to each element of s • Example: • map square_root [1,4,9,16] -> ???
map’s Return Value • map returns a sequence • The new sequence s’ is not necessarily the same size as s • The elements of s’ do not necessarily have the same type as the elements of s
a+b b a ] = map components [ , , = [ , , ] ??? , , , map’s Return Value – Example • Recall that the sum of N vectors was equal to the sum of their components: • Let components() decompose a vector into its X and Y components = [ ( , ), ( , ), ( , ) ] ???
map - Questions • Enrichment questions: • For what values of f and z will fold f z l = l? How can you modify f such that fold f z l = map f l? • Bonus question: can you implement map in terms of fold? • Visit foldl.com and foldr.com :)
map – Formal definition • map takes a function and a data structure as its inputs -- If the list is empty, there’s nothing to do map f [] = [] -- If the list is not empty, apply f to the first element and -- add the result to the mapping of f on all other elements map f (x:xs) = f x : map f xs What is the complexity of map? What is its runtime?
Exercise (1 of 2) • Individually: • Determine how these operations can be solved with a fold, a map, or some combination of fold and map: • Given a list of vectors, add them to determine the resultant vector. • Ray tracing a single ray • Ray tracing takes a list of rays that intersect the camera, and traces their path back to their respective lightsources, even across their reflection over several surfaces • Assuming you had access to a company’s monthly paystubs for all employees for an entire year, calculate how much annual income tax is owed per-person. • Run-length encoding. • Run-length encoding takes a possibly-repetitive string and rewrites it as a (value, frequency) pair, eg “aaa b ccccc dd” -> “a3 b c5 d2”. • Find the smallest element in an array • Come up with some challenging problems yourself!
Exercise (2 of 2) • In small groups, compare your answers to the above, and stump your team with the problems you came up with!