550 likes | 633 Views
If Parallelism Is The New Normal, How Do We Prepare Our Students (And Ourselves)?. Joel Adams Department of Computer Science Calvin College. An Anecdote about CCSC:MW. This story has nothing to do with parallel computing, but it may be of interest…
E N D
If Parallelism Is The New Normal, How Do We Prepare Our Students(And Ourselves)? Joel Adams Department of Computer Science Calvin College
An Anecdote about CCSC:MW This story has nothing to do with parallel computing, but it may be of interest… Did you know that if it were not for CCSC:MW, CS Education Week would likely not exist? CCSC:MW 2014 - 2
How CCSC:MW CS Ed Week • No Child Left Behind was killing HS CS! • I’m pretty apolitical, but ... At CCSC:MW in2008: • The ACM-CSTA’s Chris Stevenson gave the keynote, describing the decline of CS in high schools CCSC:MW 2014 - 3
How CCSC:MW CS Ed Week (a Physics PhD and former Calvin prof). • He was surprised to hear of the problems (esp. enrollment declines) CS was facing. I decided to visit my Congressman, Rep. Vernon Ehlers, ranking member of the House Committee on Science & Technology CCSC:MW 2014 - 4
How CCSC:MW CS Ed Week • CCSC:MW catalyzed CS Education Week! Rep. Ehlers contacted the ACM, specifically Cameron Wilson. They worked together on CS Education Week, which the House passed 405-0 in 2009. CCSC:MW 2014 - 5
What’s Happening Now? There is a bill currently in Congress: • H.R. 2536: The CS Education Act of 2013 • It seeks to strengthen K-12 CS education, and make CS a core subject. • It currently has 116 co-sponsors (62R, 54D); is supported by ACM, NCWIT, Google, MS, ... • It has been referred to the Committee on Early Childhood, Elementary, and Secondary Ed., chaired by Rep. Todd Rokita (R, IN). CCSC:MW 2014 - 6
Most Representatives Are Unaware CCSC:MW 2014 - 7
What Can You Do? There is strength in numbers: • Contactyour Congressional reprentative and ask them to co-sponsor HR 2536. • If you are in Rep. Rokita’s district… (!) • More co-sponsors improve its chances. • Tweet to Rep. Rokita (@ToddRokita) • Tell him you support HR 2536 – the CS Education Act of 2013 – and want it to pass. CCSC:MW 2014 - 8
And Now, Back To Today’s Topic Overview • The past • How our computing foundation has shifted • The present • Today’s hardware & software landscapes • The future? • Preparing ourselves & our students CCSC:MW 2014 - 9
the sun hot plate Temperature actual projected 2020 CCSC:MW 2014 - 10
The Heat Problem… • … was not caused by Moore’s Law • It was caused by manufacturers doubling the clock speeds every 18-24 months • This was the “era of the free lunch” for software developers: • If your software was sluggish, faster hardware would fix your problem within two years! CCSC:MW 2014 - 11
Solving the Heat Problem… • In 2005, manufacturers stopped doubling the clock speeds because of the heat, power consumption, electron bleeding, … • This ended the “era of the free lunch” • Software will no longer speed up on its own. CCSC:MW 2014 - 12
Clock Speed (frequency) trend CCSC:MW 2014 - 13
But Moore’s Law Continued • Every 2 years, manufacturers could still double the transistors in a given area: • 2006: Dual-core CPUs • 2008: Quad-core CPUs • 2010: 8-core CPUs • 2012: 16-core CPUs • … • Each of these cores has the full functionality of a traditional CPU. CCSC:MW 2014 - 14
12 Years of Moore’s Law • 2001: ohm.calvin.edu: 18 nodes, each with: • One 1-GHz Athlon CPU • 1 GB RAM / node • Gigabit Ethernet, USB, HDMI, … • Ubuntu Linux • ~$60,000 (funded by NSF). • 2013: Adapteva Parallella • A Dual-core 1-GHz ARM A7 • 16 core Epiphany Coprocessor • 1 GB RAM • Gigabit Ethernet, USB, HDMI, … • Ubuntu Linux • ~$99 (but freevia university program!) CCSC:MW 2014 - 15
Multiprocessors are Inexpensive • 2014: Nvidia Jetson TK1 • Quad-core ARM A15 • Kepler GPU w/ 192 CUDA cores • 2 GB RAM • Gigabit Ethernet, HDMI, USB, … • Ubuntu Linux • ~$200 CCSC:MW 2014 - 16
Multiprocessors are Everywhere CCSC:MW 2014 - 17
Some Implications • Traditional sequential programs will not run faster on today’s hardware. • They may well run slower because the manufacturers are decreasing clock speeds. • The only software that will run faster is parallel software designed to scalewith the number of cores. CCSC:MW 2014 - 18
Categorizing Parallel Hardware Parallel Systems Heterogeneous Systems Distributed Memory Shared Memory Multicore Accelerators Newer Clusters Modern Super Computers Older Clusters GPUs Coprocessors CCSC:MW 2014 - 19
Hardware: A Diverse Landscape • Shared-memory systems • Distributed-memory systems • Heterogeneous systems Core1 Core2 Core3 Core4 Memory Mem1 CPU1 Network CPU2 CPU3 Mem2 Mem3 CPUN MemN CCSC:MW 2014 - 20
CS Curriculum 2013 Because of this hardware revolution, the advent of cloud computing, and so on, CS2013 has added a new knowledge area: Parallel and Distributed Computing (PDC) CCSC:MW 2014 - 21
What is PDC? It goes beyond traditional concurrency: • Parallelemphasizes: • Throughput / performance (and timing) • Scalability (performance improves with # of cores) • New topics like speedup, Amdahl’s Law, … • Distributedemphasizes: • Multiprocessing (no shared memory) • MPI, MapReduce/Hadoop, BOINC, … • Cloud computing • Mobile apps accessing scalable web services CCSC:MW 2014 - 22
Software: Communication Options In shared-memory systems, programs may: • Communicate via the shared-memory • Languages: Java, C++11, … • Libraries: POSIX threads, OpenMP • Communicate via message passing • Message-passing languages: Erlang, Scala, … • Libraries: the Message Passing Interface (MPI) CCSC:MW 2014 - 23
CS Curriculum 2013 (CS2013) • The CS2013 core includes 15 hours of parallel & distr. computing (PDC) topics • 5 hours in core Tier 1 • 10 hours in core Tier 2 + related topics in System Fundamentals (SF) • How/where do we cover these topics in the CS curriculum? CCSC:MW 2014 - 24
Model 1: Create a New Course Add a new course to the CS curriculum that covers the core PDC topics: • If someone else has to teach this new course, dealing with PDC is theirproblem, not mine! • The CS curriculum is already full! • What do we drop to make room? CCSC:MW 2014 - 25
Model 2: Across the Curriculum Sprinkle 15+ hours (3 weeks) of PDC across our core CS courses, not counting SF: • Students see relationship of PDC to data structures, algorithms, prog. lang., … • Easier to make room for 1 week in 1 course than jettison an entire course. • Spreads the effort across multiple faculty • All those faculty have to be “on board” CCSC:MW 2014 - 26
Calvin CS Curriculum Year Fall Semester Spring Semester 1 Intro to Computing Calculus I Data Structures Calculus II Data Structures Calculus II 2 Algorithms & DS Intro. Comp. Arch. Discrete Math I Programming Lang. Discrete Math II Algorithms & DS Intro. Comp. Arch. Programming Lang. Discrete Math II 3 Software Engr Adv. Elective OS & Networking Adv. Elective Statistics Software Engr. OS & Networking 4 Adv. Elective Sr. Practicum I Adv. Elective Sr. Practicum II Perspectives on Comp. Adv. Elective: HPC CCSC:MW 2014 - 27
Why Introduce Parallelism in CS2? • For students to be facile with parallelism, they need to see it early and often. • Performance(Big-Oh) is a topic that’s first addressed in CS2. • Data structures let us store large data sets • Slow sequential processing of these sets provides a natural motivation for parallelism. CCSC:MW 2014 - 28
Parallel Topics in CS2 • Lecture topics: • Single threading vs. multithreading • The single-program-multiple-data (SPMD), fork-join, parallel loop, and reduction patterns • Speedup, asymptotic performance analysis • Parallel algorithms: searching, sorting • Race conditions: non-thread-safe structures • Lab exercise: Compare sequential vs. parallel matrix operations using OpenMP CCSC:MW 2014 - 29
Lab Exercise: Matrix Operations Given a Matrix class, the students: • Measure the time to perform sequential addition and transpose methods • For each of three different approaches: • Use the approach to parallelize those methods • Record execution times in a spreadsheet • Create a chart showing time vs # of threads Students directly experience the speedup… CCSC:MW 2014 - 30
Addition: m3 = m1 + m2 ~36 steps + = Multi-threaded (4 threads): ~9 steps = + Single-threaded: CCSC:MW 2014 - 31
Tranpose: m2 = m1.transpose() ~24 steps = .tranpose() Multi-threaded (4 threads): ~6 steps = .tranpose() Single-threaded: CCSC:MW 2014 - 32
Programming Project • Parallelize other Matrix operations • Multiplication • Assignment • Constructors • Equality • Some operations (file I/O) are inherently sequential, providing a useful lesson… CCSC:MW 2014 - 34
Alternative Exercise/Project • Parallelize image-processing operations: • Color-to-grayscale • Invert (negative) • Blur, Sharpen • Sepia-tinting • Many students will find photo-processing to be more engaging than matrix ops. CCSC:MW 2014 - 35
Assessment All students complete end-of-course evaluations with open-ended feedback: • They really like the week on parallelism • Covering material that is not in the textbook makes CS2 seem fresh and cutting edge • Students really like learning how they can use all their cores instead of just one • Having students experience speedup is key (and even better if they can seeit) CCSC:MW 2014 - 36
More Implications • Software developers who cannot build parallel apps will be unable to leverage the full power of today’s hardware. • At a competitive disadvantage? • Designing / writing parallel apps is very different from designing / writing sequential apps. • Pros think in terms of parallel design patterns CCSC:MW 2014 - 37
Parallel Design Patterns … are industry-standard strategies that parallel professionals have found useful over 30+ years of practice. … often have direct support built into popular platforms like MPI and OpenMP. … are likely to remain useful, regardless of future PDC developments. … provide a framework for PDC concepts. CCSC:MW 2014 - 38
Algorithm Strategy Patterns Example 1: Most parallel programs use one of just three parallel algorithm strategy patterns: • Data decomposition: divide up the data and process it in parallel. • Task decomposition: divide the algorithm into functional tasks that we perform in parallel (to the extent possible). • Pipeline: divide the algorithm into linear stages, through which we “pump” the data. Of these, only data decomposition scales well… CCSC:MW 2014 - 39
Data Decomposition (1 thread) Thread 0 CCSC:MW 2014 - 40
Data Decomposition (2 threads) Thread 0 Thread 1 CCSC:MW 2014 - 41
Data Decomposition (4 threads) Thread 0 Thread 1 Thread 2 Thread 3 CCSC:MW 2014 - 42
Task Decomposition Thread 0 int main() { x = f(); y = g(); z = h(); w = x + y + z; } main() f() g() h() Thread 1 Thread 2 Thread 3 Independent functions in a sequential computation can be “parallelized”: CCSC:MW 2014 - 43
Pipeline 1 2 3 4 5 6 Time- Step: 0 Thread 0 main() int main() { ... while (fin) { fin >> a; b = f(a); c = g(b); d = h(c); fout << d; } ... } a4 a2 a3 a6 a1 a5 a0 Thread 1 f(a) b3 b1 b2 b5 b0 b4 Thread 2 g(b) c2 c0 c1 c4 c3 Thread 3 h(c) d1 d0 d3 d2 … can still be pipelined: Programs with non-independent functions… CCSC:MW 2014 - 44
Scalability • If a program gets faster as more threads /cores are used, its performance scales. • For the three algorithm strategy patterns: • Only data decomposition scales well. CCSC:MW 2014 - 45
The Reduction Pattern To sum these 8 numbers: 6 8 9 1 5 7 2 4 Step 1 14 10 12 6 Step 2 24 18 Step 3 42 Programs often need to combine the local results of N parallel tasks: • When N is large, O(N) time is too slow • The reductionpattern does it in O(lg(N))time: CCSC:MW 2014 - 46
Faculty Development Resources • National Computational Science Institute (NCSI) offers workshops each summer: • www.computationalscience.org/workshops/ • The XSEDE Education Program offers workshops, bootcamps, and facilities: • www.xsede.org/curriculum-and-educator-programs • The LittleFe Project offers “buildouts” at which participants can build (and take home) a free portable Beowulf cluster: • littlefe.net CCSC:MW 2014 - 48
LittleFe • Little Fe (v4): 6 nodes • Dual-core Atom CPU • Nvidia ION2 w/ 16 CUDA cores • 2 GB RAM • GigabitEthernet, USB, … • Custom Linux distro (BCCD) • Pelican case • ~$2500 (but free at “buildouts”!) SIGCSE 2014 - 49
Faculty Development Resources • CSinParallel is an NSF-funded project to help CS educators integrate PDC topics. • 1-3 hour hands-on PDC “modules” in: • Different level courses • Different languages • Different parallel design patterns (patternlets) • Workshops (today, here; summer 2015 in Chicago) • Community of supportive people to help work through problems and issues. • csinparallel.org CCSC:MW 2014 - 50