490 likes | 586 Views
CPRE 583 Reconfigurable Computing Lecture 20: Wed 11/2/2011 (Compute Models). Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA. http://class.ee.iastate.edu/cpre583/. Announcements/Reminders. MP3: Due 11/4
E N D
CPRE 583Reconfigurable ComputingLecture 20: Wed 11/2/2011(Compute Models) Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ee.iastate.edu/cpre583/
Announcements/Reminders • MP3: Due 11/4 • IT should have resolved the issue that was causing problems running MP3 on some of the linux-X and research-X remote machines • Weekly Project Updates due: Friday’s (midnight)
Project Grading Breakdown • 50% Final Project Demo • 30% Final Project Report • 20% of your project report grade will come from your 5-6 project updates. Friday’s midnight • 20% Final Project Presentation
Projects Ideas: Relevant conferences • Micro • Super Computing • HPCA • IPDPS • FPL • FPT • FCCM • FPGA • DAC • ICCAD • Reconfig • RTSS • RTAS • ISCA
Projects: Target Timeline • Teams Formed and Topic: Mon 10/10 • Project idea in Power Point 3-5 slides • Motivation (why is this interesting, useful) • What will be the end result • High-level picture of final product • Project team list: Name, Responsibility • High-level Plan/Proposal: Fri 10/14 • Power Point 5-10 slides (presentation to class Wed 10/19) • System block diagrams • High-level algorithms (if any) • Concerns • Implementation • Conceptual • Related research papers (if any)
Projects: Target Timeline • Work on projects: 10/19 - 12/9 • Weekly update reports • More information on updates will be given • Presentations: Finals week • Present / Demo what is done at this point • 15-20 minutes (depends on number of projects) • Final write up and Software/Hardware turned in: Day of final (TBD)
Initial Project Proposal Slides (5-10 slides) • Project team list: Name, Responsibility (who is project leader) • Team size: 3-4 (5 case-by-case) • Project idea • Motivation (why is this interesting, useful) • What will be the end result • High-level picture of final product • High-level Plan • Break project into mile stones • Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip. • System block diagrams • High-level algorithms (if any) • Concerns • Implementation • Conceptual • Research papers related to you project idea
Weekly Project Updates • The current state of your project write up • Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section • The current state of your Final Presentation • Your Initial Project proposal presentation (Due Wed 10/19). Should make for a starting point for you Final presentation • What things are work & not working • What roadblocks are you running into
Overview • Compute Models
What you should learn • Introduction to Compute Models
Outline • Design patterns (previous lecture) • Why are they useful? • Examples • Compute models (Abstraction) • Why are they useful? • Examples
Outline • Design patterns (previous lecture) • Why are they useful? • Examples • Compute models (Abstraction) • Why are they useful? • Examples • System Architectures (Implementation) • Why are they useful? • Examples
References • Reconfigurable Computing (2008) [1] • Chapter 5: Compute Models and System Architectures • Scott Hauck, Andre DeHon • Design Patterns for Reconfigurable Computing [2] • Andre DeHon (FCCM 2004) • Type Architectures, Shared Memory, and the Corollary of Modest Potential [3] • Lawrence Snyder: Annual Review of Computer Science (1986)
Building Applications Problem -> Compute Model + Architecture -> Application • Questions to answer • How to think about composing the application? • How will the compute model lead to a naturally efficient architecture? • How does the compute model support composition? • How to conceptualize parallelism? • How to tradeoff area and time? • How to reason about correctness? • How to adapt to technology trends (e.g. larger/faster chips)? • How does compute model provide determinacy? • How to avoid deadlocks? • What can be computed? • How to optimize a design, or validate application properties?
Compute Models • Compute Models [1]: High-level models of the flow of computation. • Useful for: • Capturing parallelism • Reasoning about correctness • Decomposition • Guide designs by providing constraints on what is allowed during a computation • Communication links • How synchronization is performed • How data is transferred
Two High-level Families • Data Flow: • Single-rate Synchronous Data Flow • Synchronous Data Flow • Dynamic Streaming Dataflow • Dynamic Streaming Dataflow with Peeks • Steaming Data Flow with Allocation • Sequential Control: • Finite Automata (i.e. Finite State Machine) • Sequential Controller with Allocation • Data Centric • Data Parallel
Data Flow • Graph of operators that data (tokens) flows through • Composition of functions X X +
Data Flow • Graph of operators that data (tokens) flows through • Composition of functions X X +
Data Flow • Graph of operators that data (tokens) flows through • Composition of functions X X +
Data Flow • Graph of operators that data (tokens) flows through • Composition of functions X X +
Data Flow • Graph of operators that data (tokens) flows through • Composition of functions X X +
Data Flow • Graph of operators that data (tokens) flows through • Composition of functions X X +
Data Flow • Graph of operators that data (tokens) flows through • Composition of functions X X +
Data Flow • Graph of operators that data (tokens) flows through • Composition of functions • Captures: • Parallelism • Dependences • Communication X X +
Single-rate Synchronous Data Flow • One token rate for the entire graph • For example all operation take one token on a given link before producing an output token • Same power as a Finite State Machine 1 1 1 update - 1 1 1 1 1 1 1 1 copy F
Synchronous Data Flow • Each link can have a different constant token input and output rate • Same power as signal rate version but for some applications easier to describe • Automated ways to detect/determine: • Dead lock • Buffer sizes 1 10 1 update - 1 1 1 1 1 1 10 10 copy F
Dynamic Steaming Data Flow • Token rates dependent on data • Just need to add two structures • Switch Select in in0 in1 S S Switch Select out0 out1 out
Dynamic Steaming Data Flow • Token rates dependent on data • Just need to add two structures • Switch, Select • More • Powerful • Difficult to detect Deadlocks • Still Deterministic 1 Switch y x y x S F0 F1 y x x y Select
Dynamic Steaming Data Flow with Peeks • Allow operator to fire before all inputs have arrived • Example were this is useful is the merge operation • Now execution can be nondeterministic • Answer depends on input arrival times Merge
Dynamic Steaming Data Flow with Peeks • Allow operator to fire before all inputs have arrived • Example were this is useful is the merge operation • Now execution can be nondeterministic • Answer depends on input arrival times A Merge
Dynamic Steaming Data Flow with Peeks • Allow operator to fire before all inputs have arrived • Example were this is useful is the merge operation • Now execution can be nondeterministic • Answer depends on input arrival times B Merge A
Dynamic Steaming Data Flow with Peeks • Allow operator to fire before all inputs have arrived • Example were this is useful is the merge operation • Now execution can be nondeterministic • Answer depends on input arrival times Merge B A
Steaming Data Flow with Allocation • Removes the need for static links and operators. That is the Data Flow graph can change over time • More Power: Turing Complete • More difficult to analysis • Could be useful for some applications • Telecom applications. For example if a channel carries voice verses data the resources needed may vary greatly • Can take advantage of platforms that allow runtime reconfiguration
Sequential Control • Sequence of sub routines • Programming languages (C, Java) • Hardware control logic (Finite State Machines) • Transform global data state
Finite Automata (i.e. Finite State Machine) • Finite state • Can verify state reachablilty in polynomial time S1 S2 S3
Sequential Controller with Allocation • Adds ability to allocate memory. Equivalent to adding new states • Model becomes Turing Complete S1 S2 S3
Sequential Controller with Allocation • Adds ability to allocate memory. Equivalent to adding new states • Model becomes Turing Complete S1 S2 S4 S3 SN
Data Parallel • Multiple instances of a operation type acting on separate pieces of data. For example: Single Instruction Multiple Data (SIMD) • Identical match test on all items in a database • Inverting the color of all pixels in an image
Data Centric • Similar to Data flow, but state contained in the objects of the graph are the focus, not the tokens flowing through the graph • Network flow example Source1 Dest1 Source2 Switch Dest2 Source3 Flow rate Buffer overflow
Multi-threaded • Multi-threaded: a compute model made up a multiple sequential controllers that have communications channels between them • Very general, but often too much power and flexibility. No guidance for: • Ensuring determinism • Dividing application into threads • Avoiding deadlock • Synchronizing threads • The models discussed can be defined in terms of a Multi-threaded compute model
Streaming Data Flow as Multithreaded • Thread: is an operator that performs transforms on data as it flows through the graph • Thread synchronization: Tokens sent between operators
Data Parallel as Multithreaded • Thread: is a data item • Thread synchronization: data updated with each sequential instruction
Caution with Multithreaded Model • Use when a stricter compute model does not give enough expressiveness. • Define restrictions to limit the amount of expressive power that can be used • Define synchronization policy • How to reason about deadlocking
Other Models • “A Framework for Comparing Models of computation” [1998] • E. Lee, A. Sangiovanni-Vincentelli • Transactions on Computer-Aided Design of Integrated Circuits and Systems • “Concurrent Models of Computation for Embedded Software”[2005] • E. Lee, S. Neuendorffer • IEEE Proceedings – Computers and Digital Techniques
Next Lecture • System Architectures
Questions/Comments/Concerns • Write down • Main point of lecture • One thing that’s still not quite clear • If everything is clear, then give an example of how to apply something from lecture OR
Lecture Notes • Add CSP/Mulithread as root of a simple tree • 15+5(late start) minutes of time left • Think of one to two in class exercise (10 min) • Data Flow graph optimization algorithm? • Dead lock detection on a small model? • Give some examples of where a given compute model would map to a given application. • Systolic array (implement) or Dataflow compute model) • String matching (FSM) (MISD) • New image for MP3, too dark of a color