1 / 35

Model Based Design for DSP: Presentation to Stevens

Maryland DSPCAD Research Group (http://www.ece.umd.edu/DSPCAD/home/dspcad.htm) Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park. Model Based Design for DSP: Presentation to Stevens.

meagan
Download Presentation

Model Based Design for DSP: Presentation to Stevens

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maryland DSPCAD Research Group (http://www.ece.umd.edu/DSPCAD/home/dspcad.htm) Department of Electrical and Computer Engineering, andInstitute for Advanced Computer Studies University of Maryland, College Park Model Based Design for DSP:Presentation to Stevens Will Plishker, Chung-Ching Shen, Nimish Sane, George Zaki, SoujanyaKedilaya, Shuvra S. Bhattacharyya

  2. Outline • Model Based Design • Dataflow Interchange Format • Multiprocessor Scheduling • Preliminary Setup and Results with GPUs • Future Directions

  3. Implementation Gap Introduction Abstract representation of an algorithm 1 Pattern (4 bits) Threshold Module • In modern, complex systems we would like to • Create an application description independent of the target • Interface with a diverse set of tools and teams • Achieve high performance • Arrive at an initial prototype quickly • But algorithms are far removed from their final implementation • Low level programming environments • Diverse and changing platforms • Non-uniform functional verification • Entrenched design processes • Tool selection 2 Pattern comparator 3 4 Decision (1 bit) 1 2 3 4 Decision check NO zero (38 bit) E Adder E/Gamma YES EGamma (1 bit) H Adder Finegrain (1 bit) 38 bit Fine Grain OR Low level, high performance, implementation Channel Et Adder Channel Et 4x9 bits

  4. Model-Based Design for Embedded Systems • High level application subsystems are specified in terms of components that interact through formal models of computation • C or other “platform-oriented” languages can be used to specify intra-component behavior • Model-specific language can be used to specify inter-component behavior • Object-oriented techniques can be used to maintain libraries of components • Popular models for embedded systems • Dataflow and KPNs (Kahn process networks) • Continuous time, discrete event • FSM and related control formalisms

  5. Dataflow-based Design: Related Trends • Dataflow-based design (in our context) is a specific form of model-based design • Dataflow-based design is complementary to • Object-oriented design • DSP C compiler technology • Synthesis tools for hardware description languages (e.g., Verilog and VHDL)

  6. Example: Dataflow-based design for DSP Example from Agilent ADS tool

  7. Example: QAM Transmitter in National Instruments LabVIEW Rate Control QAM Encoder PassbandSignal TransmitFilters Source: [Evans 2005]

  8. Dataflow Models DSP Designs Crossing the Implementation Gap:Design Flow Using DIF Static SDF MDSDF Dynamic Meta-Modeling Signal Proc HSDF CSDF CFDF BDF PDF BLDF Image/Video Comm Sys The DIF Language (TDL) DIF Specification The DIF Package (TDP) Front-end Algorithms DIF-to-C AIF / Porting DIF Representation DSP Libraries Ptolemy Ex/Im DIF-A T Ex/Im Other Ex/Im Dataflow- based DSP Design Tools Ptolemy II Autocoding Toolset Other Tools VSIPL TI Other Embedded Processing Platforms Java Ada Other Embedded Platforms C Java VM VDM DSP

  9. Dataflow with Software Defined Radio:DIF + GNU Radio GRC DIF specification (.dif) 1) Convert or generate .dif file (Complete) 3b) Architecture specification (.arch?) The DIF Package (TDP) XML Flowgraph (.grc) Python Flowgraph (.py) • Processors • Memories • Interconnect • 4) Architecture aware MP scheduling • (assignment, ordering, invocation) Uniprocessor Scheduling GNU Radio Engine Python/C++ DIF Lite 2) Execute static schedules from DIF (Complete) Schedule (.dif, .sched) Platform Retargetable Library 3a) Perform online scheduling Legend Existing or Completed Proposed Platforms Multi-processors GPUs Cell FPGA

  10. Y Z X 5 Background: Dataflow Graphs • Vertices (actors) represent computation • Edges represent FIFO buffers • Edges may have delays, implemented as initial tokens • Tokens are produced and consumed on edges • Different models have different rules for production (SDF=fixed, CSDF=periodic, BDF=dynamic) p1 c1 p2 c2 e2 e1

  11. Evolution of Dataflow Models of Computation for DSP: Examples • Computation Graphs and Marked Graphs [Karp 1966, Reiter 1968] • Synchronous dataflow, [Lee 1987] • Static multirate behavior • SPW (Cadence) , National Instruments LabVIEW, and others. • Well behaved stream flow graphs [1992] • Schemas for bounded dynamics • Boolean/integer dataflow [Buck 1994] • Turing complete models • Multidimensional synchronous dataflow [Lee 1992] • Image and video processing • Scalable synchronous dataflow [Ritz 1993] • Block processing • COSSAP (Synopsys) • CAL [Eker 2003] • Actor-based dataflow language • Cyclo-static dataflow [Bilsen 1996] • Phased behavior • Eonic Virtuoso Synchro, Synopsys El Greco and Cocentric, Angeles System Canvas • Bounded dynamic dataflow • Bounded dynamic data transfer [Pankert 1994] • The processing graph method [Stevens, 1997] • Reconfigurable dynamic dataflow • U. S. Naval Research Lab, MCCI Autocoding Toolset • Stream-based functions [Kienhuis 2001] • Parameterized dataflow [Bhattacharya 2001] • Reconfigurable static dataflow • Meta-modeling for more general dataflow graph reconfiguration • Reactive process networks [Geilen 2004] • Blocked dataflow [Ko 2005] • Image and video through parameterized processing • Windowed synchronous dataflow [Keinert 2006] • Parameterized stream-based functions [Nikolov 2008] • Enable-invoke dataflow [Plishker 2008] • Variable rate dataflow [Wiggers 2008]

  12. X CSDF Modeling Design Space X r C, BDF, DDF e w X o PCSDF p X e PSDF v i s s e r X X p MDSD, WBDF CSDF, SSDF x E X SDF Verification / synthesis power

  13. Dataflow Interchange Format • Describe DF graphs in text • Simple DIF file: dif graph1_1 { topology { nodes = n1, n2, n3, n4; edges = e1 (n1, n2), e2 (n2, n1), e3 (n1, n3), e4 (n1, n3), e5 (n4, n3), e6 (n4, n4); } }

  14. More features of DIF • Ports interface { inputs = p1, p2:n2; outputs = p3:n3, p4:n4; } • Hierarchy refinement { graph2 = n3; p1 : e3; p2 : e4; p3 : e5; p4 : p3; }

  15. More features of DIF 4096 4096 • Production and consumption production { e1 = 4096; e10 = 1024; ... } consumption { e1 = 4096; e10 = 64; ... } • Computation keyword • User defined attributes 1024 64

  16. dataflowModelgraphID { basedon { graphID; } topology { nodes = nodeID, ...; edges = edgeID (srcNodeID, snkNodeID), ...; } interface { inputs = portID [:nodeID], ...; outputs = portID [:nodeID], ...; } parameter { paramID [:dataType]; paramID [:dataType] = value; paramID [:dataType] : range; } refinement { subgraphID = supernodeID; subPortID : edgeID; subParamID = paramID; } The DIF Language Syntax builtInAttr { [elementID] = value; [elementID] = id; [elementID] = id1, id2, ...; } attributeusrDefAttr{ [elementID] = value; [elementID] = id; [elementID] = id1, id2, ...; } actornodeID { computation = stringValue; attrID [:attrType] [:dataType] = value; attrID [:attrType] [:dataType] = id; attrID [:attrType] [:dataType] = id1, ...; } }

  17. Uniprocessor Scheduling for Synchronous Dataflow • An SDF graph G = (V,E) has a valid schedule if it is deadlock-free and is sample rate consistent (i.e., it has a periodic schedule that fires each actor at least once and produces no net change in the number of tokens on each edge). • Balance eqs: eE, prd(e) xq[src(e)] = cns(e) xq[snk(e)]. • Repetition vector q is the minimum solution of balance eqs. • A valid schedule is then a sequence of actor firings where each actor v is fired q[v] (repetition count) times and the firing sequence obeys the precedence constraints imposed by the SDF graph.

  18. Example: Sample Rate Conversion • Flat strategy • Topological sort the graph and iterate each actor vq[v] times. • Low context switching but large buffer requirement and latency • CD to DAT Flat Schedule: • (147A)(147B)(98C)(56D)(40E)(160F) CD to DAT: 44.1 kHz to 48 kHz sampling rate conversion. (A) (C) (D) (E) (F) (B)

  19. Acyclic pairwise grouping of adjacent nodes (APGAN) An adaptable (to different cost functions) and low-complexity heuristic to compute a nested looped schedule of an acyclic graph in a way that precedence constraints (topological sort) is preserved through the scheduling process. Dynamic programming post optimization (DPPO) Dynamic programming over a given actor ordering (any topological sort). GDPPO, CDPPO, SDPPO. Recursive procedure call (RPC) based MAS Generate MASs for a given R-schedule through recursive graph decomposition. The resulting schedule is bounded polynomially in the graph size. Scheduling Algorithms

  20. Representative Dataflow Analyses and Optimizations • Bounded memory and deadlock detection: consistency • Buffer minimization: minimize communication cost • Multirate loop scheduling: optimize code/data trade-off • Parallel scheduling and pipeline configuration • Heterogeneous task mapping and co-synthesis • Quasi-static scheduling: minimize run-time overhead • Probabilistic design: adapt system resources and exploit slack • Data partitioning: exploit parallel data memories • Vectorization: improve context switching, pipelining • Synchronization optimization: self-timed implementation • Clustering of actors into atomic scheduling units

  21. Multiprocessor Scheduling • Multiprocessor scheduling problem: • Actor assignment (mapping) • Actor ordering • Actor invocation • Approaches to each of these tend to be platform specific • Tools can be brought under a common formal umbrella

  22. Multiprocessor Scheduling Application Model, G(V, E, t(v), C(e)) Mapping/Scheduling

  23. Multiprocessor Mapping Application Model, G(V, E, t(v), C(e)) Mapping P1 P2 P3 P4

  24. Invocation Example: Self-Timed (ST) scheduling Assignment and ordering performed at compile-time. Invocation performed at run-time (via synchronization) Proc 1 Gantt Chart for ST schedule Proc 1 Proc 2 Proc 3 Proc 4 Proc 5 A Proc 4 E D H Proc 3 B C Proc 5 F G Proc 2 Execution Times A, B, F : 3 C, H : 5 D : 6 E : 4 G : 2 Application Graph 18 TST=9

  25. Multicore Schedules • Traditional multicore scheduling • Convert application DAG to Homogenous Synchronous Dataflow (HSDF) • Perform HSDF mapping  • Problem: exponential graph explosion • Our solution: • single processor schedule (SPS) represented as a generalized schedule tree (GST) • generate an equivalent multiprocessor schedule (MPS) to be represented as a forest of GSTs.

  26. Traditional Dataflow Multiprocessor Scheduling (MPS) Synchronous Dataflow (SDF) representation of application 3 2 A B C 1 1 A 1 1 1 B A 1 1 1 A 1 Homogenous SDF representation of application 1 C 1 A 1 1 1 B A 1 1 1 A 1

  27. GST Representation for MPS - Simple Example (a) An SDF graph (b) SPS as a GST P1 P2 P3 (c) MPS represented as a forest of GSTs

  28. Demonstration on GPUs:Start with parallel actors • Within an actor (FIR Filter). • Limitation (IIR Filter)

  29. Individual actor results:CUDA FIR vs. Stock GR FIR

  30. Individual Actor Results:Turbo Block Decode

  31. Future Direction: Tackling the general MP scheduling problem with dataflow analysis • Many dataflow analysis techniques are available once the problem is well defined in dataflow terms • Maximizemulticore utilization by replicatingand fusing actors/blocks • Stateless vs. stateful • Computation to communication ratios • Firing rates/execution times to number of blocks • Once application is mapped to blocks/processors • Single processor scheduling to minimize buffering

  32. Focus first on MP Scheduling for GPUs • Blocks • Threads • Memory

  33. 1 7 2 5 4 8 3 6 9 Refine to a simpler question:When to off-load onto a GPU? • Given: • An application graph • Actor timing characteristics for communication and computation • A target architecture with heterogeneous multiprocessing • Find optimal implementation • Latency • Throughput ? CPU GPU

  34. Summary • Model Based Design • Dataflow Interchange Format • Multiprocessor Scheduling • Preliminary Setup and Results with GPUs • Future Directions

More Related