1 / 27

CAL Actor Language

Notes on an actor language Jörn W. Janneck Xilinx Inc. 13 February 2007 – 7 th Ptolemy Miniconference. CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application. CAL Actor Language. scripting actor specifications make it easier to write atomic actors

mariel
Download Presentation

CAL Actor Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Notes on an actor languageJörn W. JanneckXilinx Inc.13 February 2007 – 7th Ptolemy Miniconference

  2. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application CAL Actor Language • scripting actor specifications • make it easier to write atomic actors • experimenting with domain polymorphism • (code generation)

  3. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application Actions State actors in CAL guarded atomic actions encapsulated state

  4. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application actor SumAbs () Input ==> Output: sum := 0; action [a] ==> [sum] guard a >= 0 do sum := sum + a; end action [a] ==> [sum] guard a < 0 do sum := sum - a; end end SumAbs simple actors actor Sum () Input ==> Output: sum := 0; action [a] ==> [sum] do sum := sum + a; end end Sum Output Input

  5. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application nondeterminism actor NDMerge () Input1, Input2 ==> Output: action Input1: [x] ==> [x] end action Input2: [x] ==> [x] end end NDMerge Input1 Output Input2

  6. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application data-dependent token flow actor Select () S, A, B ==> Output: action S: [sel], A: [v] ==> [v] guard sel end action S: [sel], B: [v] ==> [v] guardnot sel end end S Select Output A B

  7. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application CAL anddomain polymorphism • two fundamental questions: • Can an actor be interpreted/used in a given MoC? • What is its interpretation? • domain-specific interpretation

  8. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application Example: SDF Add actor Add () Input1, Input2 ==> Output: action [a], [b] ==> [a + b] end end 1 Input1 1 Output Input2 1 actor AddSeq () Input ==> Output: action[a, b]==> [a + b] end end AddSeq 2 1 Output Input

  9. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application Example: SDF (cont’d) actor NDMerge () Input1, Input2 ==> Output: action Input1: [x] ==> [x] end action Input2: [x] ==> [x] end end NDMerge Input1 Output Input2 actor Merge () Input1, Input2==> Output: action [x1], [x2] ==> [x1, x2] end end Merge 1 Input1 2 Output Input2 1

  10. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application Merge 1 2 1 Some kind of “synchronous”... F 1 1 1 NDMerge A 2 1 1 1

  11. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application Example: CSP [ Input1 ? x -> Output ! x || Input2 ? x -> Output ! x ] actor NDMerge () Input1, Input2 ==> Output: action Input1: [x] ==> [x] end action Input2: [x] ==> [x] end end actor Add () Input1, Input2 ==> Output: action [a], [b] ==> [a + b] end end Input1 ? a -> Input2 ? b -> Output ! a + b [ Input1 ? a -> Input2 ? b || Input2 ? b -> Input1 ? a ] ; Output ! a + b

  12. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application Example: CSP (cont’d) actor Select () S, A, B ==> Output: action S: [sel], A: [v] ==> [v] guard sel end action S: [sel], B: [v] ==> [v] guardnot sel end end S ? sel; [ sel -> A ? v -> Output ! v || not sel -> B ? v -> Output ! v ] actor A () X, Y ==> Z: action X: [x1, x2] ==> [f(x1, x2)] guard P(x1, x2) end action Y: [y1, y2] ==> [f(y1, y2)] guard P(y1, y2) end end ?

  13. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application CAL and dataflow at Xilinx software class MyActor { schedule(); readPort( portNum ); writePort( portNum ); } actor source+ network simulation hardware high-level synthesis • new FPGA programming model & tools • hardware code generation • software (& mixed) code generation • driver application • MPEG4 Simple Profile Decoder • MPEG standardization effort • ISO/IEC 23001-4 (working draft): Codec Configuration Representation • ISO/IEC 23002-4 (working draft):Video Tool Library

  14. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application Ethernet UDP Memory Controller VGADisplay IP FPGA Programming In PracticeNetworked MPEG-4 Viewer XUP Board(2VP30) Microblaze running LWIP protocol stack Raster Scan Actor Decoder Actor Network VGA Display IP UDP over Ethernet Remote Video Stream Server LocalVGA Monitor

  15. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application 1 http://www.xilinx.com/bvdocs/ipcenter/data_sheet/ds520_prod_brf.pdf 2 BRAM-limited to 4-CIF image size. 3 Supports HD image size. Reduces to 16 BRAMs for 4-CIF image size. MPEG-4 SP Decoderquality of compiled code

  16. CAL @ Ptolemy • the language • domain-dependent interpretation • CAL @ Xilinx • overview • application b d c a a TI64xx MPEG-4 (CPU + L1 cache only) c FPGA MPEG-4 using traditional HDL flow (12 MM effort) d FPGA MPEG-4 using actor/dataflow synthesis (3 MM effort) b ISSCC’06 H.264 capable (includes periphery) comparing decoder solutions relative area efficiency • 10 • 5 • 2 • 1 CIF SD HD 10 100 1000 throughputmacroblocks/secx1000

  17. Thank You. Credits: Dave B. Parlour, Ian D. Miller, Johan Eker, Edward A. Lee, and many others. CAL actor language: embedded.eecs.berkeley.edu/caltrop

  18. BACKUP

  19. programming language adoption Name TPCI TPCI cum. Year C 17.66% 17.66% 1973 C++ 11.06% 28.73% 1985 Perl 5.48% 34.20% 1987 Python 3.47% 37.67% 1990 VB 9.73% 47.40% 1991 Delphi 2.15% 49.54% 1994 Java 21.17% 70.72% 1995 PHP 9.86% 80.58% 1995 JavaScript 2.20% 82.78% 1995 C# 3.07% 85.85% 2002 100 cumulative TCPI by language creation date (for top 10 languages) JavaPHPJavaScript C# 50 Delphi VB Python Perl C++ C 1970 1975 1980 1985 1990 1995 2000 2005 source: TIOBE Programming Community Index, TPCI, October 2006, http://www.tiobe.com/tpci.htm

  20. Smaller, Faster, Easier Too good to be true? • This is what happens when design effort is constrained. • The key is enabling architectural exploration with rapid turn-around time. • New decoder architecture incorporates many improvements over original design in motion compensation, AC/DC reconstruction, parser, 2-d IDCT. • Approximate manpower numbers: • VHDL decoder: 12 months • Dataflow decoder: 3 months

  21. Architectural ExplorationMPEG4 Motion Compensator video stream feedback PROBLEM! Memory latency for random access reads and writes prevents real-world operation at HD rates. video frame buffer(off-chip DRAM)

  22. First Step: Try on-chip cache policy1Pass-through just to make sure model is OK. • Break the address and data streams, insert a cache placeholder. • Insert different policies, see what happens. policy2Insert a cache actor in the read path and monitor statistics.

  23. Simulation result with policy2 • Memory controller performance 133MHz clock 32 pixel cache line fill in ~18 cycles • Worst case compensation is 81 reads for an 8x8 block. • 8.3% miss rate impliesaverage read is ~ 2.4 cycles • Rate limit is 44 Mpixel/s • HD (1920p, 4:2:0, 30fps) rate target is 93.3 Mpixel/s • Options for improvement- more expensive controller- much better cache policy- application-aware prefetch Monitor console Frame 1 OK time: 28111ms Frame 2 OK time: 23834ms Requests: 49456, Hits: 45360Miss rate: 8.28% Frame 3 OK time: 27369ms Requests: 98704, Hits: 90512Miss rate: 8.30%

  24. Step2: Application-aware prefetch prefetch requests to frame buffer prefetch data replace cache with “search window” compensation addresses now relative to search window search window senses block type

  25. Results of prefetch strategy • Better performance • prefetch needs to operate at 3x pixel rate • exploits longer burst read with application-awareness(longer cache line did not help policy2 significantly) • 64 pixels in 26 cycles → average read is ~ 0.4 cycles • peak theoretical performance is 111 Mpixel/s • exceeds HD rate target with cheap DRAM • Substantial change to overall model behavior, but • impact limited to two actors • no refactoring of control in other actors needed

  26. The FPGA programming problem • Big, heterogeneous chips • circuit-design programming (+ C, Simulink, ...) • 1985: • 128 4-LUTs • 2006: [V5-LX] • 207360 6-LUTs • 10Mbit BRAM • 192 ALUs

More Related