1 / 30

Raw Fabrics for PCA Status and Plans

Raw Fabrics for PCA Status and Plans. Anant Agarwal Saman Amarasinghe. Agenda. 09:00 – 10:00 Raw Fabrics Status and Plans Agarwal 10:00 – 10:30 Streams and software systems Amarasinghe 10:30 – 11:00 Morphware update Thies 11:00 – 11:20 Operating system update Strumpen

buck
Download Presentation

Raw Fabrics for PCA Status and Plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Raw Fabrics for PCAStatus and Plans Anant Agarwal Saman Amarasinghe

  2. Agenda • 09:00 – 10:00 Raw Fabrics Status and Plans Agarwal • 10:00 – 10:30 Streams and software systems Amarasinghe • 10:30 – 11:00 Morphware update Thies • 11:00 – 11:20 Operating system update Strumpen • 11:20 – 11:50 Lab visit and demos All • 11:50 – 12:20 Applications Crago • 12:20 – 12:35 Stream Algorithms Hoffman • 12:35 – 12:50 x86 on Raw Wentzlaff • 12:50 – 1:30 Discussion All • 12:00 Lunch

  3. Raw Architecture IMEM DMEM PC REG FPU ALU SMEM PC SWITCH • A 16-tile 2-D fabric (1K tiles in 2010) • Memory is distributed • RISC-like core in each tile, with FP • Fast, programmable interconnect (r-r 3 cycles) • ~1100 off-chip data I/O The Raw Chip RawTile Packet stream Disk stream Video1 SDRAM Taylor et al., IEEE Micro ‘02, ISSCC ‘03

  4. Raw Handheld

  5. Well… Raw Handheld First program, 80MHz Jan 03 “Thorough testing” 300 MHz May 03

  6. 16 16 Audio A-to-D Audio A-to-D Audio A-to-D FPGA FPGA RAW 1024 Channel Audio Beam Forming Microphone Array First proposed at review in Nov 02 ADAT Optical 16 2 1 2 … 64 Audio Interface For RAW 128 190 . . . . . . 190 One PCA chip beats current 640 channel custom hardware beamformer!

  7. 1024 microphones 32 16 16 KHz 24 bits A-to-D FPGA FPGA CPLD 12 Mbits/sec 384 Mbits/sec RAW 768 Kbits/sec A 1024-Node Acoustic Beamformer

  8. 2-Microphone Card

  9. 32-Microphone Column

  10. Raw Chip Specifications • IBM SA27E Process • 0.15, 6-metal copper ASIC process • 16 Tile RAW Processor • 18.17mm x 18.17mm • 1657 pin CCGA package • 1152 signal pins • Clock and Power • 420MHz (actual) • 10 watts (power save turned on) • 18 watts typical • 35 watts if everything is used!

  11. “PowerPoint” Performance • Raw Chip (@420MHz) • ~7 GOPS/GFLOPS (SP) • ~100 GBytes/s of on-chip memory bandwidth • ~90 GBytes/s of on-chip “bisection bandwidth” • ~40 GBytes/s I/O bandwidth No bugs so far!

  12. Progress on the Raw Chip • Complete Spec Feb ‘00 • IBM Initial Design Review Mar ‘00 • Feature complete Netlist May ‘00 • Arch. Timing optimization Feb ‘01 • Floorplanning Mar ‘01 • Prelim Placement/Timing opt Jun ‘01 • Raw H21 system board (ISI) Jun ‘01 • Raw in Emulation Jun ‘01 • Detailed Placement/Timing opt Dec ‘01 • Release to IBM for initial layout Dec ‘01 • Timing closure after layout Mar ‘02 • All backend checks pass May ‘02 • Release to IBM for production layout May ‘02 • Final function and timing validation Jul ‘02 • Final manuf. release to IBM Aug ‘02 • Chip prototypes back Oct ’02

  13. PCA Phase 2 Effort

  14. PCA Raw Fabrics, Systems, Apps • Raw Chips Oct 02 • Handheld (H) board arrives from ISI Dec 02 • H Board bringup – Small program 80 MHz Jan 03 • H Board testing, speed gasket 300 Mhz May 03 • USB Interface, 500 Mbits/s xface July 03 • H Board refab (in fab now), to partners Sep 03 • Fabric-Array and Fabric-IO board design Jun 03 • Fabric-Array and Fabric-IO board fab Sep 03 • 16 and 64-chip PCA fabric bringup • Applications and experiments • PCA demonstrations • Embedded networking board • Audio beamformer system • 802.11b,g,a wireless system • Graphics system • Virtual x86

  15. Partner Support Activity • Handheld boards Sep 03 • USB xface • PCI xface • “Raw User Day” videos and documentation • Expansion interface testing and documentation (used in beamformer) • Software distribution • Simulator (useful to debug small assembly programs) • C compiler • rGDB debugger • Streamit language and compiler • Lots of other goodies • 1024-tile (64 chip) fabric simulator (since Dec 02) • 16, 64 node Fabrics

  16. Network I/O DRAM Network I/O Network I/O DRAM Network I/O DRAM Network I/O DRAM Network I/O DRAM DRAM 64-Node Raw Fabric

  17. Fabric System Architecture • Design: two distinct board designs; HOW??? • replicate and connect • Board 1: Quad Raw Board • Board 2: I/O & Memory Board

  18. The Challenge • How do we use the same board designs for every position in the fabric? Fabric board is easy enough.

  19. The Challenge • How do we use the same board designs for every position in the fabric? E.g., I/O board

  20. The Saman Flip • How do we use the same board designs for every position in the fabric? • IO Board • symmetric about x-axis • compensate for board flip in firmware

  21. Quad Board • 4 RAW chips per board • 16 152-pin MICTOR connectors total (4 per side) • Power distributed over separate cables from other signals • MICTOR connectors are stacked to save space

  22. 11” 11” Quad Board Layout

  23. I/O & Memory Board • 4 FPGAs • 2 64-bit PCI slots • 2 Expansion Ports (same as on Raw Handheld board) • 4 SDRAM banks • symmetric design 11”

  24. IO/Memory Board schematic

  25. Power Distribution • 48V distributed to all boards, then down-converted • DC-DC converters on each board • 1.8V Raw core • 1.5V Raw I/O • 3V other logic • 1.5V is also further down converted to 0.75V supply for HSTL termination • System-wide power supply can be up to 3kW At 1.8V, 64 Raw chips can draw 1280 amps!!!!!!!!!!!

  26. power supply Power Distribution • Distributed over special connectors, separately from signals • external power supply feeds top and bottom rows of I/O Boards

  27. clock generator Clock Distribution • signal generated and distributed from a center board over MICTOR connectors • uses DLLs to deskew the clock at each connection • every quad board sends and receives a copy of the clock to its neighbors and we can select which of the input clocks to use using dip switches

  28. Clock Distribution from external input • Synchronized clocks for all Raw chips in fabric • Delay-Locked Loop uses feedback to tune delay line for clock synchronization • Dip switches keep clock dist. general  no custom firmware DLL

  29. reset originates here Reset Distribution • signal generated by one of the I/O boards and distributed over MICTOR connectors

  30. PCA Raw Fabrics, Systems, Apps • Raw Chips Oct 02 • Handheld (H) board arrives from ISI Dec 02 • H Board bringup – Small program 80 MHz Jan 03 • H Board testing, speed gasket 300 Mhz May 03 • USB Interface, 500 Mbits/s xface July 03 • H Board refab (in fab now), to partners Sep 03 • Fabric-Array and Fabric-IO board design Jun 03 • Fabric-Array and Fabric-IO board fab Sep 03 • 16 and 64-chip PCA fabric bringup • Applications and experiments • PCA demonstrations • Embedded networking board • Audio beamformer system • 802.11b,g,a wireless system • Graphics system • Virtual x86

More Related