300 likes | 394 Views
Raw Fabrics for PCA Status and Plans. Anant Agarwal Saman Amarasinghe. Agenda. 09:00 – 10:00 Raw Fabrics Status and Plans Agarwal 10:00 – 10:30 Streams and software systems Amarasinghe 10:30 – 11:00 Morphware update Thies 11:00 – 11:20 Operating system update Strumpen
E N D
Raw Fabrics for PCAStatus and Plans Anant Agarwal Saman Amarasinghe
Agenda • 09:00 – 10:00 Raw Fabrics Status and Plans Agarwal • 10:00 – 10:30 Streams and software systems Amarasinghe • 10:30 – 11:00 Morphware update Thies • 11:00 – 11:20 Operating system update Strumpen • 11:20 – 11:50 Lab visit and demos All • 11:50 – 12:20 Applications Crago • 12:20 – 12:35 Stream Algorithms Hoffman • 12:35 – 12:50 x86 on Raw Wentzlaff • 12:50 – 1:30 Discussion All • 12:00 Lunch
Raw Architecture IMEM DMEM PC REG FPU ALU SMEM PC SWITCH • A 16-tile 2-D fabric (1K tiles in 2010) • Memory is distributed • RISC-like core in each tile, with FP • Fast, programmable interconnect (r-r 3 cycles) • ~1100 off-chip data I/O The Raw Chip RawTile Packet stream Disk stream Video1 SDRAM Taylor et al., IEEE Micro ‘02, ISSCC ‘03
Well… Raw Handheld First program, 80MHz Jan 03 “Thorough testing” 300 MHz May 03
16 16 Audio A-to-D Audio A-to-D Audio A-to-D FPGA FPGA RAW 1024 Channel Audio Beam Forming Microphone Array First proposed at review in Nov 02 ADAT Optical 16 2 1 2 … 64 Audio Interface For RAW 128 190 . . . . . . 190 One PCA chip beats current 640 channel custom hardware beamformer!
1024 microphones 32 16 16 KHz 24 bits A-to-D FPGA FPGA CPLD 12 Mbits/sec 384 Mbits/sec RAW 768 Kbits/sec A 1024-Node Acoustic Beamformer
Raw Chip Specifications • IBM SA27E Process • 0.15, 6-metal copper ASIC process • 16 Tile RAW Processor • 18.17mm x 18.17mm • 1657 pin CCGA package • 1152 signal pins • Clock and Power • 420MHz (actual) • 10 watts (power save turned on) • 18 watts typical • 35 watts if everything is used!
“PowerPoint” Performance • Raw Chip (@420MHz) • ~7 GOPS/GFLOPS (SP) • ~100 GBytes/s of on-chip memory bandwidth • ~90 GBytes/s of on-chip “bisection bandwidth” • ~40 GBytes/s I/O bandwidth No bugs so far!
Progress on the Raw Chip • Complete Spec Feb ‘00 • IBM Initial Design Review Mar ‘00 • Feature complete Netlist May ‘00 • Arch. Timing optimization Feb ‘01 • Floorplanning Mar ‘01 • Prelim Placement/Timing opt Jun ‘01 • Raw H21 system board (ISI) Jun ‘01 • Raw in Emulation Jun ‘01 • Detailed Placement/Timing opt Dec ‘01 • Release to IBM for initial layout Dec ‘01 • Timing closure after layout Mar ‘02 • All backend checks pass May ‘02 • Release to IBM for production layout May ‘02 • Final function and timing validation Jul ‘02 • Final manuf. release to IBM Aug ‘02 • Chip prototypes back Oct ’02
PCA Raw Fabrics, Systems, Apps • Raw Chips Oct 02 • Handheld (H) board arrives from ISI Dec 02 • H Board bringup – Small program 80 MHz Jan 03 • H Board testing, speed gasket 300 Mhz May 03 • USB Interface, 500 Mbits/s xface July 03 • H Board refab (in fab now), to partners Sep 03 • Fabric-Array and Fabric-IO board design Jun 03 • Fabric-Array and Fabric-IO board fab Sep 03 • 16 and 64-chip PCA fabric bringup • Applications and experiments • PCA demonstrations • Embedded networking board • Audio beamformer system • 802.11b,g,a wireless system • Graphics system • Virtual x86
Partner Support Activity • Handheld boards Sep 03 • USB xface • PCI xface • “Raw User Day” videos and documentation • Expansion interface testing and documentation (used in beamformer) • Software distribution • Simulator (useful to debug small assembly programs) • C compiler • rGDB debugger • Streamit language and compiler • Lots of other goodies • 1024-tile (64 chip) fabric simulator (since Dec 02) • 16, 64 node Fabrics
Network I/O DRAM Network I/O Network I/O DRAM Network I/O DRAM Network I/O DRAM Network I/O DRAM DRAM 64-Node Raw Fabric
Fabric System Architecture • Design: two distinct board designs; HOW??? • replicate and connect • Board 1: Quad Raw Board • Board 2: I/O & Memory Board
The Challenge • How do we use the same board designs for every position in the fabric? Fabric board is easy enough.
The Challenge • How do we use the same board designs for every position in the fabric? E.g., I/O board
The Saman Flip • How do we use the same board designs for every position in the fabric? • IO Board • symmetric about x-axis • compensate for board flip in firmware
Quad Board • 4 RAW chips per board • 16 152-pin MICTOR connectors total (4 per side) • Power distributed over separate cables from other signals • MICTOR connectors are stacked to save space
11” 11” Quad Board Layout
I/O & Memory Board • 4 FPGAs • 2 64-bit PCI slots • 2 Expansion Ports (same as on Raw Handheld board) • 4 SDRAM banks • symmetric design 11”
Power Distribution • 48V distributed to all boards, then down-converted • DC-DC converters on each board • 1.8V Raw core • 1.5V Raw I/O • 3V other logic • 1.5V is also further down converted to 0.75V supply for HSTL termination • System-wide power supply can be up to 3kW At 1.8V, 64 Raw chips can draw 1280 amps!!!!!!!!!!!
power supply Power Distribution • Distributed over special connectors, separately from signals • external power supply feeds top and bottom rows of I/O Boards
clock generator Clock Distribution • signal generated and distributed from a center board over MICTOR connectors • uses DLLs to deskew the clock at each connection • every quad board sends and receives a copy of the clock to its neighbors and we can select which of the input clocks to use using dip switches
Clock Distribution from external input • Synchronized clocks for all Raw chips in fabric • Delay-Locked Loop uses feedback to tune delay line for clock synchronization • Dip switches keep clock dist. general no custom firmware DLL
reset originates here Reset Distribution • signal generated by one of the I/O boards and distributed over MICTOR connectors
PCA Raw Fabrics, Systems, Apps • Raw Chips Oct 02 • Handheld (H) board arrives from ISI Dec 02 • H Board bringup – Small program 80 MHz Jan 03 • H Board testing, speed gasket 300 Mhz May 03 • USB Interface, 500 Mbits/s xface July 03 • H Board refab (in fab now), to partners Sep 03 • Fabric-Array and Fabric-IO board design Jun 03 • Fabric-Array and Fabric-IO board fab Sep 03 • 16 and 64-chip PCA fabric bringup • Applications and experiments • PCA demonstrations • Embedded networking board • Audio beamformer system • 802.11b,g,a wireless system • Graphics system • Virtual x86