350 likes | 473 Views
RAMP Retreat Summer 2006. Break Session Leaders & Questions Greg Gibeling, Derek Chiou, James Hoe, John Wawrzynek & Christos Kozyrakis 6/21/2006. Breakout Topics. RDL & Design Infrastructure RAMP White Caches, Network & IO (Uncore) RAMP2 Hardware BEE3 OS, VM and Compiler Software Stack.
E N D
RAMP Retreat Summer 2006 Break Session Leaders & Questions Greg Gibeling, Derek Chiou, James Hoe, John Wawrzynek & Christos Kozyrakis 6/21/2006
Breakout Topics • RDL & Design Infrastructure • RAMP White • Caches, Network & IO (Uncore) • RAMP2 Hardware • BEE3 • OS, VM and Compiler • Software Stack
RDL & Design Infrastructure • Leader/Reporter: Greg Gibeling • Topics • Features & Schedule • Proposals • Multi-platform migration • Languages • Which languages, priorities • Assignments for support • Debugging – Models & Requirements • Retargeting to ASICs (Platform Optimization)
RDL & DI Notes (1) • Languages • Hardware • Verilog • BlueSpec • IBM uses VHDL • Software? • Multi-Platform • Integration of hardware simulations • Control of multiplexing • Needed for efficiency! • Possible through channel & link parameters • Features • Meta-types • Component (and unit) libraries
RDL & DI Notes (2) • Debugging • Split target model • RDL Target Design • Exposed to a second level of RDL • Allows statistics aggregation • Modeling of noisy channels • Integration with unit internals • Event & State Extraction • Connection to processor debugging tools • People clearly want this ASAP
RDL & DI Notes (3) • Debugging (Integrated) • Message tracing • Causality Diagrams • Framework to debug through units • Checkpoints • Injection • Single stepping • May not be widely used • But cheap to implement • Watch/Breakpoints
RDL & DI Notes (4) • Why Java? • Runs on various platforms • Recompilation is generally pretty painful • Decent type system in Java 1.5 • Perfect for plugin infrastructure (e.g. OSGi) • When to use RDL • Detailed timing model • Great at abstracting inter-chip comm • Perfect platform for partitioning designs • Concise, logical specification • Support for the debugging framework • With standard interfaces, good for sharing
RDL & DI Notes (5) • Basic Infrastructure • First system bringup • Interfaces with workstations • Initial board support • Standard interfaces (RDL and otherwise) • Processor Replacements • Board Support • Currently a heroic effort • Solutions • Standardized components? • Generators?
RDL & DI Notes (6) • Timelines • Greg’s Goals • 10/2006 should see RCF/RDLC3 • 11/2006 should see documentation • Debugging (Integrated) should be ASAP • Manpower • Board support • First board bring up • RDL & RDLC users • Standard interfaces • Features & Documentation
RAMP White • Leader/Reporter: Derek Chiou • Topics • Two day break-out • First day should be pro/con • Overall • Preliminary Plan Evaluation • Who is doing exactly what? • ISA for RAMP White • OpenSPARC • 32bit Leon • PowerPC 405 • Processor agnosticism • Implementation • Reimplementation will be required • Test suites from companies are very useful
RAMP White Notes (1) • Use embedded PowerPC core first • Available • Debugged • Can run full OS today • FPGA chip space is already committed • PowerPC and Sparc are both candidates • PowerPC pros • Embedded processor is PowerPC • Sparc pros • 64b available today • Wait and see on soft-core for RAMP-White • from Derek go here
RAMP White Notes (2) • >= 256 processors • Can buy 64 processors today • Reasonable speed • 10’s of MHz • With 280K LUTs in Virtex 5, assume 50% for processor but 80% for ease of place-and-route • 100K LUTs for processors • Need 4 per FPGA (16 per board, 16 boards) • 25K LUTs per processor
RAMP White Notes (3) • Embedded PowerPC core (it’s there and better performance than any soft-core) • Soft L1 data cache (no L2) • Hard L1 instruction cache • Emulation???? • Ring coherence (a la IBM) • Linux on top of embedded PowerPC core • NSF mount for disk access • Mark’s port of Peh’s and Dally’s router • To do: • Ring coherence + L1 data cache + memory interface • RDL for modules • Software port • Timing models for memory, ring, cache, processor? • integration
RAMP White Notes (4) • RAMP-White Greek • Beta • More general fabric using same router • Still use ring coherence • Gamma • James Hoe’s coherence engine • Delta • Soft core integration
Caches, Networks & IO (Uncore) • Leader/Reporter: James Hoe • Topics • CPU, Cache and Memories • Hybrid FPGA Cosimulation • Network Storage • Interfaces • Especially with respect to interfaces • Components, not sub-frameworks • Phase uncore abilities
Uncore Notes (1) • A fully-system has more than just CPUs and memory • I/O is very important • Getting RAMP to “work” • Just like the real thing (from SW and OS’s perspective) • Software porting/development • Performance studies • Someone has to build the “uncore”? • Co-simulation • Direct HW support for paravirtualization / VM
Uncore Notes (2) • Why make RAMP white generic? • What is a more interesting target system? • What is a more relevant target system? • Building a system without an application in mind? • Would anyone care about RAMP-“vanilla”?
Uncore Notes (3) • Why insist on directory-based CC for 1000 nodes • Today’s large SMPs (at 100+ ways) are actually snoopy-based • Plug in 8-core CMPs, that is a 1000-node snoopy system (that the industry may be more interested it in)
Uncore Notes (4) • Let’s ping down a reference system architecture (including the uncore) • minimum modules required? • optional modules supported? • fix standard interfaces between modules • RDL script for RAMP white?? • Need more than a block diagram for RAMP white
Uncore Notes (5) • Requests and Ideas for RDL • Compensate for skewed raw performance of components (for timing measurements) • Large I/O bandwidth relative to CPU throughput • Need knobs to dial-in different rates for experiments • Some form of HW/SW co-simulation • Built-in performance monitoring
Uncore Notes (6) • Sanity Check • 1000 processing nodes: no problem • I/O: we can fake it somehow • DRAM for 1000 processing node • Not easy to cheat on this one
RAMP2 Hardware (BEE3) • Leader/Reporter: Dan Burke & John Wawrzynek • Topics • Follow up to XUP • Should RAMP embrace XUP at low end? • Inexpensive small systems • Size & scaling of new platform • More than 40 FPGAs? • Technical Questions • Reconsider use of SRAM • DRAM Capacity • Presence of on-board hard CPUs • On-board interfaces (PCI-Express) • Project Questions • Timelines • Definitely need one • Packaging • Pricing (Especially FPGAs) • Design for largest FPGA, change part at solder time? • Evaluation of Chen Chang’s Design
RAMP2 HW Notes (1) • Follow-up to XUP • XUP has been useful to the project, particularly for early development efforts. • Xilinx will continue to design and support new XUP boards • No v4 version planned. • V5 version will be out Q2 next year. • For BEE3 can't really count on V5 FX in 2Q next year. • Perhaps use a separate (AMCC) powerPC processor chip.
RAMP2 HW Notes (2) • Size and Scaling of new platform: • Given potential processor core density issue, will need to plan on a system that can scale past 40 FPGAs. • Better compatibility with new XUP is important: • ex: DRAM standard (better sharing of memory controllers) • USB use Cypress CY7300 for USB compatibility with Xilinx core. • Our design and production of BEE3 is timed to the production of V5 parts. We need to better understand RAMP team schedule for RAMP white. • Hope to be able to choose the package and have flexibility in part sizes and ideally part feature set. • How about a daughterboard for FPGA (DRC approach)?
RAMP2 HW Notes (3) • Technical Questions • Reconsider use of SRAM: group thought SRAM is a bad idea. It is faster, smaller, simpler to interface to. Newer parts will make interfacing simpler. Faster not a big concern for RAMP. Smaller is a big concern. • 8GB DDR2 DIMM modules on the horizon. • A target will be 1 GByte/processor. • Presence of on-board hard CPUs • Are hard cores in FPGAs useful (e.g. PPC405 in V2Pro) • Would commodity chips on PCB be useful (eg for management)
RAMP2 HW Notes (4) • Enclosures: • Using a standard form-factor will help in the with module packaging. • Need to look carefully at IBM blade center (adopted by IBM and Intel) • ATCA is gaining momentum. • Power may be a problem • Can we accomodate custom ASIC integration (perhaps through • a slight generalization of the DRAM interface). • What does Google do for packaging in their data centers? • Is it racks of 1U modules?
RAMP2 HW Notes (5) • Interesting Idea from Chuck Thacker: "Design new board based on need of RAMP White"! • Previously suggested by others • Can we estimate the logic capacity, memory BW, network BW, etc.?
OS, VM & Compiler • Leader/Reporter: Christos Kozyrakis • Topics • Debugging HW and SW (RDL) • Phased approach • Proxy, full kernel, VMMs, Hypervisor • HW/SW schedule and dependencies • High level applications
Software Notes (1) • RAMP milestones • Pick ISA • Deploy basic VMM • Deploy OS
Software Notes (2) • VMM approach: use split VMM system (ala VMware/Xen) • Run full VMM on x86 host that allows access to devices • Run simple VMM on RAMP that communicates with host for devices accesses through some network • A timing model may be used if I/O performed is important • Should talk with Sun & IBM about their VMM systems for Sparc and PowerPC. • May be able to port a very basic Xen system on our own • Questions • Accurate I/O timing with para-virtualization (you also need repeatability) • SW/system-level/IO issues for large scale machine may be more important than coherence • Related Issue: Do we want global cache coherence in white? • Benefit vs complexity (schedule etc)
Software Notes (3) • Separate infrastructure from RAMP • Example: RDL should not be tied to RAMP White • Note: This is in progress with some current RDL applications • Same with BEE3 design work • Most of our tools are applicable to others
Software Notes (4) • Debugging support: RDL-scope • Arbitrary conditions on RDL-level events to trigger debugging • Get traces of messages • Track lineage of messages • Traceability, accountability, relate events to program constructs • Infinite checkpoints for instructions & data • Checkpoint support • Swappable & observable designs • Single step • Instruction, RDL, or cycle level • Note: not always a commonly use feature • Such features may attract people to RDL more than retiming • Note: This is already the case with current RDL applications
Software Notes (5) • What our is schedule • What can we have up and running with 1 year? • Does it have to be RAMP white? • Do we need to migrate RDL maintenance from Greg? • Note: The work should be spread out at least. • Do we have enough manpower for this SW work? • Compiler, VMMs, Applications, etc…
Software Notes (6) • Application Domains • Enterprise/desktop • Full featured OS on all nodes • Running a JVM is a big plus here • Should be able to run webservers, middleware and DBs. • Embedded • While eventually an app may directly control a number of nodes, it is easier to start with all nodes running the OS. • The base design should allow all nodes to run the OS. • Easiest starting point for SW. • Various researchers may decide to run the OS in a subset of nodes, managing the rest of them directly • A simple runtime with app-specific policies • Common in embedded systems
Software Notes (7) • A simple kernel for embedded systems should support • Fast remapping of computation • Protection across processes • Emulation of attached disk • ISCSI + a timing model for disks • RAMP VMM uses: • Attract VMM researchers (might require x86) • Our own convenience • Get an OS running, access to devices etc • We may achieve (b) without (a) • Some researchers will want to turn cache coherence off anyway!