320 likes | 448 Views
A Linux-based Software Platform for the Reconfigurable Scaleable Computing Project. John A. Williams * , Neil W. Bergmann * Robert F. Hodson +. Outline. RSC Overview Concept, participants Existing Technology MicroBlaze, uClinux New Developments Vision, Multiprocessing, MPI, NoC
E N D
A Linux-based Software Platform for the Reconfigurable Scaleable Computing Project John A. Williams*, Neil W. Bergmann*Robert F. Hodson+
Outline • RSC Overview • Concept, participants • Existing Technology • MicroBlaze, uClinux • New Developments • Vision, Multiprocessing, MPI, NoC • Status and outlook • Planned Investigations, Progress
Outline • RSC Overview • Concept, participants • Existing Technology • MicroBlaze, uClinux • New Developments • Vision, Multiprocessing, MPI, NoC • Status and outlook • Planned Investigations, Progress
Reconfigurable Scaleable Computing • Features • Next-generation on-board computing platform • FPGA-based reconfigurable computer • Soft CPU cores + embedded Linux operating system • Hybrid SW/HW application environment • Hierarchical, scaleable computing network • Selected for funding in 2004 H&RT call for proposals
Reconfigurable Scaleable Computing • Participants • NASA LaRC (project lead, hardware design) • UQ (operating system, message passing libraries) • ASRC (system modeling, performance analysis) • Jefferson Labs (consulting) • StarBridge Systems (graphical design tools) • NASA Office of Logic Design • NSA
Outline • RSC Overview • Concept, participants • Existing Technology • MicroBlaze, uClinux • New Developments • Vision, Multiprocessing, MPI, NoC • Status and outlook • Planned Investigations, Progress
MicroBlaze • 32 bit RISC, Harvard soft processor • Targeted to Xilinx logic primitives • ~1000-1500 slices (10% of XC4V-LX25) • Parameteriseable • Caches • ALU, FPU • Memory/bus interfaces • Local memory bus (LMB) • On-chip Peripheral Bus (OPB) • Fast Simplex Links (FSL)
MicroBlaze • Logic utilisation in RPM prototype FPGA device (16K dcache & 16K dcache) Selected Device : 4vlx25ff668-10 Number of Slices: 1504 out of 10752 13% Number of Slice Flip Flops: 1172 out of 21504 5% Number of 4 input LUTs: 2238 out of 21504 10% Number of FIFO16/RAMB16s: 24 out of 72 33% Number used as RAMB16s: 24 Number of DSP48s: 3 out of 48 6%
MicroBlaze, Linux and RSC • Why? • Path for existing applications onto RSC • Standard platform improves design efficiency • Application development/debug • Multiprocessing/clustering • Software infrastructure • Interoperability (networking, file systems, …) • UQ research focus in rSoC • integration of custom hardware (for speed) with conventional processor/OS modules (for flexibility)
MicroBlaze, Linux and RSC • Why not? • Performance • FPGAs roughly 10x less efficient than fixed silicon • CPUs less efficient than custom hardware • A serialised abstraction of intrinsically parallel hardware • Less efficient than deeply embedded software • Abstraction incurs performance penalty • Stability/reliability • RSC is a data processing/computation platform • Not part of spacecraft survivability MicroBlaze and Linux are only part of the solution
Outline • RSC Overview • Concept, participants • Existing Technology • MicroBlaze, uClinux • New Developments • Vision, Multiprocessing, MPI, NoC • Status and outlook • Planned Investigations, Progress
Vision • Heterogeneous multiprocessing • Multiple software tasks per processor • Multiple processors per chip/RPM • Hardware Co-processors • Multiple RPMs per stack • Multiple stacks per system • RSC is an exotic computing machine • How do we program it?
Vision • Linux-based multiprocessing • To SW apps, RSC is a Linux cluster • Critical computation offloaded to hardware • EITHER Co-processors to CPU nodes, • OR Peers in the computational network • Find the sweet spot • Runtime performance vs design effort • RSC is an exotic computing machine • We must make it seem straightforward
Vision • Make it look like Linux • Build on enormous library of Linux knowledge, tools, apps, documentation, training and skills • Ability to prototype realistic user apps on Linux desktop is tremendously valuable
MicroBlaze Multiprocessing • Lots of processors gives performance and reliability – parallelism is key • MicroBlaze achieves 4-8x better MIPS/LUT than any other soft CPU architecture (in Xilinx FPGAs) • We can put about 8 CPUs in an FPGA • What are the hardware architectural issues? • How to use it efficiently?
MicroBlaze Multiprocessing • Implicit multiprocessing • SMP, looks like one fast processor • Explicit multiprocessing • protoSMP, looks like many processors • Multi-level multiprocessing • MPI, looks like a cluster
MicroBlaze Multiprocessing • Symmetric Multiprocessing (SMP) • N CPUs as a single virtual machine • Implicit parallelism • Hidden by OS and hardware • Hardware support • Cache coherency • Memory architectures • Distributed interrupt dispatch
SMP vs ProtoSMP SMP – 1 virtual machine
MicroBlaze Multiprocessing • ProtoSMP • N CPUs on shared bus • Private address zones within shared physical memory • Common shared memory region with IPC protocols • shared memory multicomputing
SMP vs ProtoSMP ProtoSMP – N virtual machines
SMP Pros Implicit parallelism and inter-CPU comms Efficient memory and cache re-use Cons Specialised hardware support (caches, distributed interrupts) Requires kernel support ProtoSMP Pros Simplicity Use existing HW components No changes in kernel Cons Explicit parallelism and inter-CPU comms Memory waste Virtual IO model (N terminals) SMP vs ProtoSMP
RSC Network • Parallel processing architectures often limited by CPU/memory bandwidth and interprocess comms bandwidth. • RSC has several potential bottlenecks: RPM memory, PCI backplane, interstack networks. • Need to leave scope for high-speed comms, eg. with Rocket I/O on FPGAs
RSC Network • Useful if applications can be initially developed without regard to partitioning and communications • Implies a uniform interprocess communications mechanism • We choose MPI
MPI on Microblaze-uClinux • MPI - Message Passing Interface • API for explicit message passing between processes • Multiple processes on one machine, or • Distributed across many machines
MPI on MicroBlaze-uClinux • MPICH implementation, Argonne National Labs • MPICH2 – complete reimplementation of MPI conforming to MPI2 standard • Layered implementation abstracting MPI application interface from underlying physical transport • Process Management Interface
MPI on MicroBlaze uClinux http://www.sharcnet.ca/fw2003/slides/mpich2-details.ppt Application MPI MPE MPICH ROMIO ADI3 ADIO CH3Device BG/L Myrinet ... PVFS GPFS XFS ... CH3 Sock SHM SSM IB …
MPI on MicroBlaze-uClinux • MPICH2 on MicroBlaze • sock implementation over TCP/IP sockets • Starting point for RSC, with COTS demo MicroBlaze multiprocessing experiments • shm shared memory wrapper, great for SMP/protoSMP • Create new wrapper layer around RSC interconnect/NoC architecture once finalised • Can hardware co-processors look like MPI ?
Outline • RSC Overview • Concept, participants • Existing Technology • MicroBlaze, uClinux • New Developments • Vision, Multiprocessing, MPI, NoC • Status and outlook • Planned Investigations, Progress
COTS Demo Platform • Two ethernet ports per board, up to 4 MicroBlaze per board • Four boards per demo cluster • Variety of cluster configuration experiments • 4x uniprocessor • 4x4-way protoSMP • 4x2x2-way protoSMP • …
Status and Outlook • Detailed SMP vs protoSMP feasibility study • Commenced Q2 2005 • MPICH2 port investigations commenced • Baseline implementation uniprocessor over TCP/IP • Work commenced Q2 2005
Conclusion • MicroBlaze and uClinux are part of the solution • Those parts which are Linux, look like desktop/cluster Linux • Deliberate decisions in trade of design vs runtime efficiency • Looking ahead • Linux abstractions over RSC hardware • Intra-board, inter-board, inter-stack, … • Development and debug environments • Seamless integration with custom hardware • Viva, VHDL, …