140 likes | 319 Views
2. CARMA Motivation. Key missing pieces in RC for HPCDynamic RC fabric discovery and managementCoherent multitasking, multi-user environmentRobust job scheduling and managementDesign for fault tolerance and scalabilityHeterogeneous system supportDevice independent programming modelDebug and s
E N D
1. CARMA: A Comprehensive Management Framework for High-Performance Reconfigurable Computing Ian A. Troxel, Aju M. Jacob, Alan D. George,
Raj Subramaniyan, and Matthew A. Radlinski
High-performance Computing and Simulation (HCS) Research Laboratory
Department of Electrical and Computer Engineering
University of Florida
Gainesville, FL
2. 2 CARMA Motivation Key missing pieces in RC for HPC
Dynamic RC fabric discovery and management
Coherent multitasking, multi-user environment
Robust job scheduling and management
Design for fault tolerance and scalability
Heterogeneous system support
Device independent programming model
Debug and system health monitoring
System performance monitoring into the RC fabric
Increased RC device and system usability
Our proposed Comprehensive Approach to Reconfigurable Management Architecture (CARMA) attempts to unify existing technologies as well as fill in missing pieces
3. 3 CARMA Framework Overview CARMA seeks to integrate:
Graphical user interface
Flexible programming model
COTS application mapper(s)
Handel-C, Impulse-C, Viva, System Generator, etc.
Graph-based job description
DAGMan, Condensed Graphs, etc.
Robust management tool
Distributed, scalable job scheduling
Checkpointing, rollback and recovery
Distributed configuration management
Multilevel monitoring service (GEMS)
Networks, hosts, and boards
Monitoring down into RC Fabric
Device independent middleware API
Multiple types of RC boards
PCI (many), network-attached, Pilchard
Multiple high-speed networks
SCI, Myrinet, GigE, InfiniBand, etc.
4. 4 Application Mapper Evaluation Evaluating on basis of ease of use, performance, hardware device independence, programming model, parallelization support, resource targeting, network support, stand-alone mapping, etc.
C-Based tools
Celoxica - SDK (Handel-C)
Provides access to in-house boards: ADM-XRC (x1), Tarari (x4), RC1000 (x4)
Good deal of success after lessons learned
Hardware design focused
Impulse Accelerated Technologies – Impulse-C
Provides an option for hardware independence
Built upon open source Streams-C from LANL
Supports ANSI standard C
Graphical tools
StarBridge Systems - Viva
Nallatech – Fuse / DIMEtalk
Annapolis Micro Systems - CoreFire
Xilinx - ISE compulsory
Evaluating the role of Jbits, System Generator, and XHWIF
Evaluations still ongoing
Programming model a fundamental issue to be addressed
5. 5 CARMA Interface Simple graphical user interface
Preliminary basis for graphical user interface via the Simple Web Interface Link Library (SWILL) from the University of Chicago*
User view for authentication and job submission/status
Administration view for system status and maintenance
Applications supported
Single or multiple tasks per job (via CARMA DAGs**)
CARMA registered (via CARMA API and DAGs) or not
Provides security, fault tolerance
Sequential and parallel (hand-coded or via MPI)
C-based application mappers supported
CARMA middleware API provides architecture independence
Any code that can link to the CARMA API library can be executed (Handel-C and ADM-XRC API tested to date)
Bit files must be registered with the CARMA Configuration Manager (CM)
All other mappers can use “not CARMA registered” mode
Plans for linking Streams/Impulse-C, System Generator, et al.
6. 6 CARMA User Interface
7. 7 CARMA Job Manager (JM) Prototyping effort (CARMA interoperability)
Completed first version of CARMA JM
Task-based execution via Condor-like DAGs
Separate processes and message queues for fault-tolerance
Checkpointing enabled with rollback in progress
Links to all other CARMA components
Fully distributed multi-node operation with job/task migration
Links to CARMA monitor and GEMS to make scheduling decisions
Tradeoff studies and analyses underway
External extensions to COTS tools (COTS plug and play)
Expand upon preliminary work @ GWU/GMU*
Striving for “plug and play” approach to JM
CARMA Monitor provides board info. (via ELIM)
Working to link to CARMA CM
Tradeoff studies and analysis underway
Integration of other CARMA components in progress
8. 8 CARMA CM Design Builds upon previous design concepts*
Execution Manager (EM)
Forks tasks from JM and returns results to JM
Requests and releases configurations
Configuration Manager (CM)
Manages configuration transport and caching
Loads, unloads configurations via BIM
Board Interface Module (BIM)
Provides board independence
Allows for configuration temporal locality benefits
Communication Module
Handles all inter-node communication
9. 9 Distributed CM Management Schemes
10. 10 CM System Recommendations
11. 11 CARMA Monitoring Services Monitoring service
Statistics Collector
Gathers local and remote information
Updates GEMS* and local values
Query Processor
Processes task scheduling requests from JM
Maintains local information
Round-Robin Database
Compact way to store performance logs
Supports simple query interface
CARMA Diagnostic
System watchdog alerts based on defined heuristics of failure conditions
Provides system monitoring and debug
Initial monitor version is complete
Studying FPGA monitoring options
Increasing the scheduling options
Tradeoff studies and analyses underway
12. 12 CARMA End-to-End Service Description Functionality demonstrated to date
Graphical user interface
Job/task scheduling based on board requirements and configuration temporal locality
Parallel and serial jobs
CARMA registered and non-registered tasks
Remote execution and result retrieval
Configuration caching and management
Mixed RC and “CPU-only” tasks
Heterogeneous board execution (3 types thus far)
System and RC device monitoring
Inter-node communication via SCI or TCP/IP/GigE
Fault-tolerant design
Processes can be restarted while running
Virtually no system impact from CARMA overhead despite use of unoptimized code
Less than 5MB RAM per node
Less than 0.1% processor utilization on a 2.4 GHz Xeon server
Less than 200 Kbps network utilization
13. 13 CARMA Framework Verification Several test jobs executed concurrently
Parallel Add Test composed of
ADD.exe, a “CPU-only” task to add two numbers
AddOne.bit, an RC task to increment input value
Parallel N-Queens Test composed of
ADD.exe, a “CPU-only” task to add two numbers
NQueens.bit, an RC1000 task to calculate a subset of the total number of solutions for an N×N board
4 RC1000s and 4 Tararis communicating via MPI
Parallel Sieve of Erasthones (on Tarari)
Parallel Monte Carlo Pi Generator (on Tarari)
Blowfish encrypt/decrypt (on ADM-XRC)
14. 14 Conclusions First working version of CARMA complete & tested
Numerous features supported
Simple GUI front-end interface
Coherent multitasking, multi-user environment
Dynamic RC fabric discovery and management
Robust job scheduling and management
Fault-tolerant and scalable services by design
Performance monitoring down into the RC fabric
Heterogeneous board support with hardware independence
Linking to COTS job management service
Initial testing shows the framework to be sound with very little overhead imposed upon the system
15. 15 Future Work and Acknowledgements Continue to fill in additional CARMA features
Include support for other boards, application mappers, and languages
Complete JM rollback feature and finish linkage to LSF
Include broker and caching mechanisms for the peer-to-peer distributed CM scheme
Include more intelligent scheduling algorithms (e.g. Last Release Time)
Expand RC device monitoring and include debug and opt. mechanisms
Enhance security including secure data transfer and authentication
Deploy on a large-scale test facility
Develop CARMA instantiations for other RC domains
Distributed shared-memory machines with RC (e.g. SGI Altix)
Embedded RC systems (e.g. satellite/aircraft systems, munitions)
We wish to thank the following for supporting this research:
Department of Defense
Xilinx
Celoxica
Alpha Data
Tarari
Key vendors of our HPC cluster resources (Intel, AMD, Cisco, Nortel)