100 likes | 258 Views
Software Integration Highlights CY2008. Lee Liming, JP Navarro GIG Area Directors for Software Integration University of Chicago, Argonne National Laboratory April 2009. Expanding TeraGrid Capabilities. Moving capabilities from working groups to production
E N D
Software Integration HighlightsCY2008 Lee Liming, JP NavarroGIG Area Directors for Software IntegrationUniversity of Chicago, Argonne National LaboratoryApril 2009
Expanding TeraGrid Capabilities • Moving capabilities from working groups to production • Help working groups define new TG-wide capabilities(SGW support, Lustre & GPFS WAN, scheduling, etc.) • Formally document new/enhanced capabilities and work out integration, testing, support details • Prepare software binaries and installers for TG systems • Operated central services • Information service • Software build & test service • Speed page (data movement performance monitor) • DMOVER and Lustre WAN • Initiated the Quality Assurance activity • Predecessor to QA/CUE working group
Capability Model • In 2005, we retooled our SW coordination process • Emphasis on use cases, user scenarios enabled by SW • Bottom up, user-driven capability model • Open processes for community input into system def • Original TeraGrid (DTF) was aimed at a narrow set of distributed HPC applications • Single platform, narrow user base and target uses (distributed HPC) • Heavy emphasis on identical software environment • By 2004 commissioning, TeraGrid had expanded in scope to cover all NSF HPC applications • Very diverse user community, resources • Very wide diversity of user scenarios and use patterns
2008 Availability & Usage • Key idea: Capability usage vs. component usage • Most CTSS capabilities were available on all (or nearly all) TG systems and were used heavily or frequently everywhere • Remote compute was used heavily on some systems (like those appropriate for SGW usage) and not on others • Visualization capability was used heavily at UC/Argonne and TACC (other TG resources offer diverse visualization capabilities) • Science workflow capability was used less than once/day, but each use generated 100s or 1000s of jobs Heavy use means more than 100 uses/day on a single system.Frequent use means 1 – 100 uses/day on a single system. Infrequent use means less than 1 use/day on a single system.
2008 Operational Issues • In 2008, CTSS comprised 10 separate capabilities, with ~80 software components on 19 platforms • 16 issues reported by RPs • Installation docs incorrect/incomplete • A GIG-provided installer doesn’t fit well with a system • Issues with specific components (as provided by developers) • Inca test not accurate in all situations • Enhancement requests from admins
Capability Development & Expansion • VM hosting services supports science teams that utilize highly tailored environments or service-oriented applications • Provided by IU Quarry and Purdue Wispy • Science gateway support enables end-user tracking and improved security for gateways • Defined and on track for PY4 availability • Client software distribution supports campus champions and related initiatives • Released for evaluation • Public build/test system supports NSF SDCI/STCI and CISE program awardees • on track for PY4 availability
Advanced Scheduling Capabilities • Documented designs and implementations for TeraGrid advanced scheduling capabilities • On-demand computation • Advance reservation • Co-scheduling • Broadened availability of new capabilities • On-demand at IU, NCAR, NCSA, SDSC, TACC, and UC/Argonne • Advance reservation and co-scheduling at LONI, NCSA, SDSC • Automatic resource selection • In development, still on schedule for end of PY4
Information Services Enhancements • TeraGrid’s Integrated Information Service is a vital communication channel for system-wide functions • Used by Inca to plan verification tests • Helps keep user documentation up-to-date • Provides queue status data for user portal monitors • Provides data for automatic resource selection • Configures speed page test runs • In general, enables automation of many routine housekeeping tasks • Expanded content • Local HPC software registry, SGW-available science tools, resource descriptions • Expanded access methods • REST application framework, multiple data formats
Questions? • Moving capabilities from working groups to operations • Helping WGs move from ideas to production support • Capability-oriented software coordination model • Integration, testing, support planning • Preparing software for deployment on TG resources • Specific capabilities • Advanced scheduling capabilities • Information services enhancements • Enhanced science gateway security, end user tracking • VM hosting for highly specialized or service-oriented applications • Software for campuses • Helping SDCI/STCI and CISE awardees prepare software for TG