560 likes | 686 Views
Toward a Campus-Wide Grid Computing System. An Overview of The Lattice Project Adam L. Bazinet and Michael P. Cummings Laboratory of Molecular Evolution Center for Bioinformatics and Computational Biology. Outline. Grid computing motivation Goals of The Lattice Project Basic architecture
E N D
Toward a Campus-Wide Grid Computing System An Overview of The Lattice Project Adam L. Bazinet and Michael P. Cummings Laboratory of Molecular Evolution Center for Bioinformatics and Computational Biology
Outline • Grid computing motivation • Goals of The Lattice Project • Basic architecture • Our current production Grid system • Implementation details • Results of usage • Demo • Research and development • Task Computing with colleagues at Fujitsu • Creating Grid-enabled workflows
Grid Computing • Definition: A model of distributed computing that uses resources that are geographically and administratively disparate. Individual users can access computers and data transparently, without having to consider location, operating system, account administration, and other details. In Grid computing the details are abstracted, and the resources are virtualized.
Why Go Grid? • Scientific problems are solved faster • Parallel execution means higher throughput • Make compute resources a commodity • Analogous to the electrical power grid • Foster growth and interaction in the research community • Use of the Grid spans departments and domains • Grid resources are typically shared resources
Outline • Grid computing motivation • Goals of The Lattice Project • Basic architecture • Our current production Grid system • Implementation details • Results of usage • Demonstration • Research and development • Task Computing with colleagues at Fujitsu • Creating Grid-enabled workflows
The Lattice Project: Initial Goals • Develop a Grid system for scientific research that: • Speeds up workflows by “Grid-enabling” various programs • Is simple and intuitive • Takes advantage of heterogeneous resources • Is capable of managing large numbers of jobs (thousands) • Supports multiple users and lowers the barriers to getting involved • Is community-driven and supported
Principles of Design • Make use of well supported open source software • Globus Toolkit • BOINC • Condor • Engineered software should be scalable, modular, and robust • Expose programs as well-defined services • Arbitrary user-supplied code cannot be run
Outline • Grid computing motivation • Goals of The Lattice Project • Basic architecture • Our current production Grid system • Implementation details • Results of usage • Demo • Research and development • Task Computing with colleagues at Fujitsu • Creating Grid-enabled workflows
Terminology • Client: A Grid user interface OR a machine that performs computation • Grid Service: A Grid-enabled program • Scheduler: Decides where Grid jobs will run • Resource: Executes Grid jobs
Outline • Grid computing motivation • Goals of The Lattice Project • Basic architecture • Our current production Grid system • Implementation details • Results of usage • Demo • Research and development • Task Computing with colleagues at Fujitsu • Creating Grid-enabled workflows
Software Components • Globus Toolkit version 3.2.1 • Backbone of the Grid • http://www.globus.org/ • Condor-G • Grid-level scheduler / resource broker • http://www.cs.wisc.edu/condor/ • BOINC: Berkeley Open Infrastructure for Network Computing • SETI@home-style desktop grid • http://boinc.berkeley.edu/ • Custom components • GSBL, GSG, Globus-BOINC adaptor, MDS-matchmaking bridge, user interface(s), administrative scripts, and much more
Globus Toolkit 3 • Key components: • Globus Core • Grid service hosting environment • GSI – Grid Security Infrastructure • Uses public key cryptography • Secures communication • Authenticates and authorizes Grid users • WS GRAM – Job management • GASS – Point to point file transfer • MDS2 – Information provider
Condor-G • Condor-G is part of the Condor suite • Resources and jobs send Condor-G descriptions of themselves called ClassAds • Condor-G matches Grid jobs to suitable resources, then submits and manages them • This process is called matchmaking
BOINC • Most novel feature of our Grid • Public computing model • Untrusted resources • Is potentially our largest resource • We have targeted 3 platforms: • Windows / Linux x86 / Mac OS X
User Interface • The “Grid Brick”: a machine used to submit Grid jobs • Our primary interface for Grid users • Command line clients mimic normal program execution • Lattice Intranet • Provides instructions for submitting jobs and managing data input and output • Provides tools for describing and monitoring jobs • Other possibilities: • Web portal model of job submission • A client capable of composing complex workflows using Task Computing and Semantic Web technology developed by collaborators at Fujitsu
Demonstration • Job submission
Grid Client Stack Command-line Interface Perl Java * Service-specific templates and stubs are created by the Grid Service Generator
Grid Service Stack Grid Service Hosting Environment, a.k.a. “the container” Java * Service-specific templates and stubs are created by the Grid Service Generator
Tools for Writing Grid Services • Grid Service Base Library (GSBL) • Java API for building Grid services with the Globus Toolkit • Shields programmers from having to work with the Globus API directly • Provides a high-level interface for operations such as job submission and file transfer • Grid Service Generator (GSG) • Simplifies the process of creating Grid Services • Intended for use with GSBL
GSBL: Design and Features • Classes for: • Clients and services (base classes) • Argument description and processing • File transfers • Job submission and control • Security configuration • Java synchronization and Globus notifications to paper over event-based model
Grid Service Generator • Deploying a Grid service with GT3 is absurdly complicated • Many files, namespaces: lots of potential typos • GSG takes as input a few parameters (service name, location, an XML argument description, etc) and generates all requisite configuration files and skeleton Java classes
Grid Services • Creating Grid Services requires: • Knowledge of the application • Techniques for compiling and porting the application to various platforms • Knowledge of the infrastructure so it can be effectively tested and deployed • Challenges: • Maintaining bodies of Grid Service code as the number of applications grow and new versions of applications are released • Minimizing the number of updates that need to be applied when the framework changes
Condor-G: ClassAds • Resources and jobs send Condor-G descriptions of themselves called ClassAds • Jobs require certain capabilities of resources • Resources advertise their capabilities • Similar to a dating service: central broker points pairs of compatible jobs/resources at each other
Generating ClassAds • Job ClassAds are generated by the Condor-G job manager • Job requirements are specified in the Grid service configuration files • Resource ClassAds are generated by extracting information from MDS • Lattice information providers supply data required for matchmaking
Monitoring and Discovery System (MDS2) • Globus information services component • LDAP based • Answers questions like: • What resources are available? • What capabilities do these resources have? • What is the load on these resources? • This in turn allows for intelligent decisions to be made in areas such as scheduling and resource accounting
Current Grid Resources • http://lattice.umiacs.umd.edu/resources/ • UMIACS Condor pool • ~ 400 processors • BOINC pools • Clients on campus > 100 • Public (off-campus) clients > 1000
BOINC • Works on the “pull” model, that is: • One or more servers create workunits • Clients connect asynchronously, pull down work, and return the results • Clients are relatively lightweight and easy to install and manage • One client can crunch work for multiple projects • Participants can join teams and are given credit for the work they complete • http://lattice.umiacs.umd.edu/boinc_public
Globus-BOINC Adapter • Consists of a number of components that allow us to run Grid Services on BOINC • BOINC job manager • Custom validator and assimilator • Registers BOINC with Globus as a GRAM-addressable resource • BOINC compatibility library eases the process of porting applications to BOINC
Demonstration • Check job status
Research Projects Using the Grid • The Laboratory of David Fushman has run protein:protein docking algorithms on Lattice • CNS is the featured Grid service in this project • Floyd Reed and Holly Mortensen from the Laboratory of Sarah Tishkoff have run a number of population genetics simulations • MDIV and IM are the featured Grid services • The Laboratory of Michael Cummings has run statistical phylogenetic analyses • GSI is the featured Grid service
Results of Grid Usage • IM – 0.13 CPU years (BOINC) • MDIV – 4.93 CPU years (BOINC) • CNS – 12.4 CPU years (BOINC) • GSI – 94.05 CPU years (Condor) • Total: 111.51 CPU years
Outline • Grid computing motivation • Goals of The Lattice Project • Basic architecture • Our current production Grid system • Implementation details • Results of usage • Demo • Research and development • Task Computing with colleagues at Fujitsu • Creating Grid-enabled workflows
GT4 Research and Development • We are currently upgrading the Grid system to use Globus Toolkit 4.0 • GT4 adheres strictly to emerging and established Web service standards • Actively developed and supported • Many components have been greatly improved • GridFTP/RFT (will replace GASS) • WS GRAM • MDS4 (XML based; replaces MDS2, LDAP based) • Our basic architecture will remain the same, and the upgrade will be made easier because of tools we have already developed (GSBL, GSG)
Outline • Grid computing motivation • Goals of The Lattice Project • Basic architecture • Our current production Grid system • Implementation details • Results of usage • Demo • Research and development • Task Computing with colleagues at Fujitsu • Creating Grid-enabled workflows
Fujitsu Task Computing Research http://taskcomputing.org/ Fujitsu Laboratories of America, Inc. College Park, Maryland
Task Computing (TC) • Goals of Task Computing • Lets ordinary end-users accomplish complex tasks easily in environments rich with applications, devices, and services • Tasks can be composed on-the-fly from the services found in each environment and on the Internet • Then, tasks can be shared and edited later by others to suit their needs • Based on the Semantic Web Services technology, TC provides many ways to interact with tasks comprised of services
The Core Idea Play Jeff’s Video Dial Contact from Outlook View Weather of Maryland … The key is Semantic Service Descriptions (SSDs) for resources OS/Application (.NET, etc.) Device (UPnP) Web Services Dial Dial Video from DV Video from DV Open Open Save Save Print Print Add into Outlook Add into Outlook Aerial Photo of Aerial Photo of Weather of Weather of Play (Audio) Play (Audio) Play (Video) Play (Video) View View Jeff’s Video Jeff’s Video Contact from Outlook Contact from Outlook Devices OS/Application Web Pages
Task Computing Environment (TCE) • Windows software to realize TC • Core is written in Java • Requirements • Windows XP with IIS (Internet Information Server) installed • Java Runtime Environment (for TC clients only) • Single Windows installer with: • TC clients in many modalities (graphical, voice, Web-based) • More than 50 kinds of TC services • OS, application functions, devices • Many mechanisms for dynamic service creation • Web Services APIs for TC functions to program your own application • Available from http://taskcomputing.org for research institutes
TC Architecture User Task Computing Environment Presentation Layer Task ComputingClient Applications Web-basedClient Web Service API Middleware Layer Discovery Engine Execution & Execution Monitoring Engine ServiceComposition Engine Management Tools Semantic ServiceDescription Semantic ServiceDescription Semantic ServiceDescription Semantic ServiceDescription Service Layer Service Service Service Service Realization Layer E-service Device Application Content
TC Process Discover Execute Create Task Execute By Web By Email Save Share Edit