680 likes | 943 Views
Grid Services. Presented by Karan Bhatia. Hype Curve. Overview. Grid Computing Background Definition Opportunities Markets Technical Challenges Security Infrastructure Resource Management Service Interoperability Summary. Grid Computing is ….
E N D
Grid Services Presented by Karan Bhatia
Overview • Grid Computing Background • Definition • Opportunities • Markets • Technical Challenges • Security Infrastructure • Resource Management • Service Interoperability • Summary
Grid Computing is … • “Co-ordinated resource sharing and problem solving in dynamic multi-institutional virtual organization.” [Foster, Kesselman, Tuecke] • Co-ordinated - multiple resources working in concert, eg. Disk & CPU, or instruments & database, etc. • Resources - compute cycles, databases, files, application services, instruments. • Problem solving - focus on solving scientific problems • Dynamic - environments that are changing in unpredictable ways • Virtual Organization - resources spanning multiple organizations and administrative domains, security domains, and technical domains
Grid Computing is … (Industry) • “about finding distributed, underutilized compute resources (systems, desktops, storage) and provisioning those resources to users or applications requiring them.” [The Grid Report, Clabby Analytics] • Distributed - all the resources laying around in departments or server rooms. • Underutilized - typical utilization of “big iron” is 5 to 10%. Organizations save money by increasing utilization versus purchasing new resources. • Resources - servers and server cycles, applications, data resources • Provisioning - predict and schedule resource use depending on load.
Compute Grids Seti@home, Entropia, United Devices, Condor Data Grids Storage Resource Broker (SRB), Avaki, BIRN, GEON Collaboration Grids Instrumentation (telescience), applications Enterprise Grids Majority of commercial interest Partner Grids B2B, Academic/Govt Grids Service Grids “Utility” Computing, “On Demand”, pervasive, autonomic, etc… Types of Grids…
A Grid is … • “the next generation Internet,” • “all about free cycles ala SETI@HOME,” • “a distributed object system,” • “a new programming model,” • “a replacement for high performance computing,”
Another Grid Example … Google • Queries • 150 M queries/day (2000/s) • 100 countries • 3.3 B documents • Hardware • 15,000 Linux systems in 6 data centers • 15 Tflop/s and 1000 TB total capacity • 40-80 1U/2U servers/cabinet • 100 MB Ethernet switches/cabinate with gigabit uplinks • Growth from 4000 systems (18 M queries/day)
Grid Resources - Data • SDSC Resources • HPSS: • SDSC's central long-term data storage system, • one of the world's largest IBM High Performance Storage System (HPSS) units, • currently holds more than a petabyte (a million gigabytes) of data in approximately 21 million files, • It has the capacity to store six petabytes of data; files are added at an average rate of 10,000 gigabytes per month. • Storage-Area Network (SAN): • A 72-processor Sun Microsystems SunFire 15K high-end server and 11 Brocade switches (1,400 ports) • 225,000 gigabytes of networked disk storage for data-oriented applications. • 1 TB of data = $2500
IBM “on demand” solutions Sun Microsystems N1 initiative Oracle 10g Dell HP “utility” computing Platform Computing LSF, metaclulstering United Devices Desktop grids DataSynapse Akamai Google? Sony online entertainment? Where’s Microsoft? Grid Companies
Global Grid Forum (GGF) Organization for the Advancement of Structured Information Standards (OASIS) Distributed Management Task Force (DMTF) World Wide Web Consortium (W3C) Globus Alliance NSF Middleware Initiative (NMI) NASA IPG DOE Science Grid EU DataGrid NSF TeraGrid Grid Organizations
Challenges: Security • Grids traverse organizational boundaries • Different administration domains have different authentication mechanisms • Resources have different use agreements and sharing priorities • Single sign-on • Multiple passwords difficult to manage • Rights delegation • Trust • Authentication of users • Authorization of users • Resource access
Security • Public Key Infrastructure • Public key A.public • Private key A.private • Supports Encrpyption • Message to B: • m’ = F(m,A.private), send m’ to B • recv m’, m = F’(m’,A.public) • Digital Signatures • Signed message to B: • m’ = (m,F(m,A.public)) • Receiver verifies that m’ is from A and not tampered
Grid Security Infrastructure (GSI) • A central concept in GSI authentication is the certificate. • Every user and service on the Grid is identified via a certificate, a text file containing the following information: • a subject name identifying the person or object that the certificate represents, • the public key belonging to the subject, • the identity of a Certificate Authority (CA) that has signed the certificate to certify that the public key and the identity both belong to the subject, • the digital signature of the named CA.
Proxy Certificate • A proxy consists of a new certificate with a new public and private key. • The new certificate contains the owner's identity modified slightly to indicate that it is a proxy. • The new certificate is signed by the owner rather than a CA. • This is called a self-signed certificate. • The certificate also includes a time notation after which the proxy should no longer be accepted by others. • Proxies have limited lifetimes in order to minimize the security vulnerability. • Because the proxy isn't valid for very long, it doesn't have to kept quite as secure as the owner's private key.
Additional Challenges • Certificate Management • MyProxy • Role-based Access Control • CAS, VOM • Authorization services • Integration with applications & Portals
Challenges: Resource Management • Resources loosely-coupled • Higher network latencies • Planned and unplanned disruptions • How to provide QoS guarantees? • Case Study: Entropia Desktop Grids • Additional trust/security issues
Entropia 1: Gimps • Over 1.5 Billion CPU hours served • 300,000+ machines, over 4 years operational • Every PC and hardware config imaginable (proc, memory, disk, etc.) • Every networking hookup imaginable • Found 35th, 36th, 37th, 38th, and 39th Mersenne Primes
Entropia 2: FightAids@home • Sept 2000 launch • Internet-Based • 54,657 total machines • 10,770,506 total hours of computation • 27,881 peak billions of calculations/sec
Entropia 3: DCGrid • Enterprise focus • Tremendous resources available in enterprise • Complements other HPC resources • Computing Platform • Arbitrary application (open scheduling model) • Security, unobtrusiveness, manageability guaranteed • Focus on • Pharmaceuticals, Chemicals, and Materials • Financial Services
Server vs. Desktop Grids • Server environment • Fixed IP, always connected • Always-on operation • Moderate number of systems (10’s – 100’s) • Dedicated use, trusted systems • Desktop environment • Dynamic, temporary IP, intermittent connection • Off evenings, off weekends, off lunch • Large numbers of systems (100’s – 1000’s - ?) • Shared resources, potentially untrusted users • These differences give rise to desktop Grid challenges
PC-Grid Challenges • Provide a stable compute environment for apps • Isolate app from variable desktop environment • Operate in environment of dynamic use • Unobtrusiveness and Fault Tolerance are key! • Provide simple application integration • Support ANY Application without modification • Provide centralized management console • Zero additional management costs
End-user computation 1 8 resource Job Job Manager resource description Management 2 7 Resource Subjob Scheduler 3 Schedulinng 4 6 b 5 Physical Node Management Entropia Node Manager Clients a Workflow
Stable Compute Environment • Entropia Proprietary Sandbox • Binary-level protection • System virtualization (registry, file system, network) • Open Scheduling Infrastructure • Intelligent scheduling (match resources to subjobs requirements) • Manage subjob redundancy/fault tolerance
Manage Dynamic Use • PC primary use must be respected! • Entropia Proprietary Sandbox • Guaranteed to run at idle priority • Limit application capability • Monitor page faults, network access • Management • Provide time-of-use windows • Different levels of unobtrusiveness • Gathers 95+ % of cycles
Application Integration • Support any Win32 binary • Language Neutral (C, C++, Fortran, Java,C#, etc.) • Compiler/library Neutral App A Client1 qsub qstat … * … … App B Client2 * Run Applications Open Grid Platform App C Application Preparation Tools
Application Performance HMMER GOLD AUTODOCK DOCK
Challenges: Service Interoperability • Trying to force homogeneity on users is futile. Everyone has their own preferences, sometimes even dogma. • The Internet provides the model…
ComputeServer SimulationTool ComputeServer WebBrowser WebPortal RegistrationService Camera TelepresenceMonitor DataViewerTool Camera Database service ChatTool DataCatalog Database service CredentialRepository Database service Certificate authority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces Typical Application
Typical Application • Implementations are provided by a mix of • Application-specific code • “Off the shelf” tools and services • Tools and services from the Globus Toolkit • Tools and services from the Grid community (compatible with GT) • Glued together by… • Application development • System integration
A ComputeServer SimulationTool B ComputeServer WebBrowser WebPortal RegistrationService Camera TelepresenceMonitor DataViewerTool Camera Application Developer 9 Off the Shelf 13 C Database service ChatTool Globus Toolkit 0 DataCatalog D Database service CredentialRepository Grid Community 0 E Database service Certificate authority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces How it Really Happens(without the Grid)
GlobusGRAM ComputeServer SimulationTool GlobusGRAM ComputeServer WebBrowser Portal Globus IndexService Camera TelepresenceMonitor DataViewerTool Application Developer 2 Camera Off the Shelf 9 GlobusDAI Database service portlet Globus Toolkit 4 GlobusMCS/RLS GlobusDAI Database service MyProxy Grid Community 4 GlobusDAI Database service CertificateAuthority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces How it Really Happens(with the Grid)
What You Get in the Globus Toolkit • OGSI(3.x)/WSRF(4.x) Core Implementation • Used to develop and run OGSA-compliant Grid Services (Java, C/C++) • Basic Grid Services • Popular among current Grid users, common interfaces to the most typical services; includes both OGSA and non-OGSA implementations • Developer APIs • C/C++ libraries and Java classes for building Grid-aware applications and tools • Tools and Examples • Useful tools and examples based on the developer APIs
MDS2 WS-Index (OGSI) Components in Globus Toolkit 3.0 GSI WU GridFTP JAVA WS Core (OGSI) Pre-WS GRAM WS-Security RFT (OGSI) OGSI C Bindings WS GRAM (OGSI) RLS Security Data Management Resource Management Information Services WS Core
WU GridFTP MDS2 GSI RFT (OGSI) WS-Index (OGSI) WS-Security RLS CAS (OGSI) OGSI-DAI SimpleCA XIO Components in Globus Toolkit 3.2 JAVA WS Core (OGSI) Pre-WS GRAM WS GRAM (OGSI) OGSI C Bindings OGSI Python Bindings (contributed) pyGlobus (contributed) Security Data Management Resource Management Information Services WS Core