190 likes | 366 Views
Performance Comparison of Grid Information Services. Beth Plale Computer Science Dept. Indiana University Unified Relational GIS Project Collaborative project with Peter Dinda, Northwestern University. Schemas in performance evaluation influenced by.
E N D
Performance Comparison of Grid Information Services Beth Plale Computer Science Dept. Indiana University Unified Relational GIS Project Collaborative project with Peter Dinda, Northwestern University
Schemas in performance evaluation influenced by “Key Concepts and Services of a Grid Information Service”, Beth Plale, Peter Dinda, Gregor von Laszewski, IASTED Parallel and Distributed Computing Systems (PDCS), September 2002
Criteria for Inclusion in GIS • Defn: object in repository represents entity in real-world grid • Grid entity has representation in GIS repository if grid entity: • can be described • has value to more than one application • has persistency needs beyond single application run
Services Provided by GIS • Query interface: request for information through query language • e.g., SELECT … FROM … WHERE in SQL • Update interface: request to add/update information in repository • e.g., UPDATE … in SQL • Management interface: activation, deactivation of service
Additional GIS Functionality • Replication • Provision of replica transparency • Distribution (a grid-driven necessity) • Partitioning of information across sites. • Security interface • Object level or column level? • Access control
4. GCE testbed portal View of GIS service Interoperability 1. Xpath query XML doc GCE testbed XML schema Xpath query Xpath query 2. XML doc converter SQL query 3. LDAP query XML db mySQL LDAP Xindice
Benchmark Evaluation of Alternate GIS Representations • Evaluation of three databases: relational (mySQL), LDAP (openLDAP), and XML (Xindice) • Database schemas: derived from single ER diagram and based partly on GLUE v8 • Benchmark: set of query and update use cases derived from Grid job submission. • Cost metric: minimized query response times, minimized update times, and minimized size of resulting query set.
Benchmark Evaluation Assumptions • Grid entities have complex relationships. • The questions asked of GIS data are becoming more complex. • Some entities require extremely rapid update rates. • Thus a cost metric that considers multiple aspects: • Minimized query response times, • Minimized update times, and • Minimized size of resulting query set.
start Benchmark Evaluation GCE XML GLUE v8 E-R diagram input schemas represent as transform into schema for relat- ional (mySQL) LDAP (open LDAP) Grid GIS Benchmark Use Cases XML (Xindice) evaluate against populate by GCE job submission use cases scripts and existing data
Set I: 05-’02, large multi-site project Set II: 01-’02, large academic HPC site Top 5 classes -- MDSDevice -- HostInfo -- MDSDeviceGroup -- top -- MDSSoftware Top 5 classes -- Globus Queue -- GlobusServicesJobMgr -- GlobusNetworkInterface -- GlobusPhysicalResource -- GlobusDaemon 36.5 % 24.5 13.5 8.5 7.0 ------- 90.0 % 42.0 % 26.0 17.5 8.0 6.0 ------- 100.0 % Top 5 classes -- GlobusFileInstance -- GlobusQueueEntry -- GlobusQueue -- GlobusOrganization -- GlobusServiceJobManager Set III: 11-’00, DOE site 80.0 % 6.5 3.2 1.8 1.8 ------- 94.5 %
E-R Diagram computing elements users application sources network cards has has clusters instan from use user accounts has has end points applications subclusters network benchmarks run on host, port, protocol has has nodes is-a is-a end-to-end connections hosts (compute nodes) network nodes network paths traceroute packet loss, latency.roundtripDelay.ping, bandwidth.avail.TCP.singleStream GLUE v8
network benchmarks nodes hosts (compute nodes) network nodes network paths Relational (table) representation computing elements users application sources network cards clusters user accounts applications end points subclusters host, port, protocol end-to-end connections traceroute packet loss, latency.roundtripDelay.ping, bandwidth.avail.TCP.singleStream
Hierarchical representation EDTtop network nodes compute elements user network path clusters application sources user accounts connections application subclusters hosts (compute nodes) endpoints
Benchmark: set of Use Cases of GIS query and update • Use cases based on job submission. • examples drawn from HotPage (M. Thomas) • Query 1: Suppose user is part of NPACI organization and knows his/her binary runs better on T3E. • “Of machines in NPACI organization, give me list of T3Es and their location for which availability is good, a binary is resident, and I have an account.”
Return machines and locations SELECT C.CPUmodel, C.name, C.location FROM Cluster as C, SubCluster as SC, Host as H, Application as A, UserAccount as UA, User as U WHERE C.Organization = “NPACI” and SC.OwningCluster = C.ClusterName and SC.CPUModel = “T3E” and A.OSName = SC.OSName and A.Owner = “Jane Lee” and A.Location = C.Location For All H where H.OwningCluster = C.ClusterName avg(H.SMPLoad1minX100 < 0.50) C.ClusterUniqueID = UA.ID and UA.ID = U.ID and U.Name = “Jane Lee” and UA.ExpireDate > 21-July-2002 and UA.ActivateDate <= 21-July-2002 Cluster is NPACI and user has binary on machine Availability is good User has valid account on cluster -> GLUEv8
“Of machines in NPACI organization, give me list of T3Es and their location for which availability is good, a binary is resident, and I have an account.” • “availability is good” could be defined different: • -- Defined here as ‘average load over all nodes in a SMP is less than .50’. • -- More difficult is ‘existence of 20 contiguous nodes.’ • ‘Binary is resident’ is fairly easy, ‘binary is nearby’ is a harder question to answer. • “Show histographic usage of my job or show historical usage of machine X for task Y where Y is job submission or transfer rate to HPSS”
start Benchmark Evaluation GCE XML GLUE v8 E-R diagram input schemas relat- ional (mySQL) LDAP (open LDAP) Grid GIS Benchmark Use Cases XML (Xindice) GCE job submission use cases scripts and existing data http://www.cs.indiana.edu/~plale