1 / 32

Nondeterministic Queries in a Relational Grid Information Service

Explore how to handle complex compositional queries efficiently in RGIS using nondeterministic approaches to provide random samples of result sets. Learn about query transformations and performance evaluations.

dockinsj
Download Presentation

Nondeterministic Queries in a Relational Grid Information Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University http://plab.cs.northwestern.edu

  2. Overview • RGIS: GIS system based on the relational data model using SQL • Complex compositional queries can be posed • “Find me 16 hosts on the same LAN that together have 32 GB of RAM” • Can be very expensive to answer • Joins: worst case O(n^m) for m tables of size n • Introduce nondeterminism • User gets random sample of result set • Automated query transformation

  3. Outline • Overview • Model • Implementation • Nondeterministic queries • Performance evaluation • Related work • Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

  4. RGIS Model of a Grid • Annotated network topology graph • Annotation examples • Hosts: memory, disk, OS, NICs, etc. • Router/Switch: backplane bandwidth, ports • Link: latency and bandwidth • Highly dynamic data in streams, not DB • Virtualization, Futures, Leases • Virtual machines module Software endpoint router iplink host Network Data link maclink macswitch Physical connectorswitch connectorlink

  5. Outline • Overview • Model • Implementation • Nondeterministic queries • Performance evaluation • Related work • Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

  6. Software Metadata Network Types Data Link Security Physical

  7. RGIS Design(Per Site)

  8. RGIS Design (Intersite) B A Update Push To Friend Site RGIS Server RGIS Server Update Push To Friend Site • Site RGIS server pushes local updates to friend sites • Site RGIS server consolidates updates from site and friend sites • Site RGIS server answers all queries originating from its site C RGIS Server

  9. Insert/Update/Delete Dual Xeon 1 GHz, 2 GB, 8x36 GB RAID5, Oracle 9i x x

  10. 2,700 lines of • authored SQL • 4,000 lines of • generated PL/SQL • 22,000 lines of • authored Perl • Main dependencies • DBI to Oracle 9i • SOAP::Lite • CGI • Not finished yet!

  11. RGIS Design(Per Site) This talk

  12. Outline • Overview • Model • Implementation • Nondeterministic queries • Performance evaluation • Related work • Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

  13. Motivation • Queries for compositions of resources easily expressed in SQL: • But such queries can be very expensive to execute • However, we typically don’t need the entire result set, just some rows, and not always the same ones • And we need them in a bounded amount of time select h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 “Find 2 hosts with Linux that together have 3 GB of RAM”

  14. Why Not Just Limit? • Oracle rownum, MySQL limit clause • “Return first k rows of result set” • Problem: Always get the SAME answer • Problem: May STILL take a long time • Results not discovered until near the end • Problem: Query time related to DATA as well as k

  15. Query Approaches Nondeterministic results (this paper) All results Available in Grid 2003 Paper Approximate results Scoped results Return Random Sample of Result Set

  16. Nondeterministic Version of Query select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds

  17. Implementing non-deterministic queries select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds Using Oracle-Specific Extensions SELECT H1.INSERTID, H2.INSERTID FROM HOSTS H1 SAMPLE(P), HOSTS H2 SAMPLE(P) WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) Query Manager and Rewriter Random sample ofinput tables withSelection Probability Pdetermined by time constraintand server load

  18. Implementing non-deterministic queries select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds Using Our Schema (Not Oracle-Specific) Rest of Talk SELECT H1.INSERTID, H2.INSERTID FROM HOSTS H1, HOSTS H2 , INSERTIDS TEMP_H1 , INSERTIDS TEMP_H2 WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) AND(H1.INSERTID=TEMP_H1.INSERTID AND TEMP_H1.rand > 982663452.975047 AND TEMP_H1.rand <= 1025613125.93505) AND (H2.INSERTID=TEMP_H2.INSERTID AND TEMP_H2.rand > 1877769069.94039 AND TEMP_H2.rand <= 1920718742.90039) Query Manager and Rewriter Random sample ofinput tables withSelection Probability Pdetermined by time constraintand server load

  19. Implementing non-deterministic queries Host insertid random_number 0 N x x+y Random Starting Point y=P*N Reshuffling Requirement

  20. Deadlines • Hard-limiting • Time-limited thread or process forked • Climbing • Start with low probability p, issue query, if no results, double probability, try again, keep going until no more time or have results • Estimation • Like climbing, but do polynomial estimation over previous runs to estimate if next run will exceed deadline

  21. Outline • Overview • Model • Implementation • Nondeterministic queries • Performance evaluation • Related work • Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

  22. GridG: Synthesing Realistic Computational Grids • Generates a Grid as an annotated layer 3 topology • Hosts, routers, links • Graph conforms to power laws of Internet topology • Annotations include: • memory, clock speed, cpu type, number of CPUs, operating system type, link bandwidths, router bandwidths, etc. • Memory distribution according to Smith study of MDS contents http://www.cs.northwestern.edu/~urgis/GridG

  23. Test Grids

  24. Nondeterministic query performance Select two hosts that together have >3GB of RAM Meaningful tradeoff between query processing time and result set size is possible

  25. Nondeterministic query performance Select n hosts that together have >3GB of RAM, holding query time constant Can use tradeoff to controlquery time independent of query complexity

  26. Deadlines Max Min Find 2 hosts with collective 600 GB RAM (VERY RARE)in 50K host grid

  27. Extending RGIS to Support Grid Computing On Virtual Machines • Virtuals • Each RGIS object has a unique id • Virtualization table associates unique id of virtual resources with unique ids of their constituent physical resources • Virtual nature of resource is hidden unless query explicitly requests it • Futures • An RGIS object that does not exist yet • Futures table of unique ids • Future nature of resource hidden unless query explicitly requests it

  28. Related Work • SLP, X.500, LDAP • Condor ClassAds • MDS • R-GMA • Redline • Random sampling from databases • Olsen, others

  29. Conclusions • GIS system based on relational data model • Powerful queries, but expensive to execute • Nondeterminism to control query time • Can be implemented without RDMBS support • Automated query translation in RGIS • Several techniques to implement deadlines for queries

  30. People and Acknowledgements • Students • Jason Skicewicz, Andrew Weinrich (Web + Soap), Jack Lange (CDN) • Collaborator • Relational Grid Resources Project at Indiana • Beth Plale • http://www.cs.indiana.edu/~plale/projects/RGR • Funder • NSF

  31. For MoreInformation • URGIS Site • http://www.cs.northwestern.edu/~urgis • Prescience Lab • http://plab.cs.northwestern.edu Join The User Comfort Study! http://comfort.cs.northwestern.edu Special Advertising Section

More Related